[slurm-users] [ext] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

Hagdorn, Magnus Karl Moritz magnus.hagdorn at charite.de
Wed Jun 28 14:39:34 UTC 2023


Hi Maysam,
you need to describe your job a little more. In the past I have used a
taskfarm approach [1] with worker jobs submitted to the cluster as a
job array. This way the system could grow and shrink depending on
available tasks/compute nodes.
Regards
Magnus

[1] http://doi.org/10.5334/jors.393


On Wed, 2023-06-28 at 11:02 +0000, Rahmanpour Koushki, Maysam wrote:
> Dear Slurm Mailing List,
> 
> I hope this email finds you well. I am currently working on a project
> that requires the ability to dynamically shrink or expand nodes for
> running jobs in Slurm. However, I am facing some challenges and would
> greatly appreciate your assistance and expertise in finding a
> solution.
> In my research, I came across the following resources:
>    1. Slurm Advanced Usage Tutorial: I found a tutorial
> (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf)
> that discusses advanced features of Slurm. It mentions the
> possibility of assigning and deassigning nodes to a job, which is
> exactly what I need. However, the tutorial refers to the FAQ for more
> detailed information.
>    2. Stack Overflow Question: I also came across a related question
> on Stack Overflow
> (https://stackoverflow.com/questions/49398201/how-to-update-job-node-
> number-in-slurm) that discusses updating the node number for a job in
> Slurm. The answer suggests that it is indeed possible, but again, it
> refers to the FAQ for further details.
> Upon reviewing the current FAQ, I found that it states node shrinking
> is only possible for pending jobs. Unfortunately, it does not provide
> additional information or examples to clarify if this functionality
> can be extended to running jobs.
> I would be grateful if anyone could provide insight into the
> following:
>    1. Is it possible to dynamically shrink or expand nodes for
> running jobs in Slurm? If so, how can it be achieved?
>    2. Are there any alternative methods or workarounds to accomplish
> dynamic node scaling for running jobs in Slurm?
> I kindly request your guidance, personal experiences, or any relevant
> resources that could shed light on this topic. Your expertise and
> assistance would greatly help me in successfully completing my
> project.
> Thank you in advance for your time and support. 
> Best regards,
> 
> Maysam
> 
> Johannes Gutenberg University of Mainz
> 

-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Virchow Klinikum
Forum 4 | Ebene 02 | Raum 2.020
Augustenburger Platz 1
13353 Berlin
 
magnus.hagdorn at charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpdesk at charite.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5449 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230628/3ee90927/attachment.bin>


More information about the slurm-users mailing list