[slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

Diego Zuccato diego.zuccato at unibo.it
Wed Jun 28 14:15:44 UTC 2023


IIUC it's not possible to increase resource usage once the job is 
started: it would mess the scheduler and MPI comms (probably).

But I also think you're trying to find a problem for a "solution". Just 
state the problem you're facing instead of proposing a solution :)
What software are you running? How does it detect that a resize is 
needed? How would it handle the expansion?

Diego

Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
> Dear Slurm Mailing List,
> 
> 
> I hope this email finds you well. I am currently working on a project 
> that requires the ability to dynamically shrink or expand nodes for 
> running jobs in Slurm. However, I am facing some challenges and would 
> greatly appreciate your assistance and expertise in finding a solution.
> 
> In my research, I came across the following resources:
> 
>  1.
> 
>     Slurm Advanced Usage Tutorial: I found a tutorial
>     (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
>     <https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) that discusses advanced features of Slurm. It mentions the possibility of assigning and deassigning nodes to a job, which is exactly what I need. However, the tutorial refers to the FAQ for more detailed information.
> 
>  2.
> 
>     Stack Overflow Question: I also came across a related question on
>     Stack Overflow
>     (https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm <https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>) that discusses updating the node number for a job in Slurm. The answer suggests that it is indeed possible, but again, it refers to the FAQ for further details.
> 
> Upon reviewing the current FAQ, I found that it states node shrinking is 
> only possible for pending jobs. Unfortunately, it does not provide 
> additional information or examples to clarify if this functionality can 
> be extended to running jobs.
> 
> I would be grateful if anyone could provide insight into the following:
> 
>  1.
> 
>     Is it possible to dynamically shrink or expand nodes for running
>     jobs in Slurm? If so, how can it be achieved?
> 
>  2.
> 
>     Are there any alternative methods or workarounds to accomplish
>     dynamic node scaling for running jobs in Slurm?
> 
> I kindly request your guidance, personal experiences, or any relevant 
> resources that could shed light on this topic. Your expertise and 
> assistance would greatly help me in successfully completing my project.
> 
> Thank you in advance for your time and support.
> 
> Best regards,
> 
> 
> Maysam
> 
> 
> Johannes Gutenberg University of Mainz
> 
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list