[slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

Rahmanpour Koushki, Maysam mrahmanp at uni-mainz.de
Thu Jun 29 08:46:33 UTC 2023

Thank you for the responses.

In response to some of the suggestions, I would like to provide further details on my specific use case. I am currently focused on exploring the concept of malleable jobs, which possess the ability to adapt their computing resources during runtime.

To tackle the MPI incompatibility issue associated with malleable jobs, There are solutions like Flex-MPI which extends the functionality of MPI to support resource adaptivity for malleable jobs during runtime. Furthermore, There are scheduling algorithms tailored for malleable jobs. These algorithms aim to efficiently allocate resources and optimize job scheduling based on the dynamic nature of malleable jobs.

My primary objective is to understand how Slurm can effectively support malleable jobs. So I am investigating to find out how can SLURM support expand and shrink nodes during runtime.

Best Regards


From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Diego Zuccato <diego.zuccato at unibo.it>
Sent: Wednesday, June 28, 2023 4:15:44 PM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm

IIUC it's not possible to increase resource usage once the job is
started: it would mess the scheduler and MPI comms (probably).

But I also think you're trying to find a problem for a "solution". Just
state the problem you're facing instead of proposing a solution :)
What software are you running? How does it detect that a resize is
needed? How would it handle the expansion?


Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
> Dear Slurm Mailing List,
> I hope this email finds you well. I am currently working on a project
> that requires the ability to dynamically shrink or expand nodes for
> running jobs in Slurm. However, I am facing some challenges and would
> greatly appreciate your assistance and expertise in finding a solution.
> In my research, I came across the following resources:
>  1.
>     Slurm Advanced Usage Tutorial: I found a tutorial
>     (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
>     <https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) that discusses advanced features of Slurm. It mentions the possibility of assigning and deassigning nodes to a job, which is exactly what I need. However, the tutorial refers to the FAQ for more detailed information.
>  2.
>     Stack Overflow Question: I also came across a related question on
>     Stack Overflow
>     (https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm <https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>) that discusses updating the node number for a job in Slurm. The answer suggests that it is indeed possible, but again, it refers to the FAQ for further details.
> Upon reviewing the current FAQ, I found that it states node shrinking is
> only possible for pending jobs. Unfortunately, it does not provide
> additional information or examples to clarify if this functionality can
> be extended to running jobs.
> I would be grateful if anyone could provide insight into the following:
>  1.
>     Is it possible to dynamically shrink or expand nodes for running
>     jobs in Slurm? If so, how can it be achieved?
>  2.
>     Are there any alternative methods or workarounds to accomplish
>     dynamic node scaling for running jobs in Slurm?
> I kindly request your guidance, personal experiences, or any relevant
> resources that could shed light on this topic. Your expertise and
> assistance would greatly help me in successfully completing my project.
> Thank you in advance for your time and support.
> Best regards,
> Maysam
> Johannes Gutenberg University of Mainz

Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230629/3d35b45e/attachment-0001.htm>

More information about the slurm-users mailing list