[slurm-users] Dynamic Node Shrinking/Expanding for Running Jobs in Slurm
Diego Zuccato
diego.zuccato at unibo.it
Wed Jun 28 14:15:44 UTC 2023
IIUC it's not possible to increase resource usage once the job is
started: it would mess the scheduler and MPI comms (probably).
But I also think you're trying to find a problem for a "solution". Just
state the problem you're facing instead of proposing a solution :)
What software are you running? How does it detect that a resize is
needed? How would it handle the expansion?
Diego
Il 28/06/2023 13:02, Rahmanpour Koushki, Maysam ha scritto:
> Dear Slurm Mailing List,
>
>
> I hope this email finds you well. I am currently working on a project
> that requires the ability to dynamically shrink or expand nodes for
> running jobs in Slurm. However, I am facing some challenges and would
> greatly appreciate your assistance and expertise in finding a solution.
>
> In my research, I came across the following resources:
>
> 1.
>
> Slurm Advanced Usage Tutorial: I found a tutorial
> (https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf
> <https://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf>) that discusses advanced features of Slurm. It mentions the possibility of assigning and deassigning nodes to a job, which is exactly what I need. However, the tutorial refers to the FAQ for more detailed information.
>
> 2.
>
> Stack Overflow Question: I also came across a related question on
> Stack Overflow
> (https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm <https://stackoverflow.com/questions/49398201/how-to-update-job-node-number-in-slurm>) that discusses updating the node number for a job in Slurm. The answer suggests that it is indeed possible, but again, it refers to the FAQ for further details.
>
> Upon reviewing the current FAQ, I found that it states node shrinking is
> only possible for pending jobs. Unfortunately, it does not provide
> additional information or examples to clarify if this functionality can
> be extended to running jobs.
>
> I would be grateful if anyone could provide insight into the following:
>
> 1.
>
> Is it possible to dynamically shrink or expand nodes for running
> jobs in Slurm? If so, how can it be achieved?
>
> 2.
>
> Are there any alternative methods or workarounds to accomplish
> dynamic node scaling for running jobs in Slurm?
>
> I kindly request your guidance, personal experiences, or any relevant
> resources that could shed light on this topic. Your expertise and
> assistance would greatly help me in successfully completing my project.
>
> Thank you in advance for your time and support.
>
> Best regards,
>
>
> Maysam
>
>
> Johannes Gutenberg University of Mainz
>
>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the slurm-users
mailing list