[slurm-users] Rolling reboot with at most N machines down simultaneously?

Corentin Mercier corentin.a.mercier at inria.fr
Fri Aug 5 09:27:00 UTC 2022


Hello, 

I think you could use SLURM's power saving mecanism to shut down all your nodes simultaneously. 
Then doing srun -N<nb_nodes> -C <your_node_group> true (or any other small work) will wake up N nodes simultaneously. 
You can even do srun while your nodes are powering down, SLURM will reboot them as soon as they're powered down. 

I hope it will be helpful ! 

Regards, 
C.Mercier 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220805/57ea0352/attachment.htm>


More information about the slurm-users mailing list