[slurm-users] Gracefully shutting down cluster

Will Dennis wdennis at nec-labs.com
Thu Oct 3 19:01:07 UTC 2019


Hi all,

I want to be able to gracefully shut down Slurm and then the node itself with a command that affects the entire cluster. It is my current understanding that I can set the “RebootProgram” param in slum.conf to be a command, and then trigger the shutdown via “scontrol reboot_nodes” which will end up executing the “RebootProgram” command on all of the cluster nodes.

So, if I define the value of “RebootProgram” to be “/sbin/shutdown -h now” and then issue the “scontrol reboot_nodes” command, this should end up doing a graceful shutdown of Slurm on the nodes, and then shut down the hardware nodes, correct? When the nodes are rebooted, it it my expectation that they would come back up and be available, right? I ask because I don’t want to “reboot” the nodes, I want to “shutdown/halt” the nodes.

My environment is Slurm 16.05.4 running on a collection of (very) hetero nodes, no cluster manager software in use.

Thanks!
Will


More information about the slurm-users mailing list