[slurm-users] Restart Job after sudden reboot of the node
chris at csamuel.org
Sat Jul 25 01:30:25 UTC 2020
On 7/24/20 12:28 pm, Saikat Roy wrote:
> If SLURM restarts automatically, is there any way to stop it?
If you would rather Slurm not start scheduling jobs when it is restarted
then you can set your partitions to have `State=DOWN` in slurm.conf.
That way should the node running slurmctld reboot then it won't start
scheduling jobs until you tell it to.
For compute nodes I believe Slurm should detect any node that reboots
and mark it "DOWN" with the reason set to "Node unexpectedly rebooted".
All the best,
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users