[slurm-users] Nodes do not return to service after scontrol reboot

Christopher Samuel chris at csamuel.org
Tue Jun 16 17:16:19 UTC 2020


On 6/16/20 8:16 am, David Baker wrote:

> We are running Slurm v19.05.5 and I am experimenting with the *scontrol 
> reboot * command. I find that compute nodes reboot, but they are not 
> returned to service. Rather they remain down following the reboot..

How are you using "scontrol reboot" ?

We do:

scontrol reboot ASAP nextstate=resume reason=$REASON $NODE

Which works for us (and we have health checks in our epilog that can 
trigger this for known issues like running low on unfragmented huge pages).

All the best,
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list