[slurm-users] Nodes do not return to service after scontrol reboot

David Baker D.J.Baker at soton.ac.uk
Thu Jun 18 06:32:36 UTC 2020


Hello Chris,

Thank you for your comments. The scontrol reboot command is now working as expected.

Best regards,
David

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Christopher Samuel <chris at csamuel.org>
Sent: 16 June 2020 18:16
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Nodes do not return to service after scontrol reboot

On 6/16/20 8:16 am, David Baker wrote:

> We are running Slurm v19.05.5 and I am experimenting with the *scontrol
> reboot * command. I find that compute nodes reboot, but they are not
> returned to service. Rather they remain down following the reboot..

How are you using "scontrol reboot" ?

We do:

scontrol reboot ASAP nextstate=resume reason=$REASON $NODE

Which works for us (and we have health checks in our epilog that can
trigger this for known issues like running low on unfragmented huge pages).

All the best,
Chris
--
   Chris Samuel  :  https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C6fa4d9db3b0e47f6a03308d812197d60%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=V9%2Fytt3ActVODtPjD%2FXAB2w5TvVhSJDYJ9%2B0xUmJRUU%3D&reserved=0  :  Berkeley, CA, USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200618/5879debd/attachment-0001.htm>


More information about the slurm-users mailing list