[slurm-users] srun --reboot option is not working

MrBr @ GMail mrbr.mail at gmail.com
Mon Mar 9 11:38:11 UTC 2020


Hi all

I'm trying to use the --reboot option of srun to reboot the nodes before
allocation.
However the nodes not been rebooted

The node get's stuck in allocated# state as show by sinfo or CF - as shown
by squeue
The logs of slurmctld and slurmd show no relevant information, debug levels
at "debug5"
Eventually the nodes got to "down" due to "ResumeTimeout reached"

Strangest thing is that the "scontrol reboot <nodename>" works without any
issues.
AFAIK both command rely on the same RebootProgram

In srun document there is a following statement: "This is only supported
with some system configurations and will otherwise be silently ignored".
May be I have this "non-supported" configuration?

Does anyone has suggestion regarding root cause of this behavior or
possible investigation path?

Tech data:
Slurm 19.05
The user that executes the srun is an admin, although it's not required in
19.05
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200309/34b55494/attachment-0001.htm>


More information about the slurm-users mailing list