[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set
Tim Schneider
tim.schneider1 at tu-darmstadt.de
Tue Oct 24 19:39:46 UTC 2023
Hi,
from my understanding, if I run "scontrol reboot <node>", the node
should continue to operate as usual and reboots once it is idle. When
adding the ASAP flag (scontrol reboot ASAP <node>), the node should go
into drain state and not accept any more jobs.
Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME
<node>", the node goes in "mix@" state (not drain), but no new jobs get
scheduled until the node reboots. Essentially I get draining behavior,
even though the node's state is not "drain". Note that this behavior is
caused by "nextstate=RESUME"; if I leave that away, jobs get scheduled
as expected. Does anyone have an idea why that could be?
I am running slurm 22.05.9.
Steps to reproduce:
# To prevent node from rebooting immediately
sbatch -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> ./long_running_script.sh
# Request reboot
scontrol reboot nextstate=RESUME <node>
# Run interactive command, which does not start until "scontrol
cancel_reboot <node>" is executed in another shell
srun -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> --pty bash
Thanks a lot in advance!
Best,
Tim
More information about the slurm-users
mailing list