[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

Wed Oct 25 11:39:04 UTC 2023

Hi Chris,

thanks a lot for your response.

I just realized that I made a mistake in my post. In the section you 
cite, the command is supposed to be "scontrol reboot nextstate=RESUME" 
(without ASAP).

So to clarify: my problem is that if I type "scontrol reboot 
nextstate=RESUME" no new jobs get scheduled anymore until the reboot. On 
the other hand, if I type "scontrol reboot", jobs continue to get 
scheduled, which is what I want. I just don't understand, why setting 
nextstate results in the nodes not accepting jobs anymore.

My usecase is similar to the one you describe. We use the ASAP option 
when we install a new image to ensure that from the point of the 
reinstallation onwards, all jobs end up on nodes with the new 
configuration only. However, in some cases when we do only minor changes 
to the image configuration, we prefer to cause as little disruption as 
possible and just reinstall the nodes whenever they are idle. Here, 
being able to set nextstate=RESUME is useful, since we usually want the 
nodes to resume after reinstallation, no matter what their previous 
state was.

Hope that clears it up and sorry for the confusion!

Best,

tim

On 25.10.23 02:10, Christopher Samuel wrote:
> On 10/24/23 12:39, Tim Schneider wrote:
>
>> Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME
>> <node>", the node goes in "mix@" state (not drain), but no new jobs get
>> scheduled until the node reboots. Essentially I get draining behavior,
>> even though the node's state is not "drain". Note that this behavior is
>> caused by "nextstate=RESUME"; if I leave that away, jobs get scheduled
>> as expected. Does anyone have an idea why that could be?
> The intent of the "ASAP` flag for "scontrol reboot" is to not let any
> more jobs onto a node until it has rebooted.
>
> IIRC that was from work we sponsored, the idea being that (for how our
> nodes are managed) we would build new images with the latest software
> stack, test them on a separate test system and then once happy bring
> them over to the production system and do an "scontrol reboot ASAP
> nextstate=resume reason=... $NODES" to ensure that from that point
> onwards no new jobs would start in the old software configuration, only
> the new one.
>
> Also slurmctld would know that these nodes are due to come back in
> "ResumeTimeout" seconds after the reboot is issued and so could plan for
> them as part of scheduling large jobs, rather than thinking there was no
> way it could do so and letting lots of smaller jobs get in the way.
>
> Hope that helps!
>
> All the best,
> Chris