[slurm-users] Reservation vs. Draining for Maintenance?

Thu Aug 6 22:55:58 UTC 2020

On 8/6/20 10:13 am, Jason Simms wrote:

> Later this month, I will have to bring down, patch, and reboot all nodes 
> in our cluster for maintenance. The two options available to set nodes 
> into a maintenance mode seem to be either: 1) creating a system-wide 
> reservation, or 2) setting all nodes into a DRAIN state.

We use both. :-)

So for cases where we need to do a system wide outage for some reason we 
will put reservations on in advance to ensure the system is drained for 
the maintenance.

But for rolling upgrades we will build a new image, set nodes to use it 
and then do something like:

scontrol reboot ASAP nextstate=resume reason="Rolling upgrade" [nodes]

That will allow running jobs to complete, drain all the nodes and when 
idle they'll reboot into the new image and resume themselves once 
they're back up and slurmd has started and checked in.

We use the same mechanism when we need to reboot nodes for other 
maintenance activities, say when huge pages are too fragmented and the 
only way to reclaim them is to reboot the node (these checks happen in 
the node epilog).

We paid for enhancements to Slurm 18.08 to ensure that slurmctld took 
these nodes states into account when scheduling jobs so that large jobs 
(as in requiring most of the nodes in the system) do not lose their 
scheduling window when a node has to be rebooted for this reason.

All the best,
Chris
-- 
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA