[slurm-users] update node config while jobs are running

Andy Georges andy.georges at ugent.be
Tue Mar 10 11:03:18 UTC 2020


On Tue, Mar 10, 2020 at 05:49:07AM +0000, Rundall, Jacob D wrote:
> I need to update the configuration for the nodes in a cluster and I’d like to let jobs keep running while I do so. Specifically I need to add RealMemory=<blah> to the node definitions (NodeName=). Is it safe to do this for nodes where jobs are currently running? Or I need to make sure nodes are drained while updating their config? We are using SelectType=select/linear on this cluster. Users would only be allocating complete nodes.
> Additionally, do I need to restart the Slurm daemons (slurmctld and slurmd) to make this change? I understand if I were adding completely new nodes I would need to do so (and that it’s advised to stop slurmctld, update config files, restart slurmd on all computes, and then start slurmctld). But is restarting the Slurm daemons also required when updating node config as I would like to do, or would ‘scontrol reconfigure’ suffice?

If you want the change to be persistent, you will need to update the
slurm.conf (and/or other files in /etc/slurm). 

That said, scontrol reconfig should suffice to trigger the change in
running slurmd daemons. However, restarting slurmd and slurmctl is no big
deal afaik, provided you respect the timeouts that you've set. When
restarting slurmd, it will see the running jobs. When restarting
slurmctld, it will poll the nodes for info and regain knowledge of
running things. So it is no issue to do this live. I would first restart
slurmctld and then all the slurmds (after slurmctld is back up and
running properly).

-- Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200310/2ebdaf13/attachment.sig>

More information about the slurm-users mailing list