[slurm-users] Rolling upgrade of compute nodes

Ümit Seren uemit.seren at gmail.com
Mon May 30 19:06:31 UTC 2022

We did a couple of major and minor SLURM upgrades without draining the compute nodes.
Once slurmdbd and slurmctld were updated to the new major version, we did a package update on the compute nodes and restarted slurmd on them.
The existing running jobs continued to run fine and new jobs on the same compute started by the updated slurmd daemon and also worked fine.

So, for us this worked smoothly.


From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
Date: Monday, 30. May 2022 at 20:58
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Rolling upgrade of compute nodes
On 30-05-2022 19:34, Chris Samuel wrote:
> On 30/5/22 10:06 am, Chris Samuel wrote:
>> If you switch that symlink those jobs will pick up the 20.11 srun
>> binary and that's where you may come unstuck.
> Just to quickly fix that, srun talks to slurmctld (which would also be
> 20.11 for you), slurmctld will talk to the slurmd's running the job
> (which would be 19.05, so OK) but then the slurmd would try and launch a
> 20.11 slurmstepd and that is where I suspect things could come undone.

How about restarting all slurmd's at version 20.11 in one shot?  No
reboot will be required.  There will be running 19.05 slurmstepd's for
the running job steps, even though slurmd is at 20.11.  You could
perhaps restart 20.11 slurmd one partition at a time in order to see if
it works correctly on a small partition of the cluster.

I think we have done this successfully when we install new RPMs on *all*
compute nodes in one shot, and I'm not aware of any job crashes.  Your
mileage may vary depending on job types!

Question: Does anyone have bad experiences with upgrading slurmd while
the cluster is running production?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220530/8df9ce6e/attachment.htm>

More information about the slurm-users mailing list