[slurm-users] Rolling upgrade of compute nodes
Stephan Roth
stephan.roth at ee.ethz.ch
Mon May 30 06:54:20 UTC 2022
Hi Byron,
If you have the means to set up a test environment to try the upgrade
first, I recommend to do it.
The upgrade from 19.05 to 20.11 worked for two clusters I maintain with
a similar NFS based setup, except we keep the Slurm configuration
separated from the Slurm software accessible through NFS.
For updates staying between 2 major releases this should work well by
restarting the Slurm daemons in the recommended order (see
https://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf) after switching the
soft link to 20.11:
1. slurmdbd
2. slurmctld
3. individual slurmd on your nodes
To be able to revert back to 19.05 you should dump the database between
stopping and starting slurmdbd as well as backing up StateSaveLocation
between stopping/restarting slurmctld.
slurmstepd's of running jobs will continue to run on 19.05 after
restarting the slurmd's.
Check individual slurmd.log files for problems.
Cheers,
Stephan
On 30.05.22 00:09, byron wrote:
> Hi
>
> I'm currently doing an upgrade from 19.05 to 20.11.
>
> All of our compute nodes have the same install of slurm NFS mounted.
> The system has been setup so that all the start scripts and
> configuration files point to the default installation which is a soft
> link to the most recent installation of slurm.
>
> This is the first time I've done an upgrade of slurm and I had been
> hoping to do a rolling upgrade as opposed to waiting for all the jobs to
> finish on all the compute nodes and then switching across but I dont see
> how I can do it with this setup. Does any one have any expereience of this?
>
> Thanks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4252 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220530/7cede773/attachment.bin>
More information about the slurm-users
mailing list