[slurm-users] Upcoming 22.05.5 release and current 22.05 upgrade warning
jbooth at schedmd.com
Fri Sep 30 21:13:58 UTC 2022
Hey folks -
As some of you have observed, one of the changes made in the Slurm 22.05
release was a security update where we added hashes to the RPCs. Part of
this change is that every Slurm binary now loads the "hash_k12" library.
Slurm validates that libraries are of the same version. Unfortunately, due
to an oversight, we failed to notice that the slurmstepd loads the hash_k12
library only after a job has completed. This means that if the hash_k12
library is upgraded before a job finishes, the slurmstepd will load the new
library when the job finishes, and will fail due to a mismatch of versions.
This results in nodes with slurmstepd processes stuck indefinitely. These
processes require manual intervention to clean up. There is no clean way to
resolve these hung slurmstepd processes.
This issue is being tracked in the following bugs:
Sites that are affected:
Sites that install and replace the current installation of Slurm,
overwriting the binaries and libraries in their current environment.
This means sites who use RPM's and perform a rolling upgrade with running
jobs between the 22.05 release.
Sites that are not affected:
Sites that use symlinks pointing to different versions as part of their
upgrade process and keep previous versions in place should not be affected
The only recommended way to upgrade between minor versions of 22.05 with
RPM’s or upgrades that replace current binaries and libraries is to drain
the nodes of running jobs first.
We are currently working on resolving this issue for future releases of
22.05, but for now, care should be taken when upgrading between minor
versions of 22.05.
We do apologize for the unfortunate oversight.
Director of Support, SchedMD LLC
Commercial Slurm Development and Support
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users