[slurm-users] Slurm does not start after (stupid) upgrade from 16.05.9 to 20.11.7
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Wed Aug 25 09:13:41 UTC 2021
On 8/25/21 10:48 AM, Julien Tailleur wrote:
> We have been running a computing cluster using slurm since 2016, that I
> installed back then, with some help from others. I was pretty late on
> upgrades and decided to upgrade the cluster up to debian Bullseye, which
> runs slurm 20.11.7, starting from stretch, that runs slurm 16.05.9.
SchedMD documents that upgrades must be at most 2 major versions, see
https://slurm.schedmd.com/quickstart_admin.html#upgrade. So you would
have to go through 16.05 -> 17.02 -> 18.08 -> 20.02 -> 20.11 (soon 21.08
will be out). Whether you can find Debian packages for these old versions
is unknown to me.
I have collected some Slurm upgrading information in
It's written for CentOS, but the Slurm parts would be the same.
> While the update of the system in itself went smoothly, slurm is broken.
> Of course, that's the stage at which I thought "Oh, I should have checked
> if the upgrade is supposed to be harmless"... Now that's the self-bashing
> is rightfully done, I would be very happy with some help! I hesitate
> between two strategies: removing slurm completely and a completely new
> installation, or trying to save what can be saved... I am tempted by the
> former since I remember suffering a bit to get the installation right in
> the first place...
A useable database dump from the old 16.05 is vital! You could start
again with Slurm 16.05 and upgrade in 4 steps as indicated above.
Beware of potential database issues:
If the 4-step upgrade doesn't work, starting from scratch seems to be the
only option :-( My Slurm Wiki page may perhaps be of a little help:
More information about the slurm-users