Hi Bjørn-Helge,
On 3/7/25 08:59, Bjørn-Helge Mevik via slurm-users wrote:
My 2¢:
If upgrading the deb packages does *not* restart the services, then you can just upgrade all the slurm packages on the controller, then restart slurmdbd first and slurmctld afterwards. (This is how I do upgrades with rpms.) If upgrading *does* restart the services, then you'd have to stop and disable them first (stop slurmctld, then slurmdbd), and after the upgrade, enable and start them (slurmdbd first), as others have answered.
Readers must understand that we are discussing *minor release* upgrades only.
When you have slurmctld and slurmdbd running on the same machine, all the Slurm packages will get upgraded simultaneously. The question is whether or not Systemd is going to restart the services as part of the package upgrade post-install? This is the case with the EL8 RPM packages built from the Slurm tar-balls (see [1]), and this works great for us with *minor release* upgrades.
As for the order of starting slurmctld and slurmdbd services running on the same server, I think it doesn't really matter with *minor release* upgrades, because there won't be any changes to the Slurm database format. Here I assume that Slurm minor upgrades don't crash the services :-) We have never experienced any such crashes for many, many past Slurm releases.
The slurmctld can be restarted immediately after upgrading without slurmdbd being available, and thereby your cluster will keep running without any interruption of service. A little later you can enable and start slurmdbd, and the delay of slurmdbd doesn't cause any problems for slurmctld or the users. I emphasize that we're discussing *minor release* upgrades only!
@Bjørn-Helge: Do you think there is good reason to start slurmdbd before slurmctld when doing minor release upgrades?
All in all, Slurm is very resilient when doing upgrades! Major release upgrades involves Slurm database format changes, and this must be done carefully, see the information in [2].
IMHO, Best Practice is to run slurmdbd and slurmctld on separate servers. I understand that with small clusters one may not afford the use of multiple servers, though.
Best regards, Ole
[1] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#build-slurm-pa... [2] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slur...