Hi,
I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) slurmctld (c) slurmd separately in this order, stopping each service for the upgrade. How can I follow this when the Debian packages have a dependency between slurmdbd + slurmctld that upgrades both packages at the same time?
thx Matthias
Hi,
If they are on two different machines, doing as described on the slurm documentation is not a problem. I just updated an ubuntu installation a few days ago following the doc and they can both can run without the other running
If they are on the same machine, my guess would be that you have to first stop slurmdb then slurmctld, update them both, first start slurmdb and them slurmctld. Just be sure to disable them both in systemd, to avoid that they restart in the wrong order. (it happened to me with apt). I am not experienced on one machine configs, so someone else might be able to confirm or correct me.
Hope that helps
Paul
On 06/03/2025 17.04, Matthias Leopold via slurm-users wrote:
Hi,
I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) slurmctld (c) slurmd separately in this order, stopping each service for the upgrade. How can I follow this when the Debian packages have a dependency between slurmdbd + slurmctld that upgrades both packages at the same time?
thx Matthias
On 3/6/25 18:10, Paul Musset via slurm-users wrote:
If they are on two different machines, doing as described on the slurm documentation is not a problem. I just updated an ubuntu installation a few days ago following the doc and they can both can run without the other running
If they are on the same machine, my guess would be that you have to first stop slurmdb then slurmctld, update them both, first start slurmdb and them slurmctld. Just be sure to disable them both in systemd, to avoid that they restart in the wrong order. (it happened to me with apt). I am not experienced on one machine configs, so someone else might be able to confirm or correct me.
We also run slurmdbd and slurmctld on separate servers so that upgrading Slurm will be safer and in order to optimize the system performance. The slurmctld doesn't require slurmdbd to be available immediately, since slurmctld caches its data for a long time (hours?) until slurmdbd becomes available again.
When doing *minor Slurm updates* (like 24.05.4 to 24.05.5), the daemons will start immediately from Systemd, and we always do such RPM updates without experiencing any issues (we run the RPM-based RockyLinux).
Since you have slurmdbd and slurmctld on the same machine, I think these daemons don't depend (strongly) on each other and can be upgraded at the same time, as long as you are only making *minor version updates*. If you want to play it safe, you can disable automatic startup of slurmdbd by Systemd (systemctl disable slurmdbd; systemctl stop slurmdbd). When your Slurm packages have been upgraded and slurmctld is running normally, you can enable and start the slurmdbd with Systemd. The slurmctld will connect to slurmdbd successfully at this stage.
For *major release upgrades* the important thing when upgrading slurmdbd is do *disable* automatic startup from Systemd. After upgrading slurmdbd with a *major release* you must start it manually because database conversion can take a long time (minutes to hours). Details are discussed in this Wiki page: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurmd...
IHTH, Ole
On 06/03/2025 17.04, Matthias Leopold via slurm-users wrote:
Hi,
I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) slurmctld (c) slurmd separately in this order, stopping each service for the upgrade. How can I follow this when the Debian packages have a dependency between slurmdbd + slurmctld that upgrades both packages at the same time?
My 2¢:
If upgrading the deb packages does *not* restart the services, then you can just upgrade all the slurm packages on the controller, then restart slurmdbd first and slurmctld afterwards. (This is how I do upgrades with rpms.) If upgrading *does* restart the services, then you'd have to stop and disable them first (stop slurmctld, then slurmdbd), and after the upgrade, enable and start them (slurmdbd first), as others have answered.
Hi Bjørn-Helge,
On 3/7/25 08:59, Bjørn-Helge Mevik via slurm-users wrote:
My 2¢:
If upgrading the deb packages does *not* restart the services, then you can just upgrade all the slurm packages on the controller, then restart slurmdbd first and slurmctld afterwards. (This is how I do upgrades with rpms.) If upgrading *does* restart the services, then you'd have to stop and disable them first (stop slurmctld, then slurmdbd), and after the upgrade, enable and start them (slurmdbd first), as others have answered.
Readers must understand that we are discussing *minor release* upgrades only.
When you have slurmctld and slurmdbd running on the same machine, all the Slurm packages will get upgraded simultaneously. The question is whether or not Systemd is going to restart the services as part of the package upgrade post-install? This is the case with the EL8 RPM packages built from the Slurm tar-balls (see [1]), and this works great for us with *minor release* upgrades.
As for the order of starting slurmctld and slurmdbd services running on the same server, I think it doesn't really matter with *minor release* upgrades, because there won't be any changes to the Slurm database format. Here I assume that Slurm minor upgrades don't crash the services :-) We have never experienced any such crashes for many, many past Slurm releases.
The slurmctld can be restarted immediately after upgrading without slurmdbd being available, and thereby your cluster will keep running without any interruption of service. A little later you can enable and start slurmdbd, and the delay of slurmdbd doesn't cause any problems for slurmctld or the users. I emphasize that we're discussing *minor release* upgrades only!
@Bjørn-Helge: Do you think there is good reason to start slurmdbd before slurmctld when doing minor release upgrades?
All in all, Slurm is very resilient when doing upgrades! Major release upgrades involves Slurm database format changes, and this must be done carefully, see the information in [2].
IMHO, Best Practice is to run slurmdbd and slurmctld on separate servers. I understand that with small clusters one may not afford the use of multiple servers, though.
Best regards, Ole
[1] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#build-slurm-pa... [2] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slur...
Thanks for all replies. I'll take the hints with running slurmctld/slurmdbd on separate nodes and disabling systemd units when upgrading (I thought of that) with me.
Matthias
Am 06.03.25 um 17:04 schrieb Matthias Leopold via slurm-users:
Hi,
I'm building Slurm Debian packages from SchedMD sources using this tutorial https://www.schedmd.com/slurm/installation-tutorial/. Now I tried upgrading (minor release upgrade within 24.05) using these packages. https://slurm.schedmd.com/upgrades.html tells me to upgrade (a) slurmdbd (b) slurmctld (c) slurmd separately in this order, stopping each service for the upgrade. How can I follow this when the Debian packages have a dependency between slurmdbd + slurmctld that upgrades both packages at the same time?
thx Matthias