Hi Bjørn-Helge,
On 3/10/25 08:50, Bjørn-Helge Mevik via slurm-users wrote:
The slurmctld can be restarted immediately after upgrading without slurmdbd being available, and thereby your cluster will keep running without any interruption of service. A little later you can enable and start slurmdbd, and the delay of slurmdbd doesn't cause any problems for slurmctld or the users. I emphasize that we're discussing *minor release* upgrades only!
@Bjørn-Helge: Do you think there is good reason to start slurmdbd before slurmctld when doing minor release upgrades?
Not any more, it appears. But (unless my memory fails me again) earlier, slurmctld would refuse to start unless slurmdbd was running (when it was configured to use slurmdbd). Slurmctld would be fine with slurmdbd stopping while slurmctld was running, but upon start, it would require slurmdbd to be already running.
That's an interesting observation! I haven't tried this, and it would be worth testing.
My guess is that recent versions of Slurm would have no problem starting slurmctld when slurmdbd is down. The logic change of "scontrol reconfig" will restart slurmctld as introduced in 23.11 (or was it in 23.02?). See "Fixing 'scontrol reconfigure'" in https://slurm.schedmd.com/SLUG23/roadmap-slug23.pdf
Since our siter is already at 24.11.2, I wonder if someone can make the test on older Slurm releases?
Thanks, Ole