Hi Ron,
On 1/20/26 22:53, Ron Gould via slurm-users wrote:
Thank you for your pointers and sharing your experience.
We always upgrade Slurm while the cluster (700 nodes) is running production jobs, and we never had any issues. As Davide said, the chance of errors seems to be very small. Minor version upgrades should be simple to do because Slurm is basically unchanged. Major version upgrades should be done a little more carefully, just to be on the safe side.
I have collected information on Slurm upgrading, database dumps etc. in these Wiki pages:
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slur...
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#backup-and-restore...
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#backup-and-restore...
Please beware of a MariaDB upgrade issue that was resolved in 22.05.7: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#slurm-database-mod...
IHTH, Ole
My user base is likely small compared to other institutions. Currently, I have about 10 users running about 30 jobs, with some started today and the oldest started in September.
Regarding the "waiting a week" between updates, most of the jobs are short lived, with some taking less than a week. Given that I don't have a short WallClock value, I could update to 23.11 before those long jobs would have to be stopped and restarted under the new slurm dæmons. Doing a couple updates would give me ample practice and I can document the entire thing.
My "slurm_acct_db" database, I have daily, weekly, and monthly backups of it. It's under 2 GB if I had to re-import it. I don't suspect the slurmdbd upgrade will take long.
Prior to that DB backup, I have another script that backs up `${StateSaveLocation}` and "/etc/slurm". This is referenced in "https://slurm.schedmd.com/upgrades.html#backups%3E%3E