Can I update when jobs are running?

List overview All Threads
Download

newer

older

MUNGE security issue...

Adding FPGAs for Slurm Tracking

Gould, Ron (GRC-VBA0)[AEGIS]

20 Jan 2026 20 Jan '26

11:42 a.m.

The tl;dr is “This is my first upgrade since inheriting this Cluster, so I’m not sure what can or can’t be running during the upgrades.”.

My Cluster is running an old version, 22.05.3. This is my first upgrade since inheriting the Cluster. As such, I’d like to install 22.05.4 because it’s a short jump, and it fixes the bug my users are seeing.

The Cluster is composed of mostly Oracle Linux 8. I’m aware that I can upgrade within the two release compatibility window. I’ve read through the Upgrade guide and I’m unclear if downtime is required. Perhaps I’m unifying downtime requirements across different SLURM services where I should be interpreting that certain services have their own downtime requirements.

https://slurm.schedmd.com/upgrades.html https://slurm.schedmd.com/upgrades.html#procedure

In the Upgrade Procedure sectionhttps://slurm.schedmd.com/upgrades.html#procedure, there’re a couple questionable things.

1. Is downtime required? Does downtime == “all jobs must be halted”? “Downtime”, to me, seems like nothing should be running. This statement indicates that jobs can be running during the upgrade.

Before considering the upgrade complete, wait for all jobs that were already running to finish. Any jobs started before the slurmd system was upgraded will be running with the old version of slurmstepd, so starting another upgrade or trying to use new features in the new version may cause problems.

within a few paragraphshttps://slurm.schedmd.com/upgrades.html#downtime, this message indicates I will need downtime:

Refer to the expected downtime guidance in the following sections for each relevant Slurm daemon

Further in the guide, in SLURMD (COMPUTE NODES)https://slurm.schedmd.com/upgrades.html#slurmd, I read

Upgrades will not interrupt running jobs as long as SlurmdTimeout is not reached during the process

This implies, at least, that existing running jobs can stay running.

2. There’re conflicting suggestions of using “rpm” to install the RPMs I built with “rpmbuild". Should I use “dnf localinstall ./*.rpm”? I’m inferring that dependencies will (not) be handled correctly.

NOTE: If RPM/DEB packages are used, all packages present on each system must be upgraded together instead of piecewise. … Avoid using low-level package managers like rpm or dpkg as they may not properly enforce these dependencies

However, in SLURMDBD (ACCOUNTING)https://slurm.schedmd.com/upgrades.html#slurmdbd, this statement

Upgrade the slurmdbd daemon binaries, libraries, and its systemd unit file (if used). If using RPM/DEB packages, the package manager will take care of these

indicates I should be using RPM packages.

Lastly, to get to a current install, I need to step through multiple versions, with the condition that jobs started with a specific major version must finish within the compatibility window. GitLab has a tool where you plug in your current and intended versions and it tells you explicitly which versions are required along the upgrade path. I’d like a similarly explicit tool for SLURM, but I infer from the Compatibility Windowhttps://slurm.schedmd.com/upgrades.html#compatibility_window that I can update like so:

1. Current = 22.05.3 2. 23.11 3. 25.05 4. 26.05

That feels like a big leapfrog between versions. I’d like the practice of upgrading. Is there any detriment to upgrading at a slower pace:

1. Current = 22.05.3 2. 22.05.11 3. 23.02.8 4. 23.11.11 5. 24.05.8 6. 24.11.7 7. 25.05.6 8. 25.11.2

Attachments:

attachment.html (text/html — 11.7 KB)
image001.png (image/png — 6.4 KB)

Show replies by date

Paul Edmon

20 Jan 20 Jan

11:51 a.m.

I think those warnings are for the overly cautious. Certainly we have never waited for all jobs to exit before upgrading. Out of paranoia we pause all our jobs, but that is not required. Typically you can upgrade between versions without pausing or canceling jobs. That said you will want to look at the release notes and changelog for the version you want to upgrade to in case there is any issue that is flagged that requires more paranoia. Generally minor version upgrades are fine.

The thing I would note though is this phrase "/Any jobs started before the slurmd system was upgraded will be running with the old version of slurmstepd, so starting another upgrade or trying to use new features in the new version may cause problems."/ What really this is noting is that upgrading in quick succession (especially major upgrades) could be problematic. So say you were to go from 22.05.3 -> 23.11 and then immediately go to 25.05, that could cause problems. If you intend to go from your current version to the latest I recommend spacing out the upgrades, or taking a full downtime.

That said I have never done an upgrade over that large a version change so some one with more experience on the list should be able to answer any questions related to that. My gut says though that if I were trying to step to the latest version I would either clear out the existing jobs, or I would do one upgrade per week to give the jobs on the cluster time to adjust to the new version.

-Paul Edmon-

On 1/20/2026 2:42 PM, Gould, Ron (GRC-VBA0)[AEGIS] via slurm-users wrote:

...

The tl;dr is “This is my first upgrade since inheriting this Cluster, so I’m not sure what can or can’t be running during the upgrades.”.

My Cluster is running an old version, 22.05.3. This is my first upgrade since inheriting the Cluster. As such, I’d like to install 22.05.4 because it’s a short jump, and it fixes the bug my users are seeing.

The Cluster is composed of mostly Oracle Linux 8. I’m aware that I can upgrade within the two release compatibility window. I’ve read through the Upgrade guide and I’m unclear if downtime is required. Perhaps I’m unifying downtime requirements across different SLURM services where I should be interpreting that certain services have their own downtime requirements.

https://slurm.schedmd.com/upgrades.html https://slurm.schedmd.com/upgrades.html#procedure

In the Upgrade Procedure section https://slurm.schedmd.com/upgrades.html#procedure, there’re a couple questionable things.

Is downtime required? Does downtime == “all jobs must be halted”? “Downtime”, to me, seems like nothing should be running. This statement indicates that jobs can be running during the upgrade. / / / Before considering the upgrade complete, /*/wait for all jobs that were already running to finish/*/. Any jobs started before the slurmd system was upgraded will be running with the old version of slurmstepd, so starting another upgrade or trying to use new features in the new version may cause problems./

within a few paragraphs https://slurm.schedmd.com/upgrades.html#downtime, this message indicates I will need downtime: / / / Refer to the expected downtime guidance in the following sections for each relevant Slurm daemon/ / / Further in the guide, in SLURMD (COMPUTE NODES) https://slurm.schedmd.com/upgrades.html#slurmd, I read

/Upgrades will not interrupt running jobs as long as SlurmdTimeout is not reached during the process/

This implies, at least, that existing running jobs can stay running.

There’re conflicting suggestions of using “rpm” to install the RPMs I built with “rpmbuild". Should I use “dnf localinstall ./*.rpm”? I’m inferring that dependencies will (not) be handled correctly. / / / NOTE: If RPM/DEB packages are used, all packages present on each system must be upgraded together instead of piecewise. … /*/Avoid using low-level package managers like rpm or dpkg /*/as they may not properly enforce these dependencies/

However, in SLURMDBD (ACCOUNTING) https://slurm.schedmd.com/upgrades.html#slurmdbd, this statement

/Upgrade the slurmdbd daemon binaries, libraries, and its systemd unit file (if used). If using RPM/DEB packages, the package manager will take care of these/

indicates I should be using RPM packages.

Lastly, to get to a current install, I need to step through multiple versions, with the condition that jobs started with a specific major version must finish within the compatibility window. GitLab has a tool where you plug in your current and intended versions and it tells you explicitly which versions are required along the upgrade path. I’d like a similarly explicit tool for SLURM, but I infer from the Compatibility Window https://slurm.schedmd.com/upgrades.html#compatibility_window that I can update like so:

Current = 22.05.3

23.11

25.05

26.05

That feels like a big leapfrog between versions. I’d like the practice of upgrading. Is there any detriment to upgrading at a slower pace:

Current = 22.05.3

22.05.11

23.02.8

23.11.11

24.05.8

24.11.7

25.05.6

25.11.2

Ron Gould

12:29 p.m.

Thank you for that guidance. I am certainly in the "overly cautious" and "paranoid" groups.

I will probably go through the slower upgrade process (1-8 list), with at least a week between them.

And yes, if anyone has experience doing such a vault between versions, please chime in.

Davide DelVento

12:58 p.m.

Hi Ron,

I also am in the "paranoid" group. And I've always done updates with jobs "live". Depending on the size of your userbase you may want to consider pausing the submission/start of new jobs while you execute the dnf commands (yes, I use them, rather than the "raw" rpm, because I think they are less error prone with e.g. dependencies). Since you are in the same group as myself, you can save a list of running jobs before and after executing the dnf commands, and see if they match. If they do, congratulations, everything went well. If they don't, there is a (tiny) risk that the jobs which completed during that time might miss "something". Examine their logs and/or warn the users as appropriate. To be clear, this tiny risk is about jobs that would complete *on their own* during that timeframe, not that the slurm update will cause healy jobs to crash. What could happen is a race condition between the jobs terminating and the slurm update which might try to update some information in some DB in an inconsistent way. My understanding is that the job itself (e.g. output file) are safe, it's just the slurm records which might get some trouble.

You mention "waiting at least a week" between a subsequent update, but really the key point is this

*Before considering the upgrade complete, **wait for all jobs that were already running to finish**.*

Which means: if you have a 6h wallclock limit, you can wait only 7h. If you have a 2 months wallclock limit you need to wait for a bit more than 2 months. If you don't have wallclock limit.... you may have to wait forever.... Wait! You have the list of jobs because you are paranoid like myself and made one as mentioned above, so you have to wait "only" for all of them to be completed before proceeding, not "forever".

With these precautions, most likely you won't encounter any issue (of course that gets weighted with the size of the cluster: if you have a huge one with hundreds of thousands of users/jobs/nodes, you will see things that have 0.001% chance of happening and that most of us never encounter)

HTH.

On Tue, Jan 20, 2026 at 1:31 PM Ron Gould via slurm-users < slurm-users@lists.schedmd.com> wrote:

...

Thank you for that guidance. I am certainly in the "overly cautious" and "paranoid" groups.

I will probably go through the slower upgrade process (1-8 list), with at least a week between them.

And yes, if anyone has experience doing such a vault between versions, please chime in.

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Ron Gould

1:53 p.m.

Thank you for your pointers and sharing your experience.

My user base is likely small compared to other institutions. Currently, I have about 10 users running about 30 jobs, with some started today and the oldest started in September.

Regarding the "waiting a week" between updates, most of the jobs are short lived, with some taking less than a week. Given that I don't have a short WallClock value, I could update to 23.11 before those long jobs would have to be stopped and restarted under the new slurm dæmons. Doing a couple updates would give me ample practice and I can document the entire thing.

My "slurm_acct_db" database, I have daily, weekly, and monthly backups of it. It's under 2 GB if I had to re-import it. I don't suspect the slurmdbd upgrade will take long.

Prior to that DB backup, I have another script that backs up `${StateSaveLocation}` and "/etc/slurm". This is referenced in "https://slurm.schedmd.com/upgrades.html#backups".

Ole Holm Nielsen

21 Jan 21 Jan

12:04 a.m.

Hi Ron,

On 1/20/26 22:53, Ron Gould via slurm-users wrote:

...

Thank you for your pointers and sharing your experience.

We always upgrade Slurm while the cluster (700 nodes) is running production jobs, and we never had any issues. As Davide said, the chance of errors seems to be very small. Minor version upgrades should be simple to do because Slurm is basically unchanged. Major version upgrades should be done a little more carefully, just to be on the safe side.

I have collected information on Slurm upgrading, database dumps etc. in these Wiki pages:

https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slur...

https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#backup-and-restore...

Please beware of a MariaDB upgrade issue that was resolved in 22.05.7: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#slurm-database-mod...

IHTH, Ole

...

My user base is likely small compared to other institutions. Currently, I have about 10 users running about 30 jobs, with some started today and the oldest started in September.

Regarding the "waiting a week" between updates, most of the jobs are short lived, with some taking less than a week. Given that I don't have a short WallClock value, I could update to 23.11 before those long jobs would have to be stopped and restarted under the new slurm dæmons. Doing a couple updates would give me ample practice and I can document the entire thing.

My "slurm_acct_db" database, I have daily, weekly, and monthly backups of it. It's under 2 GB if I had to re-import it. I don't suspect the slurmdbd upgrade will take long.

Prior to that DB backup, I have another script that backs up `${StateSaveLocation}` and "/etc/slurm". This is referenced in "https://slurm.schedmd.com/upgrades.html#backups%3E%3E

-- Ole Holm Nielsen PhD, Senior HPC Officer Department of Physics, Technical University of Denmark

Ron Gould

7:21 a.m.

Hello Ole.

Thank you for those references. I found some of those wiki articles on a different thread. Much appreciated.

My Cluster uses MySQL, but I did see a cautionary note where "MySQL update" and "22.05.7" overlapped.

Thanks, Ron

Ron Gould

7:26 a.m.

Whoops, no I am using MariaDB.

Christopher Samuel

20 Jan 20 Jan

1 p.m.

On 1/20/26 2:42 pm, Gould, Ron (GRC-VBA0)[AEGIS] via slurm-users wrote:

...

My Cluster is running an old version, 22.05.3. This is my first upgrade since inheriting the Cluster. As such, I’d like to install 22.05.4 because it’s a short jump, and it fixes the bug my users are seeing.

My one comment would be that you would be better off going to the last release in the 22.05.x series, which was 22.05.11 to get various fixes for security issues in the intervening releases in place.

Changelog:

https://github.com/SchedMD/slurm/blob/master/CHANGELOG/slurm-22.05.md

Best of luck, Chris

-- Chris Samuel : http://www.csamuel.org/ : Philadelphia, PA, USA

Ron Gould

1:56 p.m.

Thank you for that point. That had occurred to me, but as this is my first upgrade, I just wanted to upgrade to the version that fixes the bug my users see. I need a li'l win for this Cluster :) . When I have that, I'll take a couple more baby steps and do 22.05.11 with a quick turnaround.

Ron Gould

23 Jan 23 Jan

9:09 a.m.

I've read through the upgrade documentation a couple times and I've done some dry run stuff.

I have "slurmdbd" and "slurmctld" installed on the main head node. The instructions call for upgrading "slurmdbd" first. I'm trying to use the dry run options to `dnf` and `rpm` and I'm getting some messages. Perhaps I don't have the correct options specified to upgrade from 22.05.3 to 22.05.4.

It doesn't explicitly say "remove ${OldVersion} and install ${NewVersion}". I'm hesitant to remove the package out of fear the 22.05.4 version won't install.

Using `dnf`: ``` # dnf upgrade --assumeno --best --allowerasing ./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm Dependencies resolved. =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Removing: slurm-slurmdbd x86_64 22.05.3-1.el7 @@commandline 2.4 M

Transaction Summary =================================================================================================================================================================================================================== Remove 1 Package

Freed space: 2.4 M Operation aborted. ```

Using `rpm`: ``` # rpm --test --install --upgrade ./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm error: Failed dependencies: slurm(x86-64) = 22.05.4-1.el7 is needed by slurm-slurmdbd-22.05.4-1.el7.x86_64 ```

If I then tell it to install that dependency, I get: ``` # rpm --test --install --upgrade ./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm ./slurm-22.05.4-1.el7.x86_64.rpm error: Failed dependencies: slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-perlapi-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-contribs-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-slurmd-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-devel-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-libpmi-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-pam_slurm-22.05.3-1.el7.x86_64 slurm(x86-64) = 22.05.3-1.el7 is needed by (installed) slurm-slurmctld-22.05.3-1.el7.x86_64 ```

John Hearns

3 Feb 3 Feb

6:09 a.m.

I would run

rpm -qa | grep slurm

This will tell you all the slurm packages you have on the system

On Fri, 23 Jan 2026 at 17:12, Ron Gould via slurm-users < slurm-users@lists.schedmd.com> wrote:

...

I've read through the upgrade documentation a couple times and I've done some dry run stuff.

It doesn't explicitly say "remove ${OldVersion} and install ${NewVersion}". I'm hesitant to remove the package out of fear the 22.05.4 version won't install.

Using `dnf`:

# dnf upgrade --assumeno --best --allowerasing
./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm
Dependencies resolved.

===================================================================================================================================================================================================================
Package                                              Arch
                       Version
 Repository                                           Size

===================================================================================================================================================================================================================
Removing:
slurm-slurmdbd                                       x86_64
                       22.05.3-1.el7
 @@commandline                                       2.4 M

Transaction Summary

===================================================================================================================================================================================================================
Remove  1 Package

Freed space: 2.4 M
Operation aborted.

Using `rpm`:

# rpm --test --install --upgrade ./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm
error: Failed dependencies:
        slurm(x86-64) = 22.05.4-1.el7 is needed by
slurm-slurmdbd-22.05.4-1.el7.x86_64

If I then tell it to install that dependency, I get:

# rpm --test --install --upgrade ./slurm-slurmdbd-22.05.4-1.el7.x86_64.rpm
./slurm-22.05.4-1.el7.x86_64.rpm
error: Failed dependencies:
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-perlapi-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-contribs-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-slurmd-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-devel-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-libpmi-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-pam_slurm-22.05.3-1.el7.x86_64
        slurm(x86-64) = 22.05.3-1.el7 is needed by (installed)
slurm-slurmctld-22.05.3-1.el7.x86_64

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Ron Gould

6:20 a.m.

Hi there.

The SLURM DB service is installed alongside the other SLURM packages on the node I'm starting with.

slurm-slurmdbd-22.05.3-1.el7.x86_64 slurm-22.05.3-1.el7.x86_64 slurm-contribs-22.05.3-1.el7.x86_64 slurm-devel-22.05.3-1.el7.x86_64 slurm-example-configs-22.05.3-1.el7.x86_64 slurm-libpmi-22.05.3-1.el7.x86_64 slurm-openlava-22.05.3-1.el7.x86_64 slurm-pam_slurm-22.05.3-1.el7.x86_64 slurm-perlapi-22.05.3-1.el7.x86_64 slurm-slurmctld-22.05.3-1.el7.x86_64 slurm-slurmd-22.05.3-1.el7.x86_64 slurm-torque-22.05.3-1.el7.x86_64

It doesn't seem like I am able to upgrade that one package first like the instructions say to. Is it okay to upgrade all of the SLURM packages in one go? If so, I would stop the slurmdbd service, back up the appropriate directories, back up the DB into a self-contained file, upgrade the packages all together, start the slurmdbd service ( `sudo -u slurm slurmdbd -D` ), and so forth according to the instructions.

Is that a cogent plan?

Paul Edmon

6:42 a.m.

Similar to you we host our slurmdbd and slurmctld on the same host. So when we upgrade we follow the following steps:

1. Stop slurmctld

2. Stop slurmdbd

3. Take backups of the slurm database and slurm spool directory

4. Update rpms (make sure to neutered the autorestart function in the rpm spec as that can restart things before you are ready)

5. Run slurmdbd -D in a tmux (as the upgrade can take a while and if you use systemd it will terminate early).

6. Once slurmdbd ugprade is complete stop the command line version of slurmdbd and restart it using systemd

7. Global restart of slurmctld and slurmd for the cluster.

-Paul Edmon-

On 2/3/26 9:20 AM, Ron Gould via slurm-users wrote:

...

Hi there.

The SLURM DB service is installed alongside the other SLURM packages on the node I'm starting with.

slurm-slurmdbd-22.05.3-1.el7.x86_64 slurm-22.05.3-1.el7.x86_64 slurm-contribs-22.05.3-1.el7.x86_64 slurm-devel-22.05.3-1.el7.x86_64 slurm-example-configs-22.05.3-1.el7.x86_64 slurm-libpmi-22.05.3-1.el7.x86_64 slurm-openlava-22.05.3-1.el7.x86_64 slurm-pam_slurm-22.05.3-1.el7.x86_64 slurm-perlapi-22.05.3-1.el7.x86_64 slurm-slurmctld-22.05.3-1.el7.x86_64 slurm-slurmd-22.05.3-1.el7.x86_64 slurm-torque-22.05.3-1.el7.x86_64

It doesn't seem like I am able to upgrade that one package first like the instructions say to. Is it okay to upgrade all of the SLURM packages in one go? If so, I would stop the slurmdbd service, back up the appropriate directories, back up the DB into a self-contained file, upgrade the packages all together, start the slurmdbd service ( `sudo -u slurm slurmdbd -D` ), and so forth according to the instructions.

Is that a cogent plan?

Ron Gould

7:51 a.m.

Thank you Paul. That helps a lot.

Regarding item 4., do you use `dnf` or `rpm` to install the RPMs? What's your syntax?

Also in 4., I may need to rebuild my RPMs anway, as I may have to update an option. What's the option to "neutered the autorestart function"?

Thanks for your help, Ron

Paul Edmon

8:01 a.m.

Yeah, we use dnf. I do:

dnf update slurm slurm-libpmi slurm-devel slurm-contribs slurm-slurmctld slurm-perlapi slurm-slurmrestd slurm-slurmdbd

For the autorestart compare:

https://github.com/SchedMD/slurm/blob/cead8b9d2c2360f976c77d9e9e7ab875de9d86...

With:

https://github.com/fasrc/slurm-spec/blob/75ec489f2a8da057432495c0a726ed80855...

You can see that in the official spec it has:

%systemd_postun_with_restart slurmdbd.service

Which means when the rpm is uninstalled it restarts the service. You don't want that to happen in this case as that can mess up the slurmdbd (unless you increase the systemd timeouts). So in our spec we nerf it and understand that we have to force a restart rather than letting dnf do the restart itself.

Basically its a paranoia step we take as we have been bitten by it before as our database got messed up due to this exact thing happening and we had to reimport our backup (which is why you take a backup)

-Paul Edmon-

On 2/3/26 10:51 AM, Ron Gould via slurm-users wrote:

...

Thank you Paul. That helps a lot.

Regarding item 4., do you use `dnf` or `rpm` to install the RPMs? What's your syntax?

Also in 4., I may need to rebuild my RPMs anway, as I may have to update an option. What's the option to "neutered the autorestart function"?

Thanks for your help, Ron

Ron Gould

9:05 a.m.

Interesting. Is that something I can do with the existing code as downloaded (TAR.XZ file) or do I need to fork it and make changes?

Well, I have the code downloaded. I can extract it and modify it locally.

Maybe I've answered my own question. Please confirm.

Paul Edmon

9:09 a.m.

We generally maintain our own spec for that reason. It's just a clone of the spec that Slurm provides but with our edits. That repo I linked is what we use for maintaining that spec with a branch for each version (https://github.com/fasrc/slurm-spec)

-Paul Edmon-

On 2/3/26 12:05 PM, Ron Gould via slurm-users wrote:

...

Interesting. Is that something I can do with the existing code as downloaded (TAR.XZ file) or do I need to fork it and make changes?

Well, I have the code downloaded. I can extract it and modify it locally.

Maybe I've answered my own question. Please confirm.

Ron Gould

4 Feb 4 Feb

10:08 a.m.

I modified the "slurm.spec" file in the TAR.BZ2 file by prepending "###" to the

`%systemd_postun_with_restart slurmdbd.service`

line at the end, in the "%postun" section and added

`%systemd_postun slurmdbd.service`

That didn't seem to work.

I still ended up with a postuninstall script that tries to restart the slurmdbd service:

``` # rpm -q --scripts "rpmbuild.20260204/RPMS/x86_64/slurm-slurmdbd-22.05.4-1.el8.x86_64.rpm" postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then # Initial installation systemctl --no-reload preset slurmdbd.service &>/dev/null || : fi preuninstall scriptlet (using /bin/sh):

if [ $1 -eq 0 ] ; then # Package removal, not upgrade systemctl --no-reload disable --now slurmdbd.service &>/dev/null || : fi postuninstall scriptlet (using /bin/sh): ### if [ $1 -ge 1 ] ; then # Package upgrade, not uninstall systemctl try-restart slurmdbd.service &>/dev/null || : fi ```

The prepended "###" on that line might've been the problem. If I just changed the SPEC file to

``` ... %post slurmdbd %systemd_post slurmdbd.service %preun slurmdbd %systemd_preun slurmdbd.service %postun slurmdbd %systemd_postun slurmdbd.service ``` ,

I then get a better looking postuninstall script:

``` # rpm -q --scripts "rpmbuild.20260204 Modified SPEC v1/RPMS/x86_64/slurm-slurmdbd-22.05.4-1.el8.x86_64.rpm" postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then # Initial installation systemctl --no-reload preset slurmdbd.service &>/dev/null || : fi preuninstall scriptlet (using /bin/sh):

if [ $1 -eq 0 ] ; then # Package removal, not upgrade systemctl --no-reload disable --now slurmdbd.service &>/dev/null || : fi postuninstall program: /bin/sh ```

Is having an empty "postuninstall program: /bin/sh" entry okay?

Paul Edmon

10:27 a.m.

That's because your previous version had the restart in it. The restart unfortunately happens when the package is uninstalled, and thus applies to the previous version, not the new version. There isn't much you can do about that you will just need to be careful.

-Paul Edmon-

On 2/4/26 1:08 PM, Ron Gould via slurm-users wrote:

...

I modified the "slurm.spec" file in the TAR.BZ2 file by prepending "###" to the

`%systemd_postun_with_restart slurmdbd.service`

line at the end, in the "%postun" section and added

`%systemd_postun slurmdbd.service`

That didn't seem to work.

I still ended up with a postuninstall script that tries to restart the slurmdbd service:

# rpm -q --scripts "rpmbuild.20260204/RPMS/x86_64/slurm-slurmdbd-22.05.4-1.el8.x86_64.rpm"
postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then
         # Initial installation
         systemctl --no-reload preset slurmdbd.service &>/dev/null || :
fi
preuninstall scriptlet (using /bin/sh):

if [ $1 -eq 0 ] ; then
         # Package removal, not upgrade
         systemctl --no-reload disable --now slurmdbd.service &>/dev/null || :
fi
postuninstall scriptlet (using /bin/sh):
###
if [ $1 -ge 1 ] ; then
         # Package upgrade, not uninstall
         systemctl try-restart slurmdbd.service &>/dev/null || :
fi

The prepended "###" on that line might've been the problem. If I just changed the SPEC file to

...
%post slurmdbd
%systemd_post slurmdbd.service
%preun slurmdbd
%systemd_preun slurmdbd.service
%postun slurmdbd
%systemd_postun slurmdbd.service

I then get a better looking postuninstall script:

# rpm -q --scripts "rpmbuild.20260204 Modified SPEC v1/RPMS/x86_64/slurm-slurmdbd-22.05.4-1.el8.x86_64.rpm"
postinstall scriptlet (using /bin/sh):

if [ $1 -eq 1 ] ; then
         # Initial installation
         systemctl --no-reload preset slurmdbd.service &>/dev/null || :
fi
preuninstall scriptlet (using /bin/sh):

if [ $1 -eq 0 ] ; then
         # Package removal, not upgrade
         systemctl --no-reload disable --now slurmdbd.service &>/dev/null || :
fi
postuninstall program: /bin/sh

Is having an empty "postuninstall program: /bin/sh" entry okay?

Steffen Grunewald

11:56 p.m.

Stupid (?) question: wouldn't a `systemctl disable slurmdbd.service` be the best choice? I'm a Debian user and in the past this strategy seems to have worked, with the packages provided by Debian developers/maintainers ... is RPM really that different?

Thanks, S

On Wed, 2026-02-04 at 13:27:10 -0500, Slurm users wrote:

...

That's because your previous version had the restart in it. The restart unfortunately happens when the package is uninstalled, and thus applies to the previous version, not the new version. There isn't much you can do about that you will just need to be careful.

-Paul Edmon-

On 2/4/26 1:08 PM, Ron Gould via slurm-users wrote:

...
I modified the "slurm.spec" file in the TAR.BZ2 file by prepending "###" to the

`%systemd_postun_with_restart slurmdbd.service`

line at the end, in the "%postun" section and added

`%systemd_postun slurmdbd.service`

That didn't seem to work.

Paul Edmon

5 Feb 5 Feb

6:47 a.m.

Probably. I've never tried that. I imagine that will work.

-Paul Edmon-

On 2/5/26 2:56 AM, Steffen Grunewald wrote:

...

Stupid (?) question: wouldn't a `systemctl disable slurmdbd.service` be the best choice? I'm a Debian user and in the past this strategy seems to have worked, with the packages provided by Debian developers/maintainers ... is RPM really that different?

Thanks, S

On Wed, 2026-02-04 at 13:27:10 -0500, Slurm users wrote:

...
That's because your previous version had the restart in it. The restart unfortunately happens when the package is uninstalled, and thus applies to the previous version, not the new version. There isn't much you can do about that you will just need to be careful.

-Paul Edmon-

On 2/4/26 1:08 PM, Ron Gould via slurm-users wrote:

...
I modified the "slurm.spec" file in the TAR.BZ2 file by prepending "###" to the

`%systemd_postun_with_restart slurmdbd.service`

line at the end, in the "%postun" section and added

`%systemd_postun slurmdbd.service`

That didn't seem to work.

Ron Gould

7:09 a.m.

I was thinking that. I'm glad someone asked it.

Ron Gould

9 Feb 9 Feb

1:47 p.m.

Thanks y'all for the help. I upgraded from v22.05.3 to v22.05.4 on Friday. I thoroughly documented the process. I'm going to proceed through a few more minor versions to get the process down.

My DB size wasn't big enough to warrant starting `slurmdbd` manually versus letting systemd start it. Still, it's good to keep that in place for insurance.

Christopher Samuel

2:05 p.m.

On 2/9/26 4:47 pm, Ron Gould via slurm-users wrote:

...

My DB size wasn't big enough to warrant starting `slurmdbd` manually versus letting systemd start it. Still, it's good to keep that in place for insurance.

That's only an issue when doing a major Slurm upgrade from (eg) 23.11 to 24.05 or 24.11.

For a minor upgrade like you're doing you should be able to mix and match versions (though of course it's best to minimise it).

All the best, Chris

Ron Gould

11 Feb 11 Feb

6:53 a.m.

Good to know. I'll add that to the documentation I made.

Cutts, Tim

9:12 a.m.

Out of interest, what happens if you’ve added some custom indexes to your slurmdbd? I’ve had slow query log running on ours, and added some custom indexes for certain cases, which has made sacct quite a lot faster for certain queries, but I worry about whether I’ve created a land-mine for future upgrades which include schema changes…. I can always delete the indexes as part of my SOP of course, and re-create them afterwards

Tim

From: Christopher Samuel via slurm-users slurm-users@lists.schedmd.com Date: Monday, 9 February 2026 at 22:07 To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com Subject: [slurm-users] Re: Can I update when jobs are running? On 2/9/26 4:47 pm, Ron Gould via slurm-users wrote:

...

My DB size wasn't big enough to warrant starting `slurmdbd` manually versus letting systemd start it. Still, it's good to keep that in place for insurance.

That's only an issue when doing a major Slurm upgrade from (eg) 23.11 to 24.05 or 24.11.

For a minor upgrade like you're doing you should be able to mix and match versions (though of course it's best to minimise it).

All the best, Chris

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com ________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com

Age (days ago)

Last active (days ago)

slurm-users@lists.schedmd.com

26 comments

9 participants

tags (0)

participants (9)

Christopher Samuel
Cutts, Tim
Davide DelVento
Gould, Ron (GRC-VBA0)[AEGIS]
John Hearns
Ole Holm Nielsen
Paul Edmon
Ron Gould
Steffen Grunewald