[slurm-users] Steps to upgrade slurm for a patchlevel change?

Ryan Novosielski novosirj at rutgers.edu
Fri Sep 29 06:48:24 UTC 2023

I started off writing there’s really no particular process for these/just do your changes and start the new software (be mindful of any PATH that might contain data that’s under your software tree, if you have that setup), and that you might need to watch the timeouts, but I figured I’d have a look at the upgrade guide to be sure.

There’s really nothing onerous in there. I’d personally back up my database and state save directories just because I’d rather be safe than sorry, or for if have to go backwards and want to be sure. You can run SlurmCtld for a good while with no database (note that -M on the command line will be broken during that time), just being mindful of the RAM on the SlurmCtld machine/don’t restart it before the DB is back up, and backing up our fairly large database doesn’t take all that long. Whether or not 5 is required mostly depends on how long you think it will take you to do 6-11 (which could really take you seconds if your process is really as simple as stop, change symlink, start), 12 you’re going to do no matter what, 13 you don’t need if you skipped 5, and 14 is up to you. So practically, that’s what you’re going to do anyway.

We just did an upgrade last week, and the only difference is that our compute nodes are stateless, so the compute node upgrades were a reboot (we could upgrade them running, but we did it during a maintenance period anyway, so why?).

If you want to do this with running jobs, I’d definitely back up the state save directory, but as long as you watch the timeouts, it’s pretty uneventful. You won’t have that long database upgrade period, since no database modifications will be required, so it’s pretty much like upgrading anything else.

|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark

On Sep 28, 2023, at 11:58, Groner, Rob <rug262 at psu.edu> wrote:

There's 14 steps to upgrading slurm listed on their website, including shutting down and backing up the database.  So far we've only updated slurm during a downtime, and it's been a major version change, so we've taken all the steps indicated.

We now want to upgrade from 23.02.4 to 23.02.5.

Our slurm builds end up in version named directories, and we tell production which one to use via symlink.  Changing the symlink will automatically change it on our slurm controller node and all slurmd nodes.

Is there an expedited, simple, slimmed down upgrade path to follow if we're looking at just a . level upgrade?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230929/4a4a102a/attachment.htm>

More information about the slurm-users mailing list