[slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

Tue Sep 25 07:54:31 MDT 2018

Thank you for your comments. I could potentially force the upgrade of the slurm and slurm-slumdbd rpms using something like:

rpm -Uvh --noscripts --nodeps --force slurm-18.08.0-1.el7.x86_64.rpm slurm-slurmdbd-18.08.0-1.el7.x86_64.rpm

That will certainly work, however the slurmctld (or in the case of my test node, the slurmd) will be killed. The logic is that at v17.02 the slurm rpm provides slurmctld and slurmd. So upgrading that rpm will destroy/kill the existing slurmctld or slurmd processes. That is...

# rpm -q --whatprovides /usr/sbin/slurmctld
slurm-17.02.8-1.el7.x86_64

So if I force the upgrade of that rpm then I delete and kill /usr/sbin/slurmctld. In the new rpm structure slurmctld is now provided by its own rpm.

I would have thought that someone would have crossed this bridge, but maybe most admins don't use rpms...

Best regards,
David

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Chris Samuel <chris at csamuel.org>
Sent: 25 September 2018 13:00
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Upgrading a slurm on a cluster, 17.02 --> 18.08

On Tuesday, 25 September 2018 9:41:10 PM AEST Baker D. J.  wrote:

> I guess that the only solution is to upgrade all the slurm at once. That
> means that the slurmctld will be killed (unless it has been stopped first).

We don't use RPMs from Slurm [1], but the rpm command does have a --noscripts
option to (allegedly, I've never used it) suppress the execution of pre/post
install scripts.

A big warning would be do not use systemctl to start the new slurmdbd for the
first time when upgrading!

Stop the older one first (and then take a database dump) and then run the new
slurmdbd process with the "-Dvvv" options (inside screen, just in case) so
that you can watch its progress and systemd won't decide it's been taking too
long to start and try and kill it part way through the database upgrades).

Once that's completed successfully then you can ^C it and start it up via the
systemctl once more.

Hope that's useful!

All the best,
Chris

[1] - I've always installed into ${shared_local_area}/slurm/${version} and had
a symlink called "latest" that points at the currently blessed version of
Slurm.  Then I stop slurmdbd, upgrade that as above, then I can do slurmctld
(with partitions marked down, just in case).  Once those are done I can
restart slurmd's around the cluster.

--
 Chris Samuel  :  https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C35d2a0583f124e84bf0d08d622deab4e%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=uTVuUTGI3fpPZqffe1p5RifQ1%2BG%2FbsrW0ixkCeu%2FxKw%3D&reserved=0  :  Melbourne, VIC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180925/11dc43e1/attachment-0001.html>