<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>This is one of the reasons we stick with using RPM's rather than
the symlink process. It's just cleaner and avoids the issue of
having the install on shared storage that may get overwhelmed with
traffic or suffer outages. Also the package manager automatically
removes the previous versions and local installs stuff. I've never
been a fan of the symlink method has it runs counter to the entire
point and design of Linux and package managers which are supposed
to do this heavy lifting for you.</p>
<p><br>
</p>
<p>Rant aside :). Generally for minor upgrades the process is less
touchy. For our setup we follow the following process that works
well for us, but does create an outage for the period of the
upgrade.</p>
<p><br>
</p>
<p>1. Set all partitions to down: This makes sure no new jobs are
scheduled.</p>
<p>2. Suspend all jobs: This makes sure jobs aren't running while we
upgrade.</p>
<p>3. Stop slurmctld and slurmdbd.</p>
<p>4. Upgrade the slurmdbd. Restart slurmdbd<br>
</p>
<p>5. Upgrade the slurmd and slurmctld across the cluster.</p>
<p>6. Restart slurmd and slurmctld simultaneously using choria.</p>
<p>7. Unsuspend all jobs</p>
<p>8. Reopen all partitions.</p>
<p><br>
</p>
<p>For major upgrades we always take a mysqldump and backup the
spool for the slurmctld before upgrading just in case something
goes wrong. We've had this happen before when the slurmdbd upgrade
cut out early (note, always run the slurmdbd and slurmctld
upgrades in -D mode and not via systemctl as systemctl can timeout
and kill the upgrade midway for large upgrades).</p>
<p><br>
</p>
<p>That said I've also skipped steps 1, 2, 7, and 8 before for minor
upgrades and it works fine. The slurmd, slurmctld, and slurmdbd
can all run on different versions so long as the slurmdbd >
slurmctld > slurmd. So if you want to do a live upgrade you
can do it. However out paranoia we general stop everything. The
entire process takes about an hour start to finish, with the
longest part being the pausing of all the jobs.</p>
<p><br>
</p>
<p>-Paul Edmon-</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 9/29/2023 9:48 AM, Groner, Rob
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BL0PR02MB4499F4AD1D847AAB03562CE680C0A@BL0PR02MB4499.namprd02.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof">
I did already see the upgrade section of Jason's talk, but it
wasn't much about the mechanics of the actual upgrade process,
more of a big picture it seemed. It dealt a lot with different
parts of slurm at different versions, which is something we
don't have.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof">
<span style="display: inline !important; color: rgb(0, 0, 0);
background-color: rgb(255, 255, 255);" class="ContentPasted1"><br>
</span></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof">
<span style="display: inline !important; color: rgb(0, 0, 0);
background-color: rgb(255, 255, 255);" class="ContentPasted1">One
little wrinkle here is that while, yes, we're using a symlink
to point to what version of slurm is the current one...it's
all on a shared filesystem. So, ALL nodes, slurmdb, slurmctld
are using that same symlink. There is no means to upgrade one
component at a time. That means to upgrade, EVERYTHING has to
come down before it could come back up. Jason's slides seemed
to indicate that, if there were separate symlinks, then I
could focus on just the slurmdb first and upgrade it...then
focus on slurmctld and upgrade it, and then finally the nodes
(take down their slurmd, upgrade the link, bring up slurmd).
So maybe that's what I'm missing.</span></div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
Otherwise, I think what I'm saying is that I see references to a
"rolling upgrade", but I don't see any guide to a rolling
upgrade. I just see the 14 steps in <a
href="https://slurm.schedmd.com/quickstart_admin.html#upgrade"
id="OWAe45506a3-ac1b-a0e7-1d1f-010ede9d465b"
class="OWAAutoLink moz-txt-link-freetext"
moz-do-not-send="true">https://slurm.schedmd.com/quickstart_admin.html#upgrade</a>,
and I guess I'd always thought of that as the full octane, high
fat upgrade. I've only ever done upgrades during one of our
many scheduled downtimes, because the upgrades were always to a
new major version, and because I'm a scared little chicken, so I
figured there were maybe some smaller subset of steps if only
upgrading a patchlevel change. Smaller change, less risk, less
precautionary steps...? I'm seeing now that's not the case.</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
Thank you all for the suggestions!</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
Rob</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
<br>
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);" class="elementToProof
ContentPasted0">
</div>
<div style="font-family: Aptos, Aptos_EmbeddedFont,
Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
12pt; color: rgb(0, 0, 0);">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size: 11pt;
color: rgb(0, 0, 0);" face="Calibri, sans-serif"><b>From:</b>
slurm-users <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a> on
behalf of Ryan Novosielski <a class="moz-txt-link-rfc2396E" href="mailto:novosirj@rutgers.edu"><novosirj@rutgers.edu></a><br>
<b>Sent:</b> Friday, September 29, 2023 2:48 AM<br>
<b>To:</b> Slurm User Community List
<a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
<b>Subject:</b> Re: [slurm-users] Steps to upgrade slurm for a
patchlevel change?</font>
<div> </div>
</div>
<div style="line-break:after-white-space">
<table style="border:0; display:table; width:100%;
table-layout:fixed; border-collapse:seperate; float:none"
width="100%" cellspacing="0" cellpadding="0" border="0"
align="left">
<tbody style="display:block">
<tr>
<td cellpadding="7px 2px 7px 2px" style="padding: 7px 2px;
background-color: rgb(166, 166, 166);" width="1px"
valign="middle">
<br>
</td>
<td cellpadding="7px 5px 7px 15px" style="width: 100%;
padding: 7px 5px 7px 15px; font-family:
wf_segoe-ui_normal, "Segoe UI", "Segoe
WP", Tahoma, Arial, sans-serif; font-size: 12px;
font-weight: normal; text-align: left; overflow-wrap:
break-word; color: rgb(33, 33, 33); background-color:
rgb(234, 234, 234);" width="100%" valign="middle">
<div>You don't often get email from
<a class="moz-txt-link-abbreviated" href="mailto:novosirj@rutgers.edu">novosirj@rutgers.edu</a>. <a
href="https://aka.ms/LearnAboutSenderIdentification"
data-auth="NotApplicable"
id="OWAd88e8c05-caa5-7f99-59db-42fdfd8fe456"
class="OWAAutoLink" moz-do-not-send="true">
Learn why this is important</a></div>
</td>
<td cellpadding="7px 5px 7px 5px" style="width: 75px;
padding: 7px 5px; font-family: wf_segoe-ui_normal,
"Segoe UI", "Segoe WP", Tahoma,
Arial, sans-serif; font-size: 12px; font-weight: normal;
text-align: left; overflow-wrap: break-word; color:
rgb(33, 33, 33); background-color: rgb(234, 234, 234);"
width="75px" valign="middle" align="left">
<br>
</td>
</tr>
</tbody>
</table>
<div>I started off writing there’s really no particular process
for these/just do your changes and start the new software (be
mindful of any PATH that might contain data that’s under your
software tree, if you have that setup), and that you might
need to watch the timeouts, but I figured I’d have a look at
the upgrade guide to be sure.
<div><br>
</div>
<div>There’s really nothing onerous in there. I’d personally
back up my database and state save directories just because
I’d rather be safe than sorry, or for if have to go
backwards and want to be sure. You can run SlurmCtld for a
good while with no database (note that -M on the command
line will be broken during that time), just being mindful of
the RAM on the SlurmCtld machine/don’t restart it before the
DB is back up, and backing up our fairly large database
doesn’t take all that long. Whether or not 5 is required
mostly depends on how long you think it will take you to do
6-11 (which could really take you seconds if your process is
really as simple as stop, change symlink, start), 12 you’re
going to do no matter what, 13 you don’t need if you skipped
5, and 14 is up to you. So practically, that’s what you’re
going to do anyway.</div>
<div><br>
</div>
<div>We just did an upgrade last week, and the only difference
is that our compute nodes are stateless, so the compute node
upgrades were a reboot (we could upgrade them running, but
we did it during a maintenance period anyway, so why?).</div>
<div><br>
</div>
<div>If you want to do this with running jobs, I’d definitely
back up the state save directory, but as long as you watch
the timeouts, it’s pretty uneventful. You won’t have that
long database upgrade period, since no database
modifications will be required, so it’s pretty much like
upgrading anything else.</div>
<div><br>
<div>
<div>
<div dir="auto" style="letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; word-spacing: 0px;
text-decoration: none; color: rgb(0, 0, 0);">
<div dir="auto" style="letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; word-spacing: 0px;
text-decoration: none; color: rgb(0, 0, 0);">
<div dir="auto" style="letter-spacing: normal;
text-align: start; text-indent: 0px;
text-transform: none; white-space: normal;
word-spacing: 0px; text-decoration: none; color:
rgb(0, 0, 0);">
<div dir="auto" style="letter-spacing: normal;
text-align: start; text-indent: 0px;
text-transform: none; white-space: normal;
word-spacing: 0px; text-decoration: none; color:
rgb(0, 0, 0);">
<div style="letter-spacing: normal; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px; color:
rgb(0, 0, 0);">
--<br>
#BlackLivesMatter</div>
<div style="letter-spacing: normal; text-align:
start; text-indent: 0px; text-transform: none;
white-space: normal; word-spacing: 0px; color:
rgb(0, 0, 0);">
____<br>
|| \\UTGERS,
|---------------------------*O*---------------------------<br>
||_// the State<span class="x_Apple-tab-span" style="white-space:pre"> </span> |
Ryan Novosielski
- <a class="moz-txt-link-abbreviated" href="mailto:novosirj@rutgers.edu">novosirj@rutgers.edu</a><br>
|| \\ University | Sr. Technologist
- 973/972.0922 (2x0922) ~*~ RBHS Campus<br>
|| \\ of NJ<span class="x_Apple-tab-span" style="white-space:pre"> </span> |
Office of Advanced Research Computing - MSB
A555B, Newark<br>
`'</div>
</div>
</div>
</div>
</div>
</div>
<div><br>
<blockquote type="cite">
<div>On Sep 28, 2023, at 11:58, Groner, Rob
<a class="moz-txt-link-rfc2396E" href="mailto:rug262@psu.edu"><rug262@psu.edu></a> wrote:</div>
<br class="x_Apple-interchange-newline">
<div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
<br class="x_Apple-interchange-newline">
There's 14 steps to upgrading slurm listed on
their website, including shutting down and backing
up the database. So far we've only updated slurm
during a downtime, and it's been a major version
change, so we've taken all the steps indicated.</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
<br>
</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
We now want to upgrade from 23.02.4 to 23.02.5.</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
<br>
</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
Our slurm builds end up in version named
directories, and we tell production which one to
use via symlink. Changing the symlink will
automatically change it on our slurm controller
node and all slurmd nodes.</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
<br>
</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
Is there an expedited, simple, slimmed down
upgrade path to follow if we're looking at just a
. level upgrade?</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
<br>
</div>
<div class="x_elementToProof"
style="font-style:normal;
font-variant-caps:normal; font-weight:400;
letter-spacing:normal; text-align:start;
text-indent:0px; text-transform:none;
white-space:normal; word-spacing:0px;
text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
font-size:12pt">
Rob</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>