<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>This is one of the reasons we stick with using RPM's rather than
      the symlink process. It's just cleaner and avoids the issue of
      having the install on shared storage that may get overwhelmed with
      traffic or suffer outages. Also the package manager automatically
      removes the previous versions and local installs stuff. I've never
      been a fan of the symlink method has it runs counter to the entire
      point and design of Linux and package managers which are supposed
      to do this heavy lifting for you.</p>
    <p><br>
    </p>
    <p>Rant aside :). Generally for minor upgrades the process is less
      touchy. For our setup we follow the following process that works
      well for us, but does create an outage for the period of the
      upgrade.</p>
    <p><br>
    </p>
    <p>1. Set all partitions to down: This makes sure no new jobs are
      scheduled.</p>
    <p>2. Suspend all jobs: This makes sure jobs aren't running while we
      upgrade.</p>
    <p>3. Stop slurmctld and slurmdbd.</p>
    <p>4. Upgrade the slurmdbd. Restart slurmdbd<br>
    </p>
    <p>5. Upgrade the slurmd and slurmctld across the cluster.</p>
    <p>6. Restart slurmd and slurmctld simultaneously using choria.</p>
    <p>7. Unsuspend all jobs</p>
    <p>8. Reopen all partitions.</p>
    <p><br>
    </p>
    <p>For major upgrades we always take a mysqldump and backup the
      spool for the slurmctld before upgrading just in case something
      goes wrong. We've had this happen before when the slurmdbd upgrade
      cut out early (note, always run the slurmdbd and slurmctld
      upgrades in -D mode and not via systemctl as systemctl can timeout
      and kill the upgrade midway for large upgrades).</p>
    <p><br>
    </p>
    <p>That said I've also skipped steps 1, 2, 7, and 8 before for minor
      upgrades and it works fine. The slurmd, slurmctld, and slurmdbd
      can all run on different versions so long as the slurmdbd >
      slurmctld > slurmd.  So if you want to do a live upgrade you
      can do it. However out paranoia we general stop everything. The
      entire process takes about an hour start to finish, with the
      longest part being the pausing of all the jobs.</p>
    <p><br>
    </p>
    <p>-Paul Edmon-</p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 9/29/2023 9:48 AM, Groner, Rob
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:BL0PR02MB4499F4AD1D847AAB03562CE680C0A@BL0PR02MB4499.namprd02.prod.outlook.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof">
        I did already see the upgrade section of Jason's talk, but it
        wasn't much about the mechanics of the actual upgrade process,
        more of a big picture it seemed.  It dealt a lot with different
        parts of slurm at different versions, which is something we
        don't have.</div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof">
        <span style="display: inline !important; color: rgb(0, 0, 0);
          background-color: rgb(255, 255, 255);" class="ContentPasted1"><br>
        </span></div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof">
        <span style="display: inline !important; color: rgb(0, 0, 0);
          background-color: rgb(255, 255, 255);" class="ContentPasted1">One
          little wrinkle here is that while, yes, we're using a symlink
          to point to what version of slurm is the current one...it's
          all on a shared filesystem.  So, ALL nodes, slurmdb, slurmctld
          are using that same symlink.  There is no means to upgrade one
          component at a time.  That means to upgrade, EVERYTHING has to
          come down before it could come back up.  Jason's slides seemed
          to indicate that, if there were separate symlinks, then I
          could focus on just the slurmdb first and upgrade it...then
          focus on slurmctld and upgrade it, and then finally the nodes
          (take down their slurmd, upgrade the link, bring up slurmd). 
          So maybe that's what I'm missing.</span></div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof">
        <br>
      </div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        Otherwise, I think what I'm saying is that I see references to a
        "rolling upgrade", but I don't see any guide to a rolling
        upgrade.  I just see the 14 steps  in <a
          href="https://slurm.schedmd.com/quickstart_admin.html#upgrade"
          id="OWAe45506a3-ac1b-a0e7-1d1f-010ede9d465b"
          class="OWAAutoLink moz-txt-link-freetext"
          moz-do-not-send="true">https://slurm.schedmd.com/quickstart_admin.html#upgrade</a>,
        and I guess I'd always thought of that as the full octane, high
        fat upgrade.  I've only ever done upgrades during one of our
        many scheduled downtimes, because the upgrades were always to a
        new major version, and because I'm a scared little chicken, so I
        figured there were maybe some smaller subset of steps if only
        upgrading a patchlevel change.  Smaller change, less risk, less
        precautionary steps...?  I'm seeing now that's not the case.</div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        <br>
      </div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        Thank you all for the suggestions!</div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        <br>
      </div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        Rob</div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
        <br>
      </div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);" class="elementToProof
        ContentPasted0">
      </div>
      <div style="font-family: Aptos, Aptos_EmbeddedFont,
        Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size:
        12pt; color: rgb(0, 0, 0);">
        <br>
      </div>
      <hr tabindex="-1" style="display:inline-block; width:98%">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size: 11pt;
          color: rgb(0, 0, 0);" face="Calibri, sans-serif"><b>From:</b>
          slurm-users <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a> on
          behalf of Ryan Novosielski <a class="moz-txt-link-rfc2396E" href="mailto:novosirj@rutgers.edu"><novosirj@rutgers.edu></a><br>
          <b>Sent:</b> Friday, September 29, 2023 2:48 AM<br>
          <b>To:</b> Slurm User Community List
          <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
          <b>Subject:</b> Re: [slurm-users] Steps to upgrade slurm for a
          patchlevel change?</font>
        <div> </div>
      </div>
      <div style="line-break:after-white-space">
        <table style="border:0; display:table; width:100%;
          table-layout:fixed; border-collapse:seperate; float:none"
          width="100%" cellspacing="0" cellpadding="0" border="0"
          align="left">
          <tbody style="display:block">
            <tr>
              <td cellpadding="7px 2px 7px 2px" style="padding: 7px 2px;
                background-color: rgb(166, 166, 166);" width="1px"
                valign="middle">
                <br>
              </td>
              <td cellpadding="7px 5px 7px 15px" style="width: 100%;
                padding: 7px 5px 7px 15px; font-family:
                wf_segoe-ui_normal, "Segoe UI", "Segoe
                WP", Tahoma, Arial, sans-serif; font-size: 12px;
                font-weight: normal; text-align: left; overflow-wrap:
                break-word; color: rgb(33, 33, 33); background-color:
                rgb(234, 234, 234);" width="100%" valign="middle">
                <div>You don't often get email from
                  <a class="moz-txt-link-abbreviated" href="mailto:novosirj@rutgers.edu">novosirj@rutgers.edu</a>. <a
                    href="https://aka.ms/LearnAboutSenderIdentification"
                    data-auth="NotApplicable"
                    id="OWAd88e8c05-caa5-7f99-59db-42fdfd8fe456"
                    class="OWAAutoLink" moz-do-not-send="true">
                    Learn why this is important</a></div>
              </td>
              <td cellpadding="7px 5px 7px 5px" style="width: 75px;
                padding: 7px 5px; font-family: wf_segoe-ui_normal,
                "Segoe UI", "Segoe WP", Tahoma,
                Arial, sans-serif; font-size: 12px; font-weight: normal;
                text-align: left; overflow-wrap: break-word; color:
                rgb(33, 33, 33); background-color: rgb(234, 234, 234);"
                width="75px" valign="middle" align="left">
                <br>
              </td>
            </tr>
          </tbody>
        </table>
        <div>I started off writing there’s really no particular process
          for these/just do your changes and start the new software (be
          mindful of any PATH that might contain data that’s under your
          software tree, if you have that setup), and that you might
          need to watch the timeouts, but I figured I’d have a look at
          the upgrade guide to be sure.
          <div><br>
          </div>
          <div>There’s really nothing onerous in there. I’d personally
            back up my database and state save directories just because
            I’d rather be safe than sorry, or for if have to go
            backwards and want to be sure. You can run SlurmCtld for a
            good while with no database (note that -M on the command
            line will be broken during that time), just being mindful of
            the RAM on the SlurmCtld machine/don’t restart it before the
            DB is back up, and backing up our fairly large database
            doesn’t take all that long. Whether or not 5 is required
            mostly depends on how long you think it will take you to do
            6-11 (which could really take you seconds if your process is
            really as simple as stop, change symlink, start), 12 you’re
            going to do no matter what, 13 you don’t need if you skipped
            5, and 14 is up to you. So practically, that’s what you’re
            going to do anyway.</div>
          <div><br>
          </div>
          <div>We just did an upgrade last week, and the only difference
            is that our compute nodes are stateless, so the compute node
            upgrades were a reboot (we could upgrade them running, but
            we did it during a maintenance period anyway, so why?).</div>
          <div><br>
          </div>
          <div>If you want to do this with running jobs, I’d definitely
            back up the state save directory, but as long as you watch
            the timeouts, it’s pretty uneventful. You won’t have that
            long database upgrade period, since no database
            modifications will be required, so it’s pretty much like
            upgrading anything else.</div>
          <div><br>
            <div>
              <div>
                <div dir="auto" style="letter-spacing: normal;
                  text-align: start; text-indent: 0px; text-transform:
                  none; white-space: normal; word-spacing: 0px;
                  text-decoration: none; color: rgb(0, 0, 0);">
                  <div dir="auto" style="letter-spacing: normal;
                    text-align: start; text-indent: 0px; text-transform:
                    none; white-space: normal; word-spacing: 0px;
                    text-decoration: none; color: rgb(0, 0, 0);">
                    <div dir="auto" style="letter-spacing: normal;
                      text-align: start; text-indent: 0px;
                      text-transform: none; white-space: normal;
                      word-spacing: 0px; text-decoration: none; color:
                      rgb(0, 0, 0);">
                      <div dir="auto" style="letter-spacing: normal;
                        text-align: start; text-indent: 0px;
                        text-transform: none; white-space: normal;
                        word-spacing: 0px; text-decoration: none; color:
                        rgb(0, 0, 0);">
                        <div style="letter-spacing: normal; text-align:
                          start; text-indent: 0px; text-transform: none;
                          white-space: normal; word-spacing: 0px; color:
                          rgb(0, 0, 0);">
                          --<br>
                          #BlackLivesMatter</div>
                        <div style="letter-spacing: normal; text-align:
                          start; text-indent: 0px; text-transform: none;
                          white-space: normal; word-spacing: 0px; color:
                          rgb(0, 0, 0);">
                          ____<br>
                          || \\UTGERS,    
                          |---------------------------*O*---------------------------<br>
                          ||_// the State<span class="x_Apple-tab-span" style="white-space:pre"> </span> |
                                  Ryan Novosielski
                          - <a class="moz-txt-link-abbreviated" href="mailto:novosirj@rutgers.edu">novosirj@rutgers.edu</a><br>
                          || \\ University | Sr. Technologist
                          - 973/972.0922 (2x0922) ~*~ RBHS Campus<br>
                          ||  \\    of NJ<span class="x_Apple-tab-span" style="white-space:pre"> </span> |
                          Office of Advanced Research Computing - MSB
                          A555B, Newark<br>
                               `'</div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
              <div><br>
                <blockquote type="cite">
                  <div>On Sep 28, 2023, at 11:58, Groner, Rob
                    <a class="moz-txt-link-rfc2396E" href="mailto:rug262@psu.edu"><rug262@psu.edu></a> wrote:</div>
                  <br class="x_Apple-interchange-newline">
                  <div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br class="x_Apple-interchange-newline">
                      There's 14 steps to upgrading slurm listed on
                      their website, including shutting down and backing
                      up the database.  So far we've only updated slurm
                      during a downtime, and it's been a major version
                      change, so we've taken all the steps indicated.</div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br>
                    </div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      We now want to upgrade from 23.02.4 to 23.02.5.</div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br>
                    </div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      Our slurm builds end up in version named
                      directories, and we tell production which one to
                      use via symlink.  Changing the symlink will
                      automatically change it on our slurm controller
                      node and all slurmd nodes.</div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br>
                    </div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      Is there an expedited, simple, slimmed down
                      upgrade path to follow if we're looking at just a
                      . level upgrade?</div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      <br>
                    </div>
                    <div class="x_elementToProof"
                      style="font-style:normal;
                      font-variant-caps:normal; font-weight:400;
                      letter-spacing:normal; text-align:start;
                      text-indent:0px; text-transform:none;
                      white-space:normal; word-spacing:0px;
                      text-decoration:none;
font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;
                      font-size:12pt">
                      Rob</div>
                  </div>
                </blockquote>
              </div>
              <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>