<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Thanks for the info and link to your bug report. Unfortunately,
      my GraceTime is already set to zero for that QOS: <br>
    </p>
    <pre>$ sacctmgr show qos interruptible format=Name,gracetime </pre>
    <pre>      Name  GraceTime </pre>
    <pre>---------- ---------- </pre>
    <pre>interrupt+   00:00:00 </pre>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 2/26/21 3:58 PM, Michael Robbert
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:C311A8DA-6454-4065-BC32-B419FF04683D@mines.edu">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
      <style>@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        font-size:10.0pt;
        font-family:"Courier New";}span.sn-widget-textblock-body
        {mso-style-name:sn-widget-textblock-body;}span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:"Consolas",serif;}span.EmailStyle24
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}div.WordSection1
        {page:WordSection1;}</style>
      <div class="WordSection1">
        <p class="MsoNormal">We saw something that sounds similar to
          this. See this bug report: <a
            href="https://bugs.schedmd.com/show_bug.cgi?id=10196"
            moz-do-not-send="true">https://bugs.schedmd.com/show_bug.cgi?id=10196</a><o:p></o:p></p>
        <p class="MsoNormal">SchedMD never found the root cause. They
          thought it might have something to do with a timing problem on
          Prolog scripts, but the thing that fixed it for us was to set
          GraceTime=0 on our preemptable QoS.<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <div>
            <div>
              <p class="MsoNormal"><b><span style="color:#002060">Mike
                    Robbert<o:p></o:p></span></b></p>
              <p class="MsoNormal"><b><span style="color:#002060">Cyberinfrastructure
                    Specialist, Cyberinfrastructure and Advanced
                    Research Computing<o:p></o:p></span></b></p>
              <p class="MsoNormal"><span style="color:#767171">Information
                  and Technology Solutions (ITS)<o:p></o:p></span></p>
              <p class="MsoNormal"><span style="color:#767171">303-273-3786
                  | </span><a href="mailto:mrobbert@mines.edu"
                  moz-do-not-send="true"><span style="color:#0563C1">mrobbert@mines.edu</span></a><span
                  style="color:#767171"> </span><span
                  style="font-size:12.0pt;color:#767171"> <o:p></o:p></span></p>
              <p class="MsoNormal"><img
                  style="width:2.1666in;height:.3958in"
                  id="Picture_x0020_1"
                  src="cid:part3.56FD2599.9C859F1A@pppl.gov" alt="A
                  close up of a sign
                  Description automatically generated" class=""
                  width="208" height="38" border="0"><span
                  style="font-size:12.0pt;color:#767171"><o:p></o:p></span></p>
              <p class="MsoNormal"><b><span style="color:#2B4160">Our
                    values:</span></b><span style="color:#2B4160"> </span><span
                  style="color:#767171">Trust | Integrity | Respect |
                  Responsibility</span><o:p></o:p></p>
            </div>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div style="border:none;border-top:solid #B5C4DF
          1.0pt;padding:3.0pt 0in 0in 0in">
          <p class="MsoNormal"><b><span
                style="font-size:12.0pt;color:black">From: </span></b><span
              style="font-size:12.0pt;color:black">slurm-users
              <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com"><slurm-users-bounces@lists.schedmd.com></a> on behalf of
              Prentice Bisbal <a class="moz-txt-link-rfc2396E" href="mailto:pbisbal@pppl.gov"><pbisbal@pppl.gov></a><br>
              <b>Reply-To: </b>Slurm User Community List
              <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
              <b>Date: </b>Friday, February 26, 2021 at 12:38<br>
              <b>To: </b><a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com">"slurm-users@lists.schedmd.com"</a>
              <a class="moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com"><slurm-users@lists.schedmd.com></a><br>
              <b>Subject: </b>[External] [slurm-users] Preemption not
              working in 20.11<o:p></o:p></span></p>
        </div>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
        <div style="border:solid #9C6500 1.0pt;padding:2.0pt 2.0pt 2.0pt
          2.0pt">
          <p class="MsoNormal"
            style="line-height:12.0pt;background:#FFEB9C"><b><span
                style="font-size:10.0pt;color:#9C6500">CAUTION:</span></b><span
              style="font-size:10.0pt;color:black"> This email
              originated from outside of the Colorado School of Mines
              organization. Do not click on links or open attachments
              unless you recognize the sender and know the content is
              safe.<o:p></o:p></span></p>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <p>We recently upgraded from Slurm 19.05.8 to 20.11.3. In our
            configuration, we have an interruptible partition named
            'interruptible' for long-running, low-priority jobs that use
            checkpoint/restart. Jobs that are preempted would be killed
            and requeued rather than suspended. This configuration has
            been working without issue for 2+ years without issue. <o:p></o:p></p>
          <p>After the upgrade, this has stopped working. Preempted jobs
            are killed and not requeued. My slurm.conf file is
            configured to requeue preempted jobs:<o:p></o:p></p>
          <p>$ grep -i requeue /etc/slurm/slurm.conf <br>
            #JobRequeue=1<br>
            PreemptMode=Requeue<o:p></o:p></p>
          <p>And the user's sbatch script included the --requeue option.
            <o:p></o:p></p>
          <p>The user reports the err output from his preempted jobs now
            says<o:p></o:p></p>
          <p><span class="sn-widget-textblock-body">slurmstepd: error:
              *** STEP 1075117.0 ON greene002 CANCELLED AT
              2021-02-25T16:07:48 ***</span><o:p></o:p></p>
          <p><span class="sn-widget-textblock-body">And in the past it
              would see PREEMPTED instead of cancelled. </span><br>
            <br>
            <o:p></o:p></p>
          <p><span class="sn-widget-textblock-body">Any ideas what would
              cause this? I've reported this to Slurm support, and
              haven't gotten anything back yet, so I figured I'd ask
              here, too. If this is a bug, I can't be the only one who
              has experienced this. </span><br>
            <br>
            <o:p></o:p></p>
          <pre>-- <o:p></o:p></pre>
          <pre>Prentice <o:p></o:p></pre>
        </div>
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Prentice Bisbal
Lead Software Engineer
Research Computing
Princeton Plasma Physics Laboratory
<a class="moz-txt-link-freetext" href="http://www.pppl.gov">http://www.pppl.gov</a></pre>
  </body>
</html>