<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
font-size:10.0pt;
font-family:"Courier New";}
span.sn-widget-textblock-body
{mso-style-name:sn-widget-textblock-body;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Consolas",serif;}
span.EmailStyle24
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style></head><body lang=EN-US link="#0563C1" vlink="#954F72" style='word-wrap:break-word'><div class=WordSection1><p class=MsoNormal>We saw something that sounds similar to this. See this bug report: <a href="https://bugs.schedmd.com/show_bug.cgi?id=10196">https://bugs.schedmd.com/show_bug.cgi?id=10196</a><o:p></o:p></p><p class=MsoNormal>SchedMD never found the root cause. They thought it might have something to do with a timing problem on Prolog scripts, but the thing that fixed it for us was to set GraceTime=0 on our preemptable QoS.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><div><div><div><p class=MsoNormal><b><span style='color:#002060'>Mike Robbert<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='color:#002060'>Cyberinfrastructure Specialist, Cyberinfrastructure and Advanced Research Computing<o:p></o:p></span></b></p><p class=MsoNormal><span style='color:#767171'>Information and Technology Solutions (ITS)<o:p></o:p></span></p><p class=MsoNormal><span style='color:#767171'>303-273-3786 | </span><a href="mailto:mrobbert@mines.edu"><span style='color:#0563C1'>mrobbert@mines.edu</span></a><span style='color:#767171'> </span><span style='font-size:12.0pt;color:#767171'> <o:p></o:p></span></p><p class=MsoNormal><img border=0 width=208 height=38 style='width:2.1666in;height:.3958in' id="Picture_x0020_1" src="cid:image001.png@01D70C47.838C4BA0" alt="A close up of a sign
Description automatically generated"><span style='font-size:12.0pt;color:#767171'><o:p></o:p></span></p><p class=MsoNormal><b><span style='color:#2B4160'>Our values:</span></b><span style='color:#2B4160'> </span><span style='color:#767171'>Trust | Integrity | Respect | Responsibility</span><o:p></o:p></p></div></div></div><p class=MsoNormal><o:p> </o:p></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:12.0pt;color:black'>From: </span></b><span style='font-size:12.0pt;color:black'>slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Prentice Bisbal <pbisbal@pppl.gov><br><b>Reply-To: </b>Slurm User Community List <slurm-users@lists.schedmd.com><br><b>Date: </b>Friday, February 26, 2021 at 12:38<br><b>To: </b>"slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com><br><b>Subject: </b>[External] [slurm-users] Preemption not working in 20.11<o:p></o:p></span></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div style='border:solid #9C6500 1.0pt;padding:2.0pt 2.0pt 2.0pt 2.0pt'><p class=MsoNormal style='line-height:12.0pt;background:#FFEB9C'><b><span style='font-size:10.0pt;color:#9C6500'>CAUTION:</span></b><span style='font-size:10.0pt;color:black'> This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.<o:p></o:p></span></p></div><p class=MsoNormal><o:p> </o:p></p><div><p>We recently upgraded from Slurm 19.05.8 to 20.11.3. In our configuration, we have an interruptible partition named 'interruptible' for long-running, low-priority jobs that use checkpoint/restart. Jobs that are preempted would be killed and requeued rather than suspended. This configuration has been working without issue for 2+ years without issue. <o:p></o:p></p><p>After the upgrade, this has stopped working. Preempted jobs are killed and not requeued. My slurm.conf file is configured to requeue preempted jobs:<o:p></o:p></p><p>$ grep -i requeue /etc/slurm/slurm.conf <br>#JobRequeue=1<br>PreemptMode=Requeue<o:p></o:p></p><p>And the user's sbatch script included the --requeue option. <o:p></o:p></p><p>The user reports the err output from his preempted jobs now says<o:p></o:p></p><p><span class=sn-widget-textblock-body>slurmstepd: error: *** STEP 1075117.0 ON greene002 CANCELLED AT 2021-02-25T16:07:48 ***</span><o:p></o:p></p><p><span class=sn-widget-textblock-body>And in the past it would see PREEMPTED instead of cancelled. </span><br><br><o:p></o:p></p><p><span class=sn-widget-textblock-body>Any ideas what would cause this? I've reported this to Slurm support, and haven't gotten anything back yet, so I figured I'd ask here, too. If this is a bug, I can't be the only one who has experienced this. </span><br><br><o:p></o:p></p><pre>-- <o:p></o:p></pre><pre>Prentice <o:p></o:p></pre></div></div></body></html>