<div dir="ltr">James, you might take a look at CompleteWait and KillWait.<div><br></div><div>Regards,</div><div>Lyn</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 3, 2020 at 12:27 PM Erwin, James <<a href="mailto:james.erwin@intel.com">james.erwin@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_8631981121286911434WordSection1">
<p class="MsoNormal">Hello,<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">I’ve recently updated a cluster to SLURM 19.05.4 and notice that new jobs are starting on nodes still in the CG state. In an epilog I am running node health checks that last about 2-3 minutes. In the previous version (ancient 15.08), jobs
would not start running on these nodes until the epilog was complete and the node is out of the CG state. Does anyone know why this overlap of R with CG might be happening?<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">There is a release note for version 19.05.3 that looks possibly related but I’m not exactly sure what it means:<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">* Changes in Slurm 19.05.3<u></u><u></u></p>
<p class="MsoNormal">==========================<u></u><u></u></p>
<p class="MsoNormal">...<u></u><u></u></p>
<p class="MsoNormal">-- Nodes in COMPLETING state treated as being currently available for job<u></u><u></u></p>
<p class="MsoNormal"> will-run test.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">Thanks,<u></u><u></u></p>
<p class="MsoNormal">James<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</blockquote></div>