<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>We've invoked scontrol in our epilog script for years to close
off nodes with out any issue. What the docs are really referring
to is gratuitous use of those commands. If you have those
commands well circumscribed (i.e. only invoked when you have to
actually close a node) and only use them when you absolutely have
no other work around then you should be fine.</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 5/3/2022 3:46 AM,
<a class="moz-txt-link-abbreviated" href="mailto:taleintervenor@sjtu.edu.cn">taleintervenor@sjtu.edu.cn</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:008501d85ec1$eccbade0$c66309a0$@sjtu.edu.cn">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
text-justify:inter-ideograph;
font-size:10.5pt;
font-family:DengXian;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:DengXian;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Hi, all:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">We need to detect some
problem at job end timepoint, so we write some detection
script in slurm epilog, which should drain the node if check
is not passed.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I know exit epilog with
non-zero code will make slurm automatically drain the node.
But in such way, drain reason will all be marked as <b>“Epilog
error”</b>. Then our auto-repair program will have trouble
to determine how to repair the node.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Another way is call <b>scontrol</b>
directly from epilog to drain the node, but from official
doc <a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/prolog_epilog.html">https://slurm.schedmd.com/prolog_epilog.html</a> it wrote:<o:p></o:p></span></p>
<p class="MsoNormal"><i><span lang="EN-US">Prolog and Epilog
scripts should be designed to be as short as possible and
should not call Slurm commands (e.g. squeue, scontrol,
sacctmgr, etc). … Slurm commands in these scripts can
potentially lead to performance issues and should not be
used.<o:p></o:p></span></i></p>
<p class="MsoNormal"><span lang="EN-US">So what is the best way
to drain node from epilog with a self-defined reason, or
tell slurm to add more verbose message besides <b>“Epilog
error” </b>reason?<o:p></o:p></span></p>
</div>
</blockquote>
</body>
</html>