<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Make sure you have configured the RebootProgram in slurm.conf,
that it exists on the nodes and is executable by the user.<br>
</p>
<p>This is usually /sbin/reboot</p>
<p>Brian Andrus<br>
</p>
<div class="moz-cite-prefix">On 6/7/2023 7:50 AM, Heinz, Michael
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:IA1PR11MB6244BF5CB002F8D7A1C9B0D2F653A@IA1PR11MB6244.namprd11.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face
{font-family:"Intel Clear Light";
panose-1:2 11 4 4 2 2 3 2 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Intel Clear Light",sans-serif;
font-variant:normal !important;
color:windowtext;
text-transform:none;
text-decoration:none none;
vertical-align:baseline;}.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif">Hey, all.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif">So I added slurmdbd to our
slurm-23.02 install and made my account an admin, but when I
try to do a srun with --reboot it literally just sits
forever, no errors, nothing in the logs, it just sits with
the node in “CF” state until I cancel the job, set the node
to down and back to idle again.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif">I tried setting RebootProgram to a
script that just writes to a file in /tmp but the program
never runs.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif">Any suggestions?<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Intel Clear
Light",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal">Michael Heinz<o:p></o:p></p>
<p class="MsoNormal">End-to-End Network Software Engineer<o:p></o:p></p>
<p class="MsoNormal"><a href="mailto:michael.heinz@intel.com"
moz-do-not-send="true" class="moz-txt-link-freetext">michael.heinz@intel.com</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
</body>
</html>