<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<p>Just off the top of my head here.</p>
<p>I would expect you need to have no jobs currently running on the
node, so you could could submit a job to the node that sets the
node to drain, does any local things needed, then exits. As part
of the EpilogSlurmctld script, you could check for drained nodes
based on some reason (like 'MIG reconfig') and do the head node
steps there, with a final bit of bringing it back online. <br>
</p>
<p><br>
</p>
<p>Or just do all those steps from a script outside slurm itself, on
the head node. You can use ssh/pdsh to connect to a node and
execute things there while it is out of the mix.<br>
</p>
<p><br>
</p>
<p>Brian Andrus<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 9/23/2022 7:09 AM, Groner, Rob
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BL0PR02MB44995429F41BA490DA56255E80519@BL0PR02MB4499.namprd02.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255,
255, 255);" class="elementToProof">
<br>
</div>
<div dir="ltr">
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
I'm working through how to use the new dynamic node features
in order to take down a particular node, reconfigure it <span
class="x_ContentPasted0" style="color: rgb(0, 0, 0);
background-color: rgb(255, 255, 255); display: inline
!important;">(using nvidia MIG to change the number of
graphic cores available)</span> and give it back to slurm.</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
I'm at the point where I can take a node out of slurm's
control from the master node (scontrol delete nodename....),
make the nvidia-smi change, and then execute slurmd on the
node with the changed configuration parameters. It then does
show up again in the sinfo output on the master node, with the
correct new resources.</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
What I'm not sure about is...when I want to reconfigure the <span
class="x_ContentPasted1" style="color: rgb(0, 0, 0);
background-color: rgb(255, 255, 255); display: inline
!important;">
dynamic </span>node AGAIN, how do I do that on the target
node? I can use "scontrol delete" again on the scheduler
node, but on the
<span style="color: rgb(0, 0, 0); background-color: rgb(255,
255, 255); display: inline !important;">
dynamic</span> node, slurmd will still be running.
Currently, for testing purposes, I just find the process ID
and kill -9 it. Then I change the node configuration and
execute "slurmd -Z --conf=...." again. </div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
Is there a more elegant way to change the configuration on the
<span style="color: rgb(0, 0, 0); background-color: rgb(255,
255, 255); display: inline !important;">
dynamic</span> node than by killing the existing slurmd
process and starting it again? </div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
I'll note that I tried doing everything from the master
(slurmctld) node, since there is an option of creating the
node there with "scontrol create" instead of using slurmd on
the dynamic node. But when i tried that, the dynamic node I
created showed up in sinfo output with a ~ next to it (powered
off). The dynamic node docs page online did not mention what,
if anything, slurmd was supposed to be running as on the
dynamic node if attempting to handle delete and create only on
the master node. </div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
Thanks.</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
Rob</div>
<div class="x_elementToProof" style="font-family: Calibri,
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0,
0, 0); background-color: rgb(255, 255, 255);">
<br>
</div>
</div>
</blockquote>
</body>
</html>