<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
I didn't say I felt slurmd could not run as a service on a dynamic node. I'm just saying that the example they give on their dynamic nodes webpage does not show slurmd running as a service. So they seem to imply there's a different way, other than with slurmd
running as a service, that you can create a dynamic node with a different configuration. In their example, they just execute slurmd with parameters on the command line. So...not as a service. </div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
I'm fine with the concept of stopping the service, changing the service parameters for the new configuration of the node, and then starting the service again. That's fine, and that makes sense. What I'm trying to say is that their documentation does not demonstrate
that way of handling dynamic nodes. So I'm trying to figure out what they meant to have happen to a dynamic node where slurmd is already running as a process and not as a service. Is there SOME OTHER WAY they expected that a dynamic node could reconfigure
itself other than through stopping/starting a service?</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
I think their limited documentation on dynamic nodes basically only covers creating a node ONCE and removing it ONCE, and not a scenario where you might reconfigure a single node multiple times in its life. Given that, and having<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;"> the
service method of making it work, I'll just go with that. Thanks for help.</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
Rob</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);" class="elementToProof">
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> slurm-users <slurm-users-bounces@lists.schedmd.com> on behalf of Brian Andrus <toomuchit@gmail.com><br>
<b>Sent:</b> Friday, September 23, 2022 12:24 PM<br>
<b>To:</b> slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com><br>
<b>Subject:</b> Re: [slurm-users] slurmd and dynamic nodes</font>
<div> </div>
</div>
<div>
<table border="0" cellspacing="0" cellpadding="0" width="100%" align="left" style="border:0; display:table; width:100%; table-layout:fixed; border-collapse:seperate; float:none">
<tbody>
<tr>
<td valign="middle" width="1px" bgcolor="#A6A6A6" cellpadding="7px 2px 7px 2px" style="padding:7px 2px 7px 2px; background-color:#A6A6A6">
</td>
<td valign="middle" width="100%" bgcolor="#EAEAEA" cellpadding="7px 5px 7px 15px" color="#212121" style="width:100%; background-color:#EAEAEA; padding:7px 5px 7px 15px; font-family:wf_segoe-ui_normal,Segoe UI,Segoe WP,Tahoma,Arial,sans-serif; font-size:12px; font-weight:normal; color:#212121; text-align:left; word-wrap:break-word">
<div>You don't often get email from toomuchit@gmail.com. <a href="https://aka.ms/LearnAboutSenderIdentification">
Learn why this is important</a></div>
</td>
<td valign="middle" align="left" width="75px" bgcolor="#EAEAEA" cellpadding="7px 5px 7px 5px" color="#212121" style="width:75px; background-color:#EAEAEA; padding:7px 5px 7px 5px; font-family:wf_segoe-ui_normal,Segoe UI,Segoe WP,Tahoma,Arial,sans-serif; font-size:12px; font-weight:normal; color:#212121; text-align:left; word-wrap:break-word">
</td>
</tr>
</tbody>
</table>
<div>
<p>You shouldn't have to change any parameters if you have it configured in the defaults. Just systemctl stop/start slurmd as needed.</p>
<p><br>
</p>
<p>something like:</p>
<p>scontrol update state=drain nodename=<node_to_change> reason="MIG reconfig"</p>
<p><wait for it to be drained></p>
<p>ssh <node_to_change> "systemctl stop slurmd"</p>
<p><run reconfig stuff></p>
<p>ssh <node_to_change> "systemctl start slurmd"</p>
<p><br>
</p>
<p></p>
<p>Not sure what would make you feel slurmd cannot run as a service on a dynamic node. As long as you added the options to the systemd defaults file for it, you should be fine (usually /etc/defaults/slurmd)<br>
</p>
<p><br>
</p>
<p>Brian<br>
</p>
<p><br>
</p>
<div class="x_moz-cite-prefix">On 9/23/2022 7:40 AM, Groner, Rob wrote:<br>
</div>
<blockquote type="cite"><style type="text/css" style="display:none">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div class="x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
Ya, we're still working out the mechanism for taking the node out, making the changes, and bringing it back. But the part I can't figure out is slurmd running on the remote node. What do I do with it? Do I run it standalone, and when I need to reconfigure,
I kill -9 it and execute it again with the new configuration? Or what if slurmd is running as a service (as it does on all our non-dynamic nodes)? Do I stop it, change its service parameters and then restart it to reconfigure the node? The docs on slurm
for dynamic nodes don't give any indication of how you handle slurmd running on the dynamic node. What is the preferred method? </div>
<div class="x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
Rob</div>
<div class="x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> slurm-users
<a class="x_moz-txt-link-rfc2396E" href="mailto:slurm-users-bounces@lists.schedmd.com">
<slurm-users-bounces@lists.schedmd.com></a> on behalf of Brian Andrus <a class="x_moz-txt-link-rfc2396E" href="mailto:toomuchit@gmail.com">
<toomuchit@gmail.com></a><br>
<b>Sent:</b> Friday, September 23, 2022 10:24 AM<br>
<b>To:</b> <a class="x_moz-txt-link-abbreviated" href="mailto:slurm-users@lists.schedmd.com">
slurm-users@lists.schedmd.com</a> <a class="x_moz-txt-link-rfc2396E" href="mailto:slurm-users@lists.schedmd.com">
<slurm-users@lists.schedmd.com></a><br>
<b>Subject:</b> Re: [slurm-users] slurmd and dynamic nodes</font>
<div> </div>
</div>
<div>
<table width="100%" cellspacing="0" cellpadding="0" border="0" align="left" style="border:0; display:table; width:100%; table-layout:fixed; border-collapse:seperate; float:none">
<tbody>
<tr>
<td cellpadding="7px 2px 7px 2px" width="1px" valign="middle" bgcolor="#A6A6A6" style="padding:7px 2px
7px 2px; background-color:#A6A6A6">
<br>
</td>
<td cellpadding="7px 5px 7px 15px" color="#212121" width="100%" valign="middle" bgcolor="#EAEAEA" style="width:100%; background-color:#EAEAEA; padding:7px
5px 7px 15px; font-family:wf_segoe-ui_normal,Segoe
UI,Segoe WP,Tahoma,Arial,sans-serif; font-size:12px; font-weight:normal; color:#212121; text-align:left; word-wrap:break-word">
<div>You don't often get email from <a class="x_moz-txt-link-abbreviated" href="mailto:toomuchit@gmail.com">
toomuchit@gmail.com</a>. <a href="https://aka.ms/LearnAboutSenderIdentification">
Learn why this is important</a></div>
</td>
<td cellpadding="7px 5px 7px 5px" color="#212121" width="75px" valign="middle" bgcolor="#EAEAEA" align="left" style="width:75px; background-color:#EAEAEA; padding:7px
5px 7px 5px; font-family:wf_segoe-ui_normal,Segoe
UI,Segoe WP,Tahoma,Arial,sans-serif; font-size:12px; font-weight:normal; color:#212121; text-align:left; word-wrap:break-word">
<br>
</td>
</tr>
</tbody>
</table>
<div>
<p><br>
</p>
<p>Just off the top of my head here.</p>
<p>I would expect you need to have no jobs currently running on the node, so you could could submit a job to the node that sets the node to drain, does any local things needed, then exits. As part of the EpilogSlurmctld script, you could check for drained nodes
based on some reason (like 'MIG reconfig') and do the head node steps there, with a final bit of bringing it back online.
<br>
</p>
<p><br>
</p>
<p>Or just do all those steps from a script outside slurm itself, on the head node. You can use ssh/pdsh to connect to a node and execute things there while it is out of the mix.<br>
</p>
<p><br>
</p>
<p>Brian Andrus<br>
</p>
<p><br>
</p>
<div class="x_x_moz-cite-prefix">On 9/23/2022 7:09 AM, Groner, Rob wrote:<br>
</div>
<blockquote type="cite"><style type="text/css" style="display:none">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div class="x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div dir="ltr">
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
I'm working through how to use the new dynamic node features in order to take down a particular node, reconfigure it <span class="x_x_x_ContentPasted0" style="color:rgb(0,0,0); background-color:rgb(255,255,255); display:inline!important">(using nvidia MIG to
change the number of graphic cores available)</span> and give it back to slurm.</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
I'm at the point where I can take a node out of slurm's control from the master node (scontrol delete nodename....), make the nvidia-smi change, and then execute slurmd on the node with the changed configuration parameters. It then does show up again in the
sinfo output on the master node, with the correct new resources.</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
What I'm not sure about is...when I want to reconfigure the <span class="x_x_x_ContentPasted1" style="color:rgb(0,0,0); background-color:rgb(255,255,255); display:inline!important">
dynamic </span>node AGAIN, how do I do that on the target node? I can use "scontrol delete" again on the scheduler node, but on the
<span style="color:rgb(0,0,0); background-color:rgb(255,255,255); display:inline!important">
dynamic</span> node, slurmd will still be running. Currently, for testing purposes, I just find the process ID and kill -9 it. Then I change the node configuration and execute "slurmd -Z --conf=...." again. </div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
Is there a more elegant way to change the configuration on the <span style="color:rgb(0,0,0); background-color:rgb(255,255,255); display:inline!important">
dynamic</span> node than by killing the existing slurmd process and starting it again? </div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
I'll note that I tried doing everything from the master (slurmctld) node, since there is an option of creating the node there with "scontrol create" instead of using slurmd on the dynamic node. But when i tried that, the dynamic node I created showed up in
sinfo output with a ~ next to it (powered off). The dynamic node docs page online did not mention what, if anything, slurmd was supposed to be running as on the dynamic node if attempting to handle delete and create only on the master node. </div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
Thanks.</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
Rob</div>
<div class="x_x_x_elementToProof" style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0); background-color:rgb(255,255,255)">
<br>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</body>
</html>