<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<span class="x_elementToProof" style="font-size:12pt;margin:0px;color:rgb(0, 0, 0) !important;background-color:rgb(255, 255, 255)"><span class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">Hi guys,</span></span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<div class="x_elementToProof" style="font-size:12pt;margin:0px;color:rgb(0, 0, 0) !important;background-color:rgb(255, 255, 255)">
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
<br>
</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
I am trying to double check that what I see is expected for 21.08.08-2 with A100 GPUs. Can someone please confirm the following:</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
<br>
</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
<ol>
<li><span style="margin:0px">According to the very short documentation all I need to do to support already partitioned MIG devices is to add AutoDetect=nvml into gres.conf (we use global gres.conf for all nodes) and then add "Gres=gpu:4" to NodeName line in
slurm.conf. In this case I get only very simple GPU detection (just number of GPUs, though according slurmd log it detects a lot of MIG properties, but they are not used). In scontrol for this node I see "Gres=gpu:4(S:1)", nothing more. <br>
<br>
</span></li><li>
<div style="margin:0px">If I want to allow users to request MIG profiles instead of just number of GPUs, then I will need to add NodeName lines to gres.conf with AutoDetect=nvml and profile names, add the profile names to NodeName line in slurm.conf, like "<span style="margin:0px">Gres=gpu:1g.5gb:3,gpu:4g.20gb:1"
and then also add these profile names to GresTypes. So 3 places to update. And only then scontrol starts to show me "</span><span style="margin:0px">Gres=gpu:4g.20gb:1(S:1),gpu:1g.5gb:3(S:1)" (in fact in 21.08.8-2 I see "Gres=gpu:4g.20gb:1(S:1),gpu:1g.5gb:3(S:1),gpu:1g.5gb:1g.5gb:3,gpu:4g.20gb:4g.20gb:1",
sounds like a minor bug, but seems not important).<br>
</span></div>
<div style="margin:0px"><br>
</div>
</li><li>
<div style="margin:0px"><span style="margin:0px">To track the MIG profiles usage I need also to add all current MIG profiles (used on all nodes of my cluster) to <span style="margin:0px;background-color:rgb(255, 255, 255) !important;display:inline !important">AccountingStorageTRES
in slurm.conf.</span><br>
</span></div>
<div style="margin:0px"><span style="margin:0px"><br>
</span></div>
</li><li><span style="margin:0px">Each time I re-partition the MIGs, then I need to update AccountingStorageTRES, NodeName and GresTypes lines in slurm.conf, and plus NodeName lines in gres.conf, and restart all involved slurmd and slurmctld.<br>
<br>
</span></li><li style="display:block"></li></ol>
<div style="margin:0px"><span style="margin:0px">Is my understanding of the documentation complete or I miss something that will allow me to not update 4 places ni slurm.conf/gres.conf on the GPUs repartitioning?</span><br>
</div>
</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
<br>
</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
Best regards,</div>
<div class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">
<br>
</div>
<span class="x_x_elementToProof" style="margin:0px;background-color:rgb(255, 255, 255) !important">Taras</span></div>
<br>
</div>
</body>
</html>