<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p></p>
<div>Hi everyone,<br>
<br>
we have recently enabled sharding to allow GPU sharing by multiple jobs. According to SLURM documentation: once a GPU has been allocated as a gres/gpu resource it will not be available as a gres/shard (and vice versa).
<br>
<br>
<br>
However, we had the situation where, on nodes with a single GPU, jobs that allocate gres/shard and other jobs that allocate gres/gpu were running simultaneously. Has anyone encountered the same case?<br>
<br>
An example of this gres/gpu and gres/shard coexistence can be seen in the following:<br>
<br>
squeue -w s-sc-gpu017<br>
JOBID PARTITION NAME USER STAT TIME TIME_LIMI NODES NODELIST(REASON) CPUS MIN_MEMORY<br>
1288220 gpu spawner-ju xx1 RUNN 1-20:49:59 2-00:00:00 1 s-sc-gpu017 32 120G<br>
1291298 gpu interactiv xx2 RUNN 13:40 8:00:00 1 s-sc-gpu017 8 32000M<br>
<br>
<br>
scontrol show job 1288220 | grep TRES<br>
<br>
ReqTRES=cpu=32,mem=120G,node=1,billing=62,gres/shard=1<br>
AllocTRES=cpu=32,mem=120G,node=1,billing=62,gres/shard=1<br>
<br>
scontrol show job 1291298 | grep TRES<br>
<br>
ReqTRES=cpu=1,mem=32000M,node=1,billing=136,gres/gpu=1<br>
AllocTRES=cpu=8,mem=32000M,node=1,billing=143,gres/gpu=1,gres/gpu:nvidia_a100-pcie-40gb=1<br>
<br>
<br>
And the information on the node status shows both gres/gpu and gres/shard on the allocated TRES:<br>
<br>
scontrol show node s-sc-gpu017 | grep TRES<br>
<br>
CfgTRES=cpu=128,mem=500000M,billing=378,gres/gpu=1,gres/gpu:nvidia_a100-pcie-40gb=1,gres/shard=4<br>
AllocTRES=cpu=40,mem=154880M,gres/gpu=1,gres/gpu:nvidia_a100-pcie-40gb=1,gres/shard=1<br>
<br>
</div>
<div><br>
</div>
<div><br>
<br>
We are running Slurm Version 23.02.4 on Rocky Linux 8.5 and the shard related configuration in slurm.conf is as follows:<br>
<br>
GresTypes=gpu,shard,gpu/gfx90a,gpu/nvidia_a100-pcie-40gb,gpu/nvidia_a100-sxm4-40gb,gpu/nvidia_a100-sxm4-80gb,gpu/nvidia_a100_80gb_pcie<br>
<br>
AccountingStorageTRES=gres/gpu,gres/shard,gres/gpu:gfx90a,gres/gpu:nvidia_a100-pcie-40gb,gres/gpu:nvidia_a100-sxm4-40gb,gres/gpu:nvidia_a100-sxm4-80gb,gres/gpu:nvidia_a100_80gb_pcie<br>
<br>
NodeName=s-sc-gpu003 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 Gres=gpu:nvidia_a100-pcie-40gb:1,shard:4 State=UNKNOWN Weight=1<br>
<br>
NodeName=s-sc-gpu017 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 Gres=gpu:nvidia_a100-pcie-40gb:1,shard:4 State=UNKNOWN Weight=1<br>
<br>
NodeName=s-sc-gpu018 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 Gres=gpu:nvidia_a100-pcie-40gb:1,shard:4 State=UNKNOWN Weight=1<br>
<br>
NodeName=s-sc-gpu019 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 Gres=gpu:nvidia_a100-pcie-40gb:1,shard:4 State=UNKNOWN Weight=1<br>
<br>
NodeName=s-sc-gpu021 CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 Gres=gpu:nvidia_a100-pcie-40gb:1,shard:4 State=UNKNOWN Weight=1</div>
<br>
<p></p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p>Kind Regards,</p>
<p><br>
</p>
<p>Andreas</p>
<p><br>
</p>
<div id="Signature">
<div id="divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols;">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">
<div style="font-family:Tahoma; font-size:13px">-----------</div>
<div style="font-family:Tahoma; font-size:13px">Dr. Andreas Reppas
<div><br>
</div>
<div><span style="font-size:10.0pt; color:black" lang="DE">Geschäftsbereich IT | Scientific Computing</span></div>
<div><span style="font-size:10.0pt; color:black" lang="DE"><span style="font-size:10.0pt; color:black" lang="DE"><font style="font-size:10pt" size="2">Charité – Universitätsmedizin Berlin</font></span><br>
</span></div>
<div><span style="font-size:10.0pt; color:black" lang="DE">
<div>
<p class="western" style="margin-bottom:0in; line-height:100%"><font style="font-size:10pt" size="2"><br>
</font></p>
<p class="western" style="margin-bottom:0in; line-height:100%"><font style="font-size:10pt" size="2"></font></p>
<font style="font-size:10pt" size="2">
<p class="x_MsoNormal"><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"">Campus Charité
</span><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"" lang="DE">Virchow Klinikum</span></p>
<p class="x_MsoNormal"><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"" lang="DE">Forum 4</span><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont""> | Ebene 02 | Raum
</span><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"" lang="DE">2.020</span></p>
<p class="x_MsoNormal"><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"">Augustenburger Platz 1</span></p>
<p class="x_MsoNormal"><span style="color:black; font-family:"Calibri",sans-serif,serif,"EmojiFont"">13353 Berlin</span></p>
<br>
</font>
<p></p>
<p class="western" style="margin-bottom:0in; line-height:100%"><font style="font-size:10pt" size="2"><br>
</font></p>
</div>
</span></div>
<div>andreas.reppas@charite.de<span style="font-size:10.0pt; font-family:"Calibri",sans-serif" lang="DE"><br>
</span></div>
<div><span style="font-size:10.0pt; font-family:"Calibri",sans-serif" lang="DE"><a href="https://www.charite.de" class="OWAAutoLink" id="LPNoLP">https://www.charite.de</a><br>
</span></div>
<div><span style="font-size:10.0pt; font-family:"Calibri",sans-serif" lang="DE"><br>
</span></div>
<div><br>
<span style="font-size:10.0pt; font-family:"Calibri",sans-serif" lang="DE"></span><span style="font-family:"Calibri",sans-serif"></span></div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>