<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear all,</p>
<p>we have a node with 2 x 64 CPUs (with two threads each) and 8
GPUs, running slurm 22.05.5<br>
</p>
<p>In order to make use of individual threads, we changed<code><br>
</code></p>
<p><code>SelectTypeParameters=CR_Core</code><code><br>
NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64
ThreadsPerCore=2
</code></p>
<p>to</p>
<pre><code>SelectTypeParameters=CR_CPU
NodeName=nodename CPUs=256</code></pre>
<p></p>
<pre>
</pre>
We are now able to allocate individual threads to jobs, despite the
following error in slurmd.log: <br>
<pre>error: Node configuration differs from hardware: CPUs=256:256(hw) Boards=1:1(hw) SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw) ThreadsPerCore=1:2(hw)
</pre>
<p>However, it appears that since this change, we can only make use
of 4 out of the 8 GPUs. <br>
The output of "sinfo -o %G" might be relevant.</p>
<p>In the first situation it was<br>
</p>
<pre>$ sinfo -o %G
GRES
gpu:A100:8(S:0,1)
</pre>
<p>Now it is:<br>
</p>
<p></p>
<pre></pre>
<pre>$ sinfo -o %G
GRES
gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
</pre>
<p>
</p>
<p><code></code>Has anyone faced this or a similar issue and can
give me some directions?<br>
Best wishes</p>
<p>Sebastian<br>
</p>
<p> <code></code></p>
<p></p>
</body>
</html>