<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hello</p>
<p>We have four Xeon Phi (KNL) nodes with 64 cores SMT-4 each (256
hyperthreads total). They are configured in different KNL modes
(SNC4/flat, SNC4/cache, All2all/flat and all2all/cache). The node
that is in SNC4/Flat won't let us allocate all 256 hyperthreads.
Half the cores only get 2 hyperthreads instead of 4:<br>
</p>
<pre><code>$ srun -c256 -w kona02 --exclusive grep -i cpu /proc/self/status
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff,0000ffff,0000ffff,0000ffff,0000ffff
Cpus_allowed_list: 0-15,32-47,64-79,96-111,128-255</code></pre>
<p>Other nodes configured in other KNL modes are fine, we get all
256 hyperthreads:<br>
</p>
<pre><code>$ srun -c256 -w kona03 --exclusive grep -i cpu /proc/self/status
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Cpus_allowed_list: 0-255</code></pre>
<p> If we reconfigure the buggy node to All2all/cache, it works
fine. If we reconfigure another node to SNC4/flat, it starts
having the same issue. So it looks like something fails only when
KNL is configured in SNC4/Flat?</p>
<p>All nodes are configured the same in slurm.conf:</p>
<pre>NodeName=kona[01-04] Procs=256 CoresPerSocket=64 RealMemory=94000 Sockets=1 ThreadsPerCore=4 Feature=kona,intel,knightslanding,knl Weight=70</pre>
<p>FWIW, we're using SLURM 19.05.2. An upgrade in possible in the
future but not immediately. The "KNL" plugin is installed but we
don't think we've done anything to configure it (at least we never
used it to reconfigure/reboot KNL nodes).</p>
<p>Thanks</p>
<p>Brice</p>
<p><br>
</p>
</body>
</html>