[slurm-users] missing hyperthreads on Xeon Phi in SNC4/Flat mode

Brice Goglin Brice.Goglin at inria.fr
Thu Oct 14 08:58:18 UTC 2021


We have four Xeon Phi (KNL) nodes with 64 cores SMT-4 each (256 
hyperthreads total). They are configured in different KNL modes 
(SNC4/flat, SNC4/cache, All2all/flat and all2all/cache). The node that 
is in SNC4/Flat won't let us allocate all 256 hyperthreads. Half the 
cores only get 2 hyperthreads instead of 4:

|$ srun -c256 -w kona02 --exclusive grep -i cpu /proc/self/status 
Cpus_allowed_list: 0-15,32-47,64-79,96-111,128-255|

Other nodes configured in other KNL modes are fine, we get all 256 

|$ srun -c256 -w kona03 --exclusive grep -i cpu /proc/self/status 
Cpus_allowed_list: 0-255|

If we reconfigure the buggy node to All2all/cache, it works fine. If we 
reconfigure another node to SNC4/flat, it starts having the same issue. 
So it looks like something fails only when KNL is configured in SNC4/Flat?

All nodes are configured the same in slurm.conf:

NodeName=kona[01-04]     Procs=256 CoresPerSocket=64 RealMemory=94000 Sockets=1 ThreadsPerCore=4 Feature=kona,intel,knightslanding,knl Weight=70

FWIW, we're using SLURM 19.05.2. An upgrade in possible in the future 
but not immediately. The "KNL" plugin is installed but we don't think 
we've done anything to configure it (at least we never used it to 
reconfigure/reboot KNL nodes).



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211014/32cb4ece/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211014/32cb4ece/attachment.sig>

More information about the slurm-users mailing list