[slurm-users] Ubuntu20.03 - DRAIN state with SMT in node capabilities

Wed Nov 24 17:13:30 UTC 2021

Dear all,

in our research we use a small cluster of PowerEdge R720 servers. Those 
have two sockets and are equipped with 2 Intel(R) Xeon(R) CPU E5-2650 v2 
@ 2.60GHz processors and 256GB each.

For a long time we used CentOS-7.8 as OS, I tried Debian10 and 
Ubuntu-20.04 LTS, wher the latter is the preferred OS to go on with. So, 
I would want to reinstall the other nodes all with Ubuntu-20.04.

I have the problem, that when I enable SMT in /etc/slurm/slurm.conf with 
the option ThreadsPerCore=2 in the nodes' capabilities, e.g.:

NodeName=ekgen[2-7] RealMemory=250000 Sockets=2 CoresPerSocket=8 
ThreadsPerCore=2 State=UNKNOWN

The Ubuntu-20.04-LTS-node quickly falls into drain state showing a state 
of "Low socket*core*threads":

sinfo -lNe
Wed Nov 24 16:52:58 2021
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK 
WEIGHT AVAIL_FE REASON
ekgen1         1  cluster*        idle   16    2:8:1 480000 0      1   
(null) none
ekgen2         1  cluster*       mixed   32    2:8:2 250000 0      1   
(null) none
ekgen3         1    debian        idle   32    2:8:2 250000 0      1   
(null) none
ekgen4         1  cluster*       mixed   32    2:8:2 250000 0      1   
(null) none
ekgen5         1  cluster*        idle   32    2:8:2 250000 0      1   
(null) none
ekgen6         1    debian        idle   32    2:8:2 250000 0      1   
(null) none
ekgen7         1  cluster*        idle   32    2:8:2 250000 0      1   
(null) none
ekgen8         1    debian    draining   32    2:8:2 250000 0      1   
(null) Low socket*core*thre
ekgen9         1  cluster*        idle   32    2:8:2 192000 0      1   
(null) none

And with the error message in slurmd.log:

[2021-04-30T18:34:09.551] error: Node configuration differs from 
hardware: Procs=16:32(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw) 
CoresPerSocket=8:8(hw) ThreadsPerCore=1:2(hw)

This message is a bit confusing for me: I thought the second values in 
the pairs with (hw) besides it are referring to the actual hardware values?

slurmd -C shows the same capabilities for the nodes running CentOS-7, 
Debian-10 and Ubuntu-20.04.

(as would be expected for identical hardware)

CentOS-7:

slurmd -C
NodeName=ekgen2 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 
ThreadsPerCore=2 RealMemory=257738
UpTime=161-05:27:14

slurmd --version
slurm 19.05.3-2

Debian-10:

slurmd -C

NodeName=ekgen3 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 
ThreadsPerCore=2 RealMemory=257971
UpTime=7-12:25:25

slurmd --version
slurm 19.05.8

Ubuntu-20.04:

slurmd -C
NodeName=ekgen8 CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 
ThreadsPerCore=2 RealMemory=257901
UpTime=161-18:15:37

slurmd --version
slurm 19.05.3-2

The CentOS-7 and Debian-10 nodes accept the SMT configuration and run 
fine without the DRAIN state problem. The slurmd versions differ only in 
the patch level versions, and the ones running on CentOS-7 and 
Ubuntu-20.04 are even the same.

If I change to node declaration to CoresPerSocket=16 and 
ThreadsPerCore=1 in slurm.conf, which isn't really the way HT or SMT 
works, where there not the double number of fully fledge cores present, 
but only the different arhythmetic units used by multiple threads:

NodeName=ekgen[8] RealMemory=250000 Sockets=2 CoresPerSocket=16 
ThreadsPerCore=1 State=UNKNOWN

(This also differs from what slurmd -C reports on Ubuntu-20.04)

and use it in the cluster configuration, the "drained" state seems not 
to come up:

sinfo -lNe
Wed Nov 24 18:02:09 2021
NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK 
WEIGHT AVAIL_FE REASON
ekgen1         1  cluster*        idle   16    2:8:1 480000 0      1   
(null) none
ekgen2         1  cluster*       mixed   32    2:8:2 250000 0      1   
(null) none
ekgen3         1    debian        idle   32    2:8:2 250000 0      1   
(null) none
ekgen4         1  cluster*       mixed   32    2:8:2 250000 0      1   
(null) none
ekgen5         1  cluster*        idle   32    2:8:2 250000 0      1   
(null) none
ekgen6         1    debian        idle   32    2:8:2 250000 0      1   
(null) none
ekgen7         1  cluster*        idle   32    2:8:2 250000 0      1   
(null) none
ekgen8         1    debian        idle   32   2:16:1 250000 0      1   
(null) none
ekgen9         1  cluster*        idle   32    2:8:2 192000 0      1   
(null) none

In practical use of SLURM this results in the same number of available 
job slots: 32, but detailed resource allocation for CoresPerSocket and 
ThreadsPerCore would not be the same.

Our users aren't using such specific resource allocation, though. It is 
hard enough to make specifiy appropriate memory requirements.

So, maybe this wouldn't be a big disadvantage, if that allows us to use 
32 slots on the "16 Cores with 2 SMT" Xeons in the PowerEdge R720 
machines with Ubuntu 20.04

Has anyone else encountered this problem? Is there a better/proper for 
using all SMT/HT cores?

Best regards,

Sven Duscha

-- 
Sven Duscha
Deutsches Herzzentrum München
Technische Universität München
Lazarettstraße 36
80636 München

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5463 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211124/90f76418/attachment.bin>