[slurm-users] Ubuntu20.04 - DRAIN state with SMT in node capabilities
Sven Duscha
sven.duscha at tum.de
Wed Nov 24 17:29:50 UTC 2021
Dear all,
a small update.
On 24.11.21 18:13, Sven Duscha wrote:
> So, maybe this wouldn't be a big disadvantage, if that allows us to
> use 32 slots on the "16 Cores with 2 SMT" Xeons in the PowerEdge R720
> machines with Ubuntu 20.04
>
>
> Has anyone else encountered this problem? Is there a better/proper for
> using all SMT/HT cores?
It took about half an hour - with no jobs running, besides some test
jobs - for the node to fall into "drained" state again:
sinfo -lNe
Wed Nov 24 18:23:05 2021
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
WEIGHT AVAIL_FE REASON
ekgen1 1 cluster* idle 16 2:8:1 480000 0 1
(null) none
ekgen2 1 cluster* mixed 32 2:8:2 250000 0 1
(null) none
ekgen3 1 debian idle 32 2:8:2 250000 0 1
(null) none
ekgen4 1 cluster* mixed 32 2:8:2 250000 0 1
(null) none
ekgen5 1 cluster* idle 32 2:8:2 250000 0 1
(null) none
ekgen6 1 debian idle 32 2:8:2 250000 0 1
(null) none
ekgen7 1 cluster* idle 32 2:8:2 250000 0 1
(null) none
ekgen8 1 debian drained 32 2:16:1 250000 0 1
(null) Low socket*core*thre
ekgen9 1 cluster* idle 32 2:8:2 192000 0 1
(null) none
Thus,
NodeName=ekgen[8] RealMemory=250000 Sockets=2 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN
isn't a working node declaration either.
The question remains why a declaration matching the output of slurmd -C
doesn't work with Ubuntu-20.04
P.S.: Fixed version typo in the subject.
--
Sven Duscha
Deutsches Herzzentrum München
Technische Universität München
Lazarettstraße 36
80636 München
+49 89 1218 2602
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5463 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20211124/0b2e7644/attachment-0001.bin>
More information about the slurm-users
mailing list