[slurm-users] Slurm 20.02.3 error: CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs.
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Jun 16 08:12:06 UTC 2020
Today we upgraded the controller node from 19.05 to 20.02.3, and
immediately all Slurm commands (on the controller node) give error
messages for all partitions:
# sinfo --version
sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets,
Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting
CPUs.
(lines deleted)
slurm 20.02.3
In slurm.conf we have defined NodeName like:
NodeName=a[001-140] Weight=10001 Boards=1 SocketsPerBoard=2
CoresPerSocket=4 ThreadsPerCore=1 ...
According to the slurm.conf manual the CPUs should then be calculated
automatically:
"If CPUs is omitted, its default will be set equal to the product of
Boards, Sockets, CoresPerSocket, and ThreadsPerCore."
Has anyone else seen this error with Slurm 20.02?
I wonder if there is a problem with specifying SocketsPerBoard in stead of
Sockets? The slurm.conf manual doesn't seem to prefer one over the other.
I've opened a bug https://bugs.schedmd.com/show_bug.cgi?id=9241
Thanks,
Ole
More information about the slurm-users
mailing list