[slurm-users] Slurm 20.02.3 error: CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs.

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Jun 16 08:12:06 UTC 2020


Today we upgraded the controller node from 19.05 to 20.02.3, and 
immediately all Slurm commands (on the controller node) give error 
messages for all partitions:

# sinfo --version
sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets, 
Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting 
CPUs.
(lines deleted)
slurm 20.02.3

In slurm.conf we have defined NodeName like:

NodeName=a[001-140] Weight=10001 Boards=1 SocketsPerBoard=2 
CoresPerSocket=4 ThreadsPerCore=1 ...

According to the slurm.conf manual the CPUs should then be calculated 
automatically:

"If CPUs is omitted, its default will be set equal to the product of 
Boards, Sockets, CoresPerSocket, and ThreadsPerCore."

Has anyone else seen this error with Slurm 20.02?

I wonder if there is a problem with specifying SocketsPerBoard in stead of 
Sockets?  The slurm.conf manual doesn't seem to prefer one over the other.

I've opened a bug https://bugs.schedmd.com/show_bug.cgi?id=9241

Thanks,
Ole




More information about the slurm-users mailing list