[slurm-users] Slurm 20.02.3 error: CPUs=1 match no Sockets, Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore. Resetting CPUs.
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Jun 16 09:35:27 UTC 2020
Hi Ahmet,
On 6/16/20 11:27 AM, mercan wrote:
> Did you check /var/log/messages file for errors. Systemctl logs this file,
> instead of the slurmctl log file.
>
> Ahmet M.
The syslog reports the same errors from slurmctld as are being reported by
every Slurm 20.02 command.
I have found a workaround: Replace NodeName lines "Boards=1
SocketsPerBoard=2" by "Sockets=2" in slurm.conf and reconfigure the
daemons. For some reason 20.02 doesn't handle "Boards" configurations
correctly.
Any site with "Boards" in slurm.conf should reconfigure to "Sockets"
before installing/upgrading to 20.02.
It may be a good idea to track updates to bug
https://bugs.schedmd.com/show_bug.cgi?id=9241
Best regards,
Ole
> 16.06.2020 11:12 tarihinde Ole Holm Nielsen yazdı:
>> Today we upgraded the controller node from 19.05 to 20.02.3, and
>> immediately all Slurm commands (on the controller node) give error
>> messages for all partitions:
>>
>> # sinfo --version
>> sinfo: error: NodeNames=a[001-140] CPUs=1 match no Sockets,
>> Sockets*CoresPerSocket or Sockets*CoresPerSocket*ThreadsPerCore.
>> Resetting CPUs.
>> (lines deleted)
>> slurm 20.02.3
>>
>> In slurm.conf we have defined NodeName like:
>>
>> NodeName=a[001-140] Weight=10001 Boards=1 SocketsPerBoard=2
>> CoresPerSocket=4 ThreadsPerCore=1 ...
>>
>> According to the slurm.conf manual the CPUs should then be calculated
>> automatically:
>>
>> "If CPUs is omitted, its default will be set equal to the product of
>> Boards, Sockets, CoresPerSocket, and ThreadsPerCore."
>>
>> Has anyone else seen this error with Slurm 20.02?
>>
>> I wonder if there is a problem with specifying SocketsPerBoard in stead
>> of Sockets? The slurm.conf manual doesn't seem to prefer one over the
>> other.
>>
>> I've opened a bug https://bugs.schedmd.com/show_bug.cgi?id=9241
More information about the slurm-users
mailing list