[slurm-users] [External] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

Robert Kudyba rkudyba at fordham.edu
Thu Apr 23 18:52:41 UTC 2020


On Thu, Apr 23, 2020 at 1:43 PM Michael Robbert <mrobbert at mines.edu> wrote:

> It looks like you have hyper-threading turned on, but haven’t defined the
> ThreadsPerCore=2. You either need to turn off Hyper-threading in the BIOS
> or changed the definition of ThreadsPerCore in slurm.conf.
>

Nice find. node003 has hyper threading enabled but node001 and node002 do
not:
[root at node001 ~]# dmidecode -t processor | grep -E '(Core Count|Thread
Count)'
        Core Count: 12
        Thread Count: 12
        Core Count: 12
        Thread Count: 12

[root at node003 ~]# dmidecode -t processor | grep -E '(Core Count|Thread
Count)'
        Core Count: 12
        Thread Count: 24
        Core Count: 12
I found a great mini script <https://serverfault.com/a/792264/359447> to
disable hyperthreading without reboot. I did get the following warning but
I don't think it's a big issue:
 WARNING, didn't collect load info for all cpus, balancing is broken

Do I have to restart slurmctl on the head node and/or slurmd on node003?

Side question, are there ways with Slurm to test if hyperthreading improves
performance and job speed?

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/d4efef21/attachment.htm>


More information about the slurm-users mailing list