[slurm-users] [External] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

Michael Robbert mrobbert at mines.edu
Thu Apr 23 17:41:48 UTC 2020


It looks like you have hyper-threading turned on, but haven’t defined the ThreadsPerCore=2. You either need to turn off Hyper-threading in the BIOS or changed the definition of ThreadsPerCore in slurm.conf.

 

Mike

 

From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Robert Kudyba <rkudyba at fordham.edu>
Reply-To: Slurm User Community List <slurm-users at lists.schedmd.com>
Date: Thursday, April 23, 2020 at 08:27
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: [External] [slurm-users] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

 

CAUTION: This email originated from outside of the Colorado School of Mines organization. Do not click on links or open attachments unless you recognize the sender and know the content is safe.

 

Running Slurm 20.02 on Centos 7.7 on Bright Cluster 8.2. slurm.conf is on the head node. I don't see these errors on the other 2 nodes. After restarting slurmd on node003 I see this:

 

slurmd[400766]: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw) CoresPerSocket=12:12(hw) ThreadsPerCore=1:2(hw)
Apr 23 10:05:49 node003 slurmd[400766]: Message aggregation disabled
Apr 23 10:05:49 node003 slurmd[400766]: CPU frequency setting not configured for this node
Apr 23 10:05:49 node003 slurmd[400770]: CPUs=24 Boards=1 Sockets=2 Cores=12 Threads=1 Memory=191880 TmpDisk=2038 Uptime=2488268 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null)

>From slurm.conf:
# Nodes
NodeName=node[001-003]  CoresPerSocket=12 RealMemory=191800 Sockets=2 Gres=gpu:v100:1
# Partitions
$O Hidden=NO OverSubscribe=FORCE:12 GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=N$
PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidde$
# Generic resources types
GresTypes=gpu,mic
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
SchedulerTimeSlice=60
EnforcePartLimits=YES

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Stepping:              4
CPU MHz:               2600.000
BogoMIPS:              5200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              19712K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47

cat /etc/slurm/cgroup.conf| grep -v '#'
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
TaskAffinity=no
ConstrainCores=no
ConstrainRAMSpace=no
ConstrainSwapSpace=no
ConstrainDevices=no
ConstrainKmemSpace=yes
AllowedRamSpace=100
AllowedSwapSpace=0
MinKmemSpace=30
MaxKmemPercent=100
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30

What else can I check?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/2d981e2a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5173 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/2d981e2a/attachment-0001.bin>


More information about the slurm-users mailing list