[slurm-users] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)

Robert Kudyba rkudyba at fordham.edu
Thu Apr 23 14:25:12 UTC 2020


Running Slurm 20.02 on Centos 7.7 on Bright Cluster 8.2. slurm.conf is on
the head node. I don't see these errors on the other 2 nodes. After
restarting slurmd on node003 I see this:

slurmd[400766]: error: Node configuration differs from hardware:
CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)
CoresPerSocket=12:12(hw) ThreadsPerCore=1:2(hw)
Apr 23 10:05:49 node003 slurmd[400766]: Message aggregation disabled
Apr 23 10:05:49 node003 slurmd[400766]: CPU frequency setting not
configured for this node
Apr 23 10:05:49 node003 slurmd[400770]: CPUs=24 Boards=1 Sockets=2 Cores=12
Threads=1 Memory=191880 TmpDisk=2038 Uptime=2488268 CPUSpecList=(null)
FeaturesAvail=(null) FeaturesActive=(null)

>From slurm.conf:
# Nodes
NodeName=node[001-003]  CoresPerSocket=12 RealMemory=191800 Sockets=2
Gres=gpu:v100:1
# Partitions
$O Hidden=NO OverSubscribe=FORCE:12 GraceTime=0 PreemptMode=OFF ReqResv=NO
AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=N$
PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL
PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidde$
# Generic resources types
GresTypes=gpu,mic
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
SchedulerTimeSlice=60
EnforcePartLimits=YES

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Stepping:              4
CPU MHz:               2600.000
BogoMIPS:              5200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              19712K
NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47

cat /etc/slurm/cgroup.conf| grep -v '#'
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
TaskAffinity=no
ConstrainCores=no
ConstrainRAMSpace=no
ConstrainSwapSpace=no
ConstrainDevices=no
ConstrainKmemSpace=yes
AllowedRamSpace=100
AllowedSwapSpace=0
MinKmemSpace=30
MaxKmemPercent=100
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30

What else can I check?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/6ecd9565/attachment.htm>


More information about the slurm-users mailing list