[slurm-users] slurmd: error: Node configuration differs from hardware: CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)
Robert Kudyba
rkudyba at fordham.edu
Thu Apr 23 14:25:12 UTC 2020
Running Slurm 20.02 on Centos 7.7 on Bright Cluster 8.2. slurm.conf is on
the head node. I don't see these errors on the other 2 nodes. After
restarting slurmd on node003 I see this:
slurmd[400766]: error: Node configuration differs from hardware:
CPUs=24:48(hw) Boards=1:1(hw) SocketsPerBoard=2:2(hw)
CoresPerSocket=12:12(hw) ThreadsPerCore=1:2(hw)
Apr 23 10:05:49 node003 slurmd[400766]: Message aggregation disabled
Apr 23 10:05:49 node003 slurmd[400766]: CPU frequency setting not
configured for this node
Apr 23 10:05:49 node003 slurmd[400770]: CPUs=24 Boards=1 Sockets=2 Cores=12
Threads=1 Memory=191880 TmpDisk=2038 Uptime=2488268 CPUSpecList=(null)
FeaturesAvail=(null) FeaturesActive=(null)
>From slurm.conf:
# Nodes
NodeName=node[001-003] CoresPerSocket=12 RealMemory=191800 Sockets=2
Gres=gpu:v100:1
# Partitions
$O Hidden=NO OverSubscribe=FORCE:12 GraceTime=0 PreemptMode=OFF ReqResv=NO
AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=N$
PartitionName=gpuq Default=NO MinNodes=1 AllowGroups=ALL
PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidde$
# Generic resources types
GresTypes=gpu,mic
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
SchedulerTimeSlice=60
EnforcePartLimits=YES
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Stepping: 4
CPU MHz: 2600.000
BogoMIPS: 5200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 19712K
NUMA node0 CPU(s):
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s):
1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
cat /etc/slurm/cgroup.conf| grep -v '#'
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=no
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"
TaskAffinity=no
ConstrainCores=no
ConstrainRAMSpace=no
ConstrainSwapSpace=no
ConstrainDevices=no
ConstrainKmemSpace=yes
AllowedRamSpace=100
AllowedSwapSpace=0
MinKmemSpace=30
MaxKmemPercent=100
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30
What else can I check?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/6ecd9565/attachment.htm>
More information about the slurm-users
mailing list