[slurm-users] "Low socket*core*thre" - solution?

Mahmood Naderan mahmood.nt at gmail.com
Sun May 6 06:03:59 MDT 2018


The chassis of the frontend is the same as compute nodes. A mother
board with two opterons and each have 16 cores. However, the head node
is not included correctly, while the computes are added without
problem.

[root at rocks7 ~]# grep -R rocks7 /etc/slurm
/etc/slurm/partitions.conf.new:PartitionName=EMERALD
AllowAccounts=em1,em4 Nodes=compute-0-[2-4],rocks7
/etc/slurm/slurmdbd.conf:DbdHost=rocks7
/etc/slurm/head.conf:ControlMachine=rocks7
/etc/slurm/head.conf:DefaultStorageHost=rocks7
/etc/slurm/parts:PartitionName=EMERALD AllowAccounts=em1,em4
Nodes=compute-0-[2-4],rocks7
/etc/slurm/parts.conf:PartitionName=EMERALD AllowAccounts=em1,em4
Nodes=compute-0-[2-4],rocks7
/etc/slurm/slurm.conf:NodeName=rocks7 NodeAddr=10.1.1.1 CPUs=20
/etc/slurm/slurm.conf:PartitionName=DEFAULT AllocNodes=rocks7 State=UP
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]# slurmd -C rocks7
NodeName=rocks7 slurmd: Considering each NUMA node as a socket
CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1
RealMemory=64261
UpTime=23-02:45:32
[root at rocks7 ~]# slurmd -C compute-0-0
NodeName=rocks7 slurmd: Considering each NUMA node as a socket
CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1
RealMemory=64261
UpTime=23-02:45:36
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]# rocks run host compute-0-0 "lscpu"
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 1
Model name:            AMD Opteron(tm) Processor 6282 SE
Stepping:              2
CPU MHz:               1400.000
CPU max MHz:           2600.0000
CPU min MHz:           1400.0000
BogoMIPS:              5200.27
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl
nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3
cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt
lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate arat
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
decodeassists pausefilter pfthreshold
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          4
Vendor ID:             AuthenticAMD
CPU family:            21
Model:                 2
Model name:            AMD Opteron(tm) Processor 6380
Stepping:              0
CPU MHz:               1400.000
CPU max MHz:           2500.0000
CPU min MHz:           1400.0000
BogoMIPS:              4999.86
Virtualization:        AMD-V
L1d cache:             16K
L1i cache:             64K
L2 cache:              2048K
L3 cache:              6144K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl
nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3
fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
perfctr_nb cpb hw_pstate bmi1 arat npt lbrv svm_lock nrip_save
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
[root at rocks7 ~]#
[root at rocks7 ~]#
[root at rocks7 ~]# scontrol show node rocks7,compute-0-0
NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
   OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
   RealMemory=64261 AllocMem=0 FreeMem=10242 Sockets=1 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A
MCS_label=N/A
   Partitions=WHEEL,EMERALD
   BootTime=2018-04-13T13:05:00 SlurmdStartTime=2018-04-13T13:05:17
   CfgTRES=cpu=1,mem=64261M,billing=1
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Low socket*core*thread count, Low CPUs [root at 2018-05-05T21:49:45]

NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.01
   AvailableFeatures=rack-0,32CPUs
   ActiveFeatures=rack-0,32CPUs
   Gres=(null)
   NodeAddr=10.1.1.254 NodeHostName=compute-0-0 Version=17.11
   OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
   RealMemory=64261 AllocMem=0 FreeMem=63217 Sockets=32 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=444124 Weight=20511900
Owner=N/A MCS_label=N/A
   Partitions=CLUSTER,WHEEL,DIAMOND
   BootTime=2018-04-13T13:06:46 SlurmdStartTime=2018-05-05T21:17:51
   CfgTRES=cpu=32,mem=64261M,billing=47
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


[root at rocks7 ~]#
Regards,
Mahmood




On Sun, May 6, 2018 at 4:23 PM, Chris Samuel <chris at csamuel.org> wrote:
> On Sunday, 6 May 2018 7:28:55 PM AEST Mahmood Naderan wrote:
>
>> I also have noticed that State returned back to IDLE+DRAIN!
>
> Both you and Eric are having issues with Opteron 6300 series CPUs.
>
> I can't help but think the fact that each package in a socket has 2 NUMA nodes
> is the cause of your pain. So whilst Slurm says it's treating each NUMA node
> as a socket I wonder if at some point it's getting confused whether the number
> of sockets is really 2 or 4?
>
>> I am guessing to set Sockets to 32!!
>
> No, that's definitely wrong.
>
> What does this say?
>
> grep -R rocks7 /etc/slurm
>
> All the best,
> Chris
> --
>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>



More information about the slurm-users mailing list