[slurm-users] "Low socket*core*thre" - solution?

Werner Saar wernsaar at googlemail.com
Mon May 7 03:57:38 MDT 2018


Hi Mahmood,

Please try the following commands on rocks7:

systemctl restart slurmd

systemctl restart slurmctld

scontrol update node=rocks7 state=undrain


Best regards

Werner


On 05/06/2018 02:09 PM, Mahmood Naderan wrote:
> Still I think for some reasons, slurms put the frontend in drain
> state. Maybe, in order not to overload the main node by user jobs, it
> set the state to drain which is actually fake. I also checked the
> commands used in the slurm roll (package from Werner) and nothing was
> incorrect. Similar to setting up slurm manually on a cluster, but this
> time some automated scripts.
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sun, May 6, 2018 at 4:33 PM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
>> The chassis of the frontend is the same as compute nodes. A mother
>> board with two opterons and each have 16 cores. However, the head node
>> is not included correctly, while the computes are added without
>> problem.
>>
>> [root at rocks7 ~]# grep -R rocks7 /etc/slurm
>> /etc/slurm/partitions.conf.new:PartitionName=EMERALD
>> AllowAccounts=em1,em4 Nodes=compute-0-[2-4],rocks7
>> /etc/slurm/slurmdbd.conf:DbdHost=rocks7
>> /etc/slurm/head.conf:ControlMachine=rocks7
>> /etc/slurm/head.conf:DefaultStorageHost=rocks7
>> /etc/slurm/parts:PartitionName=EMERALD AllowAccounts=em1,em4
>> Nodes=compute-0-[2-4],rocks7
>> /etc/slurm/parts.conf:PartitionName=EMERALD AllowAccounts=em1,em4
>> Nodes=compute-0-[2-4],rocks7
>> /etc/slurm/slurm.conf:NodeName=rocks7 NodeAddr=10.1.1.1 CPUs=20
>> /etc/slurm/slurm.conf:PartitionName=DEFAULT AllocNodes=rocks7 State=UP
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]# slurmd -C rocks7
>> NodeName=rocks7 slurmd: Considering each NUMA node as a socket
>> CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1
>> RealMemory=64261
>> UpTime=23-02:45:32
>> [root at rocks7 ~]# slurmd -C compute-0-0
>> NodeName=rocks7 slurmd: Considering each NUMA node as a socket
>> CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1
>> RealMemory=64261
>> UpTime=23-02:45:36
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]# rocks run host compute-0-0 "lscpu"
>> Warning: untrusted X11 forwarding setup failed: xauth key data not generated
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                32
>> On-line CPU(s) list:   0-31
>> Thread(s) per core:    2
>> Core(s) per socket:    8
>> Socket(s):             2
>> NUMA node(s):          4
>> Vendor ID:             AuthenticAMD
>> CPU family:            21
>> Model:                 1
>> Model name:            AMD Opteron(tm) Processor 6282 SE
>> Stepping:              2
>> CPU MHz:               1400.000
>> CPU max MHz:           2600.0000
>> CPU min MHz:           1400.0000
>> BogoMIPS:              5200.27
>> Virtualization:        AMD-V
>> L1d cache:             16K
>> L1i cache:             64K
>> L2 cache:              2048K
>> L3 cache:              6144K
>> NUMA node0 CPU(s):     0-7
>> NUMA node1 CPU(s):     8-15
>> NUMA node2 CPU(s):     16-23
>> NUMA node3 CPU(s):     24-31
>> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
>> mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl
>> nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3
>> cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic
>> cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt
>> lwp fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate arat
>> npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
>> decodeassists pausefilter pfthreshold
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]# lscpu
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                32
>> On-line CPU(s) list:   0-31
>> Thread(s) per core:    2
>> Core(s) per socket:    8
>> Socket(s):             2
>> NUMA node(s):          4
>> Vendor ID:             AuthenticAMD
>> CPU family:            21
>> Model:                 2
>> Model name:            AMD Opteron(tm) Processor 6380
>> Stepping:              0
>> CPU MHz:               1400.000
>> CPU max MHz:           2500.0000
>> CPU min MHz:           1400.0000
>> BogoMIPS:              4999.86
>> Virtualization:        AMD-V
>> L1d cache:             16K
>> L1i cache:             64K
>> L2 cache:              2048K
>> L3 cache:              6144K
>> NUMA node0 CPU(s):     0-7
>> NUMA node1 CPU(s):     8-15
>> NUMA node2 CPU(s):     16-23
>> NUMA node3 CPU(s):     24-31
>> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx
>> mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl
>> nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3
>> fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy
>> svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
>> xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core
>> perfctr_nb cpb hw_pstate bmi1 arat npt lbrv svm_lock nrip_save
>> tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
>> [root at rocks7 ~]#
>> [root at rocks7 ~]#
>> [root at rocks7 ~]# scontrol show node rocks7,compute-0-0
>> NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
>>     CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.01
>>     AvailableFeatures=(null)
>>     ActiveFeatures=(null)
>>     Gres=(null)
>>     NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
>>     OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
>>     RealMemory=64261 AllocMem=0 FreeMem=10242 Sockets=1 Boards=1
>>     State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A
>> MCS_label=N/A
>>     Partitions=WHEEL,EMERALD
>>     BootTime=2018-04-13T13:05:00 SlurmdStartTime=2018-04-13T13:05:17
>>     CfgTRES=cpu=1,mem=64261M,billing=1
>>     AllocTRES=
>>     CapWatts=n/a
>>     CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>     ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>     Reason=Low socket*core*thread count, Low CPUs [root at 2018-05-05T21:49:45]
>>
>> NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1
>>     CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.01
>>     AvailableFeatures=rack-0,32CPUs
>>     ActiveFeatures=rack-0,32CPUs
>>     Gres=(null)
>>     NodeAddr=10.1.1.254 NodeHostName=compute-0-0 Version=17.11
>>     OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
>>     RealMemory=64261 AllocMem=0 FreeMem=63217 Sockets=32 Boards=1
>>     State=IDLE ThreadsPerCore=1 TmpDisk=444124 Weight=20511900
>> Owner=N/A MCS_label=N/A
>>     Partitions=CLUSTER,WHEEL,DIAMOND
>>     BootTime=2018-04-13T13:06:46 SlurmdStartTime=2018-05-05T21:17:51
>>     CfgTRES=cpu=32,mem=64261M,billing=47
>>     AllocTRES=
>>     CapWatts=n/a
>>     CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>>     ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>>
>>
>> [root at rocks7 ~]#
>> Regards,
>> Mahmood




More information about the slurm-users mailing list