[slurm-users] "Low socket*core*thre" - solution?
Werner Saar
wernsaar at googlemail.com
Sat May 5 23:05:51 MDT 2018
Hi,
what is the output of the command:
slurmd -C rocks7
Best regards
Werner
On 05/05/2018 06:56 PM, Mahmood Naderan wrote:
> Quick follow up.
> I see the Sockets for the head node is 1 while for the compute nodes
> is 32. And I think that is the reason, why slurm only see one cpu
> (CPUTot=1).
>
> May I ask what is the difference between CPUs and Sockets in slurm.conf?
> Regards,
> Mahmood
>
>
>
>
> On Sat, May 5, 2018 at 9:24 PM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
>> Hi,
>> I also have the same problem. I think by default, slurm won't add the
>> head node as a compute node. I manually set the state to resume,
>> However, the number of cores is still low (1) and not what I specified
>> in slurm.conf
>>
>>
>> [root at rocks7 mahmood]# scontrol show node rocks7
>> NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
>> CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.14
>> AvailableFeatures=(null)
>> ActiveFeatures=(null)
>> Gres=(null)
>> NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
>> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
>> RealMemory=64261 AllocMem=0 FreeMem=1247 Sockets=1 Boards=1
>> State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A
>> MCS_label=N/A
>> Partitions=WHEEL,EMERALD
>> BootTime=2018-04-13T13:04:59 SlurmdStartTime=2018-04-13T13:05:17
>> CfgTRES=cpu=1,mem=64261M,billing=1
>> AllocTRES=
>> CapWatts=n/a
>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>> Reason=Low socket*core*thread count, Low CPUs [root at 2018-05-05T21:18:05]
>>
>> [root at rocks7 mahmood]# scontrol update node=rocks7 state=resume
>> [root at rocks7 mahmood]# scontrol show node rocks7
>> NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
>> CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.14
>> AvailableFeatures=(null)
>> ActiveFeatures=(null)
>> Gres=(null)
>> NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
>> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
>> RealMemory=64261 AllocMem=0 FreeMem=1247 Sockets=1 Boards=1
>> State=IDLE ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A MCS_label=N/A
>> Partitions=WHEEL,EMERALD
>> BootTime=2018-04-13T13:04:59 SlurmdStartTime=2018-04-13T13:05:17
>> CfgTRES=cpu=1,mem=64261M,billing=1
>> AllocTRES=
>> CapWatts=n/a
>> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>> [root at rocks7 mahmood]# grep -A 3 -B 3 rocks7 /etc/slurm/slurm.conf
>> DebugFlags=Priority,NO_CONF_HASH,backfill,BackfillMap
>>
>> NodeName=DEFAULT State=UNKNOWN
>> NodeName=rocks7 NodeAddr=10.1.1.1 CPUs=20
>> PartitionName=DEFAULT AllocNodes=rocks7 State=UP
>> PartitionName=DEBUG
>>
>> ####### Power Save Begin ##################
>>
>>
>>
>>
>>
>> Regards,
>> Mahmood
>>
>>
>>
>>
>> On Sat, May 5, 2018 at 5:06 PM, Chris Samuel <chris at csamuel.org> wrote:
>>> On Thursday, 3 May 2018 10:28:46 AM AEST Matt Hohmeister wrote:
>>>
>>>> …and it looks good, except for the drain on my server/compute node:
>>> I think if you've had the config wrong at some point in the past then slurmctld
>>> will remember the error and you'll need to manually clear it with:
>>>
>>> scontrol update node=${NODE} state=resume
>>>
>>> All the best,
>>> Chris
>>> --
>>> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
>>>
>>>
More information about the slurm-users
mailing list