[slurm-users] "Low socket*core*thre" - solution?

Mahmood Naderan mahmood.nt at gmail.com
Mon May 7 10:27:07 MDT 2018


O yes that was brilliant.

[root at rocks7 mahmood]# scontrol show node rocks7
NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.02
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
   OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
   RealMemory=64261 AllocMem=0 FreeMem=1863 Sockets=1 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A
MCS_label=N/A
   Partitions=WHEEL,EMERALD
   BootTime=2018-04-13T13:05:00 SlurmdStartTime=2018-04-13T13:05:17
   CfgTRES=cpu=1,mem=64261M,billing=1
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Low socket*core*thread count, Low CPUs [root at 2018-05-05T21:49:45]

[root at rocks7 mahmood]# systemctl restart slurmd
[root at rocks7 mahmood]# systemctl restart slurmctld
[root at rocks7 mahmood]# scontrol update node=rocks7 state=undrain
[root at rocks7 mahmood]# scontrol show node rocks7
NodeName=rocks7 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=17.11
   OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017
   RealMemory=64261 AllocMem=0 FreeMem=1833 Sockets=20 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=281775 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=WHEEL,EMERALD
   BootTime=2018-04-13T13:04:59 SlurmdStartTime=2018-05-07T20:53:02
   CfgTRES=cpu=20,mem=64261M,billing=20
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

[root at rocks7 mahmood]# grep rocks7 /etc/slurm/slurm.conf
NodeName=rocks7 NodeAddr=10.1.1.1 CPUs=20
PartitionName=DEFAULT AllocNodes=rocks7 State=UP



So the trick was to UNDRAIN the node and not RESUME it.
Thanks

Regards,
Mahmood




On Mon, May 7, 2018 at 2:27 PM, Werner Saar <wernsaar at googlemail.com> wrote:
> Hi Mahmood,
>
> Please try the following commands on rocks7:
> systemctl restart slurmd
> systemctl restart slurmctld
> scontrol update node=rocks7 state=undrain
>
>
> Best regards
>
> Werner



More information about the slurm-users mailing list