[slurm-users] Node is not allocating all CPUs

Brian Andrus toomuchit at gmail.com
Tue Apr 5 22:14:14 UTC 2022


You want to see what is output on the node itself when you run:


slurmd -C


Brian Andrus


On 4/5/2022 2:11 PM, Guertin, David S. wrote:
> We've added a new GPU node to our cluster with 32 cores. It contains 2 
> 16-core sockets, and hyperthreading is turned off, so the total is 32 
> cores. But jobs are only being allowed to use 16 cores.
>
> Here's the relevant line from slurm.conf:
>
> NodeName=node020 CoresPerSocket=16 RealMemory=257600 ThreadsPerCore=1 
> Boards=1 SocketsPerBoard=2 Weight=100 Gres=gpu:rtxa5000:4
>
> And here's scontrol output for the node. Note that even though 
> CPUTot=32, CfgTRES=cpu=16 instead of 32:
>
> # scontrol show node node020
> NodeName=node020 Arch=x86_64 CoresPerSocket=16
>    CPUAlloc=16 CPUTot=32 CPULoad=7.29
>    AvailableFeatures=(null)
>    ActiveFeatures=(null)
>    Gres=gpu:rtxa5000:4
>    NodeAddr=node020 NodeHostName=node020 Version=19.05.8
>    OS=Linux 3.10.0-1160.59.1.el7.x86_64 #1 SMP Wed Feb 23 16:47:03 UTC 
> 2022
>    RealMemory=257600 AllocMem=126976 FreeMem=1393 Sockets=2 Boards=1
>    State=MIXED ThreadsPerCore=1 TmpDisk=2038 Weight=100 Owner=N/A 
> MCS_label=N/A
>    Partitions=gpu-long,gpu-short,gpu-standard
>    BootTime=2022-04-05T11:37:08 SlurmdStartTime=2022-04-05T11:43:00
>    CfgTRES=cpu=16,mem=257600M,billing=16,gres/gpu=4
>    AllocTRES=cpu=16,mem=124G,gres/gpu=2
>    CapWatts=n/a
>    CurrentWatts=0 AveWatts=0
>    ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> Why isn't this node allocating all 32 cores?
>
> Thanks,
> David Guertin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220405/4d838af2/attachment-0001.htm>


More information about the slurm-users mailing list