[slurm-users] Can one specify attributes on a GRES resource?
Will Dennis
wdennis at nec-labs.com
Fri Mar 22 02:39:50 UTC 2019
I tried doing this as follows:
Node's gres.conf:
##################################################################
# Slurm's Generic Resource (GRES) configuration file
##################################################################
Name=gpu File=/dev/nvidia0 Type=1050TI
Name=gpu_mem_per_card Count=4G
Name=gpu_cores_per_card Count=768
>From slurm.conf:
NodeName=n75 CPUs=32 Gres=gpu:1,gpu_mem_per_card:no_consume:4G,gpu_cores_per_card:no_consume:768 Feature=GPUMODEL_1050TI
But, when I restarted slurmctld, distributed the slurm.conf to the cluster nodes, and did a "scontrol reconfigure", the node went into a "DRAIN" state, with the following
[root at slurm-controller ~]# scontrol show node n75
NodeName=n75 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.01
AvailableFeatures=GPUMODEL_1050TI
ActiveFeatures=GPUMODEL_1050TI
Gres=gpu:1,gpu_mem_per_card:no_consume:4G,gpu_cores_per_card:no_consume:768
NodeAddr=n75 NodeHostName=n75 Version=16.05
OS=Linux RealMemory=128905 AllocMem=0 FreeMem=124347 Sockets=32 Boards=1
State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=240180 Weight=1 Owner=N/A MCS_label=N/A
BootTime=2019-02-13T14:35:03 SlurmdStartTime=2019-02-13T15:07:27
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Reason=gres/gpu_mem_per_card count too low (0 < 4294967296) [root at 2019-03-21T22:23:59] <<<<<<<<<<<<<<<
Why does it think that the "gres/gpu_mem_per_card" count is 0? How can I fix this?
-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Quirin Lohr
Sent: Wednesday, March 20, 2019 4:06 AM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] Can one specify attributes on a GRES resource?
Hi Will,
I solved this by creating a new GRES:
Some nodes have VRAM:no_consume:12G
Some nodes have VRAM:no_consume:24G
"no_consume" because it would be for the whole node otherwise.
It only works because the nodes only have one type of GPUs each.
It is then requested with --gres=gpu:1,VRAM:16G
Here an extract of my slurm.conf
> NodeName=node7 Gres=gpu:p6000:8,VRAM:no_consume:24G Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257843 Weight=10 Feature=p6000
> NodeName=node6 Gres=gpu:titanxpascal:8,VRAM:no_consume:12G Boards=1 SocketsPerBoard=2 CoresPerSocket=18 ThreadsPerCore=1 RealMemory=257854 Weight=1 Feature=titanxp
The cudacores could be implemented accordingly (which is a nice idea btw.).
Regards
Quirin
--
Quirin Lohr
Systemadministration
Technische Universität München
Fakultät für Informatik
Lehrstuhl für Bildverarbeitung und Mustererkennung
Boltzmannstrasse 3
85748 Garching
Tel. +49 89 289 17769
Fax +49 89 289 17757
quirin.lohr at in.tum.de
www.vision.in.tum.de
More information about the slurm-users
mailing list