[slurm-users] can't allocate 1 gpu per job

Erik Bryer ebryer at isi.edu
Wed Jan 6 22:21:11 UTC 2021


I have 4 gres gpus called foolsgold that I am trying to allocate, 1-to-a-job. But allocating 1 gpu allocates all gpus to that job, it seems. My batch script is:
#!/bin/bash
#SBATCH --partition=scavenge
#SBATCH --qos=scavenge
#SBATCH --account=borrowed
#SBATCH --nodes=1
#SBATCH --tasks=1
#SBATCH --time=00:05:20
#SBATCH --gpus=foolsgold:1
date
hostname -s
for ((i=1;i<=1000000000;i++)) ; do a=$((i++)) ; done
date

And the partition definition is:
PartitionName=scavtres Nodes=saga-test01,saga-test02 MaxTime=72:00:00 State=UP PriorityTier=0 PreemptMode=REQUEUE AllowQos=scavenge AllowAccounts=borrowed,gaia default=yes TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/foolsgold=200.0" OverSubscribe=FORCE

I have 2 compute nodes in this test cluster, each one with 4 gpus defined:
    NodeName=saga-test01 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4
    NodeName=saga-test02 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:foolsgold:4

The /etc/slurm/gres.conf on the two compute nodes:
Name=gpu Type=foolsgold File=/tmp/fg0
Name=gpu Type=foolsgold File=/tmp/fg1
Name=gpu Type=foolsgold File=/tmp/fg2
Name=gpu Type=foolsgold File=/tmp/fg3

How can I get one gpu allocated per job?

Thanks,

Erik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210106/9844e37b/attachment.htm>


More information about the slurm-users mailing list