[slurm-users] Scheduling GPUS

Thu Nov 7 21:19:59 UTC 2019

Greetings all:

I'm attempting to  configure the scheduler to schedule our GPU boxes but
have run into a bit of a snag.

I have a box with two Tesla K80s.  With my current configuration, the
scheduler will schedule one job on the box, but if I submit a second job,
it queues up until the first one finishes:

My submit script:

#SBATCH --partition=NodeSet1

#SBATCH --nodes=1

#SBATCH --ntasks=1

#SBATCH --gres=gpu:k80:1

My slurm.conf (the things I think are relevant)

GresTypes=gpu

SelectType=select/cons_tres

PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES
MaxTime=INFINITE OverSubscribe=FORCE State=UP

NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1
RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN

My gres.conf:

NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]

and finally, the results of squeue:

$ squeue

             JOBID PARTITION     NAME     USER ST       TIME  NODES
NODELIST(REASON)

               208  NodeSet1   job.sh jmmosley PD       0:00      1
(Resources)

               207  NodeSet1   job.sh jmmosley  R       4:12      1 cph-gpu1

Any idea what I am missing or have misconfigured?

Thanks in advance.

Mike

-- 
*J. Michael Mosley*
University Research Computing
The University of North Carolina at Charlotte
9201 University City Blvd
Charlotte, NC  28223
*704.687.7065 *    * jmmosley at uncc.edu <mmosley at uncc.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191107/26c89137/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5329 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191107/26c89137/attachment-0001.bin>