[slurm-users] [External] Scheduling GPUS

Mon Nov 11 18:17:39 UTC 2019

Remove this line:

#SBATCH --nodes=1

Slurm assumes you're requesting the whole node. --ntasks=1 should be 
adequate.

On 11/7/19 4:19 PM, Mike Mosley wrote:
> Greetings all:
>
> I'm attempting to  configure the scheduler to schedule our GPU boxes 
> but have run into a bit of a snag.
>
> I have a box with two Tesla K80s.  With my current configuration, the 
> scheduler will schedule one job on the box, but if I submit a second 
> job, it queues up until the first one finishes:
>
> My submit script:
>
> #SBATCH --partition=NodeSet1
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks=1
>
> #SBATCH --gres=gpu:k80:1
>
>
> My slurm.conf (the things I think are relevant)
>
> GresTypes=gpu
>
> SelectType=select/cons_tres
>
>
> PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES 
> MaxTime=INFINITE OverSubscribe=FORCE State=UP
>
>
> NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 
> RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
>
>
>
> My gres.conf:
>
> NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
>
>
>
> and finally, the results of squeue:
>
> $ squeue
>
> JOBID PARTITION NAME USER ST TIMENODES NODELIST(REASON)
>
> 208NodeSet1 job.sh jmmosley PD 0:001 (Resources)
>
> 207NodeSet1 job.sh jmmosleyR 4:121 cph-gpu1
>
>
> Any idea what I am missing or have misconfigured?
>
>
>
> Thanks in advance.
>
>
> Mike
>
>
> -- 
>
> */J. Michael Mosley/*
> University Research Computing
> The University of North Carolina at Charlotte
> 9201 University City Blvd
> Charlotte, NC  28223
> _704.687.7065 _ _ j/mmosley at uncc.edu <mailto:mmosley at uncc.edu>/_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191111/c380d3b6/attachment-0001.htm>