[slurm-users] [External] Scheduling GPUS
Prentice Bisbal
pbisbal at pppl.gov
Mon Nov 11 18:17:39 UTC 2019
Remove this line:
#SBATCH --nodes=1
Slurm assumes you're requesting the whole node. --ntasks=1 should be
adequate.
On 11/7/19 4:19 PM, Mike Mosley wrote:
> Greetings all:
>
> I'm attempting to configure the scheduler to schedule our GPU boxes
> but have run into a bit of a snag.
>
> I have a box with two Tesla K80s. With my current configuration, the
> scheduler will schedule one job on the box, but if I submit a second
> job, it queues up until the first one finishes:
>
> My submit script:
>
> #SBATCH --partition=NodeSet1
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks=1
>
> #SBATCH --gres=gpu:k80:1
>
>
> My slurm.conf (the things I think are relevant)
>
> GresTypes=gpu
>
> SelectType=select/cons_tres
>
>
> PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES
> MaxTime=INFINITE OverSubscribe=FORCE State=UP
>
>
> NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1
> RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
>
>
>
> My gres.conf:
>
> NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
>
>
>
> and finally, the results of squeue:
>
> $ squeue
>
> JOBID PARTITION NAME USER ST TIMENODES NODELIST(REASON)
>
> 208NodeSet1 job.sh jmmosley PD 0:001 (Resources)
>
> 207NodeSet1 job.sh jmmosleyR 4:121 cph-gpu1
>
>
> Any idea what I am missing or have misconfigured?
>
>
>
> Thanks in advance.
>
>
> Mike
>
>
> --
>
> */J. Michael Mosley/*
> University Research Computing
> The University of North Carolina at Charlotte
> 9201 University City Blvd
> Charlotte, NC 28223
> _704.687.7065 _ _ j/mmosley at uncc.edu <mailto:mmosley at uncc.edu>/_
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191111/c380d3b6/attachment-0001.htm>
More information about the slurm-users
mailing list