[slurm-users] [External] Scheduling GPUS
Loris Bennett
loris.bennett at fu-berlin.de
Tue Nov 12 07:31:11 UTC 2019
Hi,
I don't think the statement below about --nodes=1 is true. It just
means you want one and not more than one node. This can be important
multiple cores are requested, but the program is not, say, an MPI
program.
You can see which cores a running job is using with
scontrol show job --detail <job id>
HTH
Loris
Prentice Bisbal <pbisbal at pppl.gov> writes:
> Remove this line:
>
> #SBATCH --nodes=1
>
> Slurm assumes you're requesting the whole node. --ntasks=1 should be adequate.
>
>
> On 11/7/19 4:19 PM, Mike Mosley wrote:
>
> Greetings all:
>
> I'm attempting to configure the scheduler to schedule our GPU boxes but have run into a bit of a snag.
>
> I have a box with two Tesla K80s. With my current configuration, the scheduler will schedule one job on the box, but if I submit a second job, it queues up until the first
> one finishes:
>
> My submit script:
>
> #SBATCH --partition=NodeSet1
>
> #SBATCH --nodes=1
>
> #SBATCH --ntasks=1
>
> #SBATCH --gres=gpu:k80:1
>
> My slurm.conf (the things I think are relevant)
>
> GresTypes=gpu
>
> SelectType=select/cons_tres
>
> PartitionName=NodeSet1 Nodes=cht-c[1-4],cph-gpu1 Default=YES MaxTime=INFINITE OverSubscribe=FORCE State=UP
>
> NodeName=cph-gpu1 CPUs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=257541 Gres=gpu:k80:2 Feature=gpu State=UNKNOWN
>
> My gres.conf:
>
> NodeName=cph-gpu1 Name=gpu Type=k80 File=/dev/nvidia[0-1]
>
> and finally, the results of squeue:
>
> $ squeue
>
> JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
>
> 208 NodeSet1 job.sh jmmosley PD 0:00 1 (Resources)
>
> 207 NodeSet1 job.sh jmmosley R 4:12 1 cph-gpu1
>
> Any idea what I am missing or have misconfigured?
>
> Thanks in advance.
>
> Mike
>
> --
>
> J. Michael Mosley
> University Research Computing
> The University of North Carolina at Charlotte
> 9201 University City Blvd
> Charlotte, NC 28223
> 704.687.7065 jmmosley at uncc.edu
>
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the slurm-users
mailing list