[slurm-users] GRES GPU issues
Lou Nicotra
lnicotra at interactions.com
Mon Dec 3 07:13:09 MST 2018
Hi All, I have recently set up a slurm cluster with my servers and I'm
running into an issue while submitting GPU jobs. It has something to to
with gres configurations, but I just can't seem to figure out what is
wrong. Non GPU jobs run fine.
The error is as follows:
sbatch: error: Batch job submission failed: Invalid Trackable RESource
(TRES) specification after submitting a batch job.
My batch job is as follows:
#!/bin/bash
#SBATCH --partition=tiger_1 # partition name
#SBATCH --gres=gpu:k20:1
#SBATCH --gres-flags=enforce-binding
#SBATCH --time=0:20:00 # wall clock limit
#SBATCH --output=gpu-%J.txt
#SBATCH --account=lnicotra
module load cuda
python gpu1
Where gpu1 is a GPU test script that runs correctly while invoked via
python. Tiger_1 partition has servers with GPUs, with a mix of 1080GTX and
K20 as specified in slurm.conf
I have defined GRES resources in the slurm.conf file:
# GPU GRES
GresTypes=gpu
NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2
NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2
And have a local gres.conf on the servers containing GPUs...
lnicotra at tiger11 ~# cat /etc/slurm/gres.conf
# GPU Definitions
# NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=K20
File=/dev/nvidia[0-1]
Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1
and a similar one for the 1080GTX
# GPU Definitions
# NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX File=/dev/nvidia[0-1]
Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1
The account manager seems to know about the GPUs...
lnicotra at tiger11 ~# sacctmgr show tres
Type Name ID
-------- --------------- ------
cpu 1
mem 2
energy 3
node 4
billing 5
fs disk 6
vmem 7
pages 8
gres gpu 1001
gres gpu:k20 1002
gres gpu:1080gtx 1003
Can anyone point out what am I missing?
Thanks!
Lou
--
*Lou Nicotra*
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <781-405-5114>
m: 908-451-6983 <781-405-5114>
*lnicotra at interactions.com <lnicotra at interactions.com>*
www.interactions.com
--
*******************************************************************************
This e-mail and any of its attachments may contain
Interactions LLC
proprietary information, which is privileged,
confidential, or subject to
copyright belonging to the Interactions
LLC. This e-mail is intended solely
for the use of the individual or
entity to which it is addressed. If you
are not the intended recipient of this
e-mail, you are hereby notified that
any dissemination, distribution, copying,
or action taken in relation to
the contents of and attachments to this e-mail
is strictly prohibited and
may be unlawful. If you have received this e-mail in
error, please notify
the sender immediately and permanently delete the original
and any copy of
this e-mail and any printout. Thank You.
*******************************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181203/f5ec89e1/attachment.html>
More information about the slurm-users
mailing list