[slurm-users] GRES GPU issues

Lou Nicotra lnicotra at interactions.com
Tue Dec 4 07:31:18 MST 2018


Thanks Michael. I will try 17.x as I also could not see anything wrong with
my settings... Will report back afterwards...

Lou

On Tue, Dec 4, 2018 at 9:11 AM Michael Di Domenico <mdidomenico4 at gmail.com>
wrote:

> unfortunately, someone smarter then me will have to help further.  I'm
> not sure i see anything specifically wrong.  The one thing i might try
> is backing the software down to a 17.x release series.  I recently
> tried 18.x and had some issues.  I can't say whether it'll be any
> different, but you might be exposing an undiagnosed bug in the 18.x
> branch
> On Mon, Dec 3, 2018 at 4:17 PM Lou Nicotra <lnicotra at interactions.com>
> wrote:
> >
> > Made the change in the gres.conf on local server file and restarted
> slurmd and slurmctld on master.... Unfortunately same error...
> >
> > Distributed corrected gres.conf to all k20 servers, restarted slurmd and
> slurmdctl...   Still has same error...
> >
> > On Mon, Dec 3, 2018 at 4:04 PM Brian W. Johanson <bjohanso at psc.edu>
> wrote:
> >>
> >> Is that a lowercase k in k20 specified in the batch script and nodename
> and a uppercase K specified in gres.conf?
> >>
> >> On 12/03/2018 09:13 AM, Lou Nicotra wrote:
> >>
> >> Hi All, I have recently set up a slurm cluster with my servers and I'm
> running into an issue while submitting GPU jobs. It has something to to
> with gres configurations, but I just can't seem to figure out what is
> wrong. Non GPU jobs run fine.
> >>
> >> The error is as follows:
> >> sbatch: error: Batch job submission failed: Invalid Trackable RESource
> (TRES) specification  after submitting a batch job.
> >>
> >> My batch job is as follows:
> >> #!/bin/bash
> >> #SBATCH --partition=tiger_1   # partition name
> >> #SBATCH --gres=gpu:k20:1
> >> #SBATCH --gres-flags=enforce-binding
> >> #SBATCH --time=0:20:00  # wall clock limit
> >> #SBATCH --output=gpu-%J.txt
> >> #SBATCH --account=lnicotra
> >> module load cuda
> >> python gpu1
> >>
> >> Where gpu1 is a GPU test script that runs correctly while invoked via
> python. Tiger_1 partition has servers with GPUs, with a mix of 1080GTX and
> K20 as specified in slurm.conf
> >>
> >> I have defined GRES resources in the slurm.conf file:
> >> # GPU GRES
> >> GresTypes=gpu
> >> NodeName=tiger[01,05,10,15,20] Gres=gpu:1080gtx:2
> >> NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Gres=gpu:k20:2
> >>
> >> And have a local gres.conf on the servers containing GPUs...
> >> lnicotra at tiger11 ~# cat /etc/slurm/gres.conf
> >> # GPU Definitions
> >> # NodeName=tiger[02-04,06-09,11-14,16-19,21-22] Name=gpu Type=K20
> File=/dev/nvidia[0-1]
> >> Name=gpu Type=K20 File=/dev/nvidia[0-1] Cores=0,1
> >>
> >> and a similar one for the 1080GTX
> >> # GPU Definitions
> >> # NodeName=tiger[01,05,10,15,20] Name=gpu Type=1080GTX
> File=/dev/nvidia[0-1]
> >> Name=gpu Type=1080GTX File=/dev/nvidia[0-1] Cores=0,1
> >>
> >> The account manager seems to know about the GPUs...
> >> lnicotra at tiger11 ~# sacctmgr show tres
> >>     Type            Name     ID
> >> -------- --------------- ------
> >>      cpu                      1
> >>      mem                      2
> >>   energy                      3
> >>     node                      4
> >>  billing                      5
> >>       fs            disk      6
> >>     vmem                      7
> >>    pages                      8
> >>     gres             gpu   1001
> >>     gres         gpu:k20   1002
> >>     gres     gpu:1080gtx   1003
> >>
> >> Can anyone point out what am I missing?
> >>
> >> Thanks!
> >> Lou
> >>
> >>
> >> --
> >>
> >> Lou Nicotra
> >>
> >> IT Systems Engineer - SLT
> >>
> >> Interactions LLC
> >>
> >> o:  908-673-1833
> >>
> >> m: 908-451-6983
> >>
> >> lnicotra at interactions.com
> >>
> >> www.interactions.com
> >>
> >>
> *******************************************************************************
> >>
> >> This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject to
> copyright belonging to the Interactions LLC. This e-mail is intended solely
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient of this e-mail, you are hereby notified that
> any dissemination, distribution, copying, or action taken in relation to
> the contents of and attachments to this e-mail is strictly prohibited and
> may be unlawful. If you have received this e-mail in error, please notify
> the sender immediately and permanently delete the original and any copy of
> this e-mail and any printout. Thank You.
> >>
> >>
> *******************************************************************************
> >>
> >>
> >
> >
> > --
> >
> > Lou Nicotra
> >
> > IT Systems Engineer - SLT
> >
> > Interactions LLC
> >
> > o:  908-673-1833
> >
> > m: 908-451-6983
> >
> > lnicotra at interactions.com
> >
> > www.interactions.com
> >
> >
> *******************************************************************************
> >
> > This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject to
> copyright belonging to the Interactions LLC. This e-mail is intended solely
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient of this e-mail, you are hereby notified that
> any dissemination, distribution, copying, or action taken in relation to
> the contents of and attachments to this e-mail is strictly prohibited and
> may be unlawful. If you have received this e-mail in error, please notify
> the sender immediately and permanently delete the original and any copy of
> this e-mail and any printout. Thank You.
> >
> >
> *******************************************************************************
>
>

-- 

*Lou Nicotra*

IT Systems Engineer - SLT

Interactions LLC

o:  908-673-1833 <781-405-5114>

m: 908-451-6983 <781-405-5114>

*lnicotra at interactions.com <lnicotra at interactions.com>*
www.interactions.com

-- 





*******************************************************************************




This e-mail and any of its attachments may contain
Interactions LLC 
proprietary information, which is privileged,
confidential, or subject to 
copyright belonging to the Interactions
LLC. This e-mail is intended solely 
for the use of the individual or
entity to which it is addressed. If you 
are not the intended recipient of this
e-mail, you are hereby notified that 
any dissemination, distribution, copying,
or action taken in relation to 
the contents of and attachments to this e-mail
is strictly prohibited and 
may be unlawful. If you have received this e-mail in
error, please notify 
the sender immediately and permanently delete the original
and any copy of 
this e-mail and any printout. Thank You.  




******************************************************************************* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181204/d96b570b/attachment-0001.html>


More information about the slurm-users mailing list