[slurm-users] GPUs as resources which SLURM can control
Nicholas Yue
yue.nicholas at gmail.com
Thu Mar 21 02:23:00 UTC 2019
Thanks Michael. I noticed a couple of questions on the mailing list
mentioning GRES lately. I will share that information to our SLURM
administrators.
Cheers
On Thu, 21 Mar 2019 at 12:56, Renfro, Michael <Renfro at tntech.edu> wrote:
> I think all you’re looking for is Generic Resource (GRES) scheduling,
> starting at https://slurm.schedmd.com/gres.html — if you’ve already seen
> that, then more details would be helpful.
>
> If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run
> up to 4 of those jobs and leave the rest pending.
>
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601 / Tennessee Tech University
>
> > On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicholas at gmail.com>
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi,
> >
> > I am new to SLURM.
> >
> > I have access to a cluster where one of the node has 4 GPUs
> >
> > We are running version SLURM 17.11.12
> >
> > Is there some SBATCH token=value pair value I can use to submit jobs
> (each of which has an application that is only able to utilize 1 GPU) so
> that if I submit 6 copies, 4 copies will be dispatched and the 2 remaining
> will be in a state e.g. PD, until a GPU frees up
> >
> >
> +-----------------------------------------------------------------------------+
> > | NVIDIA-SMI 396.44 Driver Version: 396.44
> |
> >
> |-------------------------------+----------------------+----------------------+
> > | GPU Name Persistence-M| Bus-Id Disp.A | Volatile
> Uncorr. ECC |
> > | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
> Compute M. |
> >
> |===============================+======================+======================|
> > | 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off |
> 0 |
> > | N/A 29C P0 26W / 250W | 0MiB / 16280MiB | 0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > | 1 Tesla P100-PCIE... On | 00000000:59:00.0 Off |
> 0 |
> > | N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > | 2 Tesla P100-PCIE... On | 00000000:6D:00.0 Off |
> 0 |
> > | N/A 27C P0 26W / 250W | 0MiB / 16280MiB | 0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > | 3 Tesla P100-PCIE... On | 00000000:99:00.0 Off |
> 0 |
> > | N/A 31C P0 26W / 250W | 0MiB / 16280MiB | 0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> >
> >
> > Cheers
> > --
> > Nicholas Yue
> > Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
> > Custom Dev - C++ porting, OSX, Linux, Windows
> > http://au.linkedin.com/in/nicholasyue
> > https://vimeo.com/channels/naiadtools
>
>
--
Nicholas Yue
Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue
https://vimeo.com/channels/naiadtools
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190321/7a597062/attachment-0001.html>
More information about the slurm-users
mailing list