[slurm-users] GPUs as resources which SLURM can control

Nicholas Yue yue.nicholas at gmail.com
Thu Mar 21 02:23:00 UTC 2019


Thanks Michael. I noticed a couple of questions on the mailing list
mentioning GRES lately. I will share that information to our SLURM
administrators.

Cheers

On Thu, 21 Mar 2019 at 12:56, Renfro, Michael <Renfro at tntech.edu> wrote:

> I think all you’re looking for is Generic Resource (GRES) scheduling,
> starting at https://slurm.schedmd.com/gres.html — if you’ve already seen
> that, then more details would be helpful.
>
> If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run
> up to 4 of those jobs and leave the rest pending.
>
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology
> Services
> 931 372-3601     / Tennessee Tech University
>
> > On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicholas at gmail.com>
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > Hi,
> >
> >   I am new to SLURM.
> >
> >   I have access to a cluster where one of the node has 4 GPUs
> >
> >   We are running version SLURM 17.11.12
> >
> >   Is there some SBATCH token=value pair value I can use to submit jobs
> (each of which has an application that is only able to utilize 1 GPU) so
> that if I submit 6 copies, 4 copies will be dispatched and the 2 remaining
> will be in a state e.g. PD, until a GPU frees up
> >
> >
> +-----------------------------------------------------------------------------+
> > | NVIDIA-SMI 396.44                 Driver Version: 396.44
>       |
> >
> |-------------------------------+----------------------+----------------------+
> > | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile
> Uncorr. ECC |
> > | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util
> Compute M. |
> >
> |===============================+======================+======================|
> > |   0  Tesla P100-PCIE...  On   | 00000000:25:00.0 Off |
>     0 |
> > | N/A   29C    P0    26W / 250W |      0MiB / 16280MiB |      0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > |   1  Tesla P100-PCIE...  On   | 00000000:59:00.0 Off |
>     0 |
> > | N/A   26C    P0    26W / 250W |      0MiB / 16280MiB |      0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > |   2  Tesla P100-PCIE...  On   | 00000000:6D:00.0 Off |
>     0 |
> > | N/A   27C    P0    26W / 250W |      0MiB / 16280MiB |      0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> > |   3  Tesla P100-PCIE...  On   | 00000000:99:00.0 Off |
>     0 |
> > | N/A   31C    P0    26W / 250W |      0MiB / 16280MiB |      0%
> Default |
> >
> +-------------------------------+----------------------+----------------------+
> >
> >
> > Cheers
> > --
> > Nicholas Yue
> > Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
> > Custom Dev - C++ porting, OSX, Linux, Windows
> > http://au.linkedin.com/in/nicholasyue
> > https://vimeo.com/channels/naiadtools
>
>

-- 
Nicholas Yue
Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue
https://vimeo.com/channels/naiadtools
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190321/7a597062/attachment-0001.html>


More information about the slurm-users mailing list