[slurm-users] GPUs as resources which SLURM can control
Renfro, Michael
Renfro at tntech.edu
Thu Mar 21 01:53:39 UTC 2019
I think all you’re looking for is Generic Resource (GRES) scheduling, starting at https://slurm.schedmd.com/gres.html — if you’ve already seen that, then more details would be helpful.
If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to 4 of those jobs and leave the rest pending.
--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University
> On Mar 20, 2019, at 6:05 PM, Nicholas Yue <yue.nicholas at gmail.com> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.
> Hi,
>
> I am new to SLURM.
>
> I have access to a cluster where one of the node has 4 GPUs
>
> We are running version SLURM 17.11.12
>
> Is there some SBATCH token=value pair value I can use to submit jobs (each of which has an application that is only able to utilize 1 GPU) so that if I submit 6 copies, 4 copies will be dispatched and the 2 remaining will be in a state e.g. PD, until a GPU frees up
>
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 396.44 Driver Version: 396.44 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> |===============================+======================+======================|
> | 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | 0 |
> | N/A 29C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
> | 1 Tesla P100-PCIE... On | 00000000:59:00.0 Off | 0 |
> | N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
> | 2 Tesla P100-PCIE... On | 00000000:6D:00.0 Off | 0 |
> | N/A 27C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
> | 3 Tesla P100-PCIE... On | 00000000:99:00.0 Off | 0 |
> | N/A 31C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |
> +-------------------------------+----------------------+----------------------+
>
>
> Cheers
> --
> Nicholas Yue
> Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
> Custom Dev - C++ porting, OSX, Linux, Windows
> http://au.linkedin.com/in/nicholasyue
> https://vimeo.com/channels/naiadtools
More information about the slurm-users
mailing list