<div dir="ltr"><div>Thanks Michael. I noticed a couple of questions on the mailing list mentioning GRES lately. I will share that information to our SLURM administrators.</div><div><br></div><div>Cheers<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 21 Mar 2019 at 12:56, Renfro, Michael <<a href="mailto:Renfro@tntech.edu">Renfro@tntech.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I think all you’re looking for is Generic Resource (GRES) scheduling, starting at <a href="https://slurm.schedmd.com/gres.html" rel="noreferrer" target="_blank">https://slurm.schedmd.com/gres.html</a> — if you’ve already seen that, then more details would be helpful.<br>
<br>
If it all works correctly, then ‘sbatch --gres=gpu scriptname’ should run up to 4 of those jobs and leave the rest pending.<br>
<br>
-- <br>
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services<br>
931 372-3601 / Tennessee Tech University<br>
<br>
> On Mar 20, 2019, at 6:05 PM, Nicholas Yue <<a href="mailto:yue.nicholas@gmail.com" target="_blank">yue.nicholas@gmail.com</a>> wrote:<br>
> <br>
> External Email Warning<br>
> This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests.<br>
> Hi,<br>
> <br>
> I am new to SLURM.<br>
> <br>
> I have access to a cluster where one of the node has 4 GPUs<br>
> <br>
> We are running version SLURM 17.11.12<br>
> <br>
> Is there some SBATCH token=value pair value I can use to submit jobs (each of which has an application that is only able to utilize 1 GPU) so that if I submit 6 copies, 4 copies will be dispatched and the 2 remaining will be in a state e.g. PD, until a GPU frees up<br>
> <br>
> +-----------------------------------------------------------------------------+<br>
> | NVIDIA-SMI 396.44 Driver Version: 396.44 |<br>
> |-------------------------------+----------------------+----------------------+<br>
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |<br>
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |<br>
> |===============================+======================+======================|<br>
> | 0 Tesla P100-PCIE... On | 00000000:25:00.0 Off | 0 |<br>
> | N/A 29C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> | 1 Tesla P100-PCIE... On | 00000000:59:00.0 Off | 0 |<br>
> | N/A 26C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> | 2 Tesla P100-PCIE... On | 00000000:6D:00.0 Off | 0 |<br>
> | N/A 27C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> | 3 Tesla P100-PCIE... On | 00000000:99:00.0 Off | 0 |<br>
> | N/A 31C P0 26W / 250W | 0MiB / 16280MiB | 0% Default |<br>
> +-------------------------------+----------------------+----------------------+<br>
> <br>
> <br>
> Cheers<br>
> -- <br>
> Nicholas Yue<br>
> Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5<br>
> Custom Dev - C++ porting, OSX, Linux, Windows<br>
> <a href="http://au.linkedin.com/in/nicholasyue" rel="noreferrer" target="_blank">http://au.linkedin.com/in/nicholasyue</a><br>
> <a href="https://vimeo.com/channels/naiadtools" rel="noreferrer" target="_blank">https://vimeo.com/channels/naiadtools</a><br>
<br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Nicholas Yue<br>Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5<br>Custom Dev - C++ porting, OSX, Linux, Windows<br><a href="http://au.linkedin.com/in/nicholasyue" target="_blank">http://au.linkedin.com/in/nicholasyue</a><br><a href="https://vimeo.com/channels/naiadtools" target="_blank">https://vimeo.com/channels/naiadtools</a><br></div></div>