[slurm-users] Mixing GPU Types on Same Node

Thomas M. Payerle payerle at umd.edu
Wed Mar 29 18:39:19 UTC 2023


You can probably have a job submit lua script that looks at the --gpus flag
(and maybe the --gres=gpu:* flag as well) and force a GPU type.  A bit
complicated, and not sure if it will catch srun submissions.  I don't think
this is flexible enough to ensure they get the least powerful GPU among all
idle GPUs, but you can force it to default to the lowest GPU on the cluster
--- if nothing else this will force users who want more powerful GPUs to
explicitly give a GPU type

On Wed, Mar 29, 2023 at 2:31 PM <collin.m.mccarthy at gmail.com> wrote:

> Hello,
>
>
>
> Apologies if this is in the docs but I couldn’t find it anywhere.
>
>
>
> I’ve been using Slurm to run a small 7-node cluster in a research lab for
> a couple of years now (I’m a PhD student). A couple of our nodes have
> heterogenous GPU models. One in particular has quite a few: 2x NVIDIA
> A100s, 1x NVIDIA 3090, 2x NVIDIA GV100 w/ NVLink, 1x AMD MI100, 2x AMD
> MI200. This makes things a bit challenging but I need to work with what I
> have.
>
>
>
>    1. I’ve only been able to set this up previously on Slurm 20.02 by
>    “ignoring” the AMDs and just specifying the NVIDIA GPUs. That worked when
>    we had one or two people using the AMD GPUs and they could coordinate
>    between themselves. But now, we have more people interested. I’m upgrading
>    Slurm to 23.02 in hopes that might fix some of the challenges, but
>    should this be possible? Ideally I would like to have AutoDetect=nvml
>    and AutoDetect=rsmi both on. If it’s not I’ll shuffle GPUs around to
>    make this node NVIDIA-only.
>    2. I want everyone to allocate GPUs with --gpus=<type>:<num> instead
>    of --gpus=<num>, so they don’t “block” a nice GPU like an A100 when
>    they really wanted any-old GPU on the machine like a GV100 or 3090. Can I
>    force people to specify a GPU type and not just a count? This is especially
>    important if I’m mixing AMDs and NVIDIAs on the same node. If not, can I
>    specify the “order” in which I want GPUs to be scheduled if they don’t
>    specify a type (so they get handed out from least-powerful to most-powerful
>    if people don’t care)?
>
>
>
> Any help and/or advice here is much appreciated. Slurm has been amazing
> for our lab (albeit challenging to setup at first) and I want to get
> everything dialed before I graduate :D .
>
>
>
> Thanks,
>
> -Collin
>


-- 
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
5825 University Research Park               (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230329/928669b8/attachment.htm>


More information about the slurm-users mailing list