[slurm-users] NVIDIA MIG question

Yair Yarom irush at cs.huji.ac.il
Thu Nov 17 13:19:32 UTC 2022


Can you request more than 7 single gpu jobs on the same node?
It could be that there's another limit you've encountered (e.g. memory or
cpu), or some other limit (in the account, partition, or qos)

On our setup we're limiting jobs to 1 gpu per job (via partition qos),
however we can use up all the MIGs with single gpu jobs.


On Wed, 16 Nov 2022 at 23:48, Groner, Rob <rug262 at psu.edu> wrote:

> That does help, thanks for the extra info.
>
> If I have two separate GPU cards in the node, and I setup 7 MIGs on each
> card, for a total of 14 MIG "gpus" in the node...then, SHOULD I be able to
> salloc requesting, say 10 GPUs (7 from 1 card, 3 from the other)?  Because
> I can't.
>
> I can request up to 7 just fine.  When I request more than that, it adds
> in other nodes to try to give me that, even though there are theoretically
> 14 on the one node.  When I ask for 8, it gives me 7 from t-gc-1202 and
> then 1 from t-gc-1201.  When I ask for 10, then it fails because it can't
> give me 10 without using 2 cards in one node.
>
>
> [rug262 at testsch ~ ]# sinfo -o "%20N  %10c  %10m  %25f  %50G "
> NODELIST              CPUS        MEMORY      AVAIL_FEATURES
> GRES
> t-gc-1201             48          358400      3gc20gb
>  gpu:nvidia_a100_3g.20gb:4(S:0)
> t-gc-1202             48          358400      1gc5gb
> gpu:nvidia_a100_1g.5gb:14(S:0)
>
>
> [rug262 at testsch (RC) ~] salloc --gpus=10 --account=1gc5gb
> --partition=sla-prio
> salloc: Job allocation 5015 has been revoked.
> salloc: error: Job submit/allocate failed: Requested node configuration is
> not available
>
>
> Rob
>
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Yair Yarom <irush at cs.huji.ac.il>
> *Sent:* Wednesday, November 16, 2022 3:48 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] NVIDIA MIG question
>
> You don't often get email from irush at cs.huji.ac.il. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
> Hi,
>
> From what we observed, Slurm sees the MIGs each as a distinct gres/gpu. So
> you can have 14 jobs each using a different MIG.
> However (unless something has changed in the past year), due to nvidia
> limitations, a single process can't access more than one MIG simultaneously
> (this is unrelated to Slurm). So while you can have a user request a Slurm
> job with 2 gpus (MIGs), they'll have to run two distinct processes within
> that job in order to utilize those two MIGs.
>
> HTH,
>
>
> On Tue, 15 Nov 2022 at 23:42, Laurence <laurence.field at cern.ch> wrote:
>
> Hi Rob,
>
>
> Yes, those questions make sense. From what I understand, MIG should
> essentially split the GPU so that they behave as separate cards. Hence two
> different users should be able to use two different MIG instances at the
> same time and also a single job could use all 14 instances. The result you
> observed suggests that MIG is a feature of the driver i.e lspci shows one
> device but nvidia-smi shows 7 devices.
>
>
> I haven't played around with this myself in slurm but would be interested
> to know the answers.
>
>
> Laurence
>
>
> On 15/11/2022 17:46, Groner, Rob wrote:
>
> We have successfully used the nvidia-smi tool to take the 2 A100's in a
> node and split them into multiple GPU devices.  In one case, we split the 2
> GPUS into 7 MIG devices each, so 14 in that node total, and in the other
> case, we split the 2 GPUs into 2 MIG devices each, so 4 total in the node.
>
> From our limited testing so far, and from the "sinfo" output, it appears
> that slurm might be considering all of the MIG devices on the node to be in
> the same socket (even though the MIG devices come from two separate
> graphics cards in the node).  The sinfo output says (S:0) after the 14
> devices are shown, indicating they're in socket 0.  That seems to be
> preventing 2 different users from using MIG devices at the same time.  Am I
> wrong that having 14 MIG gres devices show up in slurm should mean that, in
> theory, 14 different users could use one at the same time?
>
> Even IF that doesn't work....if I have 14 devices spread across 2 physical
> GPU cards, can one user utilize all 14 for a single job?  I would hope that
> slurm would treat each of the MIG devices as its own separate card, which
> would mean 14 different jobs could run at the same time using their own
> particular MIG, right?
>
> Do those questions make sense to anyone?  🙂
>
> Rob
>
>
>
>
> --
>
>   /|       |
>   \/       | Yair Yarom | System Group (DevOps)
>   []       | The Rachel and Selim Benin School
>   [] /\    | of Computer Science and Engineering
>   []//\\/  | The Hebrew University of Jerusalem
>   [//  \\  | T +972-2-5494522 | F +972-2-5494522
>   //    \  | irush at cs.huji.ac.il
>  //        |
>
>

-- 

  /|       |
  \/       | Yair Yarom | System Group (DevOps)
  []       | The Rachel and Selim Benin School
  [] /\    | of Computer Science and Engineering
  []//\\/  | The Hebrew University of Jerusalem
  [//  \\  | T +972-2-5494522 | F +972-2-5494522
  //    \  | irush at cs.huji.ac.il
 //        |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221117/9f53c653/attachment.htm>


More information about the slurm-users mailing list