<div dir="ltr"><div>Hi,</div><div><br></div><div>With regards to 2. If you're using AccountingStorageTres, I think you can specify each gres/gpu:<type> to be monitored in addition to the generic gres/gpu. And then have for all accounts "GrpTRES=gres/gpu=0" so they won't be able to use gres/gpu, but only gres/gpu:<type>.</div><div><br></div><div>We haven't tried this, but it's been on our todo list for a while now. So I'd like to know if it works :)<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 29 Mar 2023 at 21:31, <<a href="mailto:collin.m.mccarthy@gmail.com">collin.m.mccarthy@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg-9012447007750961708"><div style="overflow-wrap: break-word;" lang="EN-US"><div class="m_-9012447007750961708WordSection1"><p class="MsoNormal">Hello,<u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Apologies if this is in the docs but I couldn’t find it anywhere. <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">I’ve been using <span class="m_-9012447007750961708SpellE">Slurm</span> to run a small 7-node cluster in a research lab for a couple of years now (I’m a PhD student). A couple of our nodes have heterogenous GPU models. One in particular has quite a few: 2x NVIDIA A100s, 1x NVIDIA 3090, 2x NVIDIA GV100 w/ <span class="m_-9012447007750961708SpellE">NVLink</span>, 1x AMD MI100, 2x AMD MI200. This makes things a bit challenging but I need to work with what I have. <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><ol style="margin-top:0in" type="1" start="1"><li class="m_-9012447007750961708MsoListParagraph" style="margin-left:0in">I’ve only been able to set this up previously on <span class="m_-9012447007750961708SpellE">Slurm</span> 20.02 by “ignoring” the AMDs and just specifying the NVIDIA GPUs. That worked when we had one or two people using the AMD GPUs and they could coordinate between themselves. But now, we have more people interested. I’m upgrading <span class="m_-9012447007750961708SpellE">Slurm</span> to 23.02 in hopes that might fix some of the challenges, but should this be possible? Ideally I would like to have AutoDetect=<span class="m_-9012447007750961708SpellE">nvml</span> and AutoDetect=<span class="m_-9012447007750961708SpellE">rsmi</span> both on. If it’s not I’ll shuffle GPUs around to make this node NVIDIA-only.<u></u><u></u></li><li class="m_-9012447007750961708MsoListParagraph" style="margin-left:0in">I want everyone to allocate GPUs with --<span class="m_-9012447007750961708SpellE">gpus</span>=<type>:<num> instead of --<span class="m_-9012447007750961708SpellE">gpus</span>=<num>, so they don’t “block” a nice GPU like an A100 when they really wanted any-old GPU on the machine like a GV100 or 3090. Can I force people to specify a GPU type and not just a count? This is especially important if I’m mixing AMDs and NVIDIAs on the same node. If not, can I specify the “order” in which I want GPUs to be scheduled if they don’t specify a type (so they get handed out from least-powerful to most-powerful if people don’t care)? <u></u><u></u></li></ol><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Any help and/or advice here is much appreciated. <span class="m_-9012447007750961708SpellE">Slurm</span> has been amazing for our lab (albeit challenging to setup at first) and I want to get everything dialed before I graduate :D . <u></u><u></u></p><p class="MsoNormal"><u></u> <u></u></p><p class="MsoNormal">Thanks,<u></u><u></u></p><p class="MsoNormal">-Collin<u></u><u></u></p></div></div></div></blockquote></div><br clear="all"><br><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr">
<div>
<pre style="font-family:monospace"> <span style="color:rgb(133,12,27)">/|</span> |
<span style="color:rgb(133,12,27)">\/</span> | <span style="color:rgb(51,88,104);font-weight:bold">Yair Yarom </span><span style="color:rgb(51,88,104)">| System Group (DevOps)</span>
<span style="color:rgb(92,181,149)">[]</span> | <span style="color:rgb(51,88,104);font-weight:bold">The Rachel and Selim Benin School</span>
<span style="color:rgb(92,181,149)">[]</span> <span style="color:rgb(133,12,27)">/\</span> | <span style="color:rgb(51,88,104);font-weight:bold">of Computer Science and Engineering</span>
<span style="color:rgb(92,181,149)">[]</span><span style="color:rgb(0,161,146)">//</span><span style="color:rgb(133,12,27)">\</span><span style="color:rgb(133,12,27)">\</span><span style="color:rgb(49,154,184)">/</span> | <span style="color:rgb(51,88,104)">The Hebrew University of Jerusalem</span>
<span style="color:rgb(92,181,149)">[</span><span style="color:rgb(1,84,76)">/</span><span style="color:rgb(0,161,146)">/</span> <span style="color:rgb(41,16,22)">\</span><span style="color:rgb(41,16,22)">\</span> | <span style="color:rgb(51,88,104)">T +972-2-5494522 | F +972-2-5494522</span>
<span style="color:rgb(1,84,76)">//</span> <span style="color:rgb(21,122,134)">\</span> | <span style="color:rgb(51,88,104)"><a href="mailto:irush@cs.huji.ac.il" target="_blank">irush@cs.huji.ac.il</a></span>
<span style="color:rgb(127,130,103)">/</span><span style="color:rgb(1,84,76)">/</span> |
</pre>
</div>
</div></div>