<div dir="ltr"><div>Hi,</div><div><br></div><div>We have a similar configuration, very heterogeneous cluster and cons_tres. Users need to specify the CPU/memory/GPU/time, and it will schedule their job somewhere. Indeed there's currently no guarantee that you won't be left with a node with unusable GPUs because no CPUs or memory are available.<br></div><div></div><div><br></div><div>We use one partition with 100% of the nodes and a time limit of 2 days, and a second partition with ~90% of the nodes and a limit of 7 days. This gives shorter jobs a chance to run without waiting just for long jobs.</div><div><br></div><div>We also use weights for the nodes, such that smaller nodes (resource-wise) will be selected first. This prevents smaller jobs from filling up the larger nodes (unless previous smaller nodes are occupied).</div><div><br></div><div>HTH,</div><div> Yair.<br></div><div><br></div><div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 8, 2021 at 1:41 PM Ansgar Esztermann-Kirchner <<a href="mailto:aeszter@mpibpc.mpg.de" target="_blank">aeszter@mpibpc.mpg.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello List,<br>
<br>
we're running a heterogeneous cluster (just x86_64, but a lot of<br>
different node types from 8 to 64 HW threads, 1 to 4 GPUs).<br>
Our processing power (for our main application, at least) is <br>
exclusively provided by the GPUs, so cons_tres looks quite promising:<br>
depending on the size of the job, request an appropriate number of<br>
GPUs. Of course, you have to request some CPUs as well -- ideally,<br>
evenly distributed among the GPUs (e.g. 10 per GPU on a 20-core, 2-GPU<br>
node; 16 on a 64-core, 4-GPU node).<br>
Of course, one could use different partitions for different nodes, and<br>
then submit individual jobs with CPU requests tailored to one such<br>
partition, but I'd prefer a more flexible approach where a given job<br>
could run on any large enough node.<br>
<br>
Is there anyone with a similar setup? Any config options I've missed,<br>
or do you have a work-around?<br>
<br>
Thanks,<br>
<br>
A.<br>
<br>
-- <br>
Ansgar Esztermann<br>
Sysadmin Dep. Theoretical and Computational Biophysics<br>
<a href="http://www.mpibpc.mpg.de/grubmueller/esztermann" rel="noreferrer" target="_blank">http://www.mpibpc.mpg.de/grubmueller/esztermann</a><br>
</blockquote></div><br clear="all"><br>-- <br><div dir="ltr"><div dir="ltr">
<div>
<pre style="font-family:monospace"> <span style="color:rgb(133,12,27)">/|</span> |
<span style="color:rgb(133,12,27)">\/</span> | <span style="color:rgb(51,88,104);font-weight:bold">Yair Yarom </span><span style="color:rgb(51,88,104)">| System Group (DevOps)</span>
<span style="color:rgb(92,181,149)">[]</span> | <span style="color:rgb(51,88,104);font-weight:bold">The Rachel and Selim Benin School</span>
<span style="color:rgb(92,181,149)">[]</span> <span style="color:rgb(133,12,27)">/\</span> | <span style="color:rgb(51,88,104);font-weight:bold">of Computer Science and Engineering</span>
<span style="color:rgb(92,181,149)">[]</span><span style="color:rgb(0,161,146)">//</span><span style="color:rgb(133,12,27)">\</span><span style="color:rgb(133,12,27)">\</span><span style="color:rgb(49,154,184)">/</span> | <span style="color:rgb(51,88,104)">The Hebrew University of Jerusalem</span>
<span style="color:rgb(92,181,149)">[</span><span style="color:rgb(1,84,76)">/</span><span style="color:rgb(0,161,146)">/</span> <span style="color:rgb(41,16,22)">\</span><span style="color:rgb(41,16,22)">\</span> | <span style="color:rgb(51,88,104)">T +972-2-5494522 | F +972-2-5494522</span>
<span style="color:rgb(1,84,76)">//</span> <span style="color:rgb(21,122,134)">\</span> | <span style="color:rgb(51,88,104)"><a href="mailto:irush@cs.huji.ac.il" target="_blank">irush@cs.huji.ac.il</a></span>
<span style="color:rgb(127,130,103)">/</span><span style="color:rgb(1,84,76)">/</span> |
</pre>
</div>
</div></div></div>