<div dir="ltr"><div>This is related to this other thread: <a href="https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ" target="_blank">https://groups.google.com/g/slurm-users/c/88pZ400whu0/m/9FYFqKh6AQAJ</a></div><div>AFAIK, the only rudimentary solution is the MaxCPUsPerNode partition flag, and setting independent gpu and cpu partitions, but having something like "CpusReservedPerGpu" would be nice.</div><div><br></div><div>@Aaron would you be willing to share such a script?<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">El mié., 21 oct. 2020 a las 0:01, Relu Patrascu (<<a href="mailto:relu@cs.toronto.edu" target="_blank">relu@cs.toronto.edu</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

I thought of doing this, but, I'm guessing you don't have preemption <br>

enabled.<br>

<br>

With preemption enabled this becomes more complicated, and error prone, but<br>

<br>

I'll think some more about it. It'd be nice leverage slurm's scheduling <br>

engine and<br>

<br>

just add this constraint.<br>

<br>

Relu<br>

<br>

On 2020-10-20 16:20, Aaron Jackson wrote:<br>

> I look after a very heterogeneous GPU Slurm setup and some nodes have<br>

> quite few cores. We use a job_submit lua script which calculates the<br>

> number of requested cpu cores per gpu. This is then used to scan through<br>

> a table of 'weak nodes' based on a 'max cores per gpu' property. The<br>

> node names are appended to the job desc exc_nodes property.<br>

><br>

> It's not particularly elegant but it does work quite well for us.<br>

><br>

> Aaron<br>

><br>

><br>

> On 20 October 2020 at 18:17 BST, Relu Patrascu wrote:<br>

><br>

>> Hi all,<br>

>><br>

>> We have a GPU cluster and have run into this issue occasionally. Assume<br>

>> four GPUs per node; when a user requests a GPU on such a node, and all<br>

>> the cores, or all the RAM, the other three GPUs will be wasted for the<br>

>> duration of the job, as slurm has no more cores or RAM available to<br>

>> allocate those GPUs to subsequent jobs.<br>

>><br>

>><br>

>> We have a "soft" solution to this, but it's not ideal. That is, we<br>

>> assigned large TresBillingWeights to cpu consumption, thus discouraging<br>

>> users to allocate many CPUs.<br>

>><br>

>><br>

>> Ideal for us would be to be able to define a number of CPUs to always be<br>

>> available on a node, for each GPU. Would help to a similar feature for<br>

>> an amount of RAM.<br>

>><br>

>><br>

>> Take for example a node that has:<br>

>><br>

>> * four GPUs<br>

>><br>

>> * 16 CPUs<br>

>><br>

>><br>

>> Let's assume that most jobs would work just fine with a minimum number<br>

>> of 2 CPUs per GPU. Then we could set in the node definition a variable<br>

>> such as<br>

>><br>

>>    CpusReservedPerGpu = 2<br>

>><br>

>> The first job to run on this node could get between 2 and 10 CPUs, thus<br>

>> 6 CPUs remaining for potential incoming jobs (2 per GPU).<br>

>><br>

>><br>

>> We couldn't find a way to do this, are we missing something? We'd rather<br>

>> not modify the source code again :/<br>

>><br>

>> Regards,<br>

>><br>

>> Relu<br>

><br>

<br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div style="font-size:12.8px">Stephan Schott Verdugo<br></div><span style="font-size:12.8px">Biochemist</span><br style="font-size:12.8px"><div style="font-size:12.8px"><br>Heinrich-Heine-Universitaet Duesseldorf<br>Institut fuer Pharm. und Med. Chemie<br>Universitaetsstr. 1<br>40225 Duesseldorf<br>Germany</div></div></div>