<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>You could limit the resources with the QOS. It is not per node,
but you have some options:<br>
</p>
<p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/qos.html#limits">https://slurm.schedmd.com/qos.html#limits</a></p>
<p>Otherwise you could just enforce the limits per partition and put
weight on the nodes, so that the CPU nodes are allocated before
the GPU nodes.</p>
<p>Have you checked the GPU scheduling in SLURM 19? It's much more
flexible. This is copied from the release notes:</p>
<pre><blockquote type="cite"><pre> -- Add select/cons_tres plugin, which offers similar functionality to cons_res
with far greater GPU scheduling flexibility.
-- Add GPU scheduling options to slurm.conf, available both globally and
per-partition: DefCpusPerGPU and DefMemPerGPU.
-- Add GPU scheduling options for salloc, sbatch and srun:
--cpus-per-gpu, -G/--gpus, --gpu-bind, --gpu-freq, --gpus-per-node,
--gpus-per-socket, --gpus-per-task and --mem-per-gpu.</pre>
</blockquote>
</pre>
<p>We use gres for GPUs and backfill and have no problems running
both CPU and GPU jobs on the node.<br>
</p>
<p>Cheers,</p>
<p>Barbara<br>
</p>
<div class="moz-cite-prefix">On 7/18/19 6:06 PM, Daniel Vecerka
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:8ce29086-c539-9471-6013-a21828c3a1e3@fel.cvut.cz">Dears,
<br>
<br>
we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21
CPU-only nodes. We have 3 partitions: <br>
gpu: only gpu nodes, <br>
cpu: only cpu nodes <br>
longjobs: all nodes. <br>
<br>
Jobs in longjobs are with the lowest priority and can be preempted
to suspend. Our goal is to to allow using GPU nodes also for
backfill CPU jobs. The problem is with CPU jobs which requires a
lot memory. Those jobs can block GPU jobs in queue, because
suspended jobs are not releasing memory and GPU jobs will not be
started, even free GPUs are available. <br>
<br>
My question is: Is there any partition or node option allowing to
limit TRES memory but only on specific nodes? So jobs in
partition longjobs with high memory requirements will be started
only on CPU nodes and on GPU nodes will be started only GPU jobs
( without memory limit) and CPU jobs bellow memory limit. <br>
<br>
Or in different way: Is there any way how to reserve some memory
on GPU nodes only for jobs in gpu partition and which can't be
used for jobs in longjobs partition? <br>
<br>
Thanks in advance, Daniel Vecerka, CTU Prague <br>
<br>
<br>
<br>
</blockquote>
<br>
</body>
</html>