<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>You could limit the resources with the QOS. It is not per node,

      but you have some options:<br>

    </p>

    <p><a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/qos.html#limits">https://slurm.schedmd.com/qos.html#limits</a></p>

    <p>Otherwise you could just enforce the limits per partition and put

      weight on the nodes, so that the CPU nodes are allocated before

      the GPU nodes.</p>

    <p>Have you checked the GPU scheduling in SLURM 19? It's much more

      flexible. This is copied from the release notes:</p>

    <pre><blockquote type="cite"><pre>  -- Add select/cons_tres plugin, which offers similar functionality to cons_res

     with far greater GPU scheduling flexibility.

  -- Add GPU scheduling options to slurm.conf, available both globally and

     per-partition: DefCpusPerGPU and DefMemPerGPU.

  -- Add GPU scheduling options for salloc, sbatch and srun:

    --cpus-per-gpu, -G/--gpus, --gpu-bind, --gpu-freq, --gpus-per-node,

    --gpus-per-socket, --gpus-per-task and --mem-per-gpu.</pre>

</blockquote> 

</pre>

    <p>We use gres for GPUs and backfill and have no problems running

      both CPU and GPU jobs on the node.<br>

    </p>

    <p>Cheers,</p>

    <p>Barbara<br>

    </p>

    <div class="moz-cite-prefix">On 7/18/19 6:06 PM, Daniel Vecerka

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:8ce29086-c539-9471-6013-a21828c3a1e3@fel.cvut.cz">Dears,

      <br>

      <br>

       we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21

      CPU-only nodes. We have 3 partitions: <br>

        gpu: only gpu nodes, <br>

        cpu: only cpu nodes <br>

        longjobs: all nodes. <br>

      <br>

      Jobs in longjobs are with the lowest priority and can be preempted

      to suspend.   Our goal is to to allow using GPU nodes also for

      backfill CPU jobs. The problem is with CPU jobs which requires a

      lot memory. Those jobs can block GPU jobs in queue, because

      suspended jobs are not releasing memory and GPU jobs will not be

      started, even free GPUs are available. <br>

      <br>

      My question is:  Is there any partition or node option allowing to

      limit TRES memory but only on specific nodes? So  jobs in

      partition longjobs  with high memory requirements will be started

      only on CPU nodes and   on GPU nodes will be started only GPU jobs

      ( without memory limit) and CPU jobs bellow memory limit. <br>

      <br>

      Or in different way: Is there any way how to reserve some memory

      on GPU nodes only for jobs in gpu partition and which can't be

      used for jobs in longjobs partition? <br>

      <br>

      Thanks in advance,    Daniel Vecerka, CTU Prague <br>

      <br>

      <br>

      <br>

    </blockquote>

    <br>

  </body>

</html>