[slurm-users] Backfill CPU jobs on GPU nodes

Barbara Krašovec barbara.krasovec at ijs.si
Fri Jul 19 07:11:26 UTC 2019

You could limit the resources with the QOS. It is not per node, but you
have some options:


Otherwise you could just enforce the limits per partition and put weight
on the nodes, so that the CPU nodes are allocated before the GPU nodes.

Have you checked the GPU scheduling in SLURM 19? It's much more
flexible. This is copied from the release notes:

>   -- Add select/cons_tres plugin, which offers similar functionality to cons_res
>      with far greater GPU scheduling flexibility.
>   -- Add GPU scheduling options to slurm.conf, available both globally and
>      per-partition: DefCpusPerGPU and DefMemPerGPU.
>   -- Add GPU scheduling options for salloc, sbatch and srun:
>     --cpus-per-gpu, -G/--gpus, --gpu-bind, --gpu-freq, --gpus-per-node,
>     --gpus-per-socket, --gpus-per-task and --mem-per-gpu.

We use gres for GPUs and backfill and have no problems running both CPU
and GPU jobs on the node.



On 7/18/19 6:06 PM, Daniel Vecerka wrote:
> Dears,
>  we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21
> CPU-only nodes. We have 3 partitions:
>   gpu: only gpu nodes,
>   cpu: only cpu nodes
>   longjobs: all nodes.
> Jobs in longjobs are with the lowest priority and can be preempted to
> suspend.   Our goal is to to allow using GPU nodes also for backfill
> CPU jobs. The problem is with CPU jobs which requires a lot memory.
> Those jobs can block GPU jobs in queue, because suspended jobs are not
> releasing memory and GPU jobs will not be started, even free GPUs are
> available.
> My question is:  Is there any partition or node option allowing to
> limit TRES memory but only on specific nodes? So  jobs in partition
> longjobs  with high memory requirements will be started only on CPU
> nodes and   on GPU nodes will be started only GPU jobs ( without
> memory limit) and CPU jobs bellow memory limit.
> Or in different way: Is there any way how to reserve some memory on
> GPU nodes only for jobs in gpu partition and which can't be used for
> jobs in longjobs partition?
> Thanks in advance,    Daniel Vecerka, CTU Prague

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190719/363099bf/attachment-0001.htm>

More information about the slurm-users mailing list