[slurm-users] Backfill CPU jobs on GPU nodes
Daniel Vecerka
vecerka at fel.cvut.cz
Thu Jul 18 16:06:33 UTC 2019
Dears,
we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21
CPU-only nodes. We have 3 partitions:
gpu: only gpu nodes,
cpu: only cpu nodes
longjobs: all nodes.
Jobs in longjobs are with the lowest priority and can be preempted to
suspend. Our goal is to to allow using GPU nodes also for backfill CPU
jobs. The problem is with CPU jobs which requires a lot memory. Those
jobs can block GPU jobs in queue, because suspended jobs are not
releasing memory and GPU jobs will not be started, even free GPUs are
available.
My question is: Is there any partition or node option allowing to limit
TRES memory but only on specific nodes? So jobs in partition longjobs
with high memory requirements will be started only on CPU nodes and on
GPU nodes will be started only GPU jobs ( without memory limit) and CPU
jobs bellow memory limit.
Or in different way: Is there any way how to reserve some memory on GPU
nodes only for jobs in gpu partition and which can't be used for jobs in
longjobs partition?
Thanks in advance, Daniel Vecerka, CTU Prague
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3726 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190718/773495e9/attachment.bin>
More information about the slurm-users
mailing list