[slurm-users] Backfill CPU jobs on GPU nodes

Daniel Vecerka vecerka at fel.cvut.cz
Thu Jul 18 16:06:33 UTC 2019


Dears,

  we are using SLURM 18.08.6, we have 12 nodes with 4 x GPUs and 21 
CPU-only nodes. We have 3 partitions:
   gpu: only gpu nodes,
   cpu: only cpu nodes
   longjobs: all nodes.

Jobs in longjobs are with the lowest priority and can be preempted to 
suspend.   Our goal is to to allow using GPU nodes also for backfill CPU 
jobs. The problem is with CPU jobs which requires a lot memory. Those 
jobs can block GPU jobs in queue, because suspended jobs are not 
releasing memory and GPU jobs will not be started, even free GPUs are 
available.

My question is:  Is there any partition or node option allowing to limit 
TRES memory but only on specific nodes? So  jobs in partition longjobs  
with high memory requirements will be started only on CPU nodes and   on 
GPU nodes will be started only GPU jobs ( without memory limit) and CPU 
jobs bellow memory limit.

Or in different way: Is there any way how to reserve some memory on GPU 
nodes only for jobs in gpu partition and which can't be used for jobs in 
longjobs partition?

Thanks in advance,    Daniel Vecerka, CTU Prague



-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3726 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190718/773495e9/attachment.bin>


More information about the slurm-users mailing list