[slurm-users] CPU allocation for the GPU jobs.

Mon Jul 13 09:17:41 UTC 2020

Hi Team,

We have separate partitions for the GPU nodes and only CPU nodes .

scenario: the jobs submitted in our environment is 4CPU+1GPU  as well as
4CPU only in  nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted
and rest other jobs are in queue waiting for the availability of GPU
resources.the job submitted with only CPU is not going through even
though plenty of CPU resources are available but the job which is only
looking CPU, also on pend because of these GPU based jobs( priority of GPU
jobs is higher than CPU one).

Is there any option here we can do,so that when all GPU resources are
exhausted then it should allow the CPU jobs. Is there a way to deal with
it? or some custom solution which we can think of.  There is no issue with
CPU only partitions.

Below is the my slurm configuration file

NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10
RealMemory=128833 State=UNKNOWN
NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10
RealMemory=515954 Feature=HIGHMEM State=UNKNOWN
NodeName=node[28-32]  NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28
RealMemory=257389
NodeName=node[32-33]  NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24
RealMemory=773418
NodeName=node[17-27]  NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18
RealMemory=257687 Feature=K2200 Gres=gpu:2
NodeName=node[34]  NodeAddr=node34 Sockets=2 CoresPerSocket=24
RealMemory=773410 Feature=RTX Gres=gpu:8

PartitionName=node Nodes=node[1-10,14-16,28-33,35]  Default=YES
MaxTime=INFINITE State=UP Shared=YES
PartitionName=nodeGPUsmall Nodes=node[17-27]  Default=NO MaxTime=INFINITE
State=UP Shared=YES
PartitionName=nodeGPUbig Nodes=node[34]  Default=NO MaxTime=INFINITE
State=UP Shared=YES

Regards
Navin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200713/dc67ca85/attachment.htm>