<div dir="ltr">Hi Team,<div><br><div>We have separate partitions for the GPU nodes and only CPU nodes .<br></div></div><div><br></div><div>scenario: the jobs submitted in our environment is 4CPU+1GPU as well as 4CPU only in
nodeGPUsmall and nodeGPUbig. so when all the GPU exhausted and rest other jobs are in queue waiting for the availability of GPU resources.the job submitted with only CPU is not going through even though plenty of CPU resources are available but the job which is only looking CPU, also on pend because of these GPU based jobs( priority of GPU jobs is higher than CPU one). </div><div><br></div><div>Is there any option here we can do,so that when all GPU resources are exhausted then it should allow the CPU jobs. Is there a way to deal with it? or some custom solution which we can think of. There is no issue with CPU only partitions.</div><div><br></div><div>Below is the my slurm configuration file </div><div><br></div><div><br></div><div>NodeName=node[1-12] NodeAddr=node[1-12] Sockets=2 CoresPerSocket=10 RealMemory=128833 State=UNKNOWN<br>NodeName=node[13-16] NodeAddr=node[13-16] Sockets=2 CoresPerSocket=10 RealMemory=515954 Feature=HIGHMEM State=UNKNOWN<br>NodeName=node[28-32] NodeAddr=node[28-32] Sockets=2 CoresPerSocket=28 RealMemory=257389<br></div><div>NodeName=node[32-33] NodeAddr=node[32-33] Sockets=2 CoresPerSocket=24 RealMemory=773418</div><div>NodeName=node[17-27] NodeAddr=node[17-27] Sockets=2 CoresPerSocket=18 RealMemory=257687 Feature=K2200 Gres=gpu:2 </div><div>NodeName=node[34] NodeAddr=node34 Sockets=2 CoresPerSocket=24 RealMemory=773410 Feature=RTX Gres=gpu:8<br><br><br>PartitionName=node Nodes=node[1-10,14-16,28-33,35] Default=YES MaxTime=INFINITE State=UP Shared=YES <br>PartitionName=nodeGPUsmall Nodes=node[17-27] Default=NO MaxTime=INFINITE State=UP Shared=YES <br>PartitionName=nodeGPUbig Nodes=node[34] Default=NO MaxTime=INFINITE State=UP Shared=YES </div><div><br></div><div>Regards</div><div>Navin.<br><br><br></div></div>