[slurm-users] Specific limits over GRES - still relevant?

Thu Jul 1 14:43:05 UTC 2021

Hi,

I'm trying to prepare for using Slurm with DGX A100 systems with MIG 
configuration. I will have several gres:gpu types there so I tried to 
reproduce the situation described in "Specific limits over GRES" from 
https://slurm.schedmd.com/resource_limits.html, but I can't.

In my test environment I have only 2 nodes with 1 vGPU each.

gres.conf has
* Name=gpu Type=V100D-8C
slurm conf has
* AccountingStorageTRES=gres/gpu:V100D-8C
* GresTypes=gpu
* NodeName=deepops-node1  Gres=gpu:V100D-8C:1
* NodeName=deepops-node2  Gres=gpu:V100D-8C:1
QOS v100d-8c has
* MaxTRES=gres/gpu:v100d-8c=1

srun -q v100d-8c -N2 --gres=gpu:v100d-8c:1 hostname
-> job is pending with "QOSMaxGRESPerJob", as expected

srun -q v100d-8c -N2 --gres=gpu:1 hostname
-> job is ALSO pending with "QOSMaxGRESPerJob"

According to https://slurm.schedmd.com/resource_limits.html in the 
second srun example the resource limit shouldn't be enforced, so I would 
have to use the lua job submit plugin. Is this documentation still 
applicable in Slurm 20.11.7? Is my test configuration OK? Does the 
"Specific limits over GRES" issue only appear when you have multiple 
GRES types?

thanks for advice
Matthias