[slurm-users] Specific limits over GRES - still relevant?
matthias.leopold at meduniwien.ac.at
Thu Jul 1 14:43:05 UTC 2021
I'm trying to prepare for using Slurm with DGX A100 systems with MIG
configuration. I will have several gres:gpu types there so I tried to
reproduce the situation described in "Specific limits over GRES" from
https://slurm.schedmd.com/resource_limits.html, but I can't.
In my test environment I have only 2 nodes with 1 vGPU each.
* Name=gpu Type=V100D-8C
slurm conf has
* NodeName=deepops-node1 Gres=gpu:V100D-8C:1
* NodeName=deepops-node2 Gres=gpu:V100D-8C:1
QOS v100d-8c has
srun -q v100d-8c -N2 --gres=gpu:v100d-8c:1 hostname
-> job is pending with "QOSMaxGRESPerJob", as expected
srun -q v100d-8c -N2 --gres=gpu:1 hostname
-> job is ALSO pending with "QOSMaxGRESPerJob"
According to https://slurm.schedmd.com/resource_limits.html in the
second srun example the resource limit shouldn't be enforced, so I would
have to use the lua job submit plugin. Is this documentation still
applicable in Slurm 20.11.7? Is my test configuration OK? Does the
"Specific limits over GRES" issue only appear when you have multiple
thanks for advice
More information about the slurm-users