[slurm-users] Specific limits over GRES - still relevant?
Matthias Leopold
matthias.leopold at meduniwien.ac.at
Thu Jul 1 14:43:05 UTC 2021
Hi,
I'm trying to prepare for using Slurm with DGX A100 systems with MIG
configuration. I will have several gres:gpu types there so I tried to
reproduce the situation described in "Specific limits over GRES" from
https://slurm.schedmd.com/resource_limits.html, but I can't.
In my test environment I have only 2 nodes with 1 vGPU each.
gres.conf has
* Name=gpu Type=V100D-8C
slurm conf has
* AccountingStorageTRES=gres/gpu:V100D-8C
* GresTypes=gpu
* NodeName=deepops-node1 Gres=gpu:V100D-8C:1
* NodeName=deepops-node2 Gres=gpu:V100D-8C:1
QOS v100d-8c has
* MaxTRES=gres/gpu:v100d-8c=1
srun -q v100d-8c -N2 --gres=gpu:v100d-8c:1 hostname
-> job is pending with "QOSMaxGRESPerJob", as expected
srun -q v100d-8c -N2 --gres=gpu:1 hostname
-> job is ALSO pending with "QOSMaxGRESPerJob"
According to https://slurm.schedmd.com/resource_limits.html in the
second srun example the resource limit shouldn't be enforced, so I would
have to use the lua job submit plugin. Is this documentation still
applicable in Slurm 20.11.7? Is my test configuration OK? Does the
"Specific limits over GRES" issue only appear when you have multiple
GRES types?
thanks for advice
Matthias
More information about the slurm-users
mailing list