[slurm-users] Strict GrpTRESMins limit

17 Jan 2024


      Dear All,
I tried to implement a strict limit on the GrpTRESMins for
each user. The effect I'm trying to achieve is that after the
limit of GPU minutes is reached, no new jobs can be run.
No decay, no automatic resource replenishment. After the
limit on GPU minutes is reached, each user should ask for
more minutes.
But despite exceeding the limits users *can* run new jobs.
* When I'm adding a user to the cluster I set:
sacctmgr --immediate add user name=...
   ...
   QOS=2gpu2d
   GrpTRESMins=gres/gpu=20000
* In the "slurm.conf" ("safe" means limits and associations
   are automatically set). Storage is MariaDB with SlurmDBD:
GresTypes=gpu
   AccountingStorageTRES=gres/gpu
   AccountingStorageEnforce=qos,safe
   # This disables GPU minutes replenishing.
   PriorityDecayHalfLife=0
   PriorityUsageResetPeriod=NONE
But when I look at a user's account info and usage, you can
see that the limits are not enforced.
Account             User    Partition          QOS          GrpTRESMins
---------- ---------------- ------------ ------------ --------------------
        redacted      redacted        a6000       2gpu2d 
gres/gpu=10000
--------------------------------------------------------------------------------
Top 1 Users 2024-01-05T00:00:00 - 2024-01-17T19:59:59 (1108800 secs)
Usage reported in TRES Minutes
--------------------------------------------------------------------------------
        Login     Used        TRES Name
------------ -------- ----------------
  redacted     184311         gres/gpu
  redacted     1558558              cpu
Could someone explain, where could the problem be? Am I missing
something? Apparently yes :)
Kind regards
-- 
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]

2025

2024

[slurm-users] Strict GrpTRESMins limit