[slurm-users] Slurm stopped obeying QOS limits - cant figure out why...
Aravindh Sampathkumar
aravindh at fastmail.com
Wed May 22 13:34:07 UTC 2019
Hello.
I'm running Slurm 18.08.1 and had configured limits for our users using QOS. The default QOS has the limits set. Most users belong to this.
# sacctmgr show qos
Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
normal 0 00:00:00 cluster 1.000000 cpu=72,mem=7+ 10000
nav 0 00:00:00 cluster 1.000000
eva 0 00:00:00 cluster 1.000000 cpu=18,mem=1+
emre-high 0 00:00:00 cluster 1.000000
Nothing has changed recently, and today, I noticed that the QOS limits which were working until now has silently stopped working. A user was able to submit jobs enough to saturate the cluster singlehandedly annoying other users.
There are no errors in slurmctld logs..
How can I go about troubleshooting this? Any suggestions welcome..
/ Aravindh
--
Aravindh Sampathkumar
aravindh at fastmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190522/53259a4e/attachment.html>
More information about the slurm-users
mailing list