[slurm-users] Slurm stopped obeying QOS limits - cant figure out why...

Aravindh Sampathkumar aravindh at fastmail.com
Wed May 22 13:34:07 UTC 2019


Hello.

I'm running Slurm 18.08.1 and had configured limits for our users using QOS. The default QOS has the limits set. Most users belong to this. 

# sacctmgr show qos
 Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
 normal 0 00:00:00 cluster 1.000000 cpu=72,mem=7+ 10000 
 nav 0 00:00:00 cluster 1.000000 
 eva 0 00:00:00 cluster 1.000000 cpu=18,mem=1+ 
emre-high 0 00:00:00 cluster 1.000000


Nothing has changed recently, and today, I noticed that the QOS limits which were working until now has silently stopped working. A user was able to submit jobs enough to saturate the cluster singlehandedly annoying other users.

There are no errors in slurmctld logs..

How can I go about troubleshooting this? Any suggestions welcome..

/ Aravindh



-- 
 Aravindh Sampathkumar
 aravindh at fastmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190522/53259a4e/attachment.html>


More information about the slurm-users mailing list