[slurm-users] Quickly throttling/limiting a specific user's jobs
Ransom, Geoffrey M.
Geoffrey.Ransom at jhuapl.edu
Tue Sep 22 21:33:50 UTC 2020
Hello
We had a user post a large number of array jobs with a short actual run time (20-80 seconds, but mostly to the low end) and slurmctld was falling behind on RPC calls trying to handle the jobs. It was a bit awkward trying to slap arraytaskthrottle=5 on each of the queued array jobs while slurmctld was having issues handling the RPC load.
I'm looking to make a QOS with MaxJobsPerUser=50 set that I can quickly add to a user to throttle their jobs but..
1) Adding a QOS to the user does not affect queued jobs so I still have to get all of the users jobids and modify each on directly.
2) I queued up a test job with the QOS set and it is still running 100 jobs at a time (what I set arraytaskthrottle to in the job) and not limiting the "user" to 50 jobs.
3) I tried adding the FLAG OverPartQOS to see if that changed the behavior, but it did not seem to do anything. My test cluster I ran this on doesn't have any other QOSes defined but our production cluster does have a partition QOS in place limiting single users to about 80% of the CPUs with MaxTRESPerUser.
Is there a quick way to limit how many jobs a specific user can run at one time on the cluster or in a partition if we need to throttle them back in an emergency but we don't want to flat out kill their jobs?
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200922/d2dc6d84/attachment.htm>
More information about the slurm-users
mailing list