[slurm-users] Longer queuing times for larger jobs

Loris Bennett loris.bennett at fu-berlin.de
Fri Jan 31 13:04:15 UTC 2020

Hi David,
David Baker <D.J.Baker at soton.ac.uk> writes:

> Hello,
> Our SLURM cluster is relatively small. We have 350 standard compute
> nodes each with 40 cores. The largest job that users can run on the
> partition is one requesting 32 nodes. Our cluster is a general
> university research resource and so there are many different sizes of
> jobs ranging from single core jobs, that get routed to a serial
> partition via the job-submit.lua, through to jobs requesting 32
> nodes. When we first started the service, 32 node jobs were typically
> taking in the region of 2 days to schedule -- recently queuing times
> have started to get out of hand. Our setup is essentially...
> PriorityFavorSmall=NO 
> FairShareDampeningFactor=5
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityWeightAge=400000
> PriorityWeightPartition=1000
> PriorityWeightJobSize=500000
> PriorityWeightQOS=1000000
> PriorityMaxAge=7-0
> To try to reduce the queuing times for our bigger jobs should we
> potentially increase the PriorityWeightJobSize factor in the first
> instance to bump up the priority of such jobs? Or should we
> potentially define a set of QOSs which we assign to jobs in our
> job_submit.lua depending on the size of the job. In other words, let's
> say there is large QOS that give the largest jobs a higher priority,
> and also limits how many of those jobs that a single user can submit?
> Your advice would be appreciated, please. At the moment these large
> jobs are not accruing a sufficiently high priority to rise above the
> other jobs in the cluster.

We have always gone for the weighting approach, rather than the QOS
routing one.  I have always thought that QOS routing potentially takes
away some of the users' freedom unnecessarily.  What if some one wants
to submit a large number of 32-node jobs and is perfectly happy to wait
a (long) while?  We have QOSs with higher priorities, but with
restricted MaxWall, MaxJobs, MaxSubmit, MaxTRESPU, and users have to
request them explicitly.



Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de

More information about the slurm-users mailing list