[slurm-users] Effect of PriorityMaxAge on job throughput

Michael Gutteridge michael.gutteridge at gmail.com
Tue Apr 9 17:59:54 UTC 2019

It might be useful to include the various priority factors you've got
configured.  The fact that adjusting PriorityMaxAge had a dramatic effect
suggests that the age factor is pretty high- might be worth looking at that
value relative to the other factors.

Have you looked at PriorityWeightJobSize?  Might have some utility if
you're finding large jobs getting short-shrift.

 - Michael

On Tue, Apr 9, 2019 at 2:01 AM David Baker <D.J.Baker at soton.ac.uk> wrote:

> Hello,
> I've finally got the job throughput/turnaround to be reasonable in our
> cluster. Most of the time the job activity on the cluster sets the default
> QOS to 32 nodes (there are 464 nodes in the default queue). Jobs requesting
> nodes close to the QOS level (for example 22 nodes) are scheduled within 24
> hours which is better than it has been. Still I suspect there is room for
> improvement. I note that these large jobs still struggle to be given a
> starttime, however many jobs are now being given a starttime following my
> SchedulerParameters makeover.
> I used advice from the mailing list and the Slurm high throughput document
> to help me make changes to the scheduling parameters. They are now...
> SchedulerParameters=assoc_limit_continue,batch_sched_delay=20,bf_continue,bf_interval=300,bf_min_age_reserve=10800,bf_window=3600,bf_resolution=600,bf_yield_interval=1000000,partition_job_depth=500,sched_max_job_start=200,sched_min_interval=2000000
> Also..
> PriorityFavorSmall=NO
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityMaxAge=1-0
> The most significant change was actually reducing "PriorityMaxAge" to 7-0
> to 1-0. Before that change the larger jobs could hang around in the queue
> for days. Does it make sense therefore to further reduce PriorityMaxAge to
> less than 1 day? Your advice would be appreciated, please.
> Best regards,
> David
