[slurm-users] Larger jobs tend to get starved out on our cluster

Baker D.J. D.J.Baker at soton.ac.uk
Wed Jan 9 09:40:16 MST 2019


Hello,

A colleague intimated that he thought that larger jobs were tending to get starved out on our slurm cluster. It's not a busy time at the moment so it's difficult to test this properly. Back in November it was not completely unusual for a larger job to have to wait up to a week to start.

I've extracted the key scheduling configuration out of the slurm.conf and I would appreciate your comments, please. Even at the busiest of times we notice many single compute jobs executing on the cluster -- starting either via the scheduler or by backfill.

Looking at the scheduling configuration do you think that I'm favouring small jobs too much? That is, for example, should I increase the PriorityWeightJobSize to encourage larger jobs to run?

I was very keen not to starve out small/medium jobs, however perhaps there is too much emphasis on small/medium jobs in our setup.

My colleague is from a Moab background, and in that respect he was surprised not to see nodes being reserved for jobs, but it could be that Slurm works in a different way to try to make efficient use of the cluster by backfilling more aggressively than Moab. Certainly we see a great deal of activity from backfill.

In this respect does anyone understand the mechanism used to reserve nodes/resources for jobs in slurm or potentially where to look for that type of information.

Best regards,
David

SchedulerType=sched/backfill
SchedulerParameters=bf_window=3600,bf_resolution=180,bf_max_job_user=4

SelectType=select/cons_res
SelectTypeParameters=CR_Core
FastSchedule=1
PriorityFavorSmall=NO
PriorityFlags=DEPTH_OBLIVIOUS,SMALL_RELATIVE_TO_TIME,FAIR_TREE
PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0

PriorityWeightFairshare=1000000
PriorityWeightAge=100000
PriorityWeightPartition=0
PriorityWeightJobSize=100000
PriorityWeightQOS=10000
PriorityMaxAge=7-0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190109/6e8e6170/attachment-0001.html>


More information about the slurm-users mailing list