[slurm-users] Backfill Scheduling
loris.bennett at fu-berlin.de
Tue Jun 27 06:10:31 UTC 2023
Reed Dier <reed.dier at focusvq.com> writes:
> Hoping this will be an easy one for the community.
> The priority schema was recently reworked for our cluster, with only
> PriorityWeightQOS and PriorityWeightAge contributing to the priority
> value, while PriorityWeightAssoc, PriorityWeightFairshare,
> PriorityWeightJobSize, and PriorityWeightPartition are now set to 0,
> and PriorityFavorSmall set to NO.
> The cluster is fairly loaded right now, with a big backlog of work (~250 running jobs, ~40K pending jobs).
> The majority of these jobs are arrays, which runs the pending job count up quickly.
> What I’m trying to figure out is:
> The next highest priority job array in the queue is waiting on resources, everything else on priority, which makes sense.
> However, there is a good portion of the cluster unused, seemingly
> dammed by the next up job being large, while there are much smaller
> jobs behind it that could easily fit into the available resources
> Is this an issue with the relative FIFO nature of the priority scheduling currently with all of the other factors disabled,
> or since my queue is fairly deep, is this due to bf_max_job_test being
> the default 100, and it can’t look deep enough into the queue to find
> a job that will fit into what is unoccupied?
It could be that bf_max_job_test is too low. On our system some users
think it is a good idea to submit lots of jobs with identical resource
requirements by writing a loop around sbatch. Such jobs will exhaust
the bf_max_job_test very quickly. Thus we increased the limit to 1000
and try to persuade users to use job arrays instead of home-grown loops.
This seem to work OK.
> Hoping to know where I might want to swing my hammer next, without whacking the wrong setting
> Appreciate any advice,
 One problem we still have to address is that we don't have an
array-enabled version of the 'subgXX' script for the quantum
chemistry program Gaussian. This is a Perl script which parses the
input for the program, generates a job script and submits it. An
array-enabled version would have to stipulate a specific mapping
between the array task ID and the way the input files are
organised. We are currently not sure about the best way to do this
in a suitably generic way.
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
More information about the slurm-users