[slurm-users] Tuning the backfill scheduler

Michael Gutteridge michael.gutteridge at gmail.com
Thu Oct 11 05:54:11 MDT 2018


We've run into similar problems with backfill (though not apparently of the
scale you've got).  We have a number of users who will drop 5,000+ jobs at
once- as you've indicated, this can play havoc with backfill.

One of the newer* parameters for the backfill scheduler that's been a real
help for us is "bf_max_job_assoc" and "bf_max_job_user".  These limit the
number of jobs the scheduler considers per association and user.


- Michael

*I think these are newer- I don't actually know when those were added (I'm
currently on 17.11.5)

On Wed, Oct 10, 2018 at 6:08 PM Richard Feltstykket <
rafeltstykket at ucdavis.edu> wrote:

> Hello list,
> My cluster usually has a pretty heterogenous job load and spends a lot of
> the time memory bound.  Ocassionally I have users that submit 100k+ short,
> low resource jobs.  Despite having several thousand free cores and enough
> RAM to run the jobs, the backfill scheduler would never backfill them.  It
> turns out that there were a number of factors: They were deep down in the
> queue, from an account with low priority, and there were a lot of them for
> the scheduler to consider.  After a bunch of tuning, the backfill scheduler
> parameters I finally settled on are:
> SchedulerParameters=defer,bf_continue,bf_interval=20,bf_resolution=600,bf_yield_interval=1000000,sched_min_interval=2000000,bf_max_time=600,bf_max_job_test=1000000
> After implementing these changes the backfill scheduler began to
> successfully schedule these jobs on the cluster.  While the cluster has a
> deep queue, the load on the slurmctld host can get pretty high.  However no
> users have reported issues with responsivenes of the various slurm commands
> and the backup controller has never taken over either.  Changes have been
> in place for a month or so with no ill effects that I have observed.
> While I was troubleshooting I was definitely combing the list archives for
> other people's tuning suggestions, so I figured I would post a message here
> for posterity as well as see if anyone has similiar settings or feedback
> :-).
> Cheers,
> Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181011/d5cc212f/attachment-0001.html>

More information about the slurm-users mailing list