[slurm-users] Excessive use of backfill on a cluster

Loris Bennett loris.bennett at fu-berlin.de
Tue Nov 20 06:26:14 MST 2018


Hi David,

Baker D.J. <D.J.Baker at soton.ac.uk> writes:

> Hello,
>
> We are running Slurm 18.08.0 on our cluster and I am concerned that
> Slurm appears to be using backfill scheduling excessively. In fact the
> vast majority of jobs are being scheduled using backfill. So, for
> example, I have just submitted a set of three serial jobs. They all
> started on a compute node that was completely free, but
> disconcertingly in the slurmctl log they were all reported as started
> using backfill and that isn't making sense...
>
> [2018-11-20T12:31:27.598] backfill: Started JobId=217031 in batch on red158
> [2018-11-20T12:32:28.004] backfill: Started JobId=217032 in batch on red158
> [2018-11-20T12:33:58.608] backfill: Started JobId=217033 in batch on red158
>
> I either don't understand the context of backfill re slurm or the
> above is odd. Has anyone seem this "overuse" (unnecessary) use of
> backfill on their cluster and/or could offer advice, please.

I am not sure what "excessive backfilling" might mean.  If you have
a job which requires a large amount of resources to become available
before it can start, then backfilling will allow other jobs with a lower
priority to be run, if this can be achieved without delaying the start
of the large job.  So if a job needs 100 nodes, at some point 99 of them
will be idle.  Job which can start and finish before the 100th node
becomes available will indeed be backfilled on empty nodes.  This is how
backfilling is supposed to work.

Or am I misunderstanding your problem?

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list