[slurm-users] Excessive use of backfill on a cluster

Tue Nov 20 08:58:02 MST 2018

Hi David,

We have

  PriorityType=priority/multifactor
  PriorityDecayHalfLife=14-0
  PriorityWeightFairshare=10000000
  PriorityWeightAge=10000
  PriorityWeightPartition=10000
  PriorityWeightJobSize=0
  PriorityWeightQOS=10000
  PriorityMaxAge=7-0
  PriorityCalcPeriod=5
  SchedulerType=sched/backfill
  SchedulerParameters=max_job_bf=50,bf_interval=60,bf_window=20160,default_queue_depth=1000

In particular our main priority factor, by a long way, is Fairshare,
with a slight advantage for old jobs and QOSs with a short run-time.
With your priority weights, QOS is the most important by a factor of 10.
I'm not quite sure what effects this will have, other than that your
priorities will be a bit more static, since the total priority will have
a reduced time-dependency.

We had to add the SchedulerParameters settings to get backfill working
properly at all, but this obviously isn't your problem.

Cheers,

Loris

Baker D.J. <D.J.Baker at soton.ac.uk> writes:

> Hello,
>
> Thank you for your reply and for the explanation. That makes sense --
> your explanation of backfill is as we expected. I think it's more that
> we are surprised that almost all our jobs were being scheduled using
> backfill. We very rarely see any being scheduled normally. It could be
> that we haven't actually tuned our priority weights particularly
> well. We potentially need a setup that will allow users to everything
> from small (including very small, small duration, test jobs with a
> high QOS) to large jobs running over a range of times without too many
> users losing out. Initially, we had our Age and Job size scaling
> factors too low, but have currently got the setup shown below.
>
> Any thoughts, please? 
>
> Best regards,
>
> David
>
> PriorityParameters = (null)
> PriorityDecayHalfLife = 14-00:00:00
> PriorityCalcPeriod = 00:05:00
> PriorityFavorSmall = No
> PriorityFlags = SMALL_RELATIVE_TO_TIME,DEPTH_OBLIVIOUS
> PriorityMaxAge = 14-00:00:00
> PriorityUsageResetPeriod = NONE
> PriorityType = priority/multifactor
> PriorityWeightAge = 100000
> PriorityWeightFairShare = 100000
> PriorityWeightJobSize = 100000
> PriorityWeightPartition = 0
> PriorityWeightQOS = 1000000
> PriorityWeightTRES = (null)
> PropagatePrioProcess = 0
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: Loris Bennett <loris.bennett at fu-berlin.de>
> Sent: 20 November 2018 13:26:14
> To: Baker D.J.
> Cc: Slurm User Community List
> Subject: Re: [slurm-users] Excessive use of backfill on a cluster 
> Hi David,
>
> Baker D.J. <D.J.Baker at soton.ac.uk> writes:
>
>> Hello,
>>
>> We are running Slurm 18.08.0 on our cluster and I am concerned that
>> Slurm appears to be using backfill scheduling excessively. In fact the
>> vast majority of jobs are being scheduled using backfill. So, for
>> example, I have just submitted a set of three serial jobs. They all
>> started on a compute node that was completely free, but
>> disconcertingly in the slurmctl log they were all reported as started
>> using backfill and that isn't making sense...
>>
>> [2018-11-20T12:31:27.598] backfill: Started JobId=217031 in batch on red158
>> [2018-11-20T12:32:28.004] backfill: Started JobId=217032 in batch on red158
>> [2018-11-20T12:33:58.608] backfill: Started JobId=217033 in batch on red158
>>
>> I either don't understand the context of backfill re slurm or the
>> above is odd. Has anyone seem this "overuse" (unnecessary) use of
>> backfill on their cluster and/or could offer advice, please.
>
> I am not sure what "excessive backfilling" might mean. If you have
> a job which requires a large amount of resources to become available
> before it can start, then backfilling will allow other jobs with a lower
> priority to be run, if this can be achieved without delaying the start
> of the large job. So if a job needs 100 nodes, at some point 99 of them
> will be idle. Job which can start and finish before the 100th node
> becomes available will indeed be backfilled on empty nodes. This is how
> backfilling is supposed to work.
>
> Or am I misunderstanding your problem?
>
> Cheers,
>
> Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de