[slurm-users] Slurm Fairshare / Multifactor Priority
Christoph Brüning
christoph.bruening at uni-wuerzburg.de
Wed May 29 15:07:42 UTC 2019
Hi Chad,
for us (also running slurm 17.11), the crucial point was the balance
between PriorityWeightFairshare, PriorityWeightAge and PriorityMaxAge.
We set the PriorityWeightAge high (higher than PriorityWeightFairshare,
in fact), so that even a job by some power user will eventually be the
first in the queue and can't be sort of DDoS-ed by jobs from little-used
accounts.
The question then is: How long must that job have already been waiting
in the queue?
Consider the following simplified account tree:
root
/ \
A B
/ \ |
X Y Z
When the cluster is basically occupied by X, this also has an impact on
Y's fair share value. This can lead to a situation where the difference
between X's and Y's fair share value is pretty small, even though Y has
hardly used any resources.
With a low value of PriorityMaxAge, the situation is basically FIFO
between X and Y, as X's jobs only need a couple of hours (or even less)
in the queue to compensate the difference in fair share priority.
We're currently running with the following settings, and since the
increase of PriorityMaxAge to three weeks it works fine:
PriorityMaxAge=21-0
PriorityWeightAge=1500000
PriorityWeightFairshare=1000000
For the array jobs, you can set MaxArraySize. But remember to increase
MaxJobCount as well!
Best,
Christoph
On 29/05/2019 16.17, Julius, Chad wrote:
> All,
>
> We rushed our Slurm install due to a short timeframe and missed some
> important items. We are now looking to implement a better system than
> the first in, first out we have now. My question, are the defaults
> listed in the slurm.conf file a good start? Would anyone be willing to
> share their Scheduling section in their .conf? Also we are looking to
> increase the maximum array size but I don’t see that in the slurm.conf
> in version 17. Am I looking at an upgrade of Slurm in the near future
> or can I just add MaxArraySize=somenumber?
>
> The defaults as of 17.11.8 are:
>
> # SCHEDULING
>
> #SchedulerAuth=
>
> #SchedulerPort=
>
> #SchedulerRootFilter=
>
> #PriorityType=priority/multifactor
>
> #PriorityDecayHalfLife=14-0
>
> #PriorityUsageResetPeriod=14-0
>
> #PriorityWeightFairshare=100000
>
> #PriorityWeightAge=1000
>
> #PriorityWeightPartition=10000
>
> #PriorityWeightJobSize=1000
>
> #PriorityMaxAge=1-0
>
> *Chad Julius*
>
> Cyberinfrastructure Engineer Specialist
>
> *Division of Technology & Security*
>
> SOHO 207, Box 2231
>
> Brookings, SD 57007
>
> Phone: 605-688-5767
>
> www.sdstate.edu <http://www.sdstate.edu/>
>
> cid:image007.png at 01D24AF4.6CEECA30
>
--
Dr. Christoph Brüning
Universität Würzburg
Rechenzentrum
Am Hubland
D-97074 Würzburg
Tel.: +49 931 31-80499
More information about the slurm-users
mailing list