[slurm-users] Using Nice to Break Ties

Paul Edmon pedmon at cfa.harvard.edu
Tue Sep 14 14:32:47 UTC 2021


We use the classic fairshare algorithm here with users having their 
shares set to to parent and pulling from the group pool rather than 
having each user have their own fairshare (you can see our doc here: 
https://docs.rc.fas.harvard.edu/kb/fairshare/). This has worked very 
well for us for many years.  However, there is a use case where this 
doesn't work namely breaking ties internal to a group.  We have a lot of 
private partitions owned by a specific group and when you have a bunch 
of users in that group the queue turns into FIFO instead of letting 
lower usage users go first due to the parent flag on the fairshare.  Now 
this is obviously solved by giving every user their own fairshare but 
this has the downside of impacting the users priority back on the shared 
partitions with other groups where they will not be able to use their 
groups full fairshare but instead are stuck with their own.  Thus their 
total group fairshare may be something like 0.4 but their personal is 
stuck at 0 because they are one of the heaviest users in the lab.

Now I get the feeling that Fair Tree might solve this but I can't move 
to it as it's taken years for our users to even understand and accept 
the classic fairshare model.  As such I'm trying to come up with 
solutions that work with in the model.  One option I have been 
considering is using the job_submit.lua script to set a Nice value for 
all the jobs based on that users usage.  Basically the nice value would 
break the internal ties of the group and allow non-FIFO scheduling 
internal to accounts with out impacting their overall fairshare relative 
to other groups.

Before I start messing around with this though I wanted to ping this 
wisdom of the group and see how others handle tie breaking internal to 
an account/group/lab?  What solutions have people used for this?

-Paul Edmon-




More information about the slurm-users mailing list