[slurm-users] Slurm Fairshare / Multifactor Priority

Wed May 29 15:04:53 UTC 2019

Hi Paul,

I'm wondering about this part in your SchedulerParameters:

### default_queue_depth should be some multiple of the partition_job_depth,
### ideally number_of_partitions * partition_job_depth, but typically the
main
### loop exits prematurely if you go over about 400. A partition_job_depth
of
### 10 seems to work well.

Do you remember if that's still the case, or if it's in relation with a
reported issue? That sure sounds like something that would need to be fixed
if it hasn't been already.

Cheers,
-- 
Kilian

On Wed, May 29, 2019 at 7:42 AM Paul Edmon <pedmon at cfa.harvard.edu> wrote:

> For reference we are running 18.08.7
>
> -Paul Edmon-
> On 5/29/19 10:39 AM, Paul Edmon wrote:
>
> Sure.  Here is what we have:
>
> ########################## Scheduling #####################################
> ### This section is specific to scheduling
>
> ### Tells the scheduler to enforce limits for all partitions
> ### that a job submits to.
> EnforcePartLimits=ALL
>
> ### Let's slurm know that we have a jobsubmit.lua script
> JobSubmitPlugins=lua
>
> ### When a job is launched this has slurmctld send the user information
> ### instead of having AD do the lookup on the node itself.
> LaunchParameters=send_gids
>
> ### Maximum sizes for Jobs.
> MaxJobCount=200000
> MaxArraySize=10000
> DefMemPerCPU=100
>
> ### Job Timers
> CompleteWait=0
>
> ### We set the EpilogMsgTime long so that Epilog Messages don't pile up
> all
> ### at one time due to forced exit which can cause problems for the master.
> EpilogMsgTime=3000000
> InactiveLimit=0
> KillWait=30
>
> ### This only applies to the reservation time limit, the job must still
> obey
> ### the partition time limit.
> ResvOverRun=UNLIMITED
> MinJobAge=600
> Waittime=0
>
> ### Scheduling parameters
> ### FastSchedule 2 lets slurm know not to auto detect the node config
> ### but rather follow our definition.  We also use setting 2 as due to our
> geographic
> ### size nodes may drop out of slurm and then reconnect.  If we had 1 they
> would be
> ### set to drain when they reconnect.  Setting it to 2 allows them to
> rejoin with out
> ### issue.
> FastSchedule=2
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
>
> ### Govern's default preemption behavior
> PreemptType=preempt/partition_prio
> PreemptMode=REQUEUE
>
> ### default_queue_depth should be some multiple of the partition_job_depth,
> ### ideally number_of_partitions * partition_job_depth, but typically the
> main
> ### loop exits prematurely if you go over about 400. A partition_job_depth
> of
> ### 10 seems to work well.
> SchedulerParameters=\
> default_queue_depth=1150,\
> partition_job_depth=10,\
> max_sched_time=50,\
> bf_continue,\
> bf_interval=30,\
> bf_resolution=600,\
> bf_window=11520,\
> bf_max_job_part=0,\
> bf_max_job_user=10,\
> bf_max_job_test=10000,\
> bf_max_job_start=1000,\
> bf_ignore_newly_avail_nodes,\
> kill_invalid_depend,\
> pack_serial_at_end,\
> nohold_on_prolog_fail,\
> preempt_strict_order,\
> preempt_youngest_first,\
> max_rpc_cnt=8
>
> ################################ Fairshare ################################
> ### This section sets the fairshare calculations
>
> PriorityType=priority/multifactor
>
> ### Settings for fairshare calculation frequency and shape.
> FairShareDampeningFactor=1
> PriorityDecayHalfLife=28-0
> PriorityCalcPeriod=1
>
> ### Settings for fairshare weighting.
> PriorityMaxAge=7-0
> PriorityWeightAge=10000000
> PriorityWeightFairshare=20000000
> PriorityWeightJobSize=0
> PriorityWeightPartition=0
> PriorityWeightQOS=1000000000
>
> I'm happy to chat about any of the settings if you want, or share our full
> config.
>
> -Paul Edmon-
> On 5/29/19 10:17 AM, Julius, Chad wrote:
>
> All,
>
>
>
> We rushed our Slurm install due to a short timeframe and missed some
> important items.  We are now looking to implement a better system than the
> first in, first out we have now.  My question, are the defaults listed in
> the slurm.conf file a good start?  Would anyone be willing to share their
> Scheduling section in their .conf?  Also we are looking to increase the
> maximum array size but I don’t see that in the slurm.conf in version 17.
> Am I looking at an upgrade of Slurm in the near future or can I just add
> MaxArraySize=somenumber?
>
>
>
> The defaults as of 17.11.8 are:
>
>
>
> # SCHEDULING
>
> #SchedulerAuth=
>
> #SchedulerPort=
>
> #SchedulerRootFilter=
>
> #PriorityType=priority/multifactor
>
> #PriorityDecayHalfLife=14-0
>
> #PriorityUsageResetPeriod=14-0
>
> #PriorityWeightFairshare=100000
>
> #PriorityWeightAge=1000
>
> #PriorityWeightPartition=10000
>
> #PriorityWeightJobSize=1000
>
> #PriorityMaxAge=1-0
>
>
>
> *Chad Julius*
>
> Cyberinfrastructure Engineer Specialist
>
>
>
> *Division of Technology & Security*
>
> SOHO 207, Box 2231
>
> Brookings, SD 57007
>
> Phone: 605-688-5767
>
>
>
> www.sdstate.edu
>
> [image: cid:image007.png at 01D24AF4.6CEECA30]
>
>
>
>

-- 
Kilian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190529/46a19876/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 18266 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190529/46a19876/attachment-0001.png>