<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>I believe it is still the case, but I haven't tested it. I put
this in way back when partition_job_depth was first introduced
(which was eons ago now). We run about 100 or so partitions, so
this has served us well as a general rule. What happens is that
if you set partition job depth too deep it may not get through all
the partitions before it has to give up and start again. This
lead to partition starvation in the past where there were jobs
waiting to be scheduled in a partition that had space but they
never started because the main loop never got to them. The
backfill loop took to long to clean up thus those jobs took
forever to schedule.</p>
<p>With the various improvements to the scheduler this may no longer
be the case, but I haven't taken the time to test it on our
cluster as our current set up has worked well.<br>
</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 5/29/19 11:04 AM, Kilian Cavalotti
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJz=VjEBtExg_LJPTwXMLVN+qUAO=uGyUgd3xuZWM7gqtrigVQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi Paul,
<div><br>
</div>
<div>I'm wondering about this part in your SchedulerParameters:<br>
<br>
### default_queue_depth should be some multiple of the
partition_job_depth,<br>
### ideally number_of_partitions * partition_job_depth, but
typically the main<br>
### loop exits prematurely if you go over about 400. A
partition_job_depth of<br>
### 10 seems to work well.<br>
<br>
Do you remember if that's still the case, or if it's in
relation with a reported issue? That sure sounds like
something that would need to be fixed if it hasn't been
already.</div>
<div><br>
</div>
<div>Cheers,</div>
<div>-- </div>
<div>Kilian</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, May 29, 2019 at 7:42
AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu"
moz-do-not-send="true">pedmon@cfa.harvard.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>For reference we are running 18.08.7</p>
<p>-Paul Edmon-<br>
</p>
<div class="gmail-m_-4707203304347826915moz-cite-prefix">On
5/29/19 10:39 AM, Paul Edmon wrote:<br>
</div>
<blockquote type="cite">
<p>Sure. Here is what we have:</p>
<p>########################## Scheduling
#####################################<br>
### This section is specific to scheduling<br>
<br>
### Tells the scheduler to enforce limits for all
partitions<br>
### that a job submits to.<br>
EnforcePartLimits=ALL<br>
<br>
### Let's slurm know that we have a jobsubmit.lua script<br>
JobSubmitPlugins=lua<br>
<br>
### When a job is launched this has slurmctld send the
user information<br>
### instead of having AD do the lookup on the node
itself.<br>
LaunchParameters=send_gids<br>
<br>
### Maximum sizes for Jobs.<br>
MaxJobCount=200000<br>
MaxArraySize=10000<br>
DefMemPerCPU=100<br>
<br>
### Job Timers<br>
CompleteWait=0<br>
<br>
### We set the EpilogMsgTime long so that Epilog
Messages don't pile up all <br>
### at one time due to forced exit which can cause
problems for the master.<br>
EpilogMsgTime=3000000<br>
InactiveLimit=0<br>
KillWait=30<br>
<br>
### This only applies to the reservation time limit, the
job must still obey<br>
### the partition time limit.<br>
ResvOverRun=UNLIMITED<br>
MinJobAge=600<br>
Waittime=0<br>
<br>
### Scheduling parameters<br>
### FastSchedule 2 lets slurm know not to auto detect
the node config<br>
### but rather follow our definition. We also use
setting 2 as due to our geographic<br>
### size nodes may drop out of slurm and then
reconnect. If we had 1 they would be<br>
### set to drain when they reconnect. Setting it to 2
allows them to rejoin with out<br>
### issue.<br>
FastSchedule=2<br>
SchedulerType=sched/backfill<br>
SelectType=select/cons_res<br>
SelectTypeParameters=CR_Core_Memory<br>
<br>
### Govern's default preemption behavior<br>
PreemptType=preempt/partition_prio<br>
PreemptMode=REQUEUE<br>
<br>
### default_queue_depth should be some multiple of the
partition_job_depth,<br>
### ideally number_of_partitions * partition_job_depth,
but typically the main<br>
### loop exits prematurely if you go over about 400. A
partition_job_depth of<br>
### 10 seems to work well.<br>
SchedulerParameters=\<br>
default_queue_depth=1150,\<br>
partition_job_depth=10,\<br>
max_sched_time=50,\<br>
bf_continue,\<br>
bf_interval=30,\<br>
bf_resolution=600,\<br>
bf_window=11520,\<br>
bf_max_job_part=0,\<br>
bf_max_job_user=10,\<br>
bf_max_job_test=10000,\<br>
bf_max_job_start=1000,\<br>
bf_ignore_newly_avail_nodes,\<br>
kill_invalid_depend,\<br>
pack_serial_at_end,\<br>
nohold_on_prolog_fail,\<br>
preempt_strict_order,\<br>
preempt_youngest_first,\<br>
max_rpc_cnt=8<br>
<br>
################################ Fairshare
################################<br>
### This section sets the fairshare calculations<br>
<br>
PriorityType=priority/multifactor<br>
<br>
### Settings for fairshare calculation frequency and
shape.<br>
FairShareDampeningFactor=1<br>
PriorityDecayHalfLife=28-0<br>
PriorityCalcPeriod=1<br>
<br>
### Settings for fairshare weighting.<br>
PriorityMaxAge=7-0<br>
PriorityWeightAge=10000000<br>
PriorityWeightFairshare=20000000<br>
PriorityWeightJobSize=0<br>
PriorityWeightPartition=0<br>
PriorityWeightQOS=1000000000</p>
<p>I'm happy to chat about any of the settings if you
want, or share our full config.</p>
<p>-Paul Edmon-<br>
</p>
<div class="gmail-m_-4707203304347826915moz-cite-prefix">On
5/29/19 10:17 AM, Julius, Chad wrote:<br>
</div>
<blockquote type="cite">
<div class="gmail-m_-4707203304347826915WordSection1">
<p class="MsoNormal">All, </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">We rushed our Slurm install due
to a short timeframe and missed some important
items. We are now looking to implement a better
system than the first in, first out we have now. My
question, are the defaults listed in the slurm.conf
file a good start? Would anyone be willing to share
their Scheduling section in their .conf? Also we
are looking to increase the maximum array size but I
don’t see that in the slurm.conf in version 17. Am
I looking at an upgrade of Slurm in the near future
or can I just add MaxArraySize=somenumber?</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">The defaults as of 17.11.8 are:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"># SCHEDULING</p>
<p class="MsoNormal">#SchedulerAuth=</p>
<p class="MsoNormal">#SchedulerPort=</p>
<p class="MsoNormal">#SchedulerRootFilter=</p>
<p class="MsoNormal">#PriorityType=priority/multifactor</p>
<p class="MsoNormal">#PriorityDecayHalfLife=14-0</p>
<p class="MsoNormal">#PriorityUsageResetPeriod=14-0</p>
<p class="MsoNormal">#PriorityWeightFairshare=100000</p>
<p class="MsoNormal">#PriorityWeightAge=1000</p>
<p class="MsoNormal">#PriorityWeightPartition=10000</p>
<p class="MsoNormal">#PriorityWeightJobSize=1000</p>
<p class="MsoNormal">#PriorityMaxAge=1-0</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><b>Chad Julius</b></p>
<p class="MsoNormal">Cyberinfrastructure Engineer
Specialist</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><b>Division of Technology &
Security</b></p>
<p class="MsoNormal">SOHO 207, Box 2231</p>
<p class="MsoNormal">Brookings, SD 57007</p>
<p class="MsoNormal">Phone: 605-688-5767</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><a href="http://www.sdstate.edu/"
target="_blank" moz-do-not-send="true"><span
style="color:rgb(5,99,193)">www.sdstate.edu</span></a></p>
<p class="MsoNormal"><img style="width: 2.6041in;
height: 0.75in;"
id="gmail-m_-4707203304347826915Picture_x0020_1"
src="cid:part3.A7F75314.0340623B@cfa.harvard.edu"
alt="cid:image007.png@01D24AF4.6CEECA30" class=""
width="250" height="72" border="0"></p>
<p class="MsoNormal"> </p>
</div>
</blockquote>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature">Kilian</div>
</blockquote>
</body>
</html>