<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I believe it is still the case, but I haven't tested it.  I put
      this in way back when partition_job_depth was first introduced
      (which was eons ago now).  We run about 100 or so partitions, so
      this has served us well as a general rule.  What happens is that
      if you set partition job depth too deep it may not get through all
      the partitions before it has to give up and start again.  This
      lead to partition starvation in the past where there were jobs
      waiting to be scheduled in a partition that had space but they
      never started because the main loop never got to them.  The
      backfill loop took to long to clean up thus those jobs took
      forever to schedule.</p>
    <p>With the various improvements to the scheduler this may no longer
      be the case, but I haven't taken the time to test it on our
      cluster as our current set up has worked well.<br>
    </p>
    <p>-Paul Edmon-<br>
    </p>
    <div class="moz-cite-prefix">On 5/29/19 11:04 AM, Kilian Cavalotti
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJz=VjEBtExg_LJPTwXMLVN+qUAO=uGyUgd3xuZWM7gqtrigVQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">Hi Paul, 
        <div><br>
        </div>
        <div>I'm wondering about this part in your SchedulerParameters:<br>
          <br>
          ### default_queue_depth should be some multiple of the
          partition_job_depth,<br>
          ### ideally number_of_partitions * partition_job_depth, but
          typically the main<br>
          ### loop exits prematurely if you go over about 400. A
          partition_job_depth of<br>
          ### 10 seems to work well.<br>
          <br>
          Do you remember if that's still the case, or if it's in
          relation with a reported issue? That sure sounds like
          something that would need to be fixed if it hasn't been
          already.</div>
        <div><br>
        </div>
        <div>Cheers,</div>
        <div>-- </div>
        <div>Kilian</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Wed, May 29, 2019 at 7:42
          AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu"
            moz-do-not-send="true">pedmon@cfa.harvard.edu</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div bgcolor="#FFFFFF">
            <p>For reference we are running 18.08.7</p>
            <p>-Paul Edmon-<br>
            </p>
            <div class="gmail-m_-4707203304347826915moz-cite-prefix">On
              5/29/19 10:39 AM, Paul Edmon wrote:<br>
            </div>
            <blockquote type="cite">
              <p>Sure.  Here is what we have:</p>
              <p>########################## Scheduling
                #####################################<br>
                ### This section is specific to scheduling<br>
                <br>
                ### Tells the scheduler to enforce limits for all
                partitions<br>
                ### that a job submits to.<br>
                EnforcePartLimits=ALL<br>
                <br>
                ### Let's slurm know that we have a jobsubmit.lua script<br>
                JobSubmitPlugins=lua<br>
                <br>
                ### When a job is launched this has slurmctld send the
                user information<br>
                ### instead of having AD do the lookup on the node
                itself.<br>
                LaunchParameters=send_gids<br>
                <br>
                ### Maximum sizes for Jobs.<br>
                MaxJobCount=200000<br>
                MaxArraySize=10000<br>
                DefMemPerCPU=100<br>
                <br>
                ### Job Timers<br>
                CompleteWait=0<br>
                <br>
                ### We set the EpilogMsgTime long so that Epilog
                Messages don't pile up all <br>
                ### at one time due to forced exit which can cause
                problems for the master.<br>
                EpilogMsgTime=3000000<br>
                InactiveLimit=0<br>
                KillWait=30<br>
                <br>
                ### This only applies to the reservation time limit, the
                job must still obey<br>
                ### the partition time limit.<br>
                ResvOverRun=UNLIMITED<br>
                MinJobAge=600<br>
                Waittime=0<br>
                <br>
                ### Scheduling parameters<br>
                ### FastSchedule 2 lets slurm know not to auto detect
                the node config<br>
                ### but rather follow our definition.  We also use
                setting 2 as due to our geographic<br>
                ### size nodes may drop out of slurm and then
                reconnect.  If we had 1 they would be<br>
                ### set to drain when they reconnect.  Setting it to 2
                allows them to rejoin with out<br>
                ### issue.<br>
                FastSchedule=2<br>
                SchedulerType=sched/backfill<br>
                SelectType=select/cons_res<br>
                SelectTypeParameters=CR_Core_Memory<br>
                <br>
                ### Govern's default preemption behavior<br>
                PreemptType=preempt/partition_prio<br>
                PreemptMode=REQUEUE<br>
                <br>
                ### default_queue_depth should be some multiple of the
                partition_job_depth,<br>
                ### ideally number_of_partitions * partition_job_depth,
                but typically the main<br>
                ### loop exits prematurely if you go over about 400. A
                partition_job_depth of<br>
                ### 10 seems to work well.<br>
                SchedulerParameters=\<br>
                default_queue_depth=1150,\<br>
                partition_job_depth=10,\<br>
                max_sched_time=50,\<br>
                bf_continue,\<br>
                bf_interval=30,\<br>
                bf_resolution=600,\<br>
                bf_window=11520,\<br>
                bf_max_job_part=0,\<br>
                bf_max_job_user=10,\<br>
                bf_max_job_test=10000,\<br>
                bf_max_job_start=1000,\<br>
                bf_ignore_newly_avail_nodes,\<br>
                kill_invalid_depend,\<br>
                pack_serial_at_end,\<br>
                nohold_on_prolog_fail,\<br>
                preempt_strict_order,\<br>
                preempt_youngest_first,\<br>
                max_rpc_cnt=8<br>
                <br>
                ################################ Fairshare
                ################################<br>
                ### This section sets the fairshare calculations<br>
                <br>
                PriorityType=priority/multifactor<br>
                <br>
                ### Settings for fairshare calculation frequency and
                shape.<br>
                FairShareDampeningFactor=1<br>
                PriorityDecayHalfLife=28-0<br>
                PriorityCalcPeriod=1<br>
                <br>
                ### Settings for fairshare weighting.<br>
                PriorityMaxAge=7-0<br>
                PriorityWeightAge=10000000<br>
                PriorityWeightFairshare=20000000<br>
                PriorityWeightJobSize=0<br>
                PriorityWeightPartition=0<br>
                PriorityWeightQOS=1000000000</p>
              <p>I'm happy to chat about any of the settings if you
                want, or share our full config.</p>
              <p>-Paul Edmon-<br>
              </p>
              <div class="gmail-m_-4707203304347826915moz-cite-prefix">On
                5/29/19 10:17 AM, Julius, Chad wrote:<br>
              </div>
              <blockquote type="cite">
                <div class="gmail-m_-4707203304347826915WordSection1">
                  <p class="MsoNormal">All, </p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal">We rushed our Slurm install due
                    to a short timeframe and missed some important
                    items.  We are now looking to implement a better
                    system than the first in, first out we have now.  My
                    question, are the defaults listed in the slurm.conf
                    file a good start?  Would anyone be willing to share
                    their Scheduling section in their .conf?  Also we
                    are looking to increase the maximum array size but I
                    don’t see that in the slurm.conf in version 17.  Am
                    I looking at an upgrade of Slurm in the near future
                    or can I just add MaxArraySize=somenumber?</p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal">The defaults as of 17.11.8 are:</p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal"># SCHEDULING</p>
                  <p class="MsoNormal">#SchedulerAuth=</p>
                  <p class="MsoNormal">#SchedulerPort=</p>
                  <p class="MsoNormal">#SchedulerRootFilter=</p>
                  <p class="MsoNormal">#PriorityType=priority/multifactor</p>
                  <p class="MsoNormal">#PriorityDecayHalfLife=14-0</p>
                  <p class="MsoNormal">#PriorityUsageResetPeriod=14-0</p>
                  <p class="MsoNormal">#PriorityWeightFairshare=100000</p>
                  <p class="MsoNormal">#PriorityWeightAge=1000</p>
                  <p class="MsoNormal">#PriorityWeightPartition=10000</p>
                  <p class="MsoNormal">#PriorityWeightJobSize=1000</p>
                  <p class="MsoNormal">#PriorityMaxAge=1-0</p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal"><b>Chad Julius</b></p>
                  <p class="MsoNormal">Cyberinfrastructure Engineer
                    Specialist</p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal"><b>Division of Technology &
                      Security</b></p>
                  <p class="MsoNormal">SOHO 207, Box 2231</p>
                  <p class="MsoNormal">Brookings, SD 57007</p>
                  <p class="MsoNormal">Phone: 605-688-5767</p>
                  <p class="MsoNormal"> </p>
                  <p class="MsoNormal"><a href="http://www.sdstate.edu/"
                      target="_blank" moz-do-not-send="true"><span
                        style="color:rgb(5,99,193)">www.sdstate.edu</span></a></p>
                  <p class="MsoNormal"><img style="width: 2.6041in;
                      height: 0.75in;"
                      id="gmail-m_-4707203304347826915Picture_x0020_1"
                      src="cid:part3.A7F75314.0340623B@cfa.harvard.edu"
                      alt="cid:image007.png@01D24AF4.6CEECA30" class=""
                      width="250" height="72" border="0"></p>
                  <p class="MsoNormal"> </p>
                </div>
              </blockquote>
            </blockquote>
          </div>
        </blockquote>
      </div>
      <br clear="all">
      <div><br>
      </div>
      -- <br>
      <div dir="ltr" class="gmail_signature">Kilian</div>
    </blockquote>
  </body>
</html>