<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I believe it is still the case, but I haven't tested it.  I put

      this in way back when partition_job_depth was first introduced

      (which was eons ago now).  We run about 100 or so partitions, so

      this has served us well as a general rule.  What happens is that

      if you set partition job depth too deep it may not get through all

      the partitions before it has to give up and start again.  This

      lead to partition starvation in the past where there were jobs

      waiting to be scheduled in a partition that had space but they

      never started because the main loop never got to them.  The

      backfill loop took to long to clean up thus those jobs took

      forever to schedule.</p>

    <p>With the various improvements to the scheduler this may no longer

      be the case, but I haven't taken the time to test it on our

      cluster as our current set up has worked well.<br>

    </p>

    <p>-Paul Edmon-<br>

    </p>

    <div class="moz-cite-prefix">On 5/29/19 11:04 AM, Kilian Cavalotti

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAJz=VjEBtExg_LJPTwXMLVN+qUAO=uGyUgd3xuZWM7gqtrigVQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">Hi Paul, 

        <div><br>

        </div>

        <div>I'm wondering about this part in your SchedulerParameters:<br>

          <br>

          ### default_queue_depth should be some multiple of the

          partition_job_depth,<br>

          ### ideally number_of_partitions * partition_job_depth, but

          typically the main<br>

          ### loop exits prematurely if you go over about 400. A

          partition_job_depth of<br>

          ### 10 seems to work well.<br>

          <br>

          Do you remember if that's still the case, or if it's in

          relation with a reported issue? That sure sounds like

          something that would need to be fixed if it hasn't been

          already.</div>

        <div><br>

        </div>

        <div>Cheers,</div>

        <div>-- </div>

        <div>Kilian</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Wed, May 29, 2019 at 7:42

          AM Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu"

            moz-do-not-send="true">pedmon@cfa.harvard.edu</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div bgcolor="#FFFFFF">

            <p>For reference we are running 18.08.7</p>

            <p>-Paul Edmon-<br>

            </p>

            <div class="gmail-m_-4707203304347826915moz-cite-prefix">On

              5/29/19 10:39 AM, Paul Edmon wrote:<br>

            </div>

            <blockquote type="cite">

              <p>Sure.  Here is what we have:</p>

              <p>########################## Scheduling

                #####################################<br>

                ### This section is specific to scheduling<br>

                <br>

                ### Tells the scheduler to enforce limits for all

                partitions<br>

                ### that a job submits to.<br>

                EnforcePartLimits=ALL<br>

                <br>

                ### Let's slurm know that we have a jobsubmit.lua script<br>

                JobSubmitPlugins=lua<br>

                <br>

                ### When a job is launched this has slurmctld send the

                user information<br>

                ### instead of having AD do the lookup on the node

                itself.<br>

                LaunchParameters=send_gids<br>

                <br>

                ### Maximum sizes for Jobs.<br>

                MaxJobCount=200000<br>

                MaxArraySize=10000<br>

                DefMemPerCPU=100<br>

                <br>

                ### Job Timers<br>

                CompleteWait=0<br>

                <br>

                ### We set the EpilogMsgTime long so that Epilog

                Messages don't pile up all <br>

                ### at one time due to forced exit which can cause

                problems for the master.<br>

                EpilogMsgTime=3000000<br>

                InactiveLimit=0<br>

                KillWait=30<br>

                <br>

                ### This only applies to the reservation time limit, the

                job must still obey<br>

                ### the partition time limit.<br>

                ResvOverRun=UNLIMITED<br>

                MinJobAge=600<br>

                Waittime=0<br>

                <br>

                ### Scheduling parameters<br>

                ### FastSchedule 2 lets slurm know not to auto detect

                the node config<br>

                ### but rather follow our definition.  We also use

                setting 2 as due to our geographic<br>

                ### size nodes may drop out of slurm and then

                reconnect.  If we had 1 they would be<br>

                ### set to drain when they reconnect.  Setting it to 2

                allows them to rejoin with out<br>

                ### issue.<br>

                FastSchedule=2<br>

                SchedulerType=sched/backfill<br>

                SelectType=select/cons_res<br>

                SelectTypeParameters=CR_Core_Memory<br>

                <br>

                ### Govern's default preemption behavior<br>

                PreemptType=preempt/partition_prio<br>

                PreemptMode=REQUEUE<br>

                <br>

                ### default_queue_depth should be some multiple of the

                partition_job_depth,<br>

                ### ideally number_of_partitions * partition_job_depth,

                but typically the main<br>

                ### loop exits prematurely if you go over about 400. A

                partition_job_depth of<br>

                ### 10 seems to work well.<br>

                SchedulerParameters=\<br>

                default_queue_depth=1150,\<br>

                partition_job_depth=10,\<br>

                max_sched_time=50,\<br>

                bf_continue,\<br>

                bf_interval=30,\<br>

                bf_resolution=600,\<br>

                bf_window=11520,\<br>

                bf_max_job_part=0,\<br>

                bf_max_job_user=10,\<br>

                bf_max_job_test=10000,\<br>

                bf_max_job_start=1000,\<br>

                bf_ignore_newly_avail_nodes,\<br>

                kill_invalid_depend,\<br>

                pack_serial_at_end,\<br>

                nohold_on_prolog_fail,\<br>

                preempt_strict_order,\<br>

                preempt_youngest_first,\<br>

                max_rpc_cnt=8<br>

                <br>

                ################################ Fairshare

                ################################<br>

                ### This section sets the fairshare calculations<br>

                <br>

                PriorityType=priority/multifactor<br>

                <br>

                ### Settings for fairshare calculation frequency and

                shape.<br>

                FairShareDampeningFactor=1<br>

                PriorityDecayHalfLife=28-0<br>

                PriorityCalcPeriod=1<br>

                <br>

                ### Settings for fairshare weighting.<br>

                PriorityMaxAge=7-0<br>

                PriorityWeightAge=10000000<br>

                PriorityWeightFairshare=20000000<br>

                PriorityWeightJobSize=0<br>

                PriorityWeightPartition=0<br>

                PriorityWeightQOS=1000000000</p>

              <p>I'm happy to chat about any of the settings if you

                want, or share our full config.</p>

              <p>-Paul Edmon-<br>

              </p>

              <div class="gmail-m_-4707203304347826915moz-cite-prefix">On

                5/29/19 10:17 AM, Julius, Chad wrote:<br>

              </div>

              <blockquote type="cite">

                <div class="gmail-m_-4707203304347826915WordSection1">

                  <p class="MsoNormal">All, </p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal">We rushed our Slurm install due

                    to a short timeframe and missed some important

                    items.  We are now looking to implement a better

                    system than the first in, first out we have now.  My

                    question, are the defaults listed in the slurm.conf

                    file a good start?  Would anyone be willing to share

                    their Scheduling section in their .conf?  Also we

                    are looking to increase the maximum array size but I

                    don’t see that in the slurm.conf in version 17.  Am

                    I looking at an upgrade of Slurm in the near future

                    or can I just add MaxArraySize=somenumber?</p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal">The defaults as of 17.11.8 are:</p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal"># SCHEDULING</p>

                  <p class="MsoNormal">#SchedulerAuth=</p>

                  <p class="MsoNormal">#SchedulerPort=</p>

                  <p class="MsoNormal">#SchedulerRootFilter=</p>

                  <p class="MsoNormal">#PriorityType=priority/multifactor</p>

                  <p class="MsoNormal">#PriorityDecayHalfLife=14-0</p>

                  <p class="MsoNormal">#PriorityUsageResetPeriod=14-0</p>

                  <p class="MsoNormal">#PriorityWeightFairshare=100000</p>

                  <p class="MsoNormal">#PriorityWeightAge=1000</p>

                  <p class="MsoNormal">#PriorityWeightPartition=10000</p>

                  <p class="MsoNormal">#PriorityWeightJobSize=1000</p>

                  <p class="MsoNormal">#PriorityMaxAge=1-0</p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal"><b>Chad Julius</b></p>

                  <p class="MsoNormal">Cyberinfrastructure Engineer

                    Specialist</p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal"><b>Division of Technology &

                      Security</b></p>

                  <p class="MsoNormal">SOHO 207, Box 2231</p>

                  <p class="MsoNormal">Brookings, SD 57007</p>

                  <p class="MsoNormal">Phone: 605-688-5767</p>

                  <p class="MsoNormal"> </p>

                  <p class="MsoNormal"><a href="http://www.sdstate.edu/"

                      target="_blank" moz-do-not-send="true"><span

                        style="color:rgb(5,99,193)">www.sdstate.edu</span></a></p>

                  <p class="MsoNormal"><img style="width: 2.6041in;

                      height: 0.75in;"

                      id="gmail-m_-4707203304347826915Picture_x0020_1"

                      src="cid:part3.A7F75314.0340623B@cfa.harvard.edu"

                      alt="cid:image007.png@01D24AF4.6CEECA30" class=""

                      width="250" height="72" border="0"></p>

                  <p class="MsoNormal"> </p>

                </div>

              </blockquote>

            </blockquote>

          </div>

        </blockquote>

      </div>

      <br clear="all">

      <div><br>

      </div>

      -- <br>

      <div dir="ltr" class="gmail_signature">Kilian</div>

    </blockquote>

  </body>

</html>