<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <br>

    If you haven't looked at the man page for slurm.conf, it will answer

    most if not all your questions. <br>

    <a class="moz-txt-link-freetext" href="https://slurm.schedmd.com/slurm.conf.html">https://slurm.schedmd.com/slurm.conf.html</a> but I would depend on the

    the manual version that was distributed with the version you have

    installed as options do change.<br>

    <br>

    There is a ton of information that is tedious to get through but

    reading through it multiple times opens many doors.<br>

    <br>

    DefaultTime is listed in there as a Partition option. <br>

    If you are scheduling gres/gpu resources, it's quite possible there

    are cores available with no corresponding gpus avail.<br>

    <br>

    -b<br>

    <br>

    <div class="moz-cite-prefix">On 4/24/20 2:49 PM, navin srivastava

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAK8-jZAVMt8tNDGYWYssL0UkhcQ7TyuV5k0Lyh+Qd8bRx3KHRg@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="auto">Thanks Brian.Â 

        <div dir="auto"><br>

        </div>

        <div dir="auto">I needÂ  to check the jobs order.Â <br>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Is thereÂ  any way to define the default

            timeline of the job if userÂ  not specifying time limit.Â </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Also what does the meaning of fairtreeÂ  in

            priorities in slurm.Conf file.Â </div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">The set of nodes are different in

            partitions.FIFOÂ  doesÂ  not care for anyÂ  partitiong.Â </div>

          <div dir="auto">Is it like strict odering means the job came

            1st will go and untilÂ  it runs it willÂ  not allow others.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Also priorities is high for gpusmall partition

            and low for normal jobs and the nodes of the normal

            partition is full but gpusmall cores are available.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">RegardsÂ <br>

          </div>

          <div dir="auto">NavinÂ </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Fri, Apr 24, 2020, 23:49

          Brian W. Johanson <<a href="mailto:bjohanso@psc.edu"

            moz-do-not-send="true">bjohanso@psc.edu</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div> <tt>Without seeing the jobs in your queue, I would

              expect the next job in FIFO order to be too large to fit

              in the current idle resources. <br>

              <br>

              Configure it to use the backfill scheduler: </tt><tt><tt>SchedulerType=sched/backfill<br>

                <br>

              </tt>Â Â Â Â Â  SchedulerType<br>

              Â Â Â Â Â Â Â Â Â Â Â Â Â  IdentifiesÂ  the type of scheduler to be

              used.Â  Note the slurmctld daemon must be restarted for a

              change in scheduler type to become effective

              (reconfiguring a running daemon has no effect for this

              parameter).Â  The scontrol command can be used to manually

              change job priorities if desired.Â  Acceptable values

              include:<br>

              <br>

              Â Â Â Â Â Â Â Â Â Â Â Â Â  sched/backfill<br>

              Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  For a backfill scheduling module to

              augment the default FIFO scheduling.Â  Backfill scheduling

              will initiate lower-priority jobs if doing so does not

              delay the expected initiation time of anyÂ  higherÂ 

              priorityÂ  job.Â Â  EffectivenessÂ  ofÂ  backfill scheduling is

              dependent upon users specifying job time limits, otherwise

              all jobs will have the same time limit and backfilling is

              impossible.Â  Note documentation for the

              SchedulerParameters option above.Â  This is the default

              configuration.<br>

              <br>

              Â Â Â Â Â Â Â Â Â Â Â Â Â  sched/builtin<br>

              Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ThisÂ  isÂ  theÂ  FIFO scheduler which

              initiates jobs in priority order.Â  If any job in the

              partition can not be scheduled, no lower priority job in

              that partition will be scheduled.Â  An exception is made

              for jobs that can not run due to partition constraints

              (e.g. the time limit) or down/drained nodes.Â  In that

              case, lower priority jobs can be initiated and not impact

              the higher priority job.<br>

              <br>

              <br>

              <br>

              Your partitions are set with maxtime=INFINITE, if your

              users are not specifying a reasonable timelimit to their

              jobs, this won't help either.<br>

              <br>

              <br>

              -b<br>

              <br>

            </tt><br>

            <div>On 4/24/20 1:52 PM, navin srivastava wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">In additionÂ to the above when i see the

                sprio of both the jobs it says :-

                <div><br>

                </div>

                <div>for normal queue jobs all jobs showing the same

                  priority</div>

                <div><br>

                </div>

                <div>Â JOBID PARTITION Â  PRIORITY Â FAIRSHARE<br>

                  Â  Â  Â  Â  1291352 normalÂ  Â  Â  Â  Â  Â 15789 Â  Â  Â 15789<br>

                </div>

                <div><br>

                </div>

                <div>for GPUsmall all jobs showing the same priority.</div>

                <div><br>

                </div>

                <div>Â JOBID PARTITION Â  PRIORITY Â FAIRSHARE<br>

                  Â  Â  Â  Â  1291339 GPUsmallÂ  Â  Â  21052 Â  Â  Â 21053<br>

                </div>

              </div>

              <br>

              <div class="gmail_quote">

                <div dir="ltr" class="gmail_attr">On Fri, Apr 24, 2020

                  at 11:14 PM navin srivastava <<a

                    href="mailto:navin.altair@gmail.com" target="_blank"

                    rel="noreferrer" moz-do-not-send="true">navin.altair@gmail.com</a>>

                  wrote:<br>

                </div>

                <blockquote class="gmail_quote" style="margin:0px 0px

                  0px 0.8ex;border-left:1px solid

                  rgb(204,204,204);padding-left:1ex">

                  <div dir="ltr">Hi Team,<br>

                    <div><br>

                    </div>

                    <div>we are facing some issue in our environment.

                      The resources are free but job is going into the

                      QUEUE state but not running.</div>

                    <div><br>

                    </div>

                    <div>i have attached the slurm.confÂ file here.</div>

                    <div><br>

                    </div>

                    <div>scenario:-</div>

                    <div><br>

                    </div>

                    <div>There are job only in the 2 partitions:</div>

                    <div>Â 344 jobs areÂ in PD state in normal partition

                      and the node belongs fromÂ the normal partitions

                      are full and no more job can run.</div>

                    <div><br>

                    </div>

                    <div>1300 JOBS are in GPUsmall partition are in

                      queue and enough CPU is avaiableÂ to execute the

                      jobs but i see the jobs are not schedulingÂ on free

                      nodes.</div>

                    <div><br>

                    </div>

                    <div>Rest there are no pend jobs in any other

                      partitionÂ .</div>

                    <div>eg:-</div>

                    <div>node status:- node18</div>

                    <div><br>

                    </div>

                    <div>NodeName=node18 Arch=x86_64 CoresPerSocket=18<br>

                      Â  Â CPUAlloc=6 CPUErr=0 CPUTot=36 CPULoad=4.07<br>

                      Â  Â AvailableFeatures=K2200<br>

                      Â  Â ActiveFeatures=K2200<br>

                      Â  Â Gres=gpu:2<br>

                      Â  Â NodeAddr=node18 NodeHostName=node18

                      Version=17.11<br>

                      Â  Â OS=Linux 4.4.140-94.42-default #1 SMP Tue Jul

                      17 07:44:50 UTC 2018 (0b375e4)<br>

                      Â  Â RealMemory=1 AllocMem=0 FreeMem=79532 Sockets=2

                      Boards=1<br>

                      Â  Â State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1

                      Owner=N/A MCS_label=N/A<br>

                      Â  Â Partitions=GPUsmall,pm_shared<br>

                      Â  Â BootTime=2019-12-10T14:16:37

                      SlurmdStartTime=2019-12-10T14:24:08<br>

                      Â  Â CfgTRES=cpu=36,mem=1M,billing=36<br>

                      Â  Â AllocTRES=cpu=6<br>

                      Â  Â CapWatts=n/a<br>

                      Â  Â CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br>

                      Â  Â ExtSensorsJoules=n/s ExtSensorsWatts=0

                      ExtSensorsTemp=n/s<br>

                    </div>

                    <div><br>

                    </div>

                    <div>node19:-</div>

                    <div><br>

                    </div>

                    <div>NodeName=node19 Arch=x86_64 CoresPerSocket=18<br>

                      Â  Â CPUAlloc=16 CPUErr=0 CPUTot=36 CPULoad=15.43<br>

                      Â  Â AvailableFeatures=K2200<br>

                      Â  Â ActiveFeatures=K2200<br>

                      Â  Â Gres=gpu:2<br>

                      Â  Â NodeAddr=node19 NodeHostName=node19

                      Version=17.11<br>

                      Â  Â OS=Linux 4.12.14-94.41-default #1 SMP Wed Oct

                      31 12:25:04 UTC 2018 (3090901)<br>

                      Â  Â RealMemory=1 AllocMem=0 FreeMem=63998 Sockets=2

                      Boards=1<br>

                      Â  Â State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1

                      Owner=N/A MCS_label=N/A<br>

                      Â  Â Partitions=GPUsmall,pm_shared<br>

                      Â  Â BootTime=2020-03-12T06:51:54

                      SlurmdStartTime=2020-03-12T06:53:14<br>

                      Â  Â CfgTRES=cpu=36,mem=1M,billing=36<br>

                      Â  Â AllocTRES=cpu=16<br>

                      Â  Â CapWatts=n/a<br>

                      Â  Â CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br>

                      Â  Â ExtSensorsJoules=n/s ExtSensorsWatts=0

                      ExtSensorsTemp=n/s<br>

                    </div>

                    <div><br>

                    </div>

                    <div>could you please help me to understand what

                      could be the reason?</div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                    <div><br>

                    </div>

                  </div>

                </blockquote>

              </div>

            </blockquote>

            <br>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>