<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>I suspect you have too low of a setting for "MaxJobCount"</p>

    <pre style="font-family: "Lucida Console", "Courier New", Courier, monospace; color: rgb(0, 0, 0); font-size: 12.5px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><b>MaxJobCount</b>

              The maximum number of jobs SLURM can have in its active database

              at one time. Set the values  of  <b>MaxJobCount</b>  and  <b>MinJobAge</b>  to

              insure the slurmctld daemon does not exhaust its memory or other

              resources. Once  this  limit  is  reached,  requests  to  submit

              additional  jobs will fail. The default value is 5000 jobs. This

              value may not be reset via "scontrol reconfig".  It  only  takes

              effect  upon  restart  of  the slurmctld daemon.  May not exceed

              65533.</pre>

    <p><br>

    </p>

    <p>so if you already have (by default) 5000 jobs being considered,

      the remaining aren't even looked at.</p>

    <p>Brian Andrus<br>

    </p>

    <div class="moz-cite-prefix">On 5/12/2022 7:34 AM, David Henkemeyer

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CABjsmAHaWsytxNYW2v-+jRmrTbtY9mwFz-CDyZmGm9TEO076LQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">Question for the braintrust:

        <div><br>

        </div>

        <div>I have 3 partitions:</div>

        <div>

          <ul>

            <li>Partition A_highpri: 80 nodes</li>

            <li>Partition A_lowpri: same 80 nodes</li>

            <li>Partition B_lowpri: 10 different nodes</li>

          </ul>

        </div>

        <div><br>

        </div>

        <div>There is no overlap between A and B partitions.</div>

        <div><br>

        </div>

        <div>Here is what I'm observing.  If I fill the queue with

          ~20-30k jobs for partition A_highpri, and several thousand to

          partition A_lowpri, then, a bit later, submit jobs to

          partition B_lowpri, I am observing that the Partition B jobs <u>are

            queued and not running right away, and are given a pending

            reason of "Priority"</u>, which doesn't seem right to me. 

          Yes, there are higher priority jobs pending in the queue (the

          jobs bound for A_hi), but there aren't any higher priority

          jobs pending <i>for the same partition</i> as the Partition B

          jobs, so theoretically, these partition B jobs should not be

          held up.  Eventually, the scheduler gets around to scheduling

          them, but it seems to take a while for the scheduler (which is

          probably pretty busy dealing with job starts, job stops, etc)

          to figure this out.</div>

        <div><br>

        </div>

        <div>If I schedule fewer jobs to the A partitions ( ~3k jobs ),

          then the scheduler schedules the PartitionB jobs much faster,

          as expected.  As I increase from 3k, then partition B jobs get

          held up longer and longer.</div>

        <div><br>

        </div>

        <div>I can raise the priority on partition B, and that does

          solve the problem, but I don't want those jobs to impact the

          partition A_lowpri jobs.  In fact, <u>I don't want any

            cross-partition influence</u>.</div>

        <div><br>

        </div>

        <div>I'm hoping there is a slurm parameter I can tweak to make

          slurm recognize that these partition B jobs shouldn't ever

          have a pending state of "priority".  Or to treat these as 2

          separate queues.  Or something like that.  Spinning up a 2nd

          slurm controller is not ideal for us (uless there is a

          lightweight method to do it).</div>

        <div><br>

        </div>

        <div>Thanks</div>

        <div>David</div>

        <div><br>

        </div>

        <div><br>

        </div>

      </div>

    </blockquote>

  </body>

</html>