<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>I don't know off hand.  You can sort of construct a similar
      system in Slurm, but I've never seen it as a native option.</p>
    <p>-Paul Edmon-<br>
    </p>
    <div class="moz-cite-prefix">On 6/20/19 10:32 AM, John Hearns wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAPqNE2U16u_mLF4sa=hNUFfaBch4kyhOb+=bznOxr63Lg=01Tw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>Paul, you refer to banking resources. Which leads me to ask
          are schemes such as Gold used these days in Slurm?</div>
        <div>Gold was a utility where groups could top up with a virtual
          amount of money which would be spent as they consume
          resources.</div>
        <div>Altair also wrote a similar system for PBS, which they
          offered to us when I was in Formula 1 - it was quite a good
          system, and at the time</div>
        <div>we had a requirement for allocating resources to groups of
          users.</div>
        <div><br>
        </div>
        <div>I guess the sophisticated fairshare mechanisms discussed in
          this thread make schemes like Gold obsolete.</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div class="gmail_attr" dir="ltr">On Thu, 20 Jun 2019 at 15:24,
          Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu"
            moz-do-not-send="true">pedmon@cfa.harvard.edu</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
          <div bgcolor="#FFFFFF">
            <p>People will specify which partition they need or if they
              want multiple they use this:<br>
            </p>
            <p>#SBATCH -p general,shared,serial_requeue</p>
            <p>As then the scheduler will just select which partition
              they will run in first.  Naturally there is a risk that
              you will end up running in a more expensive partition.</p>
            <p>Our time limit is only applied to our public partitions,
              our owned partitions (of which we have roughly 80) have no
              time limit.  So if they run on their dedicated resources
              they have no penalty.  We've been working on getting rid
              of owned partitions and moving to a school/department
              based partition, where all the purchased resources for
              different PI's go into the same bucket where they compete
              against themselves and not the wider community.  We've
              found that this ends up working pretty well as most PI's
              only used their purchased resources sporadically.  Thus
              there are usually idle cores lying around that we backfill
              with our serial queues.  Since those are requeueable we
              can get immediate response to access that idle space.  We
              are also toying with a high priority partition that is
              open to people with high fairshare so that they can get
              immediate response as those with high fairshare tend to be
              bursty users.</p>
            <p>Our current halflife is set to a month and we keep 6
              months of data in our database.  I'd actually like to get
              rid of the halflife and just go to a 3 month moving window
              to allow people to bank their fairshare, but we haven't
              done that yet as people have been having a hard enough
              time understanding our current system.  It's not due to
              its complexity but more that most people just flat out
              aren't cognizant of their usage and think the resource is
              functionally infinite.</p>
            <p>-Paul Edmon-<br>
            </p>
            <div class="gmail-m_-8899424503205039134moz-cite-prefix">On
              6/19/19 5:16 PM, Fulcomer, Samuel wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr">Hi Paul,
                  <div><br>
                  </div>
                  <div>Thanks..Your setup is interesting. I see that you
                    have your processor types segregated in their own
                    partitions (with the exception of of the requeue
                    partition), and that's how you get at the weighting
                    mechanism. Do you have your users explicitly specify
                    multiple partitions in the batch commands/scripts in
                    order to take advantage of this, or do you use a
                    plugin for it?</div>
                  <div><br>
                  </div>
                  <div>It sounds like you don't impose any hard limit on
                    simultaneous resource use, and allow everything to
                    fairshare out with the help of the 7 day TimeLimit.
                    We haven't been imposing any TimeLimit on our condo
                    users, which would be an issue for us with your
                    config. For our exploratory and priority users, we
                    impose an effective time limit with
                    GrpTRESRunMins=cpu (and gres/gpu= for the GPU
                    usage). In addition, since we have so many priority
                    users, we don't explicitly set a rawshare value for
                    them (they all execute under the "default" account).
                    We set rawshare for the condo accounts as
                    cores-purchased/total-cores*1000. </div>
                  <div><br>
                  </div>
                  <div>What's your fairshare decay setting (don't
                    remember the proper name at the moment)?</div>
                  <div><br>
                  </div>
                  <div>Regards,</div>
                  <div>Sam</div>
                  <div><br>
                  </div>
                  <div><br>
                  </div>
                </div>
                <br>
                <div class="gmail_quote">
                  <div class="gmail_attr" dir="ltr">On Wed, Jun 19, 2019
                    at 3:44 PM Paul Edmon <<a
                      href="mailto:pedmon@cfa.harvard.edu"
                      target="_blank" moz-do-not-send="true">pedmon@cfa.harvard.edu</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px
                    0px
0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
                    <div bgcolor="#FFFFFF">
                      <p>We do a similar thing here at Harvard:</p>
                      <p><a
class="gmail-m_-8899424503205039134gmail-m_8457408054565706666moz-txt-link-freetext"
href="https://www.rc.fas.harvard.edu/fairshare/" target="_blank"
                          moz-do-not-send="true">https://www.rc.fas.harvard.edu/fairshare/</a></p>
                      <p>We simply weight all the partitions based on
                        their core type and then we allocate Shares for
                        each account based on what they have purchased. 
                        We don't use QoS at all, so we just rely purely
                        on fairshare weighting for resource usage.  It
                        has worked pretty well for our purposes.</p>
                      <p>-Paul Edmon-<br>
                      </p>
                      <div
class="gmail-m_-8899424503205039134gmail-m_8457408054565706666moz-cite-prefix">On
                        6/19/19 3:30 PM, Fulcomer, Samuel wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr"><br>
                          <div>(...and yes, the name is inspired by a
                            certain OEM's software licensing schemes...)</div>
                          <div><br>
                          </div>
                          <div>At Brown we run a ~400 node cluster
                            containing nodes of multiple architectures
                            (Sandy/Ivy, Haswell/Broadwell, and
                            Sky/Cascade) purchased in some cases by
                            University funds and in others by
                            investigator funding (~50:50).  They all
                            appear in the default SLURM partition. We
                            have 3 classes of SLURM users:</div>
                          <div><br>
                          </div>
                          <div>
                            <ol>
                              <li>Exploratory - no-charge access to up
                                to 16 cores</li>
                              <li>Priority - $750/quarter for access to
                                up to 192 cores (and with a
                                GrpTRESRunMins=cpu limit). Each user has
                                their own QoS</li>
                              <li>Condo - an investigator group who paid
                                for nodes added to the cluster. The
                                group has its own QoS and SLURM Account.
                                The QoS allows use of the number of
                                cores purchased and has a much higher
                                priority than the QoS' of the "priority"
                                users.</li>
                            </ol>
                            <div>The first problem with this scheme is
                              that condo users who have purchased the
                              older hardware now have access to the
                              newest without penalty. In addition, we're
                              encountering resistance to the idea of
                              turning off their hardware and terminating
                              their condos (despite MOUs stating a 5yr
                              life). The pushback is the stated belief
                              that the hardware should run until it
                              dies.</div>
                          </div>
                          <div><br>
                          </div>
                          <div>What I propose is a new TRES called a
                            Processor Performance Unit (PPU) that would
                            be specified on the Node line in slurm.conf,
                            and used such that GrpTRES=ppu=N was
                            calculated as the number of allocated cores
                            multiplied by their associated PPU numbers.</div>
                          <div><br>
                          </div>
                          <div>We could then assign a base PPU to the
                            oldest hardware, say, "1" for Sandy/Ivy and
                            increase for later architectures based on
                            performance improvement. We'd set the condo
                            QoS to GrpTRES=ppu=N*X+M*Y,..., where N is
                            the number of cores of the oldest
                            architecture multiplied by the configured
                            PPU/core, X, and repeat for any newer
                            nodes/cores the investigator has purchased
                            since.</div>
                          <div><br>
                          </div>
                          <div>The result is that the investigator group
                            gets to run on an approximation of the
                            performance that they've purchased, rather
                            on the raw purchased core count.</div>
                          <div><br>
                          </div>
                          <div>Thoughts?</div>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>