<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>I don't know off hand.  You can sort of construct a similar

      system in Slurm, but I've never seen it as a native option.</p>

    <p>-Paul Edmon-<br>

    </p>

    <div class="moz-cite-prefix">On 6/20/19 10:32 AM, John Hearns wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAPqNE2U16u_mLF4sa=hNUFfaBch4kyhOb+=bznOxr63Lg=01Tw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div>Paul, you refer to banking resources. Which leads me to ask

          are schemes such as Gold used these days in Slurm?</div>

        <div>Gold was a utility where groups could top up with a virtual

          amount of money which would be spent as they consume

          resources.</div>

        <div>Altair also wrote a similar system for PBS, which they

          offered to us when I was in Formula 1 - it was quite a good

          system, and at the time</div>

        <div>we had a requirement for allocating resources to groups of

          users.</div>

        <div><br>

        </div>

        <div>I guess the sophisticated fairshare mechanisms discussed in

          this thread make schemes like Gold obsolete.</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div class="gmail_attr" dir="ltr">On Thu, 20 Jun 2019 at 15:24,

          Paul Edmon <<a href="mailto:pedmon@cfa.harvard.edu"

            moz-do-not-send="true">pedmon@cfa.harvard.edu</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

          <div bgcolor="#FFFFFF">

            <p>People will specify which partition they need or if they

              want multiple they use this:<br>

            </p>

            <p>#SBATCH -p general,shared,serial_requeue</p>

            <p>As then the scheduler will just select which partition

              they will run in first.  Naturally there is a risk that

              you will end up running in a more expensive partition.</p>

            <p>Our time limit is only applied to our public partitions,

              our owned partitions (of which we have roughly 80) have no

              time limit.  So if they run on their dedicated resources

              they have no penalty.  We've been working on getting rid

              of owned partitions and moving to a school/department

              based partition, where all the purchased resources for

              different PI's go into the same bucket where they compete

              against themselves and not the wider community.  We've

              found that this ends up working pretty well as most PI's

              only used their purchased resources sporadically.  Thus

              there are usually idle cores lying around that we backfill

              with our serial queues.  Since those are requeueable we

              can get immediate response to access that idle space.  We

              are also toying with a high priority partition that is

              open to people with high fairshare so that they can get

              immediate response as those with high fairshare tend to be

              bursty users.</p>

            <p>Our current halflife is set to a month and we keep 6

              months of data in our database.  I'd actually like to get

              rid of the halflife and just go to a 3 month moving window

              to allow people to bank their fairshare, but we haven't

              done that yet as people have been having a hard enough

              time understanding our current system.  It's not due to

              its complexity but more that most people just flat out

              aren't cognizant of their usage and think the resource is

              functionally infinite.</p>

            <p>-Paul Edmon-<br>

            </p>

            <div class="gmail-m_-8899424503205039134moz-cite-prefix">On

              6/19/19 5:16 PM, Fulcomer, Samuel wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">

                <div dir="ltr">Hi Paul,

                  <div><br>

                  </div>

                  <div>Thanks..Your setup is interesting. I see that you

                    have your processor types segregated in their own

                    partitions (with the exception of of the requeue

                    partition), and that's how you get at the weighting

                    mechanism. Do you have your users explicitly specify

                    multiple partitions in the batch commands/scripts in

                    order to take advantage of this, or do you use a

                    plugin for it?</div>

                  <div><br>

                  </div>

                  <div>It sounds like you don't impose any hard limit on

                    simultaneous resource use, and allow everything to

                    fairshare out with the help of the 7 day TimeLimit.

                    We haven't been imposing any TimeLimit on our condo

                    users, which would be an issue for us with your

                    config. For our exploratory and priority users, we

                    impose an effective time limit with

                    GrpTRESRunMins=cpu (and gres/gpu= for the GPU

                    usage). In addition, since we have so many priority

                    users, we don't explicitly set a rawshare value for

                    them (they all execute under the "default" account).

                    We set rawshare for the condo accounts as

                    cores-purchased/total-cores*1000. </div>

                  <div><br>

                  </div>

                  <div>What's your fairshare decay setting (don't

                    remember the proper name at the moment)?</div>

                  <div><br>

                  </div>

                  <div>Regards,</div>

                  <div>Sam</div>

                  <div><br>

                  </div>

                  <div><br>

                  </div>

                </div>

                <br>

                <div class="gmail_quote">

                  <div class="gmail_attr" dir="ltr">On Wed, Jun 19, 2019

                    at 3:44 PM Paul Edmon <<a

                      href="mailto:pedmon@cfa.harvard.edu"

                      target="_blank" moz-do-not-send="true">pedmon@cfa.harvard.edu</a>>

                    wrote:<br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px

0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">

                    <div bgcolor="#FFFFFF">

                      <p>We do a similar thing here at Harvard:</p>

                      <p><a

class="gmail-m_-8899424503205039134gmail-m_8457408054565706666moz-txt-link-freetext"

href="https://www.rc.fas.harvard.edu/fairshare/" target="_blank"

                          moz-do-not-send="true">https://www.rc.fas.harvard.edu/fairshare/</a></p>

                      <p>We simply weight all the partitions based on

                        their core type and then we allocate Shares for

                        each account based on what they have purchased. 

                        We don't use QoS at all, so we just rely purely

                        on fairshare weighting for resource usage.  It

                        has worked pretty well for our purposes.</p>

                      <p>-Paul Edmon-<br>

                      </p>

                      <div

class="gmail-m_-8899424503205039134gmail-m_8457408054565706666moz-cite-prefix">On

                        6/19/19 3:30 PM, Fulcomer, Samuel wrote:<br>

                      </div>

                      <blockquote type="cite">

                        <div dir="ltr"><br>

                          <div>(...and yes, the name is inspired by a

                            certain OEM's software licensing schemes...)</div>

                          <div><br>

                          </div>

                          <div>At Brown we run a ~400 node cluster

                            containing nodes of multiple architectures

                            (Sandy/Ivy, Haswell/Broadwell, and

                            Sky/Cascade) purchased in some cases by

                            University funds and in others by

                            investigator funding (~50:50).  They all

                            appear in the default SLURM partition. We

                            have 3 classes of SLURM users:</div>

                          <div><br>

                          </div>

                          <div>

                            <ol>

                              <li>Exploratory - no-charge access to up

                                to 16 cores</li>

                              <li>Priority - $750/quarter for access to

                                up to 192 cores (and with a

                                GrpTRESRunMins=cpu limit). Each user has

                                their own QoS</li>

                              <li>Condo - an investigator group who paid

                                for nodes added to the cluster. The

                                group has its own QoS and SLURM Account.

                                The QoS allows use of the number of

                                cores purchased and has a much higher

                                priority than the QoS' of the "priority"

                                users.</li>

                            </ol>

                            <div>The first problem with this scheme is

                              that condo users who have purchased the

                              older hardware now have access to the

                              newest without penalty. In addition, we're

                              encountering resistance to the idea of

                              turning off their hardware and terminating

                              their condos (despite MOUs stating a 5yr

                              life). The pushback is the stated belief

                              that the hardware should run until it

                              dies.</div>

                          </div>

                          <div><br>

                          </div>

                          <div>What I propose is a new TRES called a

                            Processor Performance Unit (PPU) that would

                            be specified on the Node line in slurm.conf,

                            and used such that GrpTRES=ppu=N was

                            calculated as the number of allocated cores

                            multiplied by their associated PPU numbers.</div>

                          <div><br>

                          </div>

                          <div>We could then assign a base PPU to the

                            oldest hardware, say, "1" for Sandy/Ivy and

                            increase for later architectures based on

                            performance improvement. We'd set the condo

                            QoS to GrpTRES=ppu=N*X+M*Y,..., where N is

                            the number of cores of the oldest

                            architecture multiplied by the configured

                            PPU/core, X, and repeat for any newer

                            nodes/cores the investigator has purchased

                            since.</div>

                          <div><br>

                          </div>

                          <div>The result is that the investigator group

                            gets to run on an approximation of the

                            performance that they've purchased, rather

                            on the raw purchased core count.</div>

                          <div><br>

                          </div>

                          <div>Thoughts?</div>

                          <div><br>

                          </div>

                          <div><br>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>