<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>In this case, I would run LINPACK on each generation of node

      (either the full node or just one core), and then somehow

      normalize performance. I  would recommend using the performance of

      a single core of the slowest node as your basis for normalization

      so it has a multiplier of 1, and then the newer systems would have

      a multiplier greater than 1. Then you can take that multiplier and

      multiply it by the number of cores in your different systems to

      get a final multiplier for a while node, if needed. <br>

    </p>

    <pre class="moz-signature" cols="72">Prentice </pre>

    <div class="moz-cite-prefix">On 6/19/19 3:30 PM, Fulcomer, Samuel

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAOORAuFBURpBz-rSiMpKyw8uh+=Wsbi7nsMKoZYp25z9hO0YwQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr"><br>

        <div>(...and yes, the name is inspired by a certain OEM's

          software licensing schemes...)</div>

        <div><br>

        </div>

        <div>At Brown we run a ~400 node cluster containing nodes of

          multiple architectures (Sandy/Ivy, Haswell/Broadwell, and

          Sky/Cascade) purchased in some cases by University funds and

          in others by investigator funding (~50:50).  They all appear

          in the default SLURM partition. We have 3 classes of SLURM

          users:</div>

        <div><br>

        </div>

        <div>

          <ol>

            <li>Exploratory - no-charge access to up to 16 cores</li>

            <li>Priority - $750/quarter for access to up to 192 cores

              (and with a GrpTRESRunMins=cpu limit). Each user has their

              own QoS</li>

            <li>Condo - an investigator group who paid for nodes added

              to the cluster. The group has its own QoS and SLURM

              Account. The QoS allows use of the number of cores

              purchased and has a much higher priority than the QoS' of

              the "priority" users.</li>

          </ol>

          <div>The first problem with this scheme is that condo users

            who have purchased the older hardware now have access to the

            newest without penalty. In addition, we're encountering

            resistance to the idea of turning off their hardware and

            terminating their condos (despite MOUs stating a 5yr life).

            The pushback is the stated belief that the hardware should

            run until it dies.</div>

        </div>

        <div><br>

        </div>

        <div>What I propose is a new TRES called a Processor Performance

          Unit (PPU) that would be specified on the Node line in

          slurm.conf, and used such that GrpTRES=ppu=N was calculated as

          the number of allocated cores multiplied by their associated

          PPU numbers.</div>

        <div><br>

        </div>

        <div>We could then assign a base PPU to the oldest hardware,

          say, "1" for Sandy/Ivy and increase for later architectures

          based on performance improvement. We'd set the condo QoS to

          GrpTRES=ppu=N*X+M*Y,..., where N is the number of cores of the

          oldest architecture multiplied by the configured PPU/core, X,

          and repeat for any newer nodes/cores the investigator has

          purchased since.</div>

        <div><br>

        </div>

        <div>The result is that the investigator group gets to run on an

          approximation of the performance that they've purchased,

          rather on the raw purchased core count.</div>

        <div><br>

        </div>

        <div>Thoughts?</div>

        <div><br>

        </div>

        <div><br>

        </div>

      </div>

    </blockquote>

  </body>

</html>