<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>We do a similar thing here at Harvard:</p>
<p><a class="moz-txt-link-freetext" href="https://www.rc.fas.harvard.edu/fairshare/">https://www.rc.fas.harvard.edu/fairshare/</a></p>
<p>We simply weight all the partitions based on their core type and
then we allocate Shares for each account based on what they have
purchased. We don't use QoS at all, so we just rely purely on
fairshare weighting for resource usage. It has worked pretty well
for our purposes.</p>
<p>-Paul Edmon-<br>
</p>
<div class="moz-cite-prefix">On 6/19/19 3:30 PM, Fulcomer, Samuel
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAOORAuFBURpBz-rSiMpKyw8uh+=Wsbi7nsMKoZYp25z9hO0YwQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr"><br>
<div>(...and yes, the name is inspired by a certain OEM's
software licensing schemes...)</div>
<div><br>
</div>
<div>At Brown we run a ~400 node cluster containing nodes of
multiple architectures (Sandy/Ivy, Haswell/Broadwell, and
Sky/Cascade) purchased in some cases by University funds and
in others by investigator funding (~50:50). They all appear
in the default SLURM partition. We have 3 classes of SLURM
users:</div>
<div><br>
</div>
<div>
<ol>
<li>Exploratory - no-charge access to up to 16 cores</li>
<li>Priority - $750/quarter for access to up to 192 cores
(and with a GrpTRESRunMins=cpu limit). Each user has their
own QoS</li>
<li>Condo - an investigator group who paid for nodes added
to the cluster. The group has its own QoS and SLURM
Account. The QoS allows use of the number of cores
purchased and has a much higher priority than the QoS' of
the "priority" users.</li>
</ol>
<div>The first problem with this scheme is that condo users
who have purchased the older hardware now have access to the
newest without penalty. In addition, we're encountering
resistance to the idea of turning off their hardware and
terminating their condos (despite MOUs stating a 5yr life).
The pushback is the stated belief that the hardware should
run until it dies.</div>
</div>
<div><br>
</div>
<div>What I propose is a new TRES called a Processor Performance
Unit (PPU) that would be specified on the Node line in
slurm.conf, and used such that GrpTRES=ppu=N was calculated as
the number of allocated cores multiplied by their associated
PPU numbers.</div>
<div><br>
</div>
<div>We could then assign a base PPU to the oldest hardware,
say, "1" for Sandy/Ivy and increase for later architectures
based on performance improvement. We'd set the condo QoS to
GrpTRES=ppu=N*X+M*Y,..., where N is the number of cores of the
oldest architecture multiplied by the configured PPU/core, X,
and repeat for any newer nodes/cores the investigator has
purchased since.</div>
<div><br>
</div>
<div>The result is that the investigator group gets to run on an
approximation of the performance that they've purchased,
rather on the raw purchased core count.</div>
<div><br>
</div>
<div>Thoughts?</div>
<div><br>
</div>
<div><br>
</div>
</div>
</blockquote>
</body>
</html>