[slurm-users] Proposal for new TRES - "Processor Performance Units"....

Alex Chekholko alex at calicolabs.com
Wed Jun 19 19:41:09 UTC 2019

Hey Samuel,

Can't you just adjust the existing "cpu" limit numbers using those same
multipliers?  Someone bought 100 CPUs 5 years ago, now that's ~70 CPUs.

Or vice versa, someone buys 100 CPUs today, they get a setting of 130 CPUs
because the CPUs are normalized to the old performance.  Since it would
probably look bad politically to reduce someone's number, but giving a new
customer a larger number should be fine.


On Wed, Jun 19, 2019 at 12:32 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>

> (...and yes, the name is inspired by a certain OEM's software licensing
> schemes...)
> At Brown we run a ~400 node cluster containing nodes of multiple
> architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in
> some cases by University funds and in others by investigator funding
> (~50:50).  They all appear in the default SLURM partition. We have 3
> classes of SLURM users:
>    1. Exploratory - no-charge access to up to 16 cores
>    2. Priority - $750/quarter for access to up to 192 cores (and with a
>    GrpTRESRunMins=cpu limit). Each user has their own QoS
>    3. Condo - an investigator group who paid for nodes added to the
>    cluster. The group has its own QoS and SLURM Account. The QoS allows use of
>    the number of cores purchased and has a much higher priority than the QoS'
>    of the "priority" users.
> The first problem with this scheme is that condo users who have purchased
> the older hardware now have access to the newest without penalty. In
> addition, we're encountering resistance to the idea of turning off their
> hardware and terminating their condos (despite MOUs stating a 5yr life).
> The pushback is the stated belief that the hardware should run until it
> dies.
> What I propose is a new TRES called a Processor Performance Unit (PPU)
> that would be specified on the Node line in slurm.conf, and used such that
> GrpTRES=ppu=N was calculated as the number of allocated cores multiplied by
> their associated PPU numbers.
> We could then assign a base PPU to the oldest hardware, say, "1" for
> Sandy/Ivy and increase for later architectures based on performance
> improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N is
> the number of cores of the oldest architecture multiplied by the configured
> PPU/core, X, and repeat for any newer nodes/cores the investigator has
> purchased since.
> The result is that the investigator group gets to run on an approximation
> of the performance that they've purchased, rather on the raw purchased core
> count.
> Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190619/97107314/attachment-0001.html>

More information about the slurm-users mailing list