[slurm-users] Proposal for new TRES - "Processor Performance Units"....
Alex Chekholko
alex at calicolabs.com
Wed Jun 19 19:41:09 UTC 2019
Hey Samuel,
Can't you just adjust the existing "cpu" limit numbers using those same
multipliers? Someone bought 100 CPUs 5 years ago, now that's ~70 CPUs.
Or vice versa, someone buys 100 CPUs today, they get a setting of 130 CPUs
because the CPUs are normalized to the old performance. Since it would
probably look bad politically to reduce someone's number, but giving a new
customer a larger number should be fine.
Regards,
Alex
On Wed, Jun 19, 2019 at 12:32 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
wrote:
>
> (...and yes, the name is inspired by a certain OEM's software licensing
> schemes...)
>
> At Brown we run a ~400 node cluster containing nodes of multiple
> architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in
> some cases by University funds and in others by investigator funding
> (~50:50). They all appear in the default SLURM partition. We have 3
> classes of SLURM users:
>
>
> 1. Exploratory - no-charge access to up to 16 cores
> 2. Priority - $750/quarter for access to up to 192 cores (and with a
> GrpTRESRunMins=cpu limit). Each user has their own QoS
> 3. Condo - an investigator group who paid for nodes added to the
> cluster. The group has its own QoS and SLURM Account. The QoS allows use of
> the number of cores purchased and has a much higher priority than the QoS'
> of the "priority" users.
>
> The first problem with this scheme is that condo users who have purchased
> the older hardware now have access to the newest without penalty. In
> addition, we're encountering resistance to the idea of turning off their
> hardware and terminating their condos (despite MOUs stating a 5yr life).
> The pushback is the stated belief that the hardware should run until it
> dies.
>
> What I propose is a new TRES called a Processor Performance Unit (PPU)
> that would be specified on the Node line in slurm.conf, and used such that
> GrpTRES=ppu=N was calculated as the number of allocated cores multiplied by
> their associated PPU numbers.
>
> We could then assign a base PPU to the oldest hardware, say, "1" for
> Sandy/Ivy and increase for later architectures based on performance
> improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N is
> the number of cores of the oldest architecture multiplied by the configured
> PPU/core, X, and repeat for any newer nodes/cores the investigator has
> purchased since.
>
> The result is that the investigator group gets to run on an approximation
> of the performance that they've purchased, rather on the raw purchased core
> count.
>
> Thoughts?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190619/97107314/attachment-0001.html>
More information about the slurm-users
mailing list