[slurm-users] Proposal for new TRES - "Processor Performance Units"....

Thu Jun 20 06:58:50 UTC 2019

Janne, thankyou. That FGCI benchmark in a container is pretty smart.
I always say that real application benchmarks beat synthetic benchmarks.
Taking a small mix of applications like that and taking a geometric mean is
great.

Note:  *"a reference result run on a Dell PowerEdge C4130"*
In the old days CERN had a standard unit of compute, which was equivalent
to a VAX.
I am sure that unit has long been retired.
Though I must say that having participated in CERN tenders a few years ago
they use
SpecFP measurements to compare systems.

On Thu, 20 Jun 2019 at 07:41, Janne Blomqvist <janne.blomqvist at aalto.fi>
wrote:

> On 19/06/2019 22.30, Fulcomer, Samuel wrote:
> >
> > (...and yes, the name is inspired by a certain OEM's software licensing
> > schemes...)
> >
> > At Brown we run a ~400 node cluster containing nodes of multiple
> > architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased
> > in some cases by University funds and in others by investigator funding
> > (~50:50).  They all appear in the default SLURM partition. We have 3
> > classes of SLURM users:
> >
> >  1. Exploratory - no-charge access to up to 16 cores
> >  2. Priority - $750/quarter for access to up to 192 cores (and with a
> >     GrpTRESRunMins=cpu limit). Each user has their own QoS
> >  3. Condo - an investigator group who paid for nodes added to the
> >     cluster. The group has its own QoS and SLURM Account. The QoS allows
> >     use of the number of cores purchased and has a much higher priority
> >     than the QoS' of the "priority" users.
> >
> > The first problem with this scheme is that condo users who have
> > purchased the older hardware now have access to the newest without
> > penalty. In addition, we're encountering resistance to the idea of
> > turning off their hardware and terminating their condos (despite MOUs
> > stating a 5yr life). The pushback is the stated belief that the hardware
> > should run until it dies.
> >
> > What I propose is a new TRES called a Processor Performance Unit (PPU)
> > that would be specified on the Node line in slurm.conf, and used such
> > that GrpTRES=ppu=N was calculated as the number of allocated cores
> > multiplied by their associated PPU numbers.
> >
> > We could then assign a base PPU to the oldest hardware, say, "1" for
> > Sandy/Ivy and increase for later architectures based on performance
> > improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N
> > is the number of cores of the oldest architecture multiplied by the
> > configured PPU/core, X, and repeat for any newer nodes/cores the
> > investigator has purchased since.
> >
> > The result is that the investigator group gets to run on an
> > approximation of the performance that they've purchased, rather on the
> > raw purchased core count.
> >
> > Thoughts?
> >
> >
>
> What we do is that our nodes are grouped into separate partitions based
> on the CPU model. E.g. the partition "batch-skl" is where our Skylake
> (6148) nodes are. The we have a job_submit.lua script which sends jobs
> without an explicit partition spec to all batch-xxx partitions (checking
> constraints etc. along the way). Then for each partition we set
> TRESBillingWeights= to "normalize" the fairshare consumption based on
> the geometric mean of a set of hopefully not too unrepresentative
> single-node benchmarks [1].
>
> We also set a memory billing weight, and have MAX_TRES among our
> PriorityFlags, approximating dominant resource fairness (DRF) [2]
>
> [1] https://github.com/AaltoScienceIT/docker-fgci-benchmark
>
> [2] https://people.eecs.berkeley.edu/~alig/papers/drf.pdf
>
> --
> Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
> Aalto University School of Science, PHYS & NBE
> +358503841576 || janne.blomqvist at aalto.fi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190620/bd0d1ba8/attachment-0001.html>