[slurm-users] Proposal for new TRES - "Processor Performance Units"....

Wed Jun 19 21:16:15 UTC 2019

Hi Paul,

Thanks..Your setup is interesting. I see that you have your processor types
segregated in their own partitions (with the exception of of the requeue
partition), and that's how you get at the weighting mechanism. Do you have
your users explicitly specify multiple partitions in the batch
commands/scripts in order to take advantage of this, or do you use a plugin
for it?

It sounds like you don't impose any hard limit on simultaneous resource
use, and allow everything to fairshare out with the help of the 7 day
TimeLimit. We haven't been imposing any TimeLimit on our condo users, which
would be an issue for us with your config. For our exploratory and priority
users, we impose an effective time limit with GrpTRESRunMins=cpu (and
gres/gpu= for the GPU usage). In addition, since we have so many priority
users, we don't explicitly set a rawshare value for them (they all execute
under the "default" account). We set rawshare for the condo accounts as
cores-purchased/total-cores*1000.

What's your fairshare decay setting (don't remember the proper name at the
moment)?

Regards,
Sam

On Wed, Jun 19, 2019 at 3:44 PM Paul Edmon <pedmon at cfa.harvard.edu> wrote:

> We do a similar thing here at Harvard:
>
> https://www.rc.fas.harvard.edu/fairshare/
>
> We simply weight all the partitions based on their core type and then we
> allocate Shares for each account based on what they have purchased.  We
> don't use QoS at all, so we just rely purely on fairshare weighting for
> resource usage.  It has worked pretty well for our purposes.
>
> -Paul Edmon-
> On 6/19/19 3:30 PM, Fulcomer, Samuel wrote:
>
>
> (...and yes, the name is inspired by a certain OEM's software licensing
> schemes...)
>
> At Brown we run a ~400 node cluster containing nodes of multiple
> architectures (Sandy/Ivy, Haswell/Broadwell, and Sky/Cascade) purchased in
> some cases by University funds and in others by investigator funding
> (~50:50).  They all appear in the default SLURM partition. We have 3
> classes of SLURM users:
>
>
>    1. Exploratory - no-charge access to up to 16 cores
>    2. Priority - $750/quarter for access to up to 192 cores (and with a
>    GrpTRESRunMins=cpu limit). Each user has their own QoS
>    3. Condo - an investigator group who paid for nodes added to the
>    cluster. The group has its own QoS and SLURM Account. The QoS allows use of
>    the number of cores purchased and has a much higher priority than the QoS'
>    of the "priority" users.
>
> The first problem with this scheme is that condo users who have purchased
> the older hardware now have access to the newest without penalty. In
> addition, we're encountering resistance to the idea of turning off their
> hardware and terminating their condos (despite MOUs stating a 5yr life).
> The pushback is the stated belief that the hardware should run until it
> dies.
>
> What I propose is a new TRES called a Processor Performance Unit (PPU)
> that would be specified on the Node line in slurm.conf, and used such that
> GrpTRES=ppu=N was calculated as the number of allocated cores multiplied by
> their associated PPU numbers.
>
> We could then assign a base PPU to the oldest hardware, say, "1" for
> Sandy/Ivy and increase for later architectures based on performance
> improvement. We'd set the condo QoS to GrpTRES=ppu=N*X+M*Y,..., where N is
> the number of cores of the oldest architecture multiplied by the configured
> PPU/core, X, and repeat for any newer nodes/cores the investigator has
> purchased since.
>
> The result is that the investigator group gets to run on an approximation
> of the performance that they've purchased, rather on the raw purchased core
> count.
>
> Thoughts?
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190619/0cada195/attachment-0001.html>