[slurm-users] tie a reservation to a QoS?

Bill Wichser bill at princeton.edu
Mon Oct 28 15:43:57 UTC 2019


One thing we changed years ago was to think about things differently. 
While researchers are in fact buying nodes for the cluster, it's rarely 
the case that they get any rights to "their" nodes.  Instead they are 
buying CPU time in an equivalent way but it gets averaged over 30 days.

We provide reports, explain how fairshare works and how their particular 
value was chosen, give them all the ways that we as sysadmins look at 
the data, and for the most part have accepted that buy-in means CPU 
hours/month rather than a certain piece of hardware.

Now that journey was not easy.  And there is a discussion with every new 
researcher who wants a piece of the cluster.  But it works for us.  But 
there is a big caveat.  There needs to be at least a 10% share of the 
cluster which is owned by a public portion.

This gives some leeway into scheduling and allows others who have not 
contributed to have access to the same resources albeit at a much lower 
priority.  And that 10% for us has become more like 25% or even more as 
we have a large and ever growing base of users who have not contributed.

This different way of thinking has made dedicated partitions and QOSes 
something we have not had to deal with as CPU time per 30 day sliding 
window has been accepted, can be quantitatively shown, and just is a 
much easier way to schedule when ALL resources can be used.

Bill

On 10/28/19 11:11 AM, Tina Friedrich wrote:
> Hello,
> 
> is there a possibility to tie a reservation to a QoS (instead of an
> account or user), or enforce a QoS for jobs submitted into a reservation?
> 
> The problem I'm trying to solve is - some of our resources are bought on
> a co-investment basis. As part of that, the 'owning' group can get very
> high scheduling priority (via a QoS) on an equivalent amount of
> resource. Additionally, they have a number of reservations for 'their'
> nodes they can request per year. However, that lends itself to gaming
> the system - they can now submit jobs into the reservation with 'normal'
> priority, and then run jobs on the rest of the cluster using the higher
> priority - really not the plan.
> 
> Basically, I need a way to ensure that - even when a reservation is in
> place - those groups 'use up' their priority resources first & then all
> other jobs they submit are run with 'lower' priority.
> 
> I'm currently dealing with it by modifying the QoS every time a
> reservation is created. But that isn't really sustainable on an ongoing
> basis - this isn't a one-off for one group, it's part of our operations
> model, and there's a (growing) number of them.
> 
> One (easy) way I can see is if I had a way to ensure you can not use a
> reservation without using the resp. priority QoS - however, from my
> reading of the docs there's no way to do that. (As only the one account
> has access to the QoS, being able to tie a reservation to a QoS would
> sort of solve my problem :) ).
> 
> Any ideas? The only thing I can come up involves a lot of scripting, and
> it would certainly not be a bit error prone (and not the most flexible).
> 
> Tina
> 



More information about the slurm-users mailing list