[slurm-users] GrpTRESMins and GrpTRESRaw usage

gerard.gil at cines.fr gerard.gil at cines.fr
Fri Jun 24 06:56:36 UTC 2022


Hi Miguel, 

It sounds good ! 

But does it mean you have to request this "NoDecay" QOS to benefit the fairshare priority ? 

Does this also mean that if all the QOS we use are created with NoDecay, we can take advantage of the FairShare priority and NoDecay for all jobs to use the GrpTRESMins limit? 

Thanks 

Regards, 
Gérard 

> De: "Miguel Oliveira" <miguel.oliveira at uc.pt>
> À: "Slurm-users" <slurm-users at lists.schedmd.com>
> Cc: "slurm-users" <slurm-users at schedmd.com>
> Envoyé: Jeudi 23 Juin 2022 18:42:28
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

> Hi Gérard,

> It is not exactly true that you have no solution to limit projects. If you
> implement each project as an account then you can create an account qos with
> the NoDecay flags.
> This will not affect associations so priority and fair share are not impacted.

> The way we do it is to create a qos:

> sacctmgr -i --quiet create qos "{{ item.account }}" set
> flags=DenyOnLimit,NoDecay GrpTRESMin=cpu=600

> And then use this qos when the account (project) is created:

> sacctmgr -i --quiet add account "{{ item.account }}" Parent="{{ item.parent }}"
> QOS="{{ item.account }}" Fairshare=1 Description="{{ item.description }}”

> We even have a slurm bank implementation to play along with this technique and
> it has not failed us yet too much! :)

> Hope that helps,

> Miguel Afonso Oliveira

>> On 23 Jun 2022, at 14:57, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>> wrote:

>> Hi Ole and B/H,

>> Thanks for your answers.

>> You're right B/H, and as I tuned TRESBillingWeights option to only counts cpu,
>> in my case : nb of reserved core = "TRES billing cost"

>> You're right again I forgot the PriorityDecayHalfLife parameter which is also
>> used by fairshare Multifactor Priority.
>> We use multifactor priority to manage the priority of jobs in the queue, and we
>> set the values of PriorityDecayHalfLife and PriorityUsageResetPeriod according
>> to these needs.
>> So PriorityDecayHalfLife will decay GrpTRESRaw and GrpTRESMins can't be used as
>> we want.

>> Setting the NoDecay flag to a QOS could be an option but I suppose it also
>> impact fairshare Multifactor Priority of all jobs using this QOS .

>> This means I have no solution to limit a project as we want, unless schedMD
>> changes its behavior or adds a new feature.

>> Thanks a lot.

>> Regards,
>> Gérard
>> [ http://www.cines.fr/ ]

>>> De: "Bjørn-Helge Mevik" < [ mailto:b.h.mevik at usit.uio.no | b.h.mevik at usit.uio.no
>>> ] >
>>> À: [ mailto:slurm-users at schedmd.com | slurm-users at schedmd.com ]
>>> Envoyé: Jeudi 23 Juin 2022 12:39:27
>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>> Ole Holm Nielsen < [ mailto:Ole.H.Nielsen at fysik.dtu.dk |
>>> Ole.H.Nielsen at fysik.dtu.dk ] > writes:

>>>> Hi Bjørn-Helge,
>>> Hello, Ole! :)

>>>> On 6/23/22 09:18, Bjørn-Helge Mevik wrote:

>>>>> Slurm the same internal variables are used for fairshare calculations as
>>>>> for GrpTRESMins (and similar), so when fair share priorities are in use,
>>>>> slurm will reduce accumulated GrpTRESMins over time. This means that it
>>>>> is impossible(*) to use GrpTRESMins limits and fairshare
>>>>> priorities at the same time.
>>>> This is a surprising observation!
>>> I discovered it quite a few years ago, when we wanted to use Slurm to
>>> enforce cpu hour quota limits (instead of using Maui+Gold). Can't
>>> remember anymore if I was surprised or just sad. :D

>>>> We use a 14 days HalfLife in slurm.conf:
>>>> PriorityDecayHalfLife=14-0

>>>> Since our longest running jobs can run only 7 days, maybe our limits
>>>> never get reduced as you describe?
>>> The accumulated usage is reduced every 5 minutes (by default; see
>>> PriorityCalcPeriod). The reduction is done by multiplying the
>>> accumulated usage by a number slightly less than 1. The number is
>>> chosen so that the accumulated usage is reduced to 50 % after
>>> PriorityDecayHalfLife (given that you don't run anything more in
>>> between, of course). With a halflife of 14 days and the default calc
>>> period, that number is very close to 1 (0.9998281 if my calculations are
>>> correct :).

>>> Note: I read all about these details on the schedmd web pages some years
>>> ago. I cannot find them again (the parts about the multiplication with
>>> a number smaller than 1 to get the half life), so I might be wrong on
>>> some of the details.

>>>> BTW, I've written a handy script for displaying user limits in a
>>>> readable format:
>>>> [ https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits |
>>>> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits ]

>>> Nice!

>>> --
>>> B/H
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220624/139e7a49/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2065 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220624/139e7a49/attachment.bin>


More information about the slurm-users mailing list