[slurm-users] GrpTRESMins and GrpTRESRaw usage

Thu Jun 30 18:12:05 UTC 2022

Hi Miguel, 

I finally found the time to test the QOS NoDecay configuration vs GrpTRESMins account limit. 

Here is my benchmark : 

1) Initialize the benchmark configuration 
- reset all RawUsage (on QOS and account) 
- set a limit on Account GrpTRESMins 
- run several jobs with a controlled ellaps cpu time on a QOS. 
- reset account RawUsage 
- set a limit on Account GrpTRESMins under the QOS RawUsage 

Here is the inital state before running the benchmark 

toto at login1: ~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,GrpTRESMins,rawusage 
Account User GrpTRESRaw GrpTRESMins RawUsage 
-------------------- ---------- ----------------------------------------------------- ------------------------------ ----------- 
dci cpu=0 ,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0 cpu=4100 0 

Account RawUsage = 0 
GrpTRESMins cpu=4100 

toto at login1 :~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support 
QOS=support(8) UsageRaw=253632 .000000 GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(132.10) GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESMins=cpu=N(4227) ,mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh= MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0) MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0) MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)} User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0) MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)} 

QOS support RawUsage = 253632 s or 4227 mn 

QOS support RawUsage > GrpTRESMins SLURM should prevent to start a job for this account if it works as expected. 

2) Run the benchmark to control limit GrpTRESMins efficiency over QOS rawusage 

toto @login1:~/TEST$ sbatch TRESMIN.slurm 
Submitted batch job 3687 

toto at login1:~/TEST$ squeue 
JOBIDADMIN_COMMMIN_MEMOR SUBMIT_TIME PRIORITY PARTITION QOS USER STATE TIME_LIMIT TIME NODES REASON START_TIME 
3687 BDW28 60000M 2022-06-30T19:36:42 1100000 bdw28 support toto RUNNING 5:00 0:02 1 None 2022-06-30T19:36:42 

The job is running unless GrpTRESMins is under QOS support RawUsage . 

Is there anything wrong with my control process that invalidates the result ? 

Thanks 

Gérard 

[ http://www.cines.fr/ ] 

> De: "gerard gil" <gerard.gil at cines.fr>
> À: "Slurm-users" <slurm-users at lists.schedmd.com>
> Envoyé: Mercredi 29 Juin 2022 19:13:56
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

> Hi Miguel,

>>If I understood you correctly your goal was to limit the number of minutes each
>>project can run. By associating each project to a slurm account with a nodecay
> >QoS then you will have achieved your goal.

> Here is what I what to do :

> "All jobs submitted to an account regardless the QOS they use have to be
> constrained to a number of minutes set by the limit associated with that
> account (and not to QOS)."

> >Try a project with a very small limit and you will see that it won’t run

> I already tested GrpTRESmins limit and confirms it works as expected.
> Then I saw the decay effect on GrpTRESRaw (what I thought first as the right
> metric to look at) and try to find out a way to fix it.

> It's really very import for me to trust it, so I need a deterministic test to
> prove it.

> I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all
> RawUsage (Account and QOS) to be sure it works as I expect.
> I print the account GrpTRESRaw (in mn) at the end of my tests job to set a new
> limits with GrpTRESMins and see how it behaves.

> I'll get inform on the results. I hope it works.

> > You don’t have to add anything.
>>Each QoS will accumulate its respective usage, i.e, the usage of all users on
>>that account. Users can even be on different accounts (projects) and charge the
> >respective project with the parameter --account on sbatch.

> If SLURM does it for to manage limit I would also like to obtain the current
> RawUsage for an account.
> Do you know how to get it ?

> >The GrpTRESMins is always changed on the QoS with a command like:

> >sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

> That's right if you want to set a limit to a QOS.
> But I dont know/think the same limit value will also apply to all other QOS, and
> if I apply the same limit to all QOS.
> Is my account limit the sum of all the QOS limit ?

> Actualy I'm setting the limit to the Account using command:

> sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=...

> With this setting I saw the limit is set to the account and not to the QOS.
> sacctmgr show QOS command shows an empty field for GrpTRESMins on all QOS

> Thanks again form your help.
> I hope I'm close to get the answer to my issue.

> Best,
> Gérard
> [ http://www.cines.fr/ ]

>> De: "Miguel Oliveira" <miguel.oliveira at uc.pt>
>> À: "Slurm-users" <slurm-users at lists.schedmd.com>
>> Envoyé: Mercredi 29 Juin 2022 01:28:58
>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>> Hi Gérard,

>> If I understood you correctly your goal was to limit the number of minutes each
>> project can run. By associating each project to a slurm account with a nodecay
>> QoS then you will have achieved your goal.
>> Try a project with a very small limit and you will see that it won’t run.

>> You don’t have to add anything. Each QoS will accumulate its respective usage,
>> i.e, the usage of all users on that account. Users can even be on different
>> accounts (projects) and charge the respective project with the parameter
>> --account on sbatch.
>> The GrpTRESMins is always changed on the QoS with a command like:

>> sacctmgr update qos where qos=... set GrpTRESMin=cpu=….

>> Hope that makes sense!

>> Best,

>> MAO

>>> On 28 Jun 2022, at 18:30, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>> wrote:

>>> Hi Miguel,

>>> OK, I did'nt know this command.

>>> I'm not sure to understand how it works regarding to my goal.
>>> I use the following command inspired by the command you gave me and I obtain a
>>> UsageRaw for each QOS.

>>> scontrol -o show assoc_mgr -accounts=myaccount Users=" "

>>> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount with
>>> NoDecay ?
>>> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle to
>>> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is reach
>>> ?

>>> Thanks again for your precious help.

>>> Gérard
>>> [ http://www.cines.fr/ ]

>>>> De: "Miguel Oliveira" < [ mailto:miguel.oliveira at uc.pt | miguel.oliveira at uc.pt ]
>>>> >
>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>> slurm-users at lists.schedmd.com ] >
>>>> Envoyé: Mardi 28 Juin 2022 17:23:18
>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>> Hi Gérard,

>>>> The way you are checking is against the association and as such it ought to be
>>>> decreasing in order to be used by fair share appropriately.
>>>> The counter used that does not decrease is on the QoS, not the association. You
>>>> can check that with:

>>>> scontrol -o show assoc_mgr | grep "^QOS='+account+’ ”

>>>> That ought to give you two numbers. The first is the limit, or N for not limit,
>>>> and the second in parenthesis the usage.

>>>> Hope that helps.

>>>> Best,

>>>> Miguel Afonso Oliveira

>>>>> On 28 Jun 2022, at 08:58, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>>>> wrote:

>>>>> Hi Miguel,

>>>>> I modified my test configuration to evaluate the effect of NoDecay.

>>>>> I modified all QOS adding NoDecay Flag.

>>>>> toto at login1:~/TEST$ sacctmgr show QOS
>>>>> Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags UsageThres
>>>>> UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES
>>>>> MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA
>>>>> MaxJobsPA MaxSubmitPA MinTRES
>>>>> ---------- ---------- ---------- ---------- ------------------- -----------
>>>>> ---------------------------------------- ---------- ----------- -------------
>>>>> ------------- ------------- ------- --------- ----------- -------------
>>>>> -------------- ------------- ----------- ------------- --------- -----------
>>>>> ------------- --------- ----------- -------------
>>>>> normal 0 00:00:00 cluster NoDecay 1.000000
>>>>> interactif 10 00:00:00 cluster NoDecay 1.000000 node=50 node=22 1-00:00:00
>>>>> node=50
>>>>> petit 4 00:00:00 cluster NoDecay 1.000000 node=1500 node=22 1-00:00:00 node=300
>>>>> gros 6 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 node=700
>>>>> court 8 00:00:00 cluster NoDecay 1.000000 node=1100 node=100 02:00:00 node=300
>>>>> long 4 00:00:00 cluster NoDecay 1.000000 node=500 node=200 5-00:00:00 node=200
>>>>> special 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=2106 5-00:00:00
>>>>> node=2106
>>>>> support 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00
>>>>> node=2106
>>>>> visu 10 00:00:00 cluster NoDecay 1.000000 node=4 node=700 06:00:00 node=4

>>>>> I submitted a bunch of jobs to control the NoDecay efficiency and I noticed
>>>>> RawUsage as well as GrpTRESRaw cpu is still decreasing.

>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins ,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6932
>>>>> ,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415966
>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins , RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6931
>>>>> ,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415866
>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o
>>>>> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6929
>>>>> ,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415766

>>>>> Something I forgot to do ?

>>>>> Best,
>>>>> Gérard

>>>>> Cordialement,
>>>>> Gérard Gil

>>>>> Département Calcul Intensif
>>>>> Centre Informatique National de l'Enseignement Superieur
>>>>> 950, rue de Saint Priest
>>>>> 34097 Montpellier CEDEX 5
>>>>> FRANCE

>>>>> tel : (334) 67 14 14 14
>>>>> fax : (334) 67 52 37 63
>>>>> web : [ http://www.cines.fr/ | http://www.cines.fr ]

>>>>>> De: "Gérard Gil" < [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ] >
>>>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>>>> slurm-users at lists.schedmd.com ] >
>>>>>> Cc: "slurm-users" < [ mailto:slurm-users at schedmd.com | slurm-users at schedmd.com ]
>>>>>> >
>>>>>> Envoyé: Vendredi 24 Juin 2022 14:52:12
>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage

>>>>>> Hi Miguel,

>>>>>> Good !!

>>>>>> I'll try this options on all existing QOS and see if everything works as
>>>>>> expected.
>>>>>> I'll inform you on the results.

>>>>>> Thanks a lot

>>>>>> Best,
>>>>>> Gérard

>>>>>> ----- Mail original -----

>>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.oliveira at uc.pt | miguel.oliveira at uc.pt ]
>>>>>>> >
>>>>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>>>>> slurm-users at lists.schedmd.com ] >
>>>>>>> Cc: "slurm-users" < [ mailto:slurm-users at schedmd.com | slurm-users at schedmd.com ]
>>>>>>> >
>>>>>>> Envoyé: Vendredi 24 Juin 2022 14:07:16
>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>>>>> Hi Gérard,

>>>>>>> I believe so. All our accounts correspond to one project and all have an
>>>>>>> associated QoS with NoDecay and DenyOnLimit. This is enough to restrict usage
>>>>>>> on each individual project.
>>>>>>> You only need these flags on the QoS. The association will carry on as usual and
>>>>>>> fairshare will not be impacted.

>>>>>>> Hope that helps,

>>>>>>> Miguel Oliveira

>>>>>>>> On 24 Jun 2022, at 12:56, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>>>>>>> wrote:

>>>>>>>> Hi Miguel,

>>>>>>>>> Why not? You can have multiple QoSs and you have other techniques to change
>>>>>>>>> priorities according to your policies.
>>>>>>>> Is this answer my question ?

>>>>>>>> "If all configured QOS use NoDecay, we can take advantage of the FairShare
>>>>>>>> priority with Decay and all jobs GrpTRESRaw with NoDecay ?"

>>>>>>>> Thanks

>>>>>>>> Best,
>>>>>> > > Gérard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220630/fe2c89e6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2065 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220630/fe2c89e6/attachment-0001.bin>