[slurm-users] GrpTRESMins and GrpTRESRaw usage
gerard.gil at cines.fr
gerard.gil at cines.fr
Thu Jun 30 18:12:05 UTC 2022
Hi Miguel,
I finally found the time to test the QOS NoDecay configuration vs GrpTRESMins account limit.
Here is my benchmark :
1) Initialize the benchmark configuration
- reset all RawUsage (on QOS and account)
- set a limit on Account GrpTRESMins
- run several jobs with a controlled ellaps cpu time on a QOS.
- reset account RawUsage
- set a limit on Account GrpTRESMins under the QOS RawUsage
Here is the inital state before running the benchmark
toto at login1: ~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,GrpTRESMins,rawusage
Account User GrpTRESRaw GrpTRESMins RawUsage
-------------------- ---------- ----------------------------------------------------- ------------------------------ -----------
dci cpu=0 ,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0 cpu=4100 0
Account RawUsage = 0
GrpTRESMins cpu=4100
toto at login1 :~/TEST$ scontrol -o show assoc_mgr | grep "^QOS" | grep support
QOS=support(8) UsageRaw=253632 .000000 GrpJobs=N(0) GrpJobsAccrue=N(0) GrpSubmitJobs=N(0) GrpWall=N(132.10) GrpTRES=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESMins=cpu=N(4227) ,mem=N(7926000),energy=N(0),node=N(132),billing=N(4227),fs/disk=N(0),vmem=N(0),pages=N(0) GrpTRESRunMins=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0) MaxWallPJ=1440 MaxTRESPJ=node=700 MaxTRESPN= MaxTRESMinsPJ= MinPrioThresh= MinTRESPJ= PreemptMode=OFF Priority=10 Account Limits= dci={MaxJobsPA=N(0) MaxJobsAccruePA=N(0) MaxSubmitJobsPA=N(0) MaxTRESPA=cpu=N(0),mem=N(0),energy=N(0),node=N(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)} User Limits= 1145={MaxJobsPU=N(0) MaxJobsAccruePU=N(0) MaxSubmitJobsPU=N(0) MaxTRESPU=cpu=N(0),mem=N(0),energy=N(0),node=2106(0),billing=N(0),fs/disk=N(0),vmem=N(0),pages=N(0)}
QOS support RawUsage = 253632 s or 4227 mn
QOS support RawUsage > GrpTRESMins SLURM should prevent to start a job for this account if it works as expected.
2) Run the benchmark to control limit GrpTRESMins efficiency over QOS rawusage
toto @login1:~/TEST$ sbatch TRESMIN.slurm
Submitted batch job 3687
toto at login1:~/TEST$ squeue
JOBIDADMIN_COMMMIN_MEMOR SUBMIT_TIME PRIORITY PARTITION QOS USER STATE TIME_LIMIT TIME NODES REASON START_TIME
3687 BDW28 60000M 2022-06-30T19:36:42 1100000 bdw28 support toto RUNNING 5:00 0:02 1 None 2022-06-30T19:36:42
The job is running unless GrpTRESMins is under QOS support RawUsage .
Is there anything wrong with my control process that invalidates the result ?
Thanks
Gérard
[ http://www.cines.fr/ ]
> De: "gerard gil" <gerard.gil at cines.fr>
> À: "Slurm-users" <slurm-users at lists.schedmd.com>
> Envoyé: Mercredi 29 Juin 2022 19:13:56
> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
> Hi Miguel,
>>If I understood you correctly your goal was to limit the number of minutes each
>>project can run. By associating each project to a slurm account with a nodecay
> >QoS then you will have achieved your goal.
> Here is what I what to do :
> "All jobs submitted to an account regardless the QOS they use have to be
> constrained to a number of minutes set by the limit associated with that
> account (and not to QOS)."
> >Try a project with a very small limit and you will see that it won’t run
> I already tested GrpTRESmins limit and confirms it works as expected.
> Then I saw the decay effect on GrpTRESRaw (what I thought first as the right
> metric to look at) and try to find out a way to fix it.
> It's really very import for me to trust it, so I need a deterministic test to
> prove it.
> I'm testing this GrpTRESMins limit with NoDecay set on QOS resetting all
> RawUsage (Account and QOS) to be sure it works as I expect.
> I print the account GrpTRESRaw (in mn) at the end of my tests job to set a new
> limits with GrpTRESMins and see how it behaves.
> I'll get inform on the results. I hope it works.
> > You don’t have to add anything.
>>Each QoS will accumulate its respective usage, i.e, the usage of all users on
>>that account. Users can even be on different accounts (projects) and charge the
> >respective project with the parameter --account on sbatch.
> If SLURM does it for to manage limit I would also like to obtain the current
> RawUsage for an account.
> Do you know how to get it ?
> >The GrpTRESMins is always changed on the QoS with a command like:
> >sacctmgr update qos where qos=... set GrpTRESMin=cpu=….
> That's right if you want to set a limit to a QOS.
> But I dont know/think the same limit value will also apply to all other QOS, and
> if I apply the same limit to all QOS.
> Is my account limit the sum of all the QOS limit ?
> Actualy I'm setting the limit to the Account using command:
> sacctmgr modify account myaccount set grptresmins=cpu=60000 qos=...
> With this setting I saw the limit is set to the account and not to the QOS.
> sacctmgr show QOS command shows an empty field for GrpTRESMins on all QOS
> Thanks again form your help.
> I hope I'm close to get the answer to my issue.
> Best,
> Gérard
> [ http://www.cines.fr/ ]
>> De: "Miguel Oliveira" <miguel.oliveira at uc.pt>
>> À: "Slurm-users" <slurm-users at lists.schedmd.com>
>> Envoyé: Mercredi 29 Juin 2022 01:28:58
>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>> Hi Gérard,
>> If I understood you correctly your goal was to limit the number of minutes each
>> project can run. By associating each project to a slurm account with a nodecay
>> QoS then you will have achieved your goal.
>> Try a project with a very small limit and you will see that it won’t run.
>> You don’t have to add anything. Each QoS will accumulate its respective usage,
>> i.e, the usage of all users on that account. Users can even be on different
>> accounts (projects) and charge the respective project with the parameter
>> --account on sbatch.
>> The GrpTRESMins is always changed on the QoS with a command like:
>> sacctmgr update qos where qos=... set GrpTRESMin=cpu=….
>> Hope that makes sense!
>> Best,
>> MAO
>>> On 28 Jun 2022, at 18:30, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>> wrote:
>>> Hi Miguel,
>>> OK, I did'nt know this command.
>>> I'm not sure to understand how it works regarding to my goal.
>>> I use the following command inspired by the command you gave me and I obtain a
>>> UsageRaw for each QOS.
>>> scontrol -o show assoc_mgr -accounts=myaccount Users=" "
>>> Do I have to sumup all QOS RawUsage to obtain the RawUsage of myaccount with
>>> NoDecay ?
>>> If I set GrpTRESMins for an Account and not for a QOS, does SLURM handle to
>>> sumpup these QOS RawUsage to control if the GrpTRESMins account limit is reach
>>> ?
>>> Thanks again for your precious help.
>>> Gérard
>>> [ http://www.cines.fr/ ]
>>>> De: "Miguel Oliveira" < [ mailto:miguel.oliveira at uc.pt | miguel.oliveira at uc.pt ]
>>>> >
>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>> slurm-users at lists.schedmd.com ] >
>>>> Envoyé: Mardi 28 Juin 2022 17:23:18
>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>> Hi Gérard,
>>>> The way you are checking is against the association and as such it ought to be
>>>> decreasing in order to be used by fair share appropriately.
>>>> The counter used that does not decrease is on the QoS, not the association. You
>>>> can check that with:
>>>> scontrol -o show assoc_mgr | grep "^QOS='+account+’ ”
>>>> That ought to give you two numbers. The first is the limit, or N for not limit,
>>>> and the second in parenthesis the usage.
>>>> Hope that helps.
>>>> Best,
>>>> Miguel Afonso Oliveira
>>>>> On 28 Jun 2022, at 08:58, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>>>> wrote:
>>>>> Hi Miguel,
>>>>> I modified my test configuration to evaluate the effect of NoDecay.
>>>>> I modified all QOS adding NoDecay Flag.
>>>>> toto at login1:~/TEST$ sacctmgr show QOS
>>>>> Name Priority GraceTime Preempt PreemptExemptTime PreemptMode Flags UsageThres
>>>>> UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES
>>>>> MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA
>>>>> MaxJobsPA MaxSubmitPA MinTRES
>>>>> ---------- ---------- ---------- ---------- ------------------- -----------
>>>>> ---------------------------------------- ---------- ----------- -------------
>>>>> ------------- ------------- ------- --------- ----------- -------------
>>>>> -------------- ------------- ----------- ------------- --------- -----------
>>>>> ------------- --------- ----------- -------------
>>>>> normal 0 00:00:00 cluster NoDecay 1.000000
>>>>> interactif 10 00:00:00 cluster NoDecay 1.000000 node=50 node=22 1-00:00:00
>>>>> node=50
>>>>> petit 4 00:00:00 cluster NoDecay 1.000000 node=1500 node=22 1-00:00:00 node=300
>>>>> gros 6 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00 node=700
>>>>> court 8 00:00:00 cluster NoDecay 1.000000 node=1100 node=100 02:00:00 node=300
>>>>> long 4 00:00:00 cluster NoDecay 1.000000 node=500 node=200 5-00:00:00 node=200
>>>>> special 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=2106 5-00:00:00
>>>>> node=2106
>>>>> support 10 00:00:00 cluster NoDecay 1.000000 node=2106 node=700 1-00:00:00
>>>>> node=2106
>>>>> visu 10 00:00:00 cluster NoDecay 1.000000 node=4 node=700 06:00:00 node=4
>>>>> I submitted a bunch of jobs to control the NoDecay efficiency and I noticed
>>>>> RawUsage as well as GrpTRESRaw cpu is still decreasing.
>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins ,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6932
>>>>> ,mem=12998963,energy=0,node=216,billing=6932,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415966
>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o account,user,GrpTRESRaw%80,
>>>>> GrpTRESMins , RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6931
>>>>> ,mem=12995835,energy=0,node=216,billing=6931,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415866
>>>>> toto at login1:~/TEST$ sshare -A dci -u " " -o
>>>>> account,user,GrpTRESRaw%80,GrpTRESMins,RawUsage
>>>>> Account User GrpTRESRaw GrpTRESMins RawUsage
>>>>> -------------------- ----------
>>>>> -----------------------------------------------------
>>>>> ------------------------------ -----------
>>>>> dci cpu=6929
>>>>> ,mem=12992708,energy=0,node=216,billing=6929,fs/disk=0,vmem=0,pages=0 cpu=17150
>>>>> 415766
>>>>> Something I forgot to do ?
>>>>> Best,
>>>>> Gérard
>>>>> Cordialement,
>>>>> Gérard Gil
>>>>> Département Calcul Intensif
>>>>> Centre Informatique National de l'Enseignement Superieur
>>>>> 950, rue de Saint Priest
>>>>> 34097 Montpellier CEDEX 5
>>>>> FRANCE
>>>>> tel : (334) 67 14 14 14
>>>>> fax : (334) 67 52 37 63
>>>>> web : [ http://www.cines.fr/ | http://www.cines.fr ]
>>>>>> De: "Gérard Gil" < [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ] >
>>>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>>>> slurm-users at lists.schedmd.com ] >
>>>>>> Cc: "slurm-users" < [ mailto:slurm-users at schedmd.com | slurm-users at schedmd.com ]
>>>>>> >
>>>>>> Envoyé: Vendredi 24 Juin 2022 14:52:12
>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>>>> Hi Miguel,
>>>>>> Good !!
>>>>>> I'll try this options on all existing QOS and see if everything works as
>>>>>> expected.
>>>>>> I'll inform you on the results.
>>>>>> Thanks a lot
>>>>>> Best,
>>>>>> Gérard
>>>>>> ----- Mail original -----
>>>>>>> De: "Miguel Oliveira" < [ mailto:miguel.oliveira at uc.pt | miguel.oliveira at uc.pt ]
>>>>>>> >
>>>>>>> À: "Slurm-users" < [ mailto:slurm-users at lists.schedmd.com |
>>>>>>> slurm-users at lists.schedmd.com ] >
>>>>>>> Cc: "slurm-users" < [ mailto:slurm-users at schedmd.com | slurm-users at schedmd.com ]
>>>>>>> >
>>>>>>> Envoyé: Vendredi 24 Juin 2022 14:07:16
>>>>>>> Objet: Re: [slurm-users] GrpTRESMins and GrpTRESRaw usage
>>>>>>> Hi Gérard,
>>>>>>> I believe so. All our accounts correspond to one project and all have an
>>>>>>> associated QoS with NoDecay and DenyOnLimit. This is enough to restrict usage
>>>>>>> on each individual project.
>>>>>>> You only need these flags on the QoS. The association will carry on as usual and
>>>>>>> fairshare will not be impacted.
>>>>>>> Hope that helps,
>>>>>>> Miguel Oliveira
>>>>>>>> On 24 Jun 2022, at 12:56, [ mailto:gerard.gil at cines.fr | gerard.gil at cines.fr ]
>>>>>>>> wrote:
>>>>>>>> Hi Miguel,
>>>>>>>>> Why not? You can have multiple QoSs and you have other techniques to change
>>>>>>>>> priorities according to your policies.
>>>>>>>> Is this answer my question ?
>>>>>>>> "If all configured QOS use NoDecay, we can take advantage of the FairShare
>>>>>>>> priority with Decay and all jobs GrpTRESRaw with NoDecay ?"
>>>>>>>> Thanks
>>>>>>>> Best,
>>>>>> > > Gérard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220630/fe2c89e6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2065 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220630/fe2c89e6/attachment-0001.bin>
More information about the slurm-users
mailing list