[slurm-users] [EXT] Re: [External] maxRSS and aveRSS
Prentice Bisbal
pbisbal at pppl.gov
Sat Mar 13 00:37:21 UTC 2021
On 3/12/21 6:37 PM, Sean Crosby wrote:
>
>
> On Sat, 13 Mar 2021 at 08:48, Prentice Bisbal <pbisbal at pppl.gov
> <mailto:pbisbal at pppl.gov>> wrote:
>
> *
> *UoM notice: *External email. Be cautious of links, attachments,
> or impersonation attempts
>
> *
> ------------------------------------------------------------------------
>
> It sounds like your confusing job steps and tasks. For an MPI
> program, tasks and MPI ranks are the same thing. A slurm job has
> multiple steps. A single job step could have only 1 task, while
> another step in the same job can use 1,000 tasks. When looking at
> the amount of memory for a job, the important number is the
> largest value of MaxRSS for all the job steps. Why this important?
> Because if you don't request at least this much with your --mem
> specification, your job may fail.
>
> Based on your definition, of aveRSS (I didn't go back and check
> the documentation myself), it sounds like you're doing unnecessary
> math, since I'm sure Slurm sums up the individual task max. RSS
> values for each task to get MaxRSS, and then divides that by the
> number of tasks to get the AveRSS.
>
> This is incorrect. MaxRSS is the maximum amount of RAM the task that
> used the most amount of RAM used. That is why there is then a
> MaxRSSNode and MaxRSSTask value. MaxRSSNode is the node the task that
> used the most amount of RAM was on, and MaxRSSTask is the task ID of
> the task that used the most amount of RAM.
Thanks for the correction. That's what I originally thought, and then
read the definition he provided, which is exactly the same as in the
documentation, and completely misinterpreted it. When I look at the
sacct documentation and see that same definition in the context of all
the all the other MaxRSS values, it's clear I screwed up. Sorry!
SchedMD should reword that so even out of context it's clear what it
represents.
When I read "Maximum resident set size of all tasks in job" I
automatically thought "Maximum of the *sum* of the RSSes of each task.
Prentice
>
> If you are trying to work out the RAM that the job as a whole used,
> use TRESUsageInTot
>
> For a job on our cluster:
>
> # sacct -j 24207294 -o
> JobID,Node,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot -p
> JobID|NodeList|AveRSS|MaxRSS|MaxRSSTask|MaxRSSNode|TRESUsageInTot|
> 24207294.0|spartan-bm[055-056,058-059,061-062,085,091-093,096,098-099,104,108,112-117,120-124]|927811665|962245K|3|spartan-bm058|cpu=4784-18:38:23,energy=0,fs/disk=3555263283,mem=217455859K,pages=2438,vmem=434981656K|
>
> This shows that AveRSS was 884MB, MaxRSS was task 3 running on
> spartan-bm058, which used 939MB, and all tasks in total used 212359MB
>
> Also remember that --mem is a per node memory request. It is not a per
> job or a per task memory request.
>
> Sean
>
> On 3/9/21 3:41 AM, xiaojinghu93 at 163.com
> <mailto:xiaojinghu93 at 163.com> wrote:
>> Hi guys,
>> I would like to calculate the CPU efficiency and Memory efficiency of slurm jobs.
>>
>> I am having difficulty calculating the real “memory” a job use.
>> According to slurm, “maxRSS” means "Maximum resident set size of all tasks in job”. If so, how can I get the memory used by a single job? As far as I am concerned, if I need to know the memory used by a single job/jobstep, I need to sum up the memory used for each task. So I think I should use the “aveRSS” field which gives the "average resident set size of all tasks in job”. If I multiply the “aveRSS” with “task”, I should get the real memory a job/jobstep used.
>>
>> But I studied the code of the “seff” command and it claims to be equivalent to "sacct -P -n -a --format JobID,User,Group,State,Cluster,AllocCPUS,REQMEM,TotalCPU,Elapsed,MaxRSS,ExitCode,NNodes,NTasks -j <job_id>”, which means I should use “maxRSS”.
>>
>> Can anyone give me some explanation on that?
>>
>> Very grateful for any help.
>> Thank you!
>>
>> Regards,
>> Xiaojing
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210312/f7afaba6/attachment-0001.htm>
More information about the slurm-users
mailing list