[slurm-users] [EXT] Re: [External] maxRSS and aveRSS

Sean Crosby scrosby at unimelb.edu.au
Fri Mar 12 23:37:17 UTC 2021


On Sat, 13 Mar 2021 at 08:48, Prentice Bisbal <pbisbal at pppl.gov> wrote:

> * UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts *
> ------------------------------
>
> It sounds like your confusing job steps and tasks. For an MPI program,
> tasks and MPI ranks are the same thing. A slurm job has multiple steps. A
> single job step could have only 1 task, while another step in the same job
> can use 1,000 tasks.  When looking at the amount of memory for a job, the
> important number is the largest value of MaxRSS for all the job steps. Why
> this important? Because if you don't request at least this much with your
> --mem specification, your job may fail.
>
> Based on your definition, of aveRSS (I didn't go back and check the
> documentation myself), it sounds like you're doing unnecessary math, since
> I'm sure Slurm sums up the individual task max. RSS values for each task to
> get MaxRSS, and then divides that by the number of tasks to get the AveRSS.
>
This is incorrect. MaxRSS is the maximum amount of RAM the task that used
the most amount of RAM used. That is why there is then a MaxRSSNode and
MaxRSSTask value. MaxRSSNode is the node the task that used the most amount
of RAM was on, and MaxRSSTask is the task ID of the task that used the most
amount of RAM.

If you are trying to work out the RAM that the job as a whole used, use
TRESUsageInTot

For a job on our cluster:

# sacct -j 24207294 -o
JobID,Node,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,TRESUsageInTot -p
JobID|NodeList|AveRSS|MaxRSS|MaxRSSTask|MaxRSSNode|TRESUsageInTot|
24207294.0|spartan-bm[055-056,058-059,061-062,085,091-093,096,098-099,104,108,112-117,120-124]|927811665|962245K|3|spartan-bm058|cpu=4784-18:38:23,energy=0,fs/disk=3555263283,mem=217455859K,pages=2438,vmem=434981656K|

This shows that AveRSS was 884MB, MaxRSS was task 3 running on
spartan-bm058, which used 939MB, and all tasks in total used 212359MB

Also remember that --mem is a per node memory request. It is not a per job
or a per task memory request.

Sean

On 3/9/21 3:41 AM, xiaojinghu93 at 163.com wrote:
>
> Hi guys,
> I would like to calculate the CPU efficiency and Memory efficiency of slurm jobs.
>
> I am having difficulty calculating the real “memory” a job use.
> According to slurm, “maxRSS” means "Maximum resident set size of all tasks in job”. If so, how can I get the memory used by a single job?  As far as I am concerned, if I need to know the memory used by a single job/jobstep, I need to sum up the memory used for each task. So I think  I should use the “aveRSS” field which gives the "average resident set size of all tasks in job”. If I multiply the “aveRSS” with “task”, I should get the real memory a job/jobstep used.
>
> But I studied the code of the “seff” command and it claims to be equivalent to "sacct -P -n -a --format JobID,User,Group,State,Cluster,AllocCPUS,REQMEM,TotalCPU,Elapsed,MaxRSS,ExitCode,NNodes,NTasks -j <job_id>”, which means I should use “maxRSS”.
>
> Can anyone give me some explanation on that?
>
> Very grateful for any help.
> Thank you!
>
> Regards,
> Xiaojing
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210313/f7b48d55/attachment.htm>


More information about the slurm-users mailing list