[slurm-users] Way MaxRSS should be interpreted

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Tue Apr 17 07:03:29 MDT 2018


I think the situation is likely to be a little different. Let’s consider a fortran program that statically or dynamically defines large arrays. This defines a virtual memory size – like declaring that this is the maximum amount of memory you might use if you fill the arrays. That amount of real memory + swap must be available for the program to run – after all, you might use that amount…  Speaking loosely, linux has a soft memory allocation policy so memory may not actually be allocated until it is used. If the program happens to read a smaller dataset and the arrays are not filled then the resident set size may be significantly smaller than the virtual memory size.  Further, memory swapped doesn’t count to the RSS so it might be even smaller. Effectively RSS for a process is the actual footprint in RAM. It will change over the life of the process/job and slurm will track the maximum (MaxRSS). I’d actually expect MaxRSS to be the maximum of the sum of RSS of known processes as sampled periodically through the job – but I’m guessing. This should apply reasonably to parallel jobs if the sum spans nodes (or it wouldn’t be the first batch system to only effectively account for the first allocated node). The whole linux memory tracking/accounting system has gotchas as shared memory (say for library code) has to be accounted for somewhere, but we can reasonably assume in HPC that memory use is dominated by unique computational working set data – so MaxRSS is a good estimate of how much RAM is needed to run a given job.

Gareth

From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of E.S. Rosenberg
Sent: Tuesday, 17 April 2018 10:42 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Way MaxRSS should be interpreted

Hi Loris,
Thanks for your explanation!
I would have interpreted as max(sum()).

Is there a way to get max(sum()) or at least sum form of sum()? The assumption that all processes are peaking at the same value is not a valid one unless all threads have essentially the same workload...
Thanks again!
Eli

On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett <loris.bennett at fu-berlin.de<mailto:loris.bennett at fu-berlin.de>> wrote:
Hi Eli,

"E.S. Rosenberg" <esr+slurm-dev at mail.hebrew.edu<mailto:esr%2Bslurm-dev at mail.hebrew.edu>> writes:

> Hi fellow slurm users,
> We have been struggling for a while with understanding how MaxRSS is reported.
>
> This because jobs often die with MaxRSS not even approaching 10% of the requested memory sometimes.
>
> I just found the following document:
> https://research.csc.fi/-/a
>
> It says:
> "maxrss = maximum amount of memory used at any time by any process in that job. This applies directly for serial jobs. For parallel jobs you need to multiply with the number of cores (max 16 or 24 as this is
> reported only for that node that used the most memory)"
>
> While 'man sacct' says:
> "Maximum resident set size of all tasks in job."
>
> Which explanation is correct? How should I be interpreting MaxRSS?

As far as I can tell, both explanations are correct, but the
text in 'man acct' is confusing.

  "Maximum resident set size of all tasks in job."

is analogous to

  "maximum height of all people in the room"

rather than

  "total height of all people in the room"

More specifically it means

  "Maximum individual resident set size out of the group of resident set
  sizes associated with all tasks in job."

It doesn't mean

  "Sum of the resident set sizes of all the tasks"

I'm a native English-speaker and I keep on stumbling over this in 'man
sacct' and then remembering that I have already worked out how it was
supposed to be interpreted.

My suggestion for improving this would be

  "Maximum individual resident set size of all resident set sizes
  associated with the tasks in job."

It's a little clunky, but I hope it is clearer.

Cheers,

Loris

--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de<mailto:loris.bennett at fu-berlin.de>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180417/fd122924/attachment-0001.html>


More information about the slurm-users mailing list