[slurm-users] Way MaxRSS should be interpreted

Loris Bennett loris.bennett at fu-berlin.de
Tue Apr 17 05:09:17 MDT 2018


Hi Eli,

"E.S. Rosenberg" <esr+slurm-dev at mail.hebrew.edu> writes:

> Hi fellow slurm users,
> We have been struggling for a while with understanding how MaxRSS is reported.
>
> This because jobs often die with MaxRSS not even approaching 10% of the requested memory sometimes.
>
> I just found the following document:
> https://research.csc.fi/-/a
>
> It says:
> "maxrss = maximum amount of memory used at any time by any process in that job. This applies directly for serial jobs. For parallel jobs you need to multiply with the number of cores (max 16 or 24 as this is
> reported only for that node that used the most memory)"
>
> While 'man sacct' says:
> "Maximum resident set size of all tasks in job."
>
> Which explanation is correct? How should I be interpreting MaxRSS?

As far as I can tell, both explanations are correct, but the
text in 'man acct' is confusing.

  "Maximum resident set size of all tasks in job."

is analogous to

  "maximum height of all people in the room"

rather than 

  "total height of all people in the room"

More specifically it means

  "Maximum individual resident set size out of the group of resident set
  sizes associated with all tasks in job."

It doesn't mean

  "Sum of the resident set sizes of all the tasks"

I'm a native English-speaker and I keep on stumbling over this in 'man
sacct' and then remembering that I have already worked out how it was
supposed to be interpreted.

My suggestion for improving this would be

  "Maximum individual resident set size of all resident set sizes
  associated with the tasks in job."

It's a little clunky, but I hope it is clearer.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universit├Ąt Berlin         Email loris.bennett at fu-berlin.de



More information about the slurm-users mailing list