[slurm-users] Way MaxRSS should be interpreted

E.S. Rosenberg esr+slurm-dev at mail.hebrew.edu
Tue Apr 17 07:49:56 MDT 2018


Hi Gareth,
Your assessment is also what I would have thought MaxRSS should be the
maximum of the sum of all RSS in a sample, swap and shared memory does
complicate things but I think most people expect jobs to only be killed if
their RSS exceeds their memory request.

That being said as far as I understand the current slurm reporting
mechanisms there is actually no way to get the total MaxRSS of a job but
only of whatever step/subjob/thread was largest in memory.
Thanks,
Eli

On Tue, Apr 17, 2018 at 4:03 PM, <Gareth.Williams at csiro.au> wrote:

> I think the situation is likely to be a little different. Let’s consider a
> fortran program that statically or dynamically defines large arrays. This
> defines a virtual memory size – like declaring that this is the maximum
> amount of memory you might use if you fill the arrays. That amount of real
> memory + swap must be available for the program to run – after all, you
> might use that amount…  Speaking loosely, linux has a soft memory
> allocation policy so memory may not actually be allocated until it is used.
> If the program happens to read a smaller dataset and the arrays are not
> filled then the resident set size may be significantly smaller than the
> virtual memory size.  Further, memory swapped doesn’t count to the RSS so
> it might be even smaller. Effectively RSS for a process is the actual
> footprint in RAM. It will change over the life of the process/job and slurm
> will track the maximum (MaxRSS). I’d actually expect MaxRSS to be the
> maximum of the sum of RSS of known processes as sampled periodically
> through the job – but I’m guessing. This should apply reasonably to
> parallel jobs if the sum spans nodes (or it wouldn’t be the first batch
> system to only effectively account for the first allocated node). The whole
> linux memory tracking/accounting system has gotchas as shared memory (say
> for library code) has to be accounted for somewhere, but we can reasonably
> assume in HPC that memory use is dominated by unique computational working
> set data – so MaxRSS is a good estimate of how much RAM is needed to run a
> given job.
>
>
>
> Gareth
>
>
>
> *From:* slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] *On
> Behalf Of *E.S. Rosenberg
> *Sent:* Tuesday, 17 April 2018 10:42 PM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Way MaxRSS should be interpreted
>
>
>
> Hi Loris,
>
> Thanks for your explanation!
>
> I would have interpreted as max(sum()).
>
>
>
> Is there a way to get max(sum()) or at least sum form of sum()? The
> assumption that all processes are peaking at the same value is not a valid
> one unless all threads have essentially the same workload...
>
> Thanks again!
>
> Eli
>
>
>
> On Tue, Apr 17, 2018 at 2:09 PM, Loris Bennett <loris.bennett at fu-berlin.de>
> wrote:
>
> Hi Eli,
>
> "E.S. Rosenberg" <esr+slurm-dev at mail.hebrew.edu> writes:
>
> > Hi fellow slurm users,
> > We have been struggling for a while with understanding how MaxRSS is
> reported.
> >
> > This because jobs often die with MaxRSS not even approaching 10% of the
> requested memory sometimes.
> >
> > I just found the following document:
> > https://research.csc.fi/-/a
> >
> > It says:
> > "maxrss = maximum amount of memory used at any time by any process in
> that job. This applies directly for serial jobs. For parallel jobs you need
> to multiply with the number of cores (max 16 or 24 as this is
> > reported only for that node that used the most memory)"
> >
> > While 'man sacct' says:
> > "Maximum resident set size of all tasks in job."
> >
> > Which explanation is correct? How should I be interpreting MaxRSS?
>
> As far as I can tell, both explanations are correct, but the
> text in 'man acct' is confusing.
>
>   "Maximum resident set size of all tasks in job."
>
> is analogous to
>
>   "maximum height of all people in the room"
>
> rather than
>
>   "total height of all people in the room"
>
> More specifically it means
>
>   "Maximum individual resident set size out of the group of resident set
>   sizes associated with all tasks in job."
>
> It doesn't mean
>
>   "Sum of the resident set sizes of all the tasks"
>
> I'm a native English-speaker and I keep on stumbling over this in 'man
> sacct' and then remembering that I have already worked out how it was
> supposed to be interpreted.
>
> My suggestion for improving this would be
>
>   "Maximum individual resident set size of all resident set sizes
>   associated with the tasks in job."
>
> It's a little clunky, but I hope it is clearer.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin         Email loris.bennett at fu-berlin.de
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180417/af1e9495/attachment.html>


More information about the slurm-users mailing list