[slurm-users] slurm accounting shows more MaxRSS than physically available memory

Ohlerich, Martin Martin.Ohlerich at lrz.de
Wed Nov 2 11:53:43 UTC 2022


Dear "Commiserates".

I wonder a bit about the meaning of MaxRSS. The documentation says:
"Maximum resident set size of all tasks in job."
To what refers here "maximum"? The maximum over job period, I understand hopefully correctly. But it does not seem to be the size of all tasks (summed up, so-to-speak), but the maximum size of RSS of that task with the largest RSS of all tasks during the job's period. Right?

In any case, I observed something like this:

login08:~> sacct -j 2408392 -o 'maxrss,maxrssnode%20'
    MaxRSS           MaxRSSNode
---------- --------------------
102124920K         i02r09c03s02

... so 102 GB if I counted the decimal positions correctly. On the other hand, for specifically this node, I actually only have

i02r09c03s02:~> cat /proc/meminfo
MemTotal:       98436736 kB

i.e. 98 GB RAM ...

Does anybody know whether there is a reasonable explanation how this can be? Specifically is the situation even worse, if MaxRSS is the maximum RSS of only one task (rank) on that node. What about the other tasks (which certainly also consume memory). And also the OS is quite large on these disk-less compute nodes.

Would be nice if you could share any ideas about my finding. Thank you!
Kind regards,
Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221102/caeeeb7a/attachment.htm>


More information about the slurm-users mailing list