[slurm-users] virtual memory limit exceeded
Noam Bernstein
noam.bernstein at nrl.navy.mil
Fri Nov 9 07:34:36 MST 2018
> On Nov 9, 2018, at 3:14 AM, Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> wrote:
>
> Noam Bernstein <noam.bernstein at nrl.navy.mil> writes:
>
>> Can anyone shed some light on where the _virtual_ memory limit comes from?
>
> Perhaps it comes from a VSizeFactor setting in slurm.conf:
>
> VSizeFactor
> Memory specifications in job requests apply to real memory size (also known as
> resident set size). It is possible to enforce virtual memory limits for both
> jobs and job steps by limiting their virtual memory to some percentage of
> their real memory allocation. The VSizeFactor parameter specifies the job's or
> job step's virtual memory limit as a percentage of its real memory limit. For
> example, if a job's real memory limit is 500MB and VSizeFactor is set to 101
> then the job will be killed if its real memory exceeds 500MB or its virtual
> memory exceeds 505MB (101 percent of the real memory limit). The default
> value is 0, which disables enforcement of virtual memory limits. The value
> may not exceed 65533 percent.
Aha - thanks. There’s lots of documentation, which is mostly great, but sometimes it’s hard to find any particular thing.
>
>> 1. If I define DefMemPerCPU in the partition line, and the job doesn't request
>> anything else, what memory measure should expect this to be the limit
>> on? RSS?
>
> Yes, RSS.
>
> To test our memory settings, we often use a small program like the
> attached C program. It first allocates memory (to test vmem limit),
> then fills it with values (to test rss limit), and finally re-reads the
> memory (to check for swapping).
Thanks - that’s a very helpful example.
And thanks also to Chris Samuel, who pointed out the difference between slurm enforcing memory limits itself and cgroup using the kernel to enforce memory limits, and the error message indicating that it must be the former.
Noam
More information about the slurm-users
mailing list