[slurm-users] virtual memory limit exceeded

Noam Bernstein noam.bernstein at nrl.navy.mil
Fri Nov 9 07:34:36 MST 2018


> On Nov 9, 2018, at 3:14 AM, Bjørn-Helge Mevik <b.h.mevik at usit.uio.no> wrote:
> 
> Noam Bernstein <noam.bernstein at nrl.navy.mil> writes:
> 
>> Can anyone shed some light on where the _virtual_ memory limit comes from?
> 
> Perhaps it comes from a VSizeFactor setting in slurm.conf:
> 
>       VSizeFactor
>              Memory specifications in job requests apply to real memory size (also known as
>              resident set size). It is possible to enforce virtual memory limits  for  both
>              jobs  and  job  steps  by  limiting their virtual memory to some percentage of
>              their real memory allocation. The VSizeFactor parameter specifies the job's or
>              job  step's virtual memory limit as a percentage of its real memory limit. For
>              example, if a job's real memory limit is 500MB and VSizeFactor is set  to  101
>              then  the  job  will be killed if its real memory exceeds 500MB or its virtual
>              memory exceeds 505MB (101 percent of the  real  memory  limit).   The  default
>              value  is  0,  which disables enforcement of virtual memory limits.  The value
>              may not exceed 65533 percent.

Aha - thanks.  There’s lots of documentation, which is mostly great, but sometimes it’s hard to find any particular thing.

> 
>> 1. If I define DefMemPerCPU in the partition line, and the job doesn't request
>> anything else, what memory measure should expect this to be the limit
>> on? RSS?
> 
> Yes, RSS.
> 
> To test our memory settings, we often use a small program like the
> attached C program.  It first allocates memory (to test vmem limit),
> then fills it with values (to test rss limit), and finally re-reads the
> memory (to check for swapping).

Thanks - that’s a very helpful example.

And thanks also to Chris Samuel, who pointed out the difference between slurm enforcing memory limits itself and cgroup using the kernel to enforce memory limits, and the error message indicating that it must be the former.


											Noam


More information about the slurm-users mailing list