[slurm-users] blastx fails with "Error memory mapping"

Mark Hahn hahn at mcmaster.ca
Fri Jan 24 20:00:57 UTC 2020


apologies for a long response; didn't have time for a shorter one ;)

> >you have it backwards.  slurm creates a cgroup for the job (step)
>> and uses the cgroup control to tell the kernel how much memory to
>> permit the job-step to use.
>
> I would like to know how can I increase the threshold in slurm config
> files. I can not find it.

maybe I'm not being clear.  if you enable cgroups in slurm.conf,
for the simple case of a single node, single-step job,
slurm creates a cgroup tree for the job (also identified by uid),
and the process it starts on your behalf (running the job script)
is controlled by the *.limit_in_bytes settings:

[hahn at gra-login3 ~]$ salloc
...
salloc: Granted job allocation 26782102
[hahn at gra796 ~]$ bin/show_my_cgroup --debug
...
gra796:14630: DEBUG pid=14630 find_cgroup(14630,memory) =
/sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
gra796:14630: memory.usage_in_bytes=8822784
gra796:14630: memory.limit_in_bytes=268435456
gra796:14630: memory.memsw.usage_in_bytes=8822784
gra796:14630: memory.memsw.limit_in_bytes=268435456
...
[hahn at gra796 ~]$ cd /sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
[hahn at gra796 step_0]$ ls -l
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.max_usage_in_bytes
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.usage_in_bytes
...
[hahn at gra796 step_0]$ cat memory.memsw.limit_in_bytes 
268435456
[hahn at gra796 step_0]$ cat memory.limit_in_bytes 
268435456
[hahn at gra796 step_0]$ cat memory.kmem.limit_in_bytes 
9223372036854771712

in other words, on this system, such a job defaults to 256M (because I didn't 
salloc with --mem) and the cgroup that controls the job step's processes 
(in this case, step_0) are found in a specific sub-cgroup.

if I, as root, came in while that step was executing, and wrote some different
number to one of the limits, I could expand or contract the limit that the
kernel permits the cgroup.  for instance, I could leave rss limited but allow 
the cgroup more VM just by expanding the memsw.limit_in_bytes.

of course, this is a bad idea: the reason you have Slurm around is to make
the right settings in the first place!  you get to choose whether the
RSS limit (memory.limit_in_bytes) is the same as the VSZ limit 
(memory.memsw.limit_in_bytes).

that distinction seems to be your whole issue.  mmaping a file increases VSZ,
but not necessarily RSS, and VSZ can easily and safely go vastly beyond 
physical memory since you're using mmap to read a large file.

> According to [1], " No value is provided by cgroups for virtual memory size
> ('vsize') "
>
> [1] https://slurm.schedmd.com/slurm.conf.html

depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf.
(it's yes on the system above)

regards,
-- 
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
           | McMaster RHPCS    | hahn at mcmaster.ca | 905 525 9140 x24687
           | Compute/Calcul Canada                | http://www.computecanada.ca



More information about the slurm-users mailing list