[slurm-users] blastx fails with "Error memory mapping"
Mark Hahn
hahn at mcmaster.ca
Fri Jan 24 20:00:57 UTC 2020
apologies for a long response; didn't have time for a shorter one ;)
> >you have it backwards. slurm creates a cgroup for the job (step)
>> and uses the cgroup control to tell the kernel how much memory to
>> permit the job-step to use.
>
> I would like to know how can I increase the threshold in slurm config
> files. I can not find it.
maybe I'm not being clear. if you enable cgroups in slurm.conf,
for the simple case of a single node, single-step job,
slurm creates a cgroup tree for the job (also identified by uid),
and the process it starts on your behalf (running the job script)
is controlled by the *.limit_in_bytes settings:
[hahn at gra-login3 ~]$ salloc
...
salloc: Granted job allocation 26782102
[hahn at gra796 ~]$ bin/show_my_cgroup --debug
...
gra796:14630: DEBUG pid=14630 find_cgroup(14630,memory) =
/sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
gra796:14630: memory.usage_in_bytes=8822784
gra796:14630: memory.limit_in_bytes=268435456
gra796:14630: memory.memsw.usage_in_bytes=8822784
gra796:14630: memory.memsw.limit_in_bytes=268435456
...
[hahn at gra796 ~]$ cd /sys/fs/cgroup/memory/slurm/uid_3000566/job_26782102/step_0
[hahn at gra796 step_0]$ ls -l
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.max_usage_in_bytes
...
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.limit_in_bytes
-rw-r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.max_usage_in_bytes
-r--r--r-- 1 root root 0 Jan 24 14:43 memory.memsw.usage_in_bytes
...
[hahn at gra796 step_0]$ cat memory.memsw.limit_in_bytes
268435456
[hahn at gra796 step_0]$ cat memory.limit_in_bytes
268435456
[hahn at gra796 step_0]$ cat memory.kmem.limit_in_bytes
9223372036854771712
in other words, on this system, such a job defaults to 256M (because I didn't
salloc with --mem) and the cgroup that controls the job step's processes
(in this case, step_0) are found in a specific sub-cgroup.
if I, as root, came in while that step was executing, and wrote some different
number to one of the limits, I could expand or contract the limit that the
kernel permits the cgroup. for instance, I could leave rss limited but allow
the cgroup more VM just by expanding the memsw.limit_in_bytes.
of course, this is a bad idea: the reason you have Slurm around is to make
the right settings in the first place! you get to choose whether the
RSS limit (memory.limit_in_bytes) is the same as the VSZ limit
(memory.memsw.limit_in_bytes).
that distinction seems to be your whole issue. mmaping a file increases VSZ,
but not necessarily RSS, and VSZ can easily and safely go vastly beyond
physical memory since you're using mmap to read a large file.
> According to [1], " No value is provided by cgroups for virtual memory size
> ('vsize') "
>
> [1] https://slurm.schedmd.com/slurm.conf.html
depends on whether "ConstrainSwapSpace=yes" appears in cgroup.conf.
(it's yes on the system above)
regards,
--
Mark Hahn | SHARCnet Sysadmin | hahn at sharcnet.ca | http://www.sharcnet.ca
| McMaster RHPCS | hahn at mcmaster.ca | 905 525 9140 x24687
| Compute/Calcul Canada | http://www.computecanada.ca
More information about the slurm-users
mailing list