[slurm-users] ConstrainRAMSpace=yes and page cache?

Fri Jun 14 08:14:25 UTC 2019

Dear Kilian,

thanks for pointing this out. I should have mentioned that I had
already browsed the croups.conf man page up and down but did not find
any specific hints on how to achieve the desired behavior. Maybe I am
still missing something obvious?

Also the kernel cgroups documentation indicates that page cache
and anonymous memory, are both tied to userland memory[1]:

--- snip ---
While not completely water-tight, all major memory usages by a given
cgroup are tracked so that the total memory consumption can be
accounted and controlled to a reasonable extent. Currently, the
following types of memory usages are tracked.

    Userland memory - page cache and anonymous memory.
    Kernel data structures such as dentries and inodes.
    TCP socket buffers.
--- snip ---

That's why I'm somewhat unsure whether KmemSpace options in
cgroups.conf can address this issue.

I guess my question simply boils down to whether there is a Slurm-ish
way to prevent active page caches from being counted against memory
constraints when ConstrainRAMSpace=yes is set?

Best regards
Jürgen

[1] https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html

-- 
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471

* Kilian Cavalotti <kilian.cavalotti.work at gmail.com> [190613 17:27]:
> Hi Jürgen,
> 
> I would take a look at the various *KmemSpace options in
> cgroups.conf, they can certainly help with this.
> 
> Cheers, -- Kilian
> 
> On Thu, Jun 13, 2019 at 2:41 PM Juergen Salk
> <juergen.salk at uni-ulm.de> wrote:
> >
> > Dear all,
> >
> > I'm just starting to get used to Slurm and play around with it in
> > a small test environment within our old cluster.
> >
> > For our next system we will probably have to abandon our current
> > exclusive user node access policy in favor of a shared user
> > policy, i.e. jobs from different users will then run side by side
> > on the same node at the same time. In order to prevent the jobs
> > from interfering with each other, I have set both
> > ConstrainCores=yes and ConstrainRAMSpace=yes in cgroups.conf,
> > which works as expected for limiting the memory of the processes
> > to the value requested at job submission (e.g. by --mem=...
> > option).
> >
> > However, I've noticed that ConstrainRAMSpace=yes does also cap the
> > available page cache for which the Linux kernel normally exploits
> > any unused areas of the memory in a flexible way. This may result
> > in a significant performance impact as we do have quite a number
> > of IO demanding applications (predominated by read operations)
> > that are known to benefit a lot from page caching.
> >
> > Here comes a small example to illustrate this issue. The job
> > writes a 16 GB file to a local scratch file system, measures the
> > amount of data cached in memory and then reads the file previously
> > written.
> >
> > $ cat job.slurm #!/bin/bash #SBATCH --partition=standard #SBATCH
> > --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --time=00:10:00
> >
> > # Get amount of data cached in memory before writing the file
> > cached1=`awk '$1=="Cached:" {print $2}' /proc/meminfo`
> >
> > # Write 16 GB file to local scratch SSD dd if=/dev/zero
> > of=$SCRATCH/testfile count=16 bs=1024M
> >
> > # Get amount of data cached in memory after writing the file
> > cached2=`awk '$1=="Cached:" {print $2}' /proc/meminfo`
> >
> > # Print difference of data cached in memory echo -e "\nIncreased
> > cached data by $(((cached2-cached1)/1000000)) GB\n"
> >
> > # Read the file previously written dd if=$SCRATCH/testfile
> > of=/dev/null count=16 bs=1024M
> >
> > $
> >
> > For reference, this is the result *without* ConstrainRAMSpace=yes
> > set in cgroups.conf and submitted with `sbatch --mem=2G
> > --gres=scratch:16 job.slurm´
> >
> > --- snip --- 16+0 records in 16+0 records out 17179869184 bytes
> > (17 GB) copied, 10.9839 s, 1.6 GB/s
> >
> > Increased cached data by 16 GB
> >
> > 16+0 records in 16+0 records out 17179869184 bytes (17 GB) copied,
> > 5.03225 s, 3.4 GB/s --- snip ---
> >
> > Note that there is 16 GB of data cached and the read performance
> > is 3.4 GB/s as the data is actually read from page cache.
> >
> > And this is the result *with* ConstrainRAMSpace=yes set in
> > cgroups.conf and submitted with the very same command:
> >
> > --- snip --- 16+0 records in 16+0 records out 17179869184 bytes
> > (17 GB) copied, 13.3163 s, 1.3 GB/s
> >
> > Increased cached data by 1 GB
> >
> > 16+0 records in 16+0 records out 17179869184 bytes (17 GB) copied,
> > 11.1098 s, 1.5 GB/s --- snip ---
> >
> > Now only 1 GB of data has been cached (which is roughly the 2 GB
> > that have been requested for the job minus 1 GB allocated by the
> > dd buffer) resulting in a read performance degradation to 1.5 GB/s
> > (compared to 3.4 GB/s as above).
> >
> > Finally, this is the result with *with* ConstrainRAMSpace=yes set
> > in cgroups.conf and the job submitted with `sbatch --mem=18G
> > --gres=scratch:16 job.slurm´:
> >
> > --- snip --- 16+0 records in 16+0 records out 17179869184 bytes
> > (17 GB) copied, 11.0601 s, 1.6 GB/s
> >
> > Increased cached data by 16 GB
> >
> > 16+0 records in 16+0 records out 17179869184 bytes (17 GB) copied,
> > 5.01643 s, 3.4 GB/s --- snip ---
> >
> > This is almost the same result as in the unconstrained case (i.e.
> > without ConstrainRAMSpace=yes set in cgroups.conf) as the amount
> > of memory requested for the job (18 GB) is large enough to allow
> > the file to be fully cached in memory.
> >
> > I do not think this is an issue with Slurm itself but how cgroups
> > are supposed to work. However, I wonder how others cope with this.
> >
> > Maybe we have to teach our users to also consider page cache when
> > requesting a certain amount of memory for their jobs?
> >
> > Any comment or idea would be highly appreciated.
> >
> > Thank you in advance.
> >
> > Best regards Jürgen
> >
> > -- Jürgen Salk Scientific Software & Compute Services (SSCS)
> > Kommunikations- und Informationszentrum (kiz) Universität Ulm
> > Telefon: +49 (0)731 50-22478 Telefax: +49 (0)731 50-22471
> >
> 
> 
> -- Kilian
> 

-- 
GPG A997BA7A | 87FC DA31 5F00 C885 0DC3  E28F BD0D 4B33 A997 BA7A