[slurm-users] ConstrainRAMSpace=yes and page cache?

Kilian Cavalotti kilian.cavalotti.work at gmail.com
Fri Jun 14 00:27:18 UTC 2019


Hi Jürgen,

I would take a look at the various *KmemSpace options in cgroups.conf,
they can certainly help with this.

Cheers,
-- 
Kilian

On Thu, Jun 13, 2019 at 2:41 PM Juergen Salk <juergen.salk at uni-ulm.de> wrote:
>
> Dear all,
>
> I'm just starting to get used to Slurm and play around with it in a small test
> environment within our old cluster.
>
> For our next system we will probably have to abandon our current exclusive user
> node access policy in favor of a shared user policy, i.e. jobs from different
> users will then run side by side on the same node at the same time. In order to
> prevent the jobs from interfering with each other, I have set both
> ConstrainCores=yes and ConstrainRAMSpace=yes in cgroups.conf, which works as
> expected for limiting the memory of the processes to the value requested at job
> submission (e.g. by --mem=... option).
>
> However, I've noticed that ConstrainRAMSpace=yes does also cap the available
> page cache for which the Linux kernel normally exploits any unused areas of the
> memory in a flexible way. This may result in a significant performance impact
> as we do have quite a number of IO demanding applications (predominated by read
> operations) that are known to benefit a lot from page caching.
>
> Here comes a small example to illustrate this issue. The job writes a 16 GB
> file to a local scratch file system, measures the amount of data cached in
> memory and then reads the file previously written.
>
> $ cat job.slurm
> #!/bin/bash
> #SBATCH --partition=standard
> #SBATCH --nodes=1
> #SBATCH --ntasks-per-node=1
> #SBATCH --time=00:10:00
>
> # Get amount of data cached in memory before writing the file
> cached1=`awk '$1=="Cached:" {print $2}' /proc/meminfo`
>
> # Write 16 GB file to local scratch SSD
> dd if=/dev/zero of=$SCRATCH/testfile count=16 bs=1024M
>
> # Get amount of data cached in memory after writing the file
> cached2=`awk '$1=="Cached:" {print $2}' /proc/meminfo`
>
> # Print difference of data cached in memory
> echo -e "\nIncreased cached data by $(((cached2-cached1)/1000000)) GB\n"
>
> # Read the file previously written
> dd if=$SCRATCH/testfile of=/dev/null count=16 bs=1024M
>
> $
>
> For reference, this is the result *without* ConstrainRAMSpace=yes
> set in cgroups.conf and submitted with `sbatch --mem=2G --gres=scratch:16 job.slurm´
>
> --- snip ---
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 10.9839 s, 1.6 GB/s
>
> Increased cached data by 16 GB
>
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 5.03225 s, 3.4 GB/s
> --- snip ---
>
> Note that there is 16 GB of data cached and the read
> performance is 3.4 GB/s as the data is actually read from page
> cache.
>
> And this is the result *with* ConstrainRAMSpace=yes set in cgroups.conf
> and submitted with the very same command:
>
> --- snip ---
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 13.3163 s, 1.3 GB/s
>
> Increased cached data by 1 GB
>
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 11.1098 s, 1.5 GB/s
> --- snip ---
>
> Now only 1 GB of data has been cached (which is roughly
> the 2 GB that have been requested for the job minus 1 GB
> allocated by the dd buffer) resulting in a read performance
> degradation to 1.5 GB/s (compared to 3.4 GB/s as above).
>
> Finally, this is the result with *with* ConstrainRAMSpace=yes
> set in cgroups.conf and the job submitted with
> `sbatch --mem=18G --gres=scratch:16 job.slurm´:
>
> --- snip ---
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 11.0601 s, 1.6 GB/s
>
> Increased cached data by 16 GB
>
> 16+0 records in
> 16+0 records out
> 17179869184 bytes (17 GB) copied, 5.01643 s, 3.4 GB/s
> --- snip ---
>
> This is almost the same result as in the unconstrained case (i.e. without
> ConstrainRAMSpace=yes set in cgroups.conf) as the amount of memory requested
> for the job (18 GB) is large enough to allow the file to be fully cached in
> memory.
>
> I do not think this is an issue with Slurm itself but how cgroups
> are supposed to work. However, I wonder how others cope with this.
>
> Maybe we have to teach our users to also consider page cache when
> requesting a certain amount of memory for their jobs?
>
> Any comment or idea would be highly appreciated.
>
> Thank you in advance.
>
> Best regards
> Jürgen
>
> --
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471
>


-- 
Kilian



More information about the slurm-users mailing list