[slurm-users] Cgroups and swap with 18.08.1?

John Hearns hearnsj at googlemail.com
Tue Oct 16 03:09:13 MDT 2018


Bill, you know this already. But permit me an observation from PPBpro.
Turn up the logging level to maximum on the nodes. Tail the slurm log and
start a job.
Look HARD at exactly what the log is telling you - and as Richard Feynman
says you are the easiest person to fool.
Dont take the log to say what you think is happening - remember that log
messages take effort to put in the code,
well at least some keystrokes, so they usually mean something!

On Tue, 16 Oct 2018 at 10:04, John Hearns <hearnsj at googlemail.com> wrote:

> Rather dumb question from me - you have checked those processes are
> running within a cgroup?
> I have no experience in constraining the swap usage using cgroups, so
> sorry if I am adding nothing to the debate here.
>
> On Tue, 16 Oct 2018 at 04:49, Bill Broadley <bill at cse.ucdavis.edu> wrote:
>
>>
>> Greetings,
>>
>> I'm using ubuntu-18.04 and slurm-18.08.1 compiled from source.
>>
>> I followed the directions on:
>> https://slurm.schedmd.com/cgroups.html
>>
>> And:
>> https://slurm.schedmd.com/cgroup.conf.html
>>
>> That resulted in:
>> $ cat slurm.conf | egrep -i "cgroup|CR_"
>> ProctrackType=proctrack/cgroup
>> TaskPlugin=task/cgroup
>> SelectTypeParameters=CR_CPU_MEMORY
>> JobAcctGatherType=jobacct_gather/cgroup
>>
>> $ cat /etc/default/grub  | grep GRUB_CMDLINE_LINUX=
>> GRUB_CMDLINE_LINUX='cgroup_enable=memory swapaccount=1 console=tty0
>> transparent_hugepage=madvise console=ttyS0,57600'
>>
>> $ cat cgroup.conf
>> CgroupAutomount=yes
>> ConstrainCores=yes
>> ConstrainDevices=yes
>> ConstrainRAMSpace=yes
>> ConstrainSwapSpace=yes
>> MaxSwapPercent=0
>> AllowedSwapSpace=0
>>
>> So I expect jobs to not use swap.  Turns out if I run a 3GB ram process
>> with
>> sbatch --mem=1000 I just get a process that uses 1GB ram and 2GB of swap.
>>
>> So a 3GB process with --mem=1000:
>>   $ ps acux
>>   USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>   bill     17698 11.1  1.5 2817020 1015392 ?     D    20:40   0:13 stream\
>>
>>   $ smem
>>   User     Count     Swap      USS      PSS      RSS
>>   bill         1  1795552  1017048  1017076  1018492
>>
>> With --mem=3000 zero swap is used and the job consumes 100% of a CPU.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181016/125eb4b3/attachment-0001.html>


More information about the slurm-users mailing list