[slurm-users] Jobs escaping cgroup device controls after some amount of time.

Tue May 1 07:18:05 MDT 2018

Thanks Andy,

I've been able to confirm that in my case, any jobs that ran for at least
30 minutes (puppet's run interval) would lose their cgroups, and that the
time those cgroups disappear corresponds exactly with puppet runs. I am not
sure if this is cgroup change to root is what causes the oom event that
Slurm detects - I looked through
src/plugins/task/cgroup/task_cgroup_memory.c and the memory cgroup
documentation and it's not clear to me what would happen if you've created
the oom event listener on a specific cgroup and that cgroup disappears. But
since I disabled puppet overnight, jobs running longer than 30 minutes are
completing, and cgroups are persisting, whereas before that, they were not.

--nate

On Mon, Apr 30, 2018 at 5:47 PM, Andy Georges <Andy.Georges at ugent.be> wrote:

>
>
> > On 30 Apr 2018, at 22:37, Nate Coraor <nate at bx.psu.edu> wrote:
> >
> > Hi Shawn,
> >
> > I'm wondering if you're still seeing this. I've recently enabled
> task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs
> are escaping their cgroups. For me this is resulting in a lot of jobs
> ending in OUT_OF_MEMORY that shouldn't, because it appears slurmd thinks
> the oom-killer has triggered when it hasn't. I'm not using GRES or devices,
> only:
>
> I am not sure that you are making the correct conclusion here.
>
> There is a known cgroups issue, due to
>
> https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt
>
> Relevant part:
>
> The memory controller has a long history. A request for comments for the
> memory
> controller was posted by Balbir Singh [1]. At the time the RFC was posted
> there were several implementations for memory control. The goal of the
> RFC was to build consensus and agreement for the minimal features required
> for memory control. The first RSS controller was posted by Balbir Singh[2]
> in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of
> the
> RSS controller. At OLS, at the resource management BoF, everyone suggested
> that we handle both page cache and RSS together. Another request was raised
> to allow user space handling of OOM. The current memory controller is
> at version 6; it combines both mapped (RSS) and unmapped Page
> Cache Control [11].
>
> Are the jobs killed prematurely? If not, then you ran into the above.
>
> Kind regards.
> — Andy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180501/77e94266/attachment.html>