[slurm-users] slurm memory cgroup seems to have vanished
Andy Georges
Andy.Georges at UGent.be
Wed Apr 4 10:00:31 MDT 2018
Hi,
For some reason I am seeing memory cgroups disappear on the nodes:
[root at node3108 memory]# file $PWD/slurm
/sys/fs/cgroup/memory/slurm: cannot open (No such file or directory)
There is however a job running and other cgroups are still present:
[root at node3108 memory]# ls /sys/fs/cgroup/cpu,cpuacct/slurm/
cgroup.clone_children cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.rt_period_us cpu.shares notify_on_release uid_2540915 uid_2541917 uid_2541963
cgroup.event_control cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.rt_runtime_us cpu.stat tasks uid_2540941 uid_2541948
The config has:
[root at node3108 memory]# grep "CR_C" /etc/slurm/slurm.conf
SelectTypeParameters=CR_Core_Memory
[root at node3108 memory]# grep "cgr" /etc/slurm/slurm.conf
JobAcctGatherType=jobacct_gather/cgroup
ProctrackType=proctrack/cgroup
TaskPlugin=task/affinity,task/cgroup
[root at node3108 memory]# cat /etc/slurm/cgroup.conf
AllowedSwapSpace=10
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
TaskAffinity=yes
Slurmd logs show:
[2018-04-03T08:00:04.224] [4389.extern] Considering each NUMA node as a socket
[2018-04-03T08:00:04.240] [4389.extern] task/cgroup: /slurm/uid_2540941/job_4389: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:04.240] [4389.extern] task/cgroup: /slurm/uid_2540941/job_4389/step_extern: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:04.492] task_p_slurmd_batch_request: 4389
[2018-04-03T08:00:04.492] task/affinity: job 4389 CPU input mask for node: 0xFFFFFFFFF
[2018-04-03T08:00:04.492] task/affinity: job 4389 CPU final HW mask for node: 0xFFFFFFFFF
[2018-04-03T08:00:05.534] Launching batch job 4389 for UID 2540941
[2018-04-03T08:00:05.587] [4389.batch] Considering each NUMA node as a socket
[2018-04-03T08:00:05.597] [4389.batch] task/cgroup: /slurm/uid_2540941/job_4389: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:05.598] [4389.batch] task/cgroup: /slurm/uid_2540941/job_4389/step_batch: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:05.668] [4389.batch] task_p_pre_launch: Using sched_affinity for tasks
[2018-04-03T08:00:08.594] launch task 4389.0 request from 2540941.2540941 at 10.141.4.9 (port 5299)
[2018-04-03T08:00:08.594] lllp_distribution jobid [4389] auto binding off: mask_cpu,one_thread
[2018-04-03T08:00:08.645] [4389.0] Considering each NUMA node as a socket
[2018-04-03T08:00:08.654] [4389.0] task/cgroup: /slurm/uid_2540941/job_4389: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:08.655] [4389.0] task/cgroup: /slurm/uid_2540941/job_4389/step_0: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
[2018-04-03T08:00:08.669] [4389.0] task_p_pre_launch: Using sched_affinity for tasks
[2018-04-04T17:45:51.336] [4389.0] _oom_event_monitor: oom-kill event count: 1
[2018-04-04T17:45:51.524] [4389.batch] _oom_event_monitor: oom-kill event count: 1
[2018-04-04T17:45:51.691] [4389.extern] _oom_event_monitor: oom-kill event count: 1
Currently, these processes are running as the job on the node:
root 425230 0.0 0.0 308260 4616 ? Sl Apr03 0:09 slurmstepd: [4389.extern]
root 425235 0.0 0.0 107904 604 ? S Apr03 0:00 \_ sleep 1000000
root 425329 0.0 0.0 308592 4940 ? Sl Apr03 0:43 slurmstepd: [4389.batch]
vsc40941 425334 0.0 0.0 113280 1660 ? S Apr03 0:00 \_ /bin/bash /var/spool/slurm/slurmd/job04389/slurm_script
vsc40941 425584 0.0 0.0 223660 14920 ? S Apr03 0:04 \_ /usr/bin/python /apps/gent/CO7/skylake-ib-PILOT/software/vsc-mympirun/4.1.0/bin/mympirun --hybrid 8 --output /user/scratch/gent/gvo000/gvo00003/vsc40941/pilot_testi
vsc40941 425602 0.0 0.0 113284 1548 ? S Apr03 0:00 \_ /bin/sh /apps/gent/CO7/skylake-ib-PILOT/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.0-2.28/bin64/mpirun --file=/user/home/gent/vsc409/vsc40941/.mympiru
vsc40941 425607 0.0 0.0 15916 1640 ? S Apr03 0:00 \_ mpiexec.hydra --file=/user/home/gent/vsc409/vsc40941/.mympirun_7xwr8q/4389_20180403_080008/mpdboot --machinefile /user/home/gent/vsc409/vsc40941/.mympirun_7
vsc40941 425608 0.0 0.0 252972 4800 ? Sl Apr03 0:00 \_ /bin/srun --nodelist node3108.skitty.os -N 1 -n 1 --input none /apps/gent/CO7/skylake-ib-PILOT/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.0-2.
vsc40941 425609 0.0 0.0 48204 712 ? S Apr03 0:00 \_ /bin/srun --nodelist node3108.skitty.os -N 1 -n 1 --input none /apps/gent/CO7/skylake-ib-PILOT/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.
root 425617 0.0 0.0 376888 4676 ? Sl Apr03 1:06 slurmstepd: [4389.0]
vsc40941 425624 0.0 0.0 19340 1928 ? S Apr03 0:00 \_ /apps/gent/CO7/skylake-ib-PILOT/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.0-2.28/bin64/pmi_proxy --control-port node3108.skitty.os:44097 --pmi-connect alltoa
vsc40941 425628 99.7 0.6 1754940 1305896 ? Rl Apr03 2021:06 \_ vasp
vsc40941 425629 99.7 0.6 1770904 1320548 ? Rl Apr03 2021:01 \_ vasp
vsc40941 425630 99.7 0.6 1768488 1325680 ? Rl Apr03 2020:47 \_ vasp
vsc40941 425631 99.7 0.6 1745188 1314856 ? Rl Apr03 2021:28 \_ vasp
vsc40941 425632 99.7 0.6 1786948 1346932 ? Rl Apr03 2021:35 \_ vasp
vsc40941 425633 99.7 0.6 1755904 1318080 ? Rl Apr03 2020:47 \_ vasp
vsc40941 425634 99.7 0.6 1740088 1300324 ? Rl Apr03 2021:36 \_ vasp
vsc40941 425635 99.7 0.6 1751500 1312788 ? Rl Apr03 2021:34 \_ vasp
Furthermore I see the following open files
[root at node3108 memory]# lsof -p 425624
<snip>
pmi_proxy 425624 vsc40941 10r REG 0,24 0 12195624 /sys/fs/cgroup/memory/slurm/uid_2540941/job_4389/step_0/memory.oom_control (deleted)
pmi_proxy 425624 vsc40941 11w REG 0,24 0 12195612 /sys/fs/cgroup/memory/slurm/uid_2540941/job_4389/step_0/cgroup.event_control (deleted)
Anything you can point me to look at to understand why the cgroup has gone? We’ve seen this with other (similar) jobs, who were killed by OOM-killer as they exceeded the cgroup memory limit (even though the user claims there should nor be more than 1G used per node). I am not sure this is related, but we’d prefer to keep the memory cgroups around. There, once the nodes were empty a new jobs recreated the memory cgroup hierarchy.
Given that
[2018-04-03T08:00:05.597] [4389.batch] task/cgroup: /slurm/uid_2540941/job_4389: alloc=184300MB mem.limit=184300MB memsw.limit=202730MB
is a line in the logs, I am assuming that the memory cgroup for the job was still present when that line was logged, would that be correct?
Thanks in advance,
— Andy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180404/47754bb2/attachment.sig>
More information about the slurm-users
mailing list