[slurm-users] How to automatically kill a job that exceeds its memory limits (--mem-per-cpu)?
b.h.mevik at usit.uio.no
Tue Oct 8 06:34:39 UTC 2019
Jean-mathieu CHANTREIN <jean-mathieu.chantrein at univ-angers.fr> writes:
> I tried using, in slurm.conf
> TaskPlugin=task/affinity, task/cgroup
> and in cgroup.conf:
We have a very similar setup, the biggest difference being that we have
MemLimitEnforce=no, and leave the killing to the kernel's cgroup. For
us, jobs are killed as they should. Here are a couple of things you
- Does it work if you remove the space in "TaskPlugin=task/affinity,
task/cgroup"? (Slurm can be quite picky when reading slurm.conf).
- See in slurmd.log on the node(s) of the job if cgroup actually gets
activated and starts limit memory for the job, or if there are any
errors related to cgroup.
- While a job is running, see in the cgroup memory directory (typically
/sys/fs/cgroup/memory/slurm/uid_<num>/job_<num> for the job (on the
compute node). Does the values there, for instance
memory.limit_in_bytes and memory.max_usage_in_bytes, make sense?
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 832 bytes
Desc: not available
More information about the slurm-users