[slurm-users] Job memory not being tracked correctly for some processes

JinSung Kang jskang815 at gmail.com
Wed Jan 30 16:36:18 UTC 2019


Hello,

Memory for one of the jobs is going over the limit and slurm lets it run
and the job does not terminate. The job forks multiple jobs, but I don't
think we have had problems with slurm calculating total memory usage for
these kind of jobs. I've tested it on a single thread and the killing
mechanism does work. Does anyone know what might be the problem?

RSS when I check for the job using ps is way over the limit (5x). Previous
instances of this jobs have either timed out or cancelled and the memory
usage is not logged on sacct. So I have no idea as to what the MaxRSS slurm
sees.

I have included some information to explain my configuration, but let me
know if you need more to figure it out.

Ubuntu 16.04.5
slurm version 17.02.9

JobAcctGatherType= jobacct_gather/linux
Virtual Memory limit has been removed/unlimited

Thank you,

Jin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190130/7d16c02d/attachment.html>


More information about the slurm-users mailing list