[slurm-users] What's the best way to suppress core dump files from jobs?
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Wed Mar 21 05:08:00 MDT 2018
We experience problems with MPI jobs dumping lots (1 per MPI task) of
multi-GB core dump files, causing problems for file servers and compute
nodes.
The user has "ulimit -c 0" in his .bashrc file, but that's ignored when
slurmd starts the job, and the slurmd process limits are employed in stead.
I should mention that we have decided to configure slurm.conf with
PropagateResourceLimitsExcept=ALL
because it's desirable to have rather restrictive user limits on login
nodes. Unfortunately, this means that the user's "ulimit -c 0" isn't
propagated to any batch job.
What's the best way to suppress core dump files from jobs? Does anyone
have good or bad experiences?
One working solution is to modify the slurmd Systemd service file
/usr/lib/systemd/system/slurmd.service to add a line:
LimitCORE=0
I've documented further details in my Slurm Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#slurmd-systemd-limits.
However, it's a bit cumbersome to modify the Systemd service file on
all compute nodes.
Thanks for sharing any experiences.
/Ole
More information about the slurm-users
mailing list