[slurm-users] What's the best way to suppress core dump files from jobs?
Bill Barth
bbarth at tacc.utexas.edu
Wed Mar 21 06:08:20 MDT 2018
You could set /etc/security/limits.conf on every node to contain something like (check my syntax):
* soft core 0
* hard core 0
And make sure that /etc/pam.d/slurm.* and /etc/pam.d/system-auth* contain:
session required pam_limits.so
session required pam_limits.so
…so that limits are enforced for each user session. We have these lines in several other PAM files, but those above might be the minimum set for use with SLURM and SSH. Both sets of files might not be necessary, but if you allow ssh to compute nodes after a job is started, you probably need both.
Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 3/21/18, 6:08 AM, "slurm-users on behalf of Ole Holm Nielsen" <slurm-users-bounces at lists.schedmd.com on behalf of Ole.H.Nielsen at fysik.dtu.dk> wrote:
We experience problems with MPI jobs dumping lots (1 per MPI task) of
multi-GB core dump files, causing problems for file servers and compute
nodes.
The user has "ulimit -c 0" in his .bashrc file, but that's ignored when
slurmd starts the job, and the slurmd process limits are employed in stead.
I should mention that we have decided to configure slurm.conf with
PropagateResourceLimitsExcept=ALL
because it's desirable to have rather restrictive user limits on login
nodes. Unfortunately, this means that the user's "ulimit -c 0" isn't
propagated to any batch job.
What's the best way to suppress core dump files from jobs? Does anyone
have good or bad experiences?
One working solution is to modify the slurmd Systemd service file
/usr/lib/systemd/system/slurmd.service to add a line:
LimitCORE=0
I've documented further details in my Slurm Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#slurmd-systemd-limits.
However, it's a bit cumbersome to modify the Systemd service file on
all compute nodes.
Thanks for sharing any experiences.
/Ole
More information about the slurm-users
mailing list