[slurm-users] What's the best way to suppress core dump files from jobs?

Bill Barth bbarth at tacc.utexas.edu
Wed Mar 21 06:08:20 MDT 2018


You could set /etc/security/limits.conf on every node to contain something like (check my syntax):

* soft core 0
* hard  core 0

And make sure that /etc/pam.d/slurm.* and /etc/pam.d/system-auth* contain:

session     required      pam_limits.so
session     required      pam_limits.so

…so that limits are enforced for each user session. We have these lines in several other PAM files, but those above might be the minimum set for use with SLURM and SSH. Both sets of files might not be necessary, but if you allow ssh to compute nodes after a job is started, you probably need both.

Best,
Bill.


-- 
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 3/21/18, 6:08 AM, "slurm-users on behalf of Ole Holm Nielsen" <slurm-users-bounces at lists.schedmd.com on behalf of Ole.H.Nielsen at fysik.dtu.dk> wrote:

    We experience problems with MPI jobs dumping lots (1 per MPI task) of 
    multi-GB core dump files, causing problems for file servers and compute 
    nodes.
    
    The user has "ulimit -c 0" in his .bashrc file, but that's ignored when 
    slurmd starts the job, and the slurmd process limits are employed in stead.
    
    I should mention that we have decided to configure slurm.conf with
       PropagateResourceLimitsExcept=ALL
    because it's desirable to have rather restrictive user limits on login 
    nodes.  Unfortunately, this means that the user's "ulimit -c 0" isn't 
    propagated to any batch job.
    
    What's the best way to suppress core dump files from jobs?  Does anyone 
    have good or bad experiences?
    
    One working solution is to modify the slurmd Systemd service file 
    /usr/lib/systemd/system/slurmd.service to add a line:
       LimitCORE=0
    I've documented further details in my Slurm Wiki page 
    https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#slurmd-systemd-limits. 
      However, it's a bit cumbersome to modify the Systemd service file on 
    all compute nodes.
    
    Thanks for sharing any experiences.
    
    /Ole
    
    



More information about the slurm-users mailing list