[slurm-users] ulimit in sbatch script

Bill Barth bbarth at tacc.utexas.edu
Sun Apr 15 13:02:48 MDT 2018


Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example. 

At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time. 

Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all  the nodes their job wants (cgroups, perhaps?).

Best,
Bill.

-- 
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435            |   Fax:   (512) 475-9445
 
 

On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:

    Excuse me... I think the problem is not pam.d.
    How do you interpret the following output?
    
    
    [hamid at rocks7 case1_source2]$ sbatch slurm_script.sh
    Submitted batch job 53
    [hamid at rocks7 case1_source2]$ tail -f hvacSteadyFoam.log
    max memory size         (kbytes, -m) 65536000
    open files                      (-n) 1024
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 4096
    virtual memory          (kbytes, -v) 72089600
    file locks                      (-x) unlimited
    ^C
    [hamid at rocks7 case1_source2]$ squeue
                 JOBID PARTITION     NAME     USER ST       TIME  NODES
    NODELIST(REASON)
                    53   CLUSTER hvacStea    hamid  R       0:27      1 compute-0-3
    [hamid at rocks7 case1_source2]$ ssh compute-0-3
    Warning: untrusted X11 forwarding setup failed: xauth key data not generated
    Last login: Sun Apr 15 23:03:29 2018 from rocks7.local
    Rocks Compute Node
    Rocks 7.0 (Manzanita)
    Profile built 19:21 11-Apr-2018
    
    Kickstarted 19:37 11-Apr-2018
    [hamid at compute-0-3 ~]$ ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 256712
    max locked memory       (kbytes, -l) unlimited
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 1024
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 4096
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    [hamid at compute-0-3 ~]$
    
    
    
    As you can see, the log file where I put  "ulimit -a" before the main
    command says limited virtual memory. However, when I login to the
    node, it says unlimited!
    
    Regards,
    Mahmood
    
    
    
    
    On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
    > Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from.
    >
    > Best,
    > Bill.
    
    



More information about the slurm-users mailing list