[slurm-users] ulimit in sbatch script
Bill Barth
bbarth at tacc.utexas.edu
Sun Apr 15 13:02:48 MDT 2018
Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example.
At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time.
Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all the nodes their job wants (cgroups, perhaps?).
Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:
Excuse me... I think the problem is not pam.d.
How do you interpret the following output?
[hamid at rocks7 case1_source2]$ sbatch slurm_script.sh
Submitted batch job 53
[hamid at rocks7 case1_source2]$ tail -f hvacSteadyFoam.log
max memory size (kbytes, -m) 65536000
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) 72089600
file locks (-x) unlimited
^C
[hamid at rocks7 case1_source2]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3
[hamid at rocks7 case1_source2]$ ssh compute-0-3
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Sun Apr 15 23:03:29 2018 from rocks7.local
Rocks Compute Node
Rocks 7.0 (Manzanita)
Profile built 19:21 11-Apr-2018
Kickstarted 19:37 11-Apr-2018
[hamid at compute-0-3 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256712
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[hamid at compute-0-3 ~]$
As you can see, the log file where I put "ulimit -a" before the main
command says limited virtual memory. However, when I login to the
node, it says unlimited!
Regards,
Mahmood
On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
> Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from.
>
> Best,
> Bill.
More information about the slurm-users
mailing list