<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>So I decided to eat my own dog food, and tested this out myself.
First of all, running ulimit through srun "naked" like that
doesn't work, since ulimit is a bash shell builtin, so I had to
write a simple shell script: <br>
</p>
<pre>$ cat ulimit.sh </pre>
<pre>#!/bin/bash</pre>
<pre>
ulimit -a
</pre>
<p>By default, core is set to zero in our environment as a good
security practice and to keep user's core dumps from filling up
the filesystem. My default ulimit settings: <br>
</p>
<p><font face="monospace">$ ulimit -a <br>
core file size (blocks, -c) 0<br>
data seg size (kbytes, -d) unlimited<br>
scheduling priority (-e) 0<br>
file size (blocks, -f) unlimited<br>
pending signals (-i) 128054<br>
max locked memory (kbytes, -l) unlimited<br>
max memory size (kbytes, -m) unlimited<br>
open files (-n) 1024<br>
pipe size (512 bytes, -p) 8<br>
POSIX message queues (bytes, -q) 819200<br>
real-time priority (-r) 0<br>
stack size (kbytes, -s) unlimited<br>
cpu time (seconds, -t) unlimited<br>
max user processes (-u) 4096<br>
virtual memory (kbytes, -v) unlimited<br>
file locks (-x) unlimited</font><br>
</p>
<p>Now I run my ulimit.sh script through srun<br>
</p>
<p><font face="monospace">$ srun -N1 -n1 -t 00:01:00 --mem=1G
./ulimit.sh <br>
srun: job 1249977 queued and waiting for resources<br>
srun: job 1249977 has been allocated resources<br>
core file size (blocks, -c) 0<br>
data seg size (kbytes, -d) unlimited<br>
scheduling priority (-e) 0<br>
file size (blocks, -f) unlimited<br>
pending signals (-i) 257092<br>
max locked memory (kbytes, -l) unlimited<br>
max memory size (kbytes, -m) 1048576<br>
open files (-n) 1024<br>
pipe size (512 bytes, -p) 8<br>
POSIX message queues (bytes, -q) 819200<br>
real-time priority (-r) 0<br>
stack size (kbytes, -s) unlimited<br>
cpu time (seconds, -t) unlimited<br>
max user processes (-u) 4096<br>
virtual memory (kbytes, -v) unlimited<br>
file locks (-x) unlimited</font><br>
</p>
<p>Now I set core size: <br>
</p>
<p><font face="monospace">$ ulimit -c 1024<br>
(base) [pbisbal@sunfire01 ulimit]$ ulimit -c <br>
1024</font><br>
</p>
<p>And run ulimit.sh through srun again: <br>
</p>
<p><font face="monospace">$ srun -N1 -n1 -t 00:01:00 --mem=1G
./ulimit.sh <br>
srun: job 1249978 queued and waiting for resources<br>
srun: job 1249978 has been allocated resources<br>
core file size (blocks, -c) 1024<br>
data seg size (kbytes, -d) unlimited<br>
scheduling priority (-e) 0<br>
file size (blocks, -f) unlimited<br>
pending signals (-i) 257092<br>
max locked memory (kbytes, -l) unlimited<br>
max memory size (kbytes, -m) 1048576<br>
open files (-n) 1024<br>
pipe size (512 bytes, -p) 8<br>
POSIX message queues (bytes, -q) 819200<br>
real-time priority (-r) 0<br>
stack size (kbytes, -s) unlimited<br>
cpu time (seconds, -t) unlimited<br>
max user processes (-u) 4096<br>
virtual memory (kbytes, -v) unlimited<br>
file locks (-x) unlimited</font><br>
</p>
<p>This confirms that PropagateResourceLimits comes from the user's
environment, not PAM. If you have UsePAM enabled as Ryan suggested
in a previous e-mail, that puts *upper limits* on the values
propagated by PropagateResourceLimits. According to the slurm.conf
man age, it doesn't necessarily override the limits set in the
environment when the job is submitted:<br>
</p>
<p>
<blockquote type="cite"> UsePAM If set to 1, PAM (Pluggable
Authentication Modules for Linux)<br>
will be enabled. PAM is used to establish the
upper bounds for<br>
resource limits. With PAM support enabled, local
system adminis‐<br>
trators can dynamically configure system resource
limits. Chang‐<br>
ing the upper bound of a resource limit will not
alter the lim‐<br>
its of running jobs, only jobs started after a
change has been<br>
made will pick up the new limits. The default
value is 0 (not<br>
to enable PAM support)....</blockquote>
</p>
<p>So if I set core file size to 0 and /etc/security/limits.conf
sets it to 1024, if UsePAM=1 and PropagateResourceLimits=ALL (the
default for that setting), core file size will stay 0. If I set it
to 2048 and UsePAM=1, then Slurm will reduce that limit to 1024. <br>
</p>
<p>Note that setting UsePAM=1 alone isn't enough. You need to
configure a PAM module named slurm, too, as Ryan pointed out. <br>
</p>
<pre class="moz-signature" cols="72">Prentice</pre>
<div class="moz-cite-prefix">On 4/29/21 12:35 PM, Prentice Bisbal
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:b35727e7-2585-6d6c-f989-9d4ec2d2f649@pppl.gov">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>On 4/28/21 2:26 AM, Diego Zuccato wrote:<br>
</p>
<blockquote type="cite"
cite="mid:727503b2-587b-b7a8-fe21-2ea98b3a310c@unibo.it">Il
27/04/2021 17:31, Prentice Bisbal ha scritto: <br>
<br>
<blockquote type="cite">I don't think PAM comes into play here.
Since Slurm is starting the processes on the compute nodes as
the user, etc., PAM is being bypassed. <br>
</blockquote>
Then maybe slurmd somehow goes throught the PAM stack another
way, since limits on the frontend got propagated (as implied by
PropagateResourceLimits default value of ALL). <br>
And I can confirm that setting it to NONE seems to have solved
the issue: users on the frontend get limited resources, and jobs
on the nodes get the resources they asked. <br>
<br>
</blockquote>
<p>In this case, Slurm is deliberately looking at the resource
limits effect when the job is submitted on the submission host,
and then copying them the to job's environment. From the
slurm.conf documentation (<a class="moz-txt-link-freetext"
href="https://slurm.schedmd.com/slurm.conf.html"
moz-do-not-send="true">https://slurm.schedmd.com/slurm.conf.html</a>):
<br>
</p>
<p> </p>
<blockquote type="cite">
<dl compact="compact">
<dt><b>PropagateResourceLimits</b></dt>
<dd> A comma-separated list of resource limit names. The
slurmd daemon uses these names to obtain the associated
(soft) limit values from the user's process environment on
the submit node. These limits are then propagated and
applied to the jobs that will run on the compute nodes.'</dd>
</dl>
</blockquote>
<p>Then later on, it indicates that all resource limits are
propagated by default: <br>
</p>
<p> </p>
<blockquote type="cite">The following limit names are supported by
Slurm (although some options may not be supported on some
systems):
<dl compact="compact">
<dt><b>ALL</b></dt>
<dd> All limits listed below (default)<br>
</dd>
</dl>
</blockquote>
<p>You should be able to verify this yourself in the following
manner: <br>
</p>
<p>1. Start two separate shells on the submission host</p>
<p>2. Change the limits in one of the shells. For example, reduce
core size to 0, with 'ulimit -c 0' in just one shell. <br>
</p>
<p>3. Then run 'srun ulimit -a' from each shell. <br>
</p>
<p>4. Compare the output. The one shell should show that core size
is now zero. <br>
</p>
<p>--</p>
<p>Prentice<br>
</p>
</blockquote>
</body>
</html>