[slurm-users] pam_slurm_adopt and memory constraints?
Juergen Salk
juergen.salk at uni-ulm.de
Fri Jul 12 13:21:31 UTC 2019
Dear all,
I have configured pam_slurm_adopt in our Slurm test environment by
following the corresponding documentation:
https://slurm.schedmd.com/pam_slurm_adopt.html
I've set `PrologFlags=contain´ in slurm.conf and also have task/cgroup
enabled along with task/affinity (i.e. `TaskPlugin=task/affinity,task/cgroup´).
This is the current configuration in cgroups.conf:
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainKmemSpace=no
TaskAffinity=yes
PAM is enabled in /etc/ssh/sshd_config, i.e. `UsePAM yes´, which is
the default on RHEL7 anyway. SELinux is disabled on the system.
PAM configuration in /etc/pam.d/sshd (last two lines only):
[...]
# Authorize users that have a running job on node
account sufficient pam_slurm_adopt.so
account required pam_access.so nodefgroup
This does all work fine. Users can only log into a compute node if
they have at least one job running on it. Access is denied by
pam_slurm_adopt for all other users. When the user logs into a compute
node, the process environment is indeed "somewhat" adopted into the
external step of one of the running jobs. However *only* in terms of
the cpuset:
$ sbatch --mem=2G job.slurm
Submitted batch job 357
$ scontrol show job 357 | grep BatchHost
BatchHost=n1521
$ ssh n1521
Last login: Fri Jul 12 14:43:19 2019 from XXXX
$ cat /proc/self/cgroup
11:cpuset:/slurm/uid_900002/job_357/step_extern
10:hugetlb:/
9:perf_event:/
8:devices:/user.slice
7:net_prio,net_cls:/
6:cpuacct,cpu:/user.slice
5:pids:/user.slice
4:blkio:/user.slice
3:memory:/user.slice
2:freezer:/slurm/uid_900002/job_357/step_extern
1:name=systemd:/user.slice/user-900002.slice/session-35494.scope
$
Thus, the ssh session seems to be totally unconstrained by cgroups in
terms of memory usage. In fact, I was able to launch a test
application from the interactive ssh session that consumed almost all
of the memory on that node. That's obviously undesirable for a shared
user environment with jobs from different users running side by side
on one node at the same time.
I suppose this is nevertheless the expected behavior and just the way
it is when using pam_slurm_adopt to restrict access to the compute
nodes? Is that right? Or did I miss something obvious?
Thank you in advance for any comment.
Best regards
Jürgen Salk
PS: This is Slurm version 18.08.7 if that matters.
--
Jürgen Salk
Scientific Software & Compute Services (SSCS)
Kommunikations- und Informationszentrum (kiz)
Universität Ulm
Telefon: +49 (0)731 50-22478
Telefax: +49 (0)731 50-22471
More information about the slurm-users
mailing list