[slurm-users] Proposed changes to pam_slurm_adopt
Jerome.Vienne at Squarepoint-Capital.com
Tue Jun 12 10:30:45 MDT 2018
While testing pam_slurm_adopt with multiple jobs running on the node with Centos7, I realized that it was failing with error messages like:
Jun 12 11:57:37 server pam_slurm_adopt: From 192.168.1.48 port 36512 as test1234: unable to determine source job
Jun 12 11:57:37 server pam_slurm_adopt: Couldn't stat path '/cgroup/memory/slurm/uid_1002/job_104'
Jun 12 11:57:37 server pam_slurm_adopt: Couldn't stat path '/cgroup/memory/slurm/uid_1002/job_105'
Jun 12 11:57:37 server pam_slurm_adopt: Couldn't stat path '/cgroup/memory/slurm/uid_1002/job_106'
Jun 12 11:57:37 server pam_slurm_adopt: Couldn't stat path '/cgroup/memory/slurm/uid_1002/job_107'
As explained in the documentation of the plugin, I knew that I had to change the subsystem in the function _inderterminate_multiple() and change "memory" by "cpuset' for my case.
But I am not a fan of hard-coded things, so I decided to modify the plugin to accept a new option that I called "cgoup_subsystem".
To select the subsystem used by slurm/cgroup, just set the value of subsystem after pam_slurm_adopt.so in /etc/pam.d/sshd, for example:
account sufficient pam_slurm_adopt.so subsystem=cpuset
If nothing is set, the default subsystem is memory like before.
After that and with the proposed modifications, everything was working as expected:
Jun 12 11:59:14 server pam_slurm_adopt: From 192.168.1.48 port 36644 as test1234: unable to determine source job
Jun 12 11:59:14 server pam_slurm_adopt: action_unknown: Picked job 116
Jun 12 11:59:14 server pam_slurm_adopt: Process 86610 adopted into job 116
I am attaching the modified version. I believe that it will be useful for some people and might be added to the next version of pam_slurm_adopt.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users