[slurm-users] pam_slurm_adopt always claims now active jobs even when they do
Wensheng Deng
wd35 at nyu.edu
Thu Oct 29 12:10:20 UTC 2020
Interesting...
On Thu, Oct 29, 2020 at 7:56 AM Paul Raines <raines at nmr.mgh.harvard.edu>
wrote:
> The debugging was useful. The problem turned out to be that I am running
> with SELINUX enabled due to corporate policy. The issue was SELINUX is
> blocking sshd access to /var/slurm/spool/d socket files:
>
> time->Thu Oct 29 07:53:50 2020
> type=AVC msg=audit(1603972430.809:2800): avc: denied { write } for
> pid=403840 comm="sshd" name="rtx-05_811.4294967295" dev="md122"
> ino=2228938
> scontext=system_u:system_r:sshd_t:s0-s0:c0.c1023
> tcontext=system_u:object_r:var_t:s0 tclass=sock_file permissive=1
>
> -- Paul Raines (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__help.nmr.mgh.harvard.edu&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6ZbAjmsY_ctWb2nIYk4TwA&m=55j7GZv0WW5yLS9lqlcErzo16T96n7tbbKS7vkQQGMY&s=pDHwRFWpaxduFlP_-Lbr4ptx29rgWubIiN_DSCWqBYs&e=
> )
>
>
>
> On Mon, 26 Oct 2020 9:26am, Paul Raines wrote:
>
> >
> > With debugging on I get:
> >
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: Reading
> slurm.conf
> > file: /etc/slurm/slurm.conf
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid =
> 808,
> > stepid = 4294967295
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug4: found jobid =
> 808,
> > stepid = 0
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug: _step_connect:
> > connect() failed dir /var/slurm/spool/d node rtx-03 step 808.4294967295
> > Permission denied
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: debug3: unable to
> connect to
> > step 808.4294967295 on rtx-03: Permission denied
> > Oct 26 09:22:33 rtx-03 pam_slurm_adopt[176647]: send_user_msg: Access
> denied
> > by pam_slurm_adopt: you have no active jobs on this node
> > Oct 26 09:22:33 rtx-03 sshd[176647]: pam_access(sshd:account): access
> denied
> > for user `raines' from `10.162.254.11'
> > Oct 26 09:22:33 rtx-03 sshd[176647]: fatal: Access denied for user
> raines by
> > PAM account configuration [preauth]
> >
> >
> > -- Paul Raines (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__help.nmr.mgh.harvard.edu&d=DwIDaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=6ZbAjmsY_ctWb2nIYk4TwA&m=55j7GZv0WW5yLS9lqlcErzo16T96n7tbbKS7vkQQGMY&s=pDHwRFWpaxduFlP_-Lbr4ptx29rgWubIiN_DSCWqBYs&e=
> )
> >
> >
> >
> > On Fri, 23 Oct 2020 11:12pm, Wensheng Deng wrote:
> >
> >> Append ‘log_level=debug5’ to the pam_slurm_adopt line in system-auth,
> >> restart sshd, try a new job and ssh session. then check log message in
> >> /var/log/secure...
> >>
> >>
> >> On Fri, Oct 23, 2020 at 9:04 PM Paul Raines <
> raines at nmr.mgh.harvard.edu>
> >> wrote:
> >>
> >>>
> >>> I am running Slurm 20.02.3 on CentOS 7 systems. I have
> pam_slurm_adopt
> >>> setup in /etc/pam.d/system-auth and slurm.conf has
> >>> PrologFlags=Contain,X11
> >>> I also have masked systemd-logind
> >>>
> >>> But pam_slurm_adopt always denies login with "Access denied by
> >>> pam_slurm_adopt: you have no active jobs on this node" even when the
> >>> user most definitely has a job running on the node via srun
> >>>
> >>> Any clues as to why pam_slurm_adopt thinks there is no job?
> >>>
> >>> serena [raines] squeue
> >>> JOBID PARTITION NAME USER ST TIME NODES
> >>> NODELIST(REASON)
> >>> 785 lcnrtx tcsh raines R 19:44:51 1
> >>> rtx-03
> >>> serena [raines] ssh rtx-03
> >>> Access denied by pam_slurm_adopt: you have no active jobs on this node
> >>> Authentication failed.
> >>>
> >>>
> >>>
> >>
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201029/3e157d48/attachment.htm>
More information about the slurm-users
mailing list