[slurm-users] Issues with pam_slurm_adopt
Brian Andrus
toomuchit at gmail.com
Fri Apr 8 19:43:31 UTC 2022
Check selinux.
Run "getenforce" on the node, if it returns 1, try setting "setenforce 0"
Slurm doesn't play well if selinux is enabled.
Brian Andrus
On 4/8/2022 10:53 AM, Nicolas Greneche wrote:
> Hi,
>
> I have an issue with pam_slurm_adopt when I moved from 21.08.5 to
> 21.08.6. It no longer works.
>
> When I log straight to the node with root account :
>
> Apr 8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
> Apr 8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from
> 172.16.0.3 port 50884 ssh2: ...
> Apr 8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session
> opened for user root(uid=0) by (uid=0)
>
> Everything is OK.
>
> I submit a very simple job, an infinite loop to keep the first compute
> node busy :
>
> nicolas.greneche at magi3:~/test-bullseye/infinite$ cat infinite.slurm
> #!/bin/bash
> #SBATCH --job-name=infinite
> #SBATCH --output=%x.%j.out
> #SBATCH --error=%x.%j.err
> #SBATCH --nodes=1
> srun infinite.sh
>
> nicolas.greneche at magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
> Submitted batch job 203
>
> nicolas.greneche at magi3:~/test-bullseye/infinite$ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> 203 COMPUTE infinite nicolas. R 0:03 1 magi46
>
> I have a job running on the node. When I try to log on the node with
> the same regular account :
>
> nicolas.greneche at magi3:~/test-bullseye/infinite$ ssh magi46
> Access denied by pam_slurm_adopt: you have no active jobs on this node
> Connection closed by 172.16.0.46 port 22
>
> In the auth.log, we can see that the job found (JOBID 203) is found
> but the PAM decides that I have no running job on node :
>
> Apr 8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access
> denied for user `nicolas.greneche' from `172.16.0.3'
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2:
> _establish_config_source: using config_file=/run/slurm/conf/slurm.conf
> (cached)
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: slurm_conf_init:
> using config_file=/run/slurm/conf/slurm.conf
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: Reading
> slurm.conf file: /run/slurm/conf/slurm.conf
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: Reading
> cgroup.conf file /run/slurm/conf/cgroup.conf
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found
> StepId=203.batch
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found StepId=203.0
> Apr 8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access
> denied by pam_slurm_adopt: you have no active jobs on this node
> Apr 8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user
> nicolas.greneche by PAM account configuration [preauth]
>
> I may have miss something, if you have some tips, I'll be delighted.
>
> In appendices, I give you the configuration of sshd pam on compute
> nodes and the slurm.conf :
>
> root at magi46:~# cat /etc/pam.d/sshd
> @include common-auth
> account required pam_nologin.so
> account required pam_access.so
> account required pam_slurm_adopt.so log_level=debug5
>
> @include common-account
> session [success=ok ignore=ignore module_unknown=ignore default=bad]
> pam_selinux.so close
> session required pam_loginuid.so
> session optional pam_keyinit.so force revoke
>
> @include common-session
> session optional pam_motd.so motd=/run/motd.dynamic
> session optional pam_motd.so noupdate
> session optional pam_mail.so standard noenv
> session required pam_limits.so
> session required pam_env.so
> session required pam_env.so user_readenv=1
> envfile=/etc/default/locale
> session [success=ok ignore=ignore module_unknown=ignore default=bad]
> pam_selinux.so open
>
> @include common-password
>
> root at slurmctld:~# cat /etc/slurm/slurm.conf
> ClusterName=magi
> ControlMachine=slurmctld
> SlurmUser=slurm
> AuthType=auth/munge
>
> MailProg=/usr/bin/mail
> SlurmdDebug=debug
>
> StateSaveLocation=/var/slurm
> SlurmdSpoolDir=/var/slurm
> SlurmctldPidFile=/var/slurm/slurmctld.pid
> SlurmdPidFile=/var/slurm/slurmd.pid
> SlurmdLogFile=/var/log/slurm/slurmd.log
> SlurmctldLogFile=/var/log/slurm/slurmctld.log
> SlurmctldParameters=enable_configless
>
> AccountingStorageHost=slurmctld
> JobAcctGatherType=jobacct_gather/linux
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStorageEnforce=associations
> JobRequeue=0
> SlurmdTimeout=600
>
> SelectType=select/cons_tres
> SelectTypeParameters=CR_CPU
>
> TmpFS=/scratch
>
> GresTypes=gpu
> PriorityType="priority/multifactor"
>
> Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
> State=UNKNOWN
> Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14
> ThreadsPerCore=2 RealMemory=92000 State=UNKNOWN
> Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10
> ThreadsPerCore=2 RealMemory=64000 State=UNKNOWN
>
> PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE
> State=UP
> PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 MaxTime=INFINITE
> State=UP Default=YES
>
> Thank you,
>
More information about the slurm-users
mailing list