[slurm-users] Issues with pam_slurm_adopt

Nicolas Greneche nicolas.greneche at univ-paris13.fr
Fri Apr 8 20:55:46 UTC 2022


Yes they are all stored in LDAP directory :

root at magi3:~# id nicolas.greneche
uid=6001(nicolas.greneche) gid=6001(nicolas.greneche) 
groupes=6001(nicolas.greneche)

root at magi46:~# id nicolas.greneche
uid=6001(nicolas.greneche) gid=6001(nicolas.greneche) 
groupes=6001(nicolas.greneche)

UID are consistent on the whole cluster.

Le 08/04/2022 à 22:46, Brian Andrus a écrit :
> Ok. Next I would check that the uid of the user is the same on the 
> compute node as the head node.
> 
> It looks like it is identifying the job, but doesn't see it as yours.
> 
> Brian Andrus
> 
> 
> On 4/8/2022 1:40 PM, Nicolas Greneche wrote:
>> Hi Brian,
>>
>> Thanks, SELinux is neither in strict or targeted mode, I'm running 
>> SLURM on Debian Bullseye with SELinux and Apparmor disabled.
>>
>> Thank you for your suggestion,
>>
>> Le 08/04/2022 à 21:43, Brian Andrus a écrit :
>>> Check selinux.
>>>
>>> Run "getenforce" on the node, if it returns 1, try setting 
>>> "setenforce 0"
>>>
>>> Slurm doesn't play well if selinux is enabled.
>>>
>>> Brian Andrus
>>>
>>>
>>> On 4/8/2022 10:53 AM, Nicolas Greneche wrote:
>>>> Hi,
>>>>
>>>> I have an issue with pam_slurm_adopt when I moved from 21.08.5 to 
>>>> 21.08.6. It no longer works.
>>>>
>>>> When I log straight to the node with root account :
>>>>
>>>> Apr  8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
>>>> Apr  8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from 
>>>> 172.16.0.3 port 50884 ssh2: ...
>>>> Apr  8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session 
>>>> opened for user root(uid=0) by (uid=0)
>>>>
>>>> Everything is OK.
>>>>
>>>> I submit a very simple job, an infinite loop to keep the first 
>>>> compute node busy :
>>>>
>>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ cat infinite.slurm
>>>> #!/bin/bash
>>>> #SBATCH --job-name=infinite
>>>> #SBATCH --output=%x.%j.out
>>>> #SBATCH --error=%x.%j.err
>>>> #SBATCH --nodes=1
>>>> srun infinite.sh
>>>>
>>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
>>>> Submitted batch job 203
>>>>
>>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ squeue
>>>>              JOBID PARTITION     NAME     USER ST       TIME NODES 
>>>> NODELIST(REASON)
>>>>                203   COMPUTE infinite nicolas.  R       0:03 1 magi46
>>>>
>>>> I have a job running on the node. When I try to log on the node with 
>>>> the same regular account :
>>>>
>>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ ssh magi46
>>>> Access denied by pam_slurm_adopt: you have no active jobs on this node
>>>> Connection closed by 172.16.0.46 port 22
>>>>
>>>> In the auth.log, we can see that the job found (JOBID 203) is found 
>>>> but the PAM decides that I have no running job on node :
>>>>
>>>> Apr  8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access 
>>>> denied for user `nicolas.greneche' from `172.16.0.3'
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2: 
>>>> _establish_config_source: using 
>>>> config_file=/run/slurm/conf/slurm.conf (cached)
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: 
>>>> slurm_conf_init: using config_file=/run/slurm/conf/slurm.conf
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading 
>>>> slurm.conf file: /run/slurm/conf/slurm.conf
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading 
>>>> cgroup.conf file /run/slurm/conf/cgroup.conf
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found 
>>>> StepId=203.batch
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found 
>>>> StepId=203.0
>>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access 
>>>> denied by pam_slurm_adopt: you have no active jobs on this node
>>>> Apr  8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user 
>>>> nicolas.greneche by PAM account configuration [preauth]
>>>>
>>>> I may have miss something, if you have some tips, I'll be delighted.
>>>>
>>>> In appendices, I give you the configuration of sshd pam on compute 
>>>> nodes and the slurm.conf :
>>>>
>>>> root at magi46:~# cat /etc/pam.d/sshd
>>>> @include common-auth
>>>> account    required     pam_nologin.so
>>>> account  required     pam_access.so
>>>> account  required     pam_slurm_adopt.so log_level=debug5
>>>>
>>>> @include common-account
>>>> session [success=ok ignore=ignore module_unknown=ignore default=bad] 
>>>>     pam_selinux.so close
>>>> session    required     pam_loginuid.so
>>>> session    optional     pam_keyinit.so force revoke
>>>>
>>>> @include common-session
>>>> session    optional     pam_motd.so  motd=/run/motd.dynamic
>>>> session    optional     pam_motd.so noupdate
>>>> session    optional     pam_mail.so standard noenv
>>>> session    required     pam_limits.so
>>>> session    required     pam_env.so
>>>> session    required     pam_env.so user_readenv=1 
>>>> envfile=/etc/default/locale
>>>> session [success=ok ignore=ignore module_unknown=ignore default=bad] 
>>>>     pam_selinux.so open
>>>>
>>>> @include common-password
>>>>
>>>> root at slurmctld:~# cat /etc/slurm/slurm.conf
>>>> ClusterName=magi
>>>> ControlMachine=slurmctld
>>>> SlurmUser=slurm
>>>> AuthType=auth/munge
>>>>
>>>> MailProg=/usr/bin/mail
>>>> SlurmdDebug=debug
>>>>
>>>> StateSaveLocation=/var/slurm
>>>> SlurmdSpoolDir=/var/slurm
>>>> SlurmctldPidFile=/var/slurm/slurmctld.pid
>>>> SlurmdPidFile=/var/slurm/slurmd.pid
>>>> SlurmdLogFile=/var/log/slurm/slurmd.log
>>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>>>> SlurmctldParameters=enable_configless
>>>>
>>>> AccountingStorageHost=slurmctld
>>>> JobAcctGatherType=jobacct_gather/linux
>>>> AccountingStorageType=accounting_storage/slurmdbd
>>>> AccountingStorageEnforce=associations
>>>> JobRequeue=0
>>>> SlurmdTimeout=600
>>>>
>>>> SelectType=select/cons_tres
>>>> SelectTypeParameters=CR_CPU
>>>>
>>>> TmpFS=/scratch
>>>>
>>>> GresTypes=gpu
>>>> PriorityType="priority/multifactor"
>>>>
>>>> Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 
>>>> State=UNKNOWN
>>>> Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14 
>>>> ThreadsPerCore=2 RealMemory=92000 State=UNKNOWN
>>>> Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10 
>>>> ThreadsPerCore=2 RealMemory=64000 State=UNKNOWN
>>>>
>>>> PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE 
>>>> State=UP
>>>> PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 
>>>> MaxTime=INFINITE State=UP Default=YES
>>>>
>>>> Thank you,
>>>>
>>>
>>
> 

-- 
Nicolas Greneche
USPN
Support à la recherche / RSSI
https://www-magi.univ-paris13.fr



More information about the slurm-users mailing list