[slurm-users] Issues with pam_slurm_adopt

Brian Andrus toomuchit at gmail.com
Fri Apr 8 20:46:49 UTC 2022


Ok. Next I would check that the uid of the user is the same on the 
compute node as the head node.

It looks like it is identifying the job, but doesn't see it as yours.

Brian Andrus


On 4/8/2022 1:40 PM, Nicolas Greneche wrote:
> Hi Brian,
>
> Thanks, SELinux is neither in strict or targeted mode, I'm running 
> SLURM on Debian Bullseye with SELinux and Apparmor disabled.
>
> Thank you for your suggestion,
>
> Le 08/04/2022 à 21:43, Brian Andrus a écrit :
>> Check selinux.
>>
>> Run "getenforce" on the node, if it returns 1, try setting 
>> "setenforce 0"
>>
>> Slurm doesn't play well if selinux is enabled.
>>
>> Brian Andrus
>>
>>
>> On 4/8/2022 10:53 AM, Nicolas Greneche wrote:
>>> Hi,
>>>
>>> I have an issue with pam_slurm_adopt when I moved from 21.08.5 to 
>>> 21.08.6. It no longer works.
>>>
>>> When I log straight to the node with root account :
>>>
>>> Apr  8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
>>> Apr  8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from 
>>> 172.16.0.3 port 50884 ssh2: ...
>>> Apr  8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session 
>>> opened for user root(uid=0) by (uid=0)
>>>
>>> Everything is OK.
>>>
>>> I submit a very simple job, an infinite loop to keep the first 
>>> compute node busy :
>>>
>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ cat infinite.slurm
>>> #!/bin/bash
>>> #SBATCH --job-name=infinite
>>> #SBATCH --output=%x.%j.out
>>> #SBATCH --error=%x.%j.err
>>> #SBATCH --nodes=1
>>> srun infinite.sh
>>>
>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
>>> Submitted batch job 203
>>>
>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ squeue
>>>              JOBID PARTITION     NAME     USER ST       TIME NODES 
>>> NODELIST(REASON)
>>>                203   COMPUTE infinite nicolas.  R       0:03 1 magi46
>>>
>>> I have a job running on the node. When I try to log on the node with 
>>> the same regular account :
>>>
>>> nicolas.greneche at magi3:~/test-bullseye/infinite$ ssh magi46
>>> Access denied by pam_slurm_adopt: you have no active jobs on this node
>>> Connection closed by 172.16.0.46 port 22
>>>
>>> In the auth.log, we can see that the job found (JOBID 203) is found 
>>> but the PAM decides that I have no running job on node :
>>>
>>> Apr  8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access 
>>> denied for user `nicolas.greneche' from `172.16.0.3'
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2: 
>>> _establish_config_source: using 
>>> config_file=/run/slurm/conf/slurm.conf (cached)
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug: 
>>> slurm_conf_init: using config_file=/run/slurm/conf/slurm.conf
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading 
>>> slurm.conf file: /run/slurm/conf/slurm.conf
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading 
>>> cgroup.conf file /run/slurm/conf/cgroup.conf
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found 
>>> StepId=203.batch
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found 
>>> StepId=203.0
>>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access 
>>> denied by pam_slurm_adopt: you have no active jobs on this node
>>> Apr  8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user 
>>> nicolas.greneche by PAM account configuration [preauth]
>>>
>>> I may have miss something, if you have some tips, I'll be delighted.
>>>
>>> In appendices, I give you the configuration of sshd pam on compute 
>>> nodes and the slurm.conf :
>>>
>>> root at magi46:~# cat /etc/pam.d/sshd
>>> @include common-auth
>>> account    required     pam_nologin.so
>>> account  required     pam_access.so
>>> account  required     pam_slurm_adopt.so log_level=debug5
>>>
>>> @include common-account
>>> session [success=ok ignore=ignore module_unknown=ignore default=bad] 
>>>     pam_selinux.so close
>>> session    required     pam_loginuid.so
>>> session    optional     pam_keyinit.so force revoke
>>>
>>> @include common-session
>>> session    optional     pam_motd.so  motd=/run/motd.dynamic
>>> session    optional     pam_motd.so noupdate
>>> session    optional     pam_mail.so standard noenv
>>> session    required     pam_limits.so
>>> session    required     pam_env.so
>>> session    required     pam_env.so user_readenv=1 
>>> envfile=/etc/default/locale
>>> session [success=ok ignore=ignore module_unknown=ignore default=bad] 
>>>     pam_selinux.so open
>>>
>>> @include common-password
>>>
>>> root at slurmctld:~# cat /etc/slurm/slurm.conf
>>> ClusterName=magi
>>> ControlMachine=slurmctld
>>> SlurmUser=slurm
>>> AuthType=auth/munge
>>>
>>> MailProg=/usr/bin/mail
>>> SlurmdDebug=debug
>>>
>>> StateSaveLocation=/var/slurm
>>> SlurmdSpoolDir=/var/slurm
>>> SlurmctldPidFile=/var/slurm/slurmctld.pid
>>> SlurmdPidFile=/var/slurm/slurmd.pid
>>> SlurmdLogFile=/var/log/slurm/slurmd.log
>>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>>> SlurmctldParameters=enable_configless
>>>
>>> AccountingStorageHost=slurmctld
>>> JobAcctGatherType=jobacct_gather/linux
>>> AccountingStorageType=accounting_storage/slurmdbd
>>> AccountingStorageEnforce=associations
>>> JobRequeue=0
>>> SlurmdTimeout=600
>>>
>>> SelectType=select/cons_tres
>>> SelectTypeParameters=CR_CPU
>>>
>>> TmpFS=/scratch
>>>
>>> GresTypes=gpu
>>> PriorityType="priority/multifactor"
>>>
>>> Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 
>>> State=UNKNOWN
>>> Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14 
>>> ThreadsPerCore=2 RealMemory=92000 State=UNKNOWN
>>> Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10 
>>> ThreadsPerCore=2 RealMemory=64000 State=UNKNOWN
>>>
>>> PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE 
>>> State=UP
>>> PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 
>>> MaxTime=INFINITE State=UP Default=YES
>>>
>>> Thank you,
>>>
>>
>



More information about the slurm-users mailing list