[slurm-users] Issues with pam_slurm_adopt

Nicolas Greneche nicolas.greneche at univ-paris13.fr
Fri Apr 22 18:12:37 UTC 2022


Hi Juergen,

I found what went wrong. I forgot to specify :

PrologFlags=contain

before :

ProctrackType=proctrack/cgroup

My bad, it is specified in the documentation here :

https://slurm.schedmd.com/pam_slurm_adopt.html#SLURM_CONFIG

Many thanks for all your kind responses.

Le 08/04/2022 à 23:55, Juergen Salk a écrit :
> Hi Nicolas,
> 
> it looks like you have pam_access.so placed in your PAM stack *before*
> pam_slurm_adopt.so so this may get in your way. In fact, the logs
> indicate that it's pam_access and not pam_slurm_adopt that denies access
> in the first place:
> 
> Apr  8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access denied for user `nicolas.greneche' from `172.16.0.3'
> 
> Maybe the following web page is useful for you in order to setup
> your PAM stack with pam_slurm_adopt:
> 
>   https://slurm.schedmd.com/pam_slurm_adopt.html
> 
> --- snip ---
> 
> If you always want to allow access for an administrative group (e.g.,
> wheel), stack the pam_access module after pam_slurm_adopt. A success
> with pam_slurm_adopt is sufficient to allow access, but the pam_access
> module can allow others, such as administrative staff, access even
> without jobs on that node:
> 
> account    sufficient    pam_slurm_adopt.so
> account    required      pam_access.so
> 
> --- snip ---
> 
> We did it that way and this works fine for us. There is just one
> drawback, though, namely that adminisitrative users that are allowed
> to access compute nodes without having jobs on them do always get
> an annoying message from pam_slurm_adopt when doing so, even though
> login succeeds:
> 
> Access denied by pam_slurm_adopt: you have no active jobs on this node
> 
> We've gotten used to it, but now that I see it on the web page, maybe
> I'll take a look at the alternative approach with pam_listfile.so.
> 
> Best regards
> Jürgen
> 
> 
> * Nicolas Greneche <nicolas.greneche at univ-paris13.fr> [220408 19:53]:
>> Hi,
>>
>> I have an issue with pam_slurm_adopt when I moved from 21.08.5 to 21.08.6.
>> It no longer works.
>>
>> When I log straight to the node with root account :
>>
>> Apr  8 19:06:49 magi46 pam_slurm_adopt[20400]: Ignoring root user
>> Apr  8 19:06:49 magi46 sshd[20400]: Accepted publickey for root from
>> 172.16.0.3 port 50884 ssh2: ...
>> Apr  8 19:06:49 magi46 sshd[20400]: pam_unix(sshd:session): session opened
>> for user root(uid=0) by (uid=0)
>>
>> Everything is OK.
>>
>> I submit a very simple job, an infinite loop to keep the first compute node
>> busy :
>>
>> nicolas.greneche at magi3:~/test-bullseye/infinite$ cat infinite.slurm
>> #!/bin/bash
>> #SBATCH --job-name=infinite
>> #SBATCH --output=%x.%j.out
>> #SBATCH --error=%x.%j.err
>> #SBATCH --nodes=1
>> srun infinite.sh
>>
>> nicolas.greneche at magi3:~/test-bullseye/infinite$ sbatch infinite.slurm
>> Submitted batch job 203
>>
>> nicolas.greneche at magi3:~/test-bullseye/infinite$ squeue
>>               JOBID PARTITION     NAME     USER ST       TIME  NODES
>> NODELIST(REASON)
>>                 203   COMPUTE infinite nicolas.  R       0:03      1 magi46
>>
>> I have a job running on the node. When I try to log on the node with the
>> same regular account :
>>
>> nicolas.greneche at magi3:~/test-bullseye/infinite$ ssh magi46
>> Access denied by pam_slurm_adopt: you have no active jobs on this node
>> Connection closed by 172.16.0.46 port 22
>>
>> In the auth.log, we can see that the job found (JOBID 203) is found but the
>> PAM decides that I have no running job on node :
>>
>> Apr  8 19:11:32 magi46 sshd[20542]: pam_access(sshd:account): access denied
>> for user `nicolas.greneche' from `172.16.0.3'
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug2:
>> _establish_config_source: using config_file=/run/slurm/conf/slurm.conf
>> (cached)
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  slurm_conf_init:
>> using config_file=/run/slurm/conf/slurm.conf
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading slurm.conf
>> file: /run/slurm/conf/slurm.conf
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug:  Reading cgroup.conf
>> file /run/slurm/conf/cgroup.conf
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found
>> StepId=203.batch
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: debug4: found StepId=203.0
>> Apr  8 19:11:32 magi46 pam_slurm_adopt[20542]: send_user_msg: Access denied
>> by pam_slurm_adopt: you have no active jobs on this node
>> Apr  8 19:11:32 magi46 sshd[20542]: fatal: Access denied for user
>> nicolas.greneche by PAM account configuration [preauth]
>>
>> I may have miss something, if you have some tips, I'll be delighted.
>>
>> In appendices, I give you the configuration of sshd pam on compute nodes and
>> the slurm.conf :
>>
>> root at magi46:~# cat /etc/pam.d/sshd
>> @include common-auth
>> account    required     pam_nologin.so
>> account  required     pam_access.so
>> account  required     pam_slurm_adopt.so log_level=debug5
>>
>> @include common-account
>> session [success=ok ignore=ignore module_unknown=ignore default=bad]
>> pam_selinux.so close
>> session    required     pam_loginuid.so
>> session    optional     pam_keyinit.so force revoke
>>
>> @include common-session
>> session    optional     pam_motd.so  motd=/run/motd.dynamic
>> session    optional     pam_motd.so noupdate
>> session    optional     pam_mail.so standard noenv
>> session    required     pam_limits.so
>> session    required     pam_env.so
>> session    required     pam_env.so user_readenv=1
>> envfile=/etc/default/locale
>> session [success=ok ignore=ignore module_unknown=ignore default=bad]
>> pam_selinux.so open
>>
>> @include common-password
>>
>> root at slurmctld:~# cat /etc/slurm/slurm.conf
>> ClusterName=magi
>> ControlMachine=slurmctld
>> SlurmUser=slurm
>> AuthType=auth/munge
>>
>> MailProg=/usr/bin/mail
>> SlurmdDebug=debug
>>
>> StateSaveLocation=/var/slurm
>> SlurmdSpoolDir=/var/slurm
>> SlurmctldPidFile=/var/slurm/slurmctld.pid
>> SlurmdPidFile=/var/slurm/slurmd.pid
>> SlurmdLogFile=/var/log/slurm/slurmd.log
>> SlurmctldLogFile=/var/log/slurm/slurmctld.log
>> SlurmctldParameters=enable_configless
>>
>> AccountingStorageHost=slurmctld
>> JobAcctGatherType=jobacct_gather/linux
>> AccountingStorageType=accounting_storage/slurmdbd
>> AccountingStorageEnforce=associations
>> JobRequeue=0
>> SlurmdTimeout=600
>>
>> SelectType=select/cons_tres
>> SelectTypeParameters=CR_CPU
>>
>> TmpFS=/scratch
>>
>> GresTypes=gpu
>> PriorityType="priority/multifactor"
>>
>> Nodename=magi3 Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
>> State=UNKNOWN
>> Nodename=magi[107] Boards=1 Sockets=2 CoresPerSocket=14 ThreadsPerCore=2
>> RealMemory=92000 State=UNKNOWN
>> Nodename=magi[46-53] Boards=1 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2
>> RealMemory=64000 State=UNKNOWN
>>
>> PartitionName=MISC-56c Nodes=magi107 Priority=3000 MaxTime=INFINITE State=UP
>> PartitionName=COMPUTE Nodes=magi[46-53] Priority=3000 MaxTime=INFINITE
>> State=UP Default=YES
>>
>> Thank you,
>>
>> -- 
>> Nicolas Greneche
>> USPN
>> Support à la recherche / RSSI
>> https://www-magi.univ-paris13.fr
>>
> 

-- 
Nicolas Greneche
USPN
Support à la recherche / RSSI
https://www-magi.univ-paris13.fr



More information about the slurm-users mailing list