[slurm-users] pam_slurm_adopt not working for all users

Brian Andrus toomuchit at gmail.com
Tue May 25 16:23:55 UTC 2021


Your mistake is that munge has nothing to do with sshd, which is the 
daemon you are connecting to. It can use PAM (hence the ability to use 
pam_slurm_adopt), but munge has no pam integration that I am aware of.

As far as your /etc/skel bits, that is something that is done when a 
user's home is first created at initial login (if so configured). So, 
depending on how/where they did that, such items should be automatically 
created.
ssh keys, however, are not created automatically. As others have 
mentioned, you can create a script in /etc/profile.d/ where some of your 
initial items can be executed. We have HPC_Setup.sh in there where we 
create ssh keys, setup their .forward file and other setup tasks.

Brian Andrus

On 5/25/2021 5:09 AM, Loris Bennett wrote:
> Hi everyone,
>
> Thanks for all the replies.
>
> I think my main problem is that I expect logging in to a node with a job
> to work with pam_slurm_adopt but without any SSH keys.  My assumption
> was that MUNGE takes care of the authentication, since users' jobs start
> on nodes with the need for keys.
>
> Can someone confirm that this expectation is wrong and, if possible, why
> the analogy with jobs is incorrect?
>
> I have a vague memory that this used work on our old cluster with an
> older version of Slurm, but I could be thinking of a time before we set
> up pam_slurm_adopt.
>
> Cheers,
>
> Loris
>    
>
> Brian Andrus <toomuchit at gmail.com> writes:
>
>> Oh, you could also use the ssh-agent to mange the keys, then use 'ssh-add
>> ~/.ssh/id_rsa' to type the passphrase once for your whole session (from that
>> system).
>>
>> Brian Andrus
>>
>>
>> On 5/21/2021 5:53 AM, Loris Bennett wrote:
>>> Hi,
>>>
>>> We have set up pam_slurm_adopt using the official Slurm documentation
>>> and Ole's information on the subject.  It works for a user who has SSH
>>> keys set up, albeit the passphrase is needed:
>>>
>>>     $ salloc --partition=gpu --gres=gpu:1 --qos=hiprio --ntasks=1 --time=00:30:00 --mem=100
>>>     salloc: Granted job allocation 7202461
>>>     salloc: Waiting for resource configuration
>>>     salloc: Nodes g003 are ready for job
>>>
>>>     $ ssh g003
>>>     Warning: Permanently added 'g003' (ECDSA) to the list of known hosts.
>>>     Enter passphrase for key '/home/loris/.ssh/id_rsa':
>>>     Last login: Wed May  5 08:50:00 2021 from login.curta.zedat.fu-berlin.de
>>>
>>>     $ ssh g004
>>>     Warning: Permanently added 'g004' (ECDSA) to the list of known hosts.
>>>     Enter passphrase for key '/home/loris/.ssh/id_rsa':
>>>     Access denied: user loris (uid=182317) has no active jobs on this node.
>>>     Access denied by pam_slurm_adopt: you have no active jobs on this node
>>>     Authentication failed.
>>>
>>> If SSH keys are not set up, then the user is asked for a password:
>>>
>>>     $ squeue --me
>>>                  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
>>>                7201647      main test_job nokeylee  R    3:45:24      1 c005
>>>                7201646      main test_job nokeylee  R    3:46:09      1 c005
>>>     $ ssh c005
>>>     Warning: Permanently added 'c005' (ECDSA) to the list of known hosts.
>>>     nokeylee at c005's password:
>>>
>>> My assumption was that a user should be able to log into a node on which
>>> that person has a running job without any further ado, i.e. without the
>>> necessity to set up anything else or to enter any credentials.
>>>
>>> Is this assumption correct?
>>>
>>> If so, how can I best debug what I have done wrong?
>>>
>>> Cheers,
>>>
>>> Loris
>>>



More information about the slurm-users mailing list