[slurm-users] pam_slurm_adopt not working for all users

Thu May 27 15:27:24 UTC 2021

On Thursday, 27 May 2021, at 08:19:14 (+0200),
Loris Bennett wrote:

> Thanks for the detailed explanations.  I was obviously completely
> confused about what MUNGE does.  Would it be possible to say, in very
> hand-waving terms, that MUNGE performs a similar role for the access of
> processes to nodes as SSH does for the access of users to nodes?

If you replace the word "processes" with the word "jobs," you've got
it. :-)

MUNGE is really just intended to be a simple, lightweight solution to
allow for creating a single, global "credential domain" among all the
hosts in an HPC cluster using a single shared secret.  Without going
into too much detail with the crypto stuff, it basically allows a
trusted local entity to cryptographically prove to another that
they're both part of the same trust/cred domain; having established
this, they know they can trust each other to provide and/or validate
credentials between hosts.

But I want to emphasize the "single shared secret" part.  That means
there's a single trust domain.  Think "root of trust" with nothing but
the root of trust.  So you can authenticate a single group of hosts to
all the rest of the group such that all are equals, but that's it.
There's no additional facility for authenticating different roles or
anything like that.  Either you have the same shared secret or you
don't; nothing else is possible.

> Regarding keys vs. host-based SSH, I see that host-based would be more
> elegant, but would involve more configuration.  What exactly are the
> simplification gains you see? I just have a single cluster and naively I
> would think dropping a script into /etc/profile.d on the login node
> would be less work than re-configuring SSH for the login node and
> multiple compute node images.

I like to think of it as "one and done."  At least in our case at
LANL, and at LBNL previously, all nodes of the same type/group boot
the same VNFS image.  As long as I don't need to cryptographically
differentiate among, say, compute nodes, I only have to set up a
single set of credentials for all the hosts, and I'm done.

It also saves overall support time in my experience.  By taking the
responsibility for inter-machine trust myself at the system level, I
don't have to worry about (1) modifying a user's SSH config without
their knowledge, (2) running the risk of them messing with their
config and breaking it, or (3) any user support/services calls about
"why can't I do any of the things on the stuff?!"  :-)

It is totally a personal/team choice, but I'll be honest:  Once I
"discovered" host-based authentication and all the headaches it saved
our sysadmin and consulting teams, I was kicking myself for having
done it the other way for so long! :-D

> Regarding AuthorizedKeysCommand, I don't think we can use that, because
> users don't necessarily have existing SSH keys.  What abuse scenarios
> where you thinking of in connection with in-homedir key pairs?

Users don't have to have existing keys for it to work; the command you
specify can easily create a key pair, drop the private key, and output
the public key.  Or even simpler, you can specify a value for
"AuthorizedKeysFile" that points to a directory users can't write to,
and store a key pair for each user in that location.  Lots of ways to
do it.

But if I'm being frank about it, if I had my druthers, we'd be using
certificates for authentication, not files.  The advantages are, in my
very humble opinion, well worth a little extra setup time!

As far as abuse of keys goes:  What's stopping your user from taking
that private key you created for them (which is, as you recall,
*unencrypted*) outside of your cluster to another host somewhere else
on campus.  Maybe something that has tons of untrusted folks with
root.  Then any of those folks can SSH to your cluster as that user.

Credential theft is a *huge* problem in HPC across the world, so I
always recommend that sysadmins think of it as Public Enemy #1!  The
more direct and permanent control you have over user credentials, the
better. :-)

> Would it be correct to say that, if one were daft enough, one could
> build some sort of terminal server on top of MUNGE without using SSH,
> but which could then replicate basic SSH behaviour?

No; that would only provide a method to authenticate servers at best.
You can't authenticate users for the reasons I noted above.  Single
shared key, single trust domain.

> Your explanation is very clear, but it still seems like quite a few
> steps with various gotchas, like the fact that, as I understand it,
> shosts.equiv has to contain all the possible ways a host might be
> addressed (short name, long name, IP).

You are correct, though that's easy to automate with a teensy weensy
shell script.  But yes, there's more up-front configuration.  Again,
though, I truly believe it saves admin time in the long run (not to
mention user support staff time and user pain).  But again, that's a
personal or team choice.

I'm not sure if I'm clearing things up or just muddying the waters.
But hopefully at least *some* of that helped! :-D

Michael

-- 
Michael E. Jennings <mej at lanl.gov> - [PGPH: he/him/his/Mr]  --  hpc.lanl.gov
HPC Systems Engineer   --   Platforms Team   --  HPC Systems Group (HPC-SYS)
Strategic Computing Complex, Bldg. 03-2327, Rm. 2341    W: +1 (505) 606-0605
Los Alamos National Laboratory,  P.O. Box 1663,  Los Alamos, NM   87545-0001