[slurm-users] x11 forwarding not available?

Michael Jennings mej at lanl.gov
Tue Oct 16 13:41:15 MDT 2018


On Tuesday, 16 October 2018, at 09:30:13 (-0400),
Dave Botsch wrote:

> Hrm... it looks like the default install of OHPC went with DHA keys
> instead:
> 
> .ssh]$ cat config 
> # Added by Warewulf  2018-10-08
> Host *
>    IdentityFile ~/.ssh/cluster
>    StrictHostKeyChecking=no
> $ file cluster
> cluster: PEM DSA private key

That's not OHPC.  That's a (rather unfortunate) part of Warewulf
called `cluster-env`, a tool used to seamlessly make passphrase-less
SSH work within a cluster without admin/user intervention.  You can
see the code here:
  https://github.com/warewulf/warewulf3/blob/master/cluster/bin/cluster-env

If you install the warewulf-cluster RPM, a script installed as
/etc/profile.d/cluster-env.sh will run /usr/bin/cluster-env on each
login (for sh/ksh/bash users...and an equivalent script is installed
for csh/tcsh users).  See
e.g. https://github.com/warewulf/warewulf3/blob/master/cluster/etc/cluster-env.sh
for the stub script.

The above version on GitHub has been updated to use RSA keys instead
of DSA, but the *actually* correct solution, rather than forceably
altering each user's SSH configuration and ~/.ssh/ contents, is to
enable Host-based authentication for SSH in /etc/ssh/sshd_config (or
GSSAPI authentication, or host-based certificates, or any of the other
options available to have machines authenticate themselves so that
users can move between cluster hosts seamlessly and securely).

When that utility was written, DSA was the "state-of-the-art," and it
unfortunately went untouched for a very long time.  The key type
should not have been hard-coded with no way to permit site-specific
configuration, but it was.  As I said, though, there are better ways
to accomplish user auth between nodes without passphrases, and I
recommend disabling `cluster-env` and using one of those alternatives
instead.  (In fact, it's probably best to remove the entire
warewulf-cluster RPM.  wwinit and wwfirstboot are similarly ancient
tools in need of updating/replacement.)

As for X11 forwarding/authentication, there is no easy/simple answer
to why it won't work.  Lots of things need to be in sync for it to
work, including xauth, xhost, $DISPLAY, firewall rules, etc., and
there are numerous opportunities for minor misconfigurations to break
the whole kit-and-kaboodle.  To troubleshoot, I recommend examining
the values of $DISPLAY and the results of `xauth list` and `xhost`
under both working and non-working conditions, and see if you can see
a pattern.  Also make sure `ssh -Y` is being used all along the way,
not just `ssh -X`.

Our solution at LANL uses a 130-line PERL script that does proper
NFS-based locking of the user's ~/.Xauthority file, forceably resets
their $DISPLAY to the correct value, and adds the correct entry to
~/.Xauthority using `xauth add`.  Our experience has been that's the
only way to correctly handle all cases.  (And no, unfortunately I
can't share it, but it's not a difficult thing to write.)

Michael

-- 
Michael E. Jennings <mej at lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341     W: +1 (505) 606-0605



More information about the slurm-users mailing list