[slurm-users] Srun not setting DISPLAY with --x11 for one account

Simon Andrews simon.andrews at babraham.ac.uk
Mon Jan 27 14:16:27 UTC 2020


Thanks for the reply.  I’m using 18.08.

I’ve done some more digging and found that it’s not just one account but a load of newly created accounts which suffer from this problem.  Older accounts are OK, but new ones don’t work.

In the simplest case a completely new account fails.  It contains the expected .bashrc and .bash_profile and .bash_logout which are copied from /etc/skel.  I really can’t see what’s in any way different between those and the older accounts which still work.

Is there any way to turn up the debugging on srun to try to see what it’s attempting (and what might be failing)?

Simon.

From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of William Brown
Sent: 24 January 2020 17:21
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

There are differences for X11 between Slurm versions so it may help to know which version you have.

I tried some of your commands on our slurm 19.05.3-2 cluster, and interestingly on the session on the compute node I don't see the cookie for the login node:  This was with MobaXterm:

[user at prdubrvm005 ~]$ xauth list
prdubrvm005.research.rcsi.com/unix:10<http://prdubrvm005.research.rcsi.com/unix:10>  MIT-MAGIC-COOKIE-1  2efc5dd851736e3848193f65d038eca8
[user at prdubrvm005 ~]$ srun --pty  --x11  --preserve-env /bin/bash
[user at prdubrhpc1-02 ~]$ xauth list
prdubrhpc1-02.research.rcsi.com/unix:95<http://prdubrhpc1-02.research.rcsi.com/unix:95>  MIT-MAGIC-COOKIE-1  2efc5dd851736e3848193f65d038eca8
[user at prdubrhpc1-02 ~]$ echo $DISPLAY
localhost:95.0

Any per-user problem would make me suspect the user having a different shell, or something in their login script.  Can you make their .bashrc and .bash_profile just exit?  Or look for hidden configuration files for <something> in their home directory?

William



On Fri, 24 Jan 2020 at 16:05, Simon Andrews <simon.andrews at babraham.ac.uk<mailto:simon.andrews at babraham.ac.uk>> wrote:
I have a weird problem which I can’t get to the bottom of.

We have a cluster which allows users to start interactive sessions which forward any X11 sessions they generated on the head node.  This generally works fine, but on the account of one user it doesn’t work.  The X11 connection to the head node is fine, but it won’t transfer to the compute node.

The symptoms are shown below:

A good user gets this:

[good at headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:12<http://headnode.babraham.ac.uk/unix:12>  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a

[good at headnode ~]$ echo $DISPLAY
localhost:12.0

[good at headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[good at compute ~]$ xauth list
headnode.babraham.ac.uk/unix:12<http://headnode.babraham.ac.uk/unix:12>  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a
compute/unix:25  MIT-MAGIC-COOKIE-1  f04a2bf9a921a3357e44373655add14a

[good at compute ~]$ echo $DISPLAY
localhost:25.0

So the cookie is copied from the head node and forwarded and the DISPLAY variable is updated.

The bad user gets this:

[bad at headnode ~]$ xauth list
headnode.babraham.ac.uk/unix:10<http://headnode.babraham.ac.uk/unix:10>  MIT-MAGIC-COOKIE-1  c39a493a37132d308b37469d363d8692

[bad at headnode ~]$ echo $DISPLAY
localhost:10.0

[bad at headnode ~]$ srun --pty -p interactive --x11  --preserve-env /bin/bash

[bad at compute ~]$ xauth list
headnode.babraham.ac.uk/unix:10<http://headnode.babraham.ac.uk/unix:10>  MIT-MAGIC-COOKIE-1  c39a493a37132d308b37469d363d8692

[bad at compute ~]$ echo $DISPLAY
localhost:10.0

So the cookie isn’t copied and the DISPLAY isn’t updated.  I can’t see any errors in the logs and I can’t see anything different about this account.

If I do a straight forward ssh -Y from the head node to a compute node from the bad account then that works fine – it’s only whatever is specific about the way that srun forwards X which fails.

Any ideas or suggestions for debugging would be appreciated as I’m running out of things to try!

Simon.
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT Registered Charity No. 1053902.
The information transmitted in this email is directed only to the addressee. If you received this in error, please contact the sender and delete this email from your system. The contents of this e-mail are the views of the sender and do not necessarily represent the views of the Babraham Institute. Full conditions at: www.babraham.ac.uk<http://www.babraham.ac.uk/terms>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200127/41c090d5/attachment-0001.htm>


More information about the slurm-users mailing list