[slurm-users] X11 forwarding issues

Russell Jones arjones85 at gmail.com
Tue Nov 17 15:41:51 UTC 2020


Thank you!

I do have X11UseLocalhost set to no, X11Forwarding set to yes:

[root at cluster-cn02 ssh]# sshd -T | grep -i X11
x11displayoffset 10
x11maxdisplays 1000
x11forwarding yes
x11uselocalhost no


No firewalls on this network between the login node and compute node.

On Tue, Nov 17, 2020 at 1:21 AM Patrick Bégou <
Patrick.Begou at legi.grenoble-inp.fr> wrote:

> Hi Russell Jones,
>
> did you try to stop firewall on the client cluster-cn02 ?
>
> Patrick
>
> Le 16/11/2020 à 19:20, Russell Jones a écrit :
>
> Here's some debug logs from the compute node after launching an
> interactive shell with the x11 flag. I see it show X11 forwarding
> established, then it ends with a connection timeout.
>
>
> [2020-11-16T12:12:34.097] debug:  Checking credential with 492 bytes of
> sig data
> [2020-11-16T12:12:34.098] _run_prolog: run job script took usec=1284
> [2020-11-16T12:12:34.098] _run_prolog: prolog with lock for job 30873 ran
> for 0 seconds
> [2020-11-16T12:12:34.111] debug:  AcctGatherEnergy NONE plugin loaded
> [2020-11-16T12:12:34.112] debug:  AcctGatherProfile NONE plugin loaded
> [2020-11-16T12:12:34.113] debug:  AcctGatherInterconnect NONE plugin loaded
> [2020-11-16T12:12:34.114] debug:  AcctGatherFilesystem NONE plugin loaded
> [2020-11-16T12:12:34.115] debug:  switch NONE plugin loaded
> [2020-11-16T12:12:34.116] debug:  init: Gres GPU plugin loaded
> [2020-11-16T12:12:34.116] [30873.extern] debug:  Job accounting gather
> LINUX plugin loaded
> [2020-11-16T12:12:34.117] [30873.extern] debug:  cont_id hasn't been set
> yet not running poll
> [2020-11-16T12:12:34.117] [30873.extern] debug:  Message thread started
> pid = 18771
> [2020-11-16T12:12:34.119] [30873.extern] debug:  task NONE plugin loaded
> [2020-11-16T12:12:34.120] [30873.extern] Munge credential signature plugin
> loaded
> [2020-11-16T12:12:34.121] [30873.extern] debug:  job_container none plugin
> loaded
> [2020-11-16T12:12:34.121] [30873.extern] debug:  spank: opening plugin
> stack /apps/slurm/cluster/20.02.0/etc/plugstack.conf
> [2020-11-16T12:12:34.121] [30873.extern] debug:  X11Parameters: (null)
> [2020-11-16T12:12:34.133] [30873.extern] X11 forwarding established on
> DISPLAY=cluster-cn02.domain:66.0
> [2020-11-16T12:12:34.133] [30873.extern] debug:  jag_common_poll_data:
> Task 0 pid 18775 ave_freq = 4023000 mem size/max 6750208/6750208 vmem
> size/max 147521536/147521536, disk read size/max (7200/7200), disk write
> size/max (374/374), time 0.000000(0+0) Energy tot/max 0/0 TotPower 0
> MaxPower 0 MinPower 0
> [2020-11-16T12:12:34.133] [30873.extern] debug:  x11 forwarding local
> display is 66
> [2020-11-16T12:12:34.133] [30873.extern] debug:  x11 forwarding local
> xauthority is /tmp/.Xauthority-MkU8aA
> [2020-11-16T12:12:34.202] launch task 30873.0 request from UID:1368
> GID:512 HOST:172.21.150.10 PORT:4795
> [2020-11-16T12:12:34.202] debug:  Checking credential with 492 bytes of
> sig data
> [2020-11-16T12:12:34.202] [30873.extern] debug:  Handling
> REQUEST_X11_DISPLAY
> [2020-11-16T12:12:34.202] [30873.extern] debug:  Leaving
> _handle_get_x11_display
> [2020-11-16T12:12:34.202] debug:  Leaving stepd_get_x11_display
> [2020-11-16T12:12:34.202] debug:  Waiting for job 30873's prolog to
> complete
> [2020-11-16T12:12:34.202] debug:  Finished wait for job 30873's prolog to
> complete
> [2020-11-16T12:12:34.213] debug:  AcctGatherEnergy NONE plugin loaded
> [2020-11-16T12:12:34.214] debug:  AcctGatherProfile NONE plugin loaded
> [2020-11-16T12:12:34.214] debug:  AcctGatherInterconnect NONE plugin loaded
> [2020-11-16T12:12:34.214] debug:  AcctGatherFilesystem NONE plugin loaded
> [2020-11-16T12:12:34.215] debug:  switch NONE plugin loaded
> [2020-11-16T12:12:34.215] debug:  init: Gres GPU plugin loaded
> [2020-11-16T12:12:34.216] [30873.0] debug:  Job accounting gather LINUX
> plugin loaded
> [2020-11-16T12:12:34.216] [30873.0] debug:  cont_id hasn't been set yet
> not running poll
> [2020-11-16T12:12:34.216] [30873.0] debug:  Message thread started pid =
> 18781
> [2020-11-16T12:12:34.216] debug:  task_p_slurmd_reserve_resources: 30873 0
> [2020-11-16T12:12:34.217] [30873.0] debug:  task NONE plugin loaded
> [2020-11-16T12:12:34.217] [30873.0] Munge credential signature plugin
> loaded
> [2020-11-16T12:12:34.217] [30873.0] debug:  job_container none plugin
> loaded
> [2020-11-16T12:12:34.217] [30873.0] debug:  mpi type = pmix
> [2020-11-16T12:12:34.244] [30873.0] debug:  spank: opening plugin stack
> /apps/slurm/cluster/20.02.0/etc/plugstack.conf
> [2020-11-16T12:12:34.244] [30873.0] debug:  mpi type = pmix
> [2020-11-16T12:12:34.244] [30873.0] debug:  (null) [0] mpi_pmix.c:153
> [p_mpi_hook_slurmstepd_prefork] mpi/pmix: start
> [2020-11-16T12:12:34.244] [30873.0] debug:  mpi/pmix: setup sockets
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_client_v2.c:69 [_errhandler_reg_callbk] mpi/pmix: Error handler
> registration callback is called with status=0, ref=0
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_client.c:697 [pmixp_libpmix_job_set] mpi/pmix: task initialization
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_agent.c:229 [_agent_thread] mpi/pmix: Start agent thread
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_agent.c:330 [pmixp_agent_start] mpi/pmix: agent thread started: tid =
> 70366934331824
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_agent.c:335 [pmixp_agent_start] mpi/pmix: timer thread started: tid =
> 70366933283248
> [2020-11-16T12:12:34.273] [30873.0] debug:  cluster-cn02 [0]
> pmixp_agent.c:267 [_pmix_timer_thread] mpi/pmix: Start timer thread
> [2020-11-16T12:12:34.273] [30873.0] debug:    stdin uses a pty object
> [2020-11-16T12:12:34.274] [30873.0] debug:  init pty size 34:159
> [2020-11-16T12:12:34.274] [30873.0] in _window_manager
> [2020-11-16T12:12:34.274] [30873.0] debug level = 2
> [2020-11-16T12:12:34.274] [30873.0] debug:  IO handler started pid=18781
> [2020-11-16T12:12:34.275] [30873.0] starting 1 tasks
> [2020-11-16T12:12:34.276] [30873.0] task 0 (18801) started
> 2020-11-16T12:12:34
> [2020-11-16T12:12:34.276] [30873.0] debug:  task_p_pre_launch_priv: 30873.0
> [2020-11-16T12:12:34.288] [30873.0] debug:  jag_common_poll_data: Task 0
> pid 18801 ave_freq = 4023000 mem size/max 9961472/9961472 vmem size/max
> 512557056/512557056, disk read size/max (0/0), disk write size/max (0/0),
> time 0.000000(0+0) Energy tot/max 0/0 TotPower 0 MaxPower 0 MinPower 0
> [2020-11-16T12:12:34.288] [30873.0] debug:  Sending launch resp rc=0
> [2020-11-16T12:12:34.288] [30873.0] debug:  mpi type = pmix
> [2020-11-16T12:12:34.288] [30873.0] debug:  cluster-cn02 [0]
> mpi_pmix.c:180 [p_mpi_hook_slurmstepd_task] mpi/pmix: Patch environment for
> task 0
> [2020-11-16T12:12:34.289] [30873.0] debug:  task_p_pre_launch: 30873.0,
> task 0
> [2020-11-16T12:12:39.475] [30873.extern] error: _x11_socket_read:
> slurm_open_msg_conn: Connection timed out
>
> On Mon, Nov 16, 2020 at 11:50 AM Russell Jones <arjones85 at gmail.com>
> wrote:
>
>> Hello,
>>
>> Thanks for the reply!
>>
>> We are using Slurm 20.02.0.
>>
>> On Mon, Nov 16, 2020 at 10:59 AM sathish <sathish.sathishkumar at gmail.com>
>> wrote:
>>
>>> Hi Russell Jones,
>>>
>>> I believe you are using a slurm version older than 19.05.  X11
>>> forwarding code has been revamped and it works as expected starting from
>>> the 19.05.0 version.
>>>
>>>
>>> On Mon, Nov 16, 2020 at 10:02 PM Russell Jones <arjones85 at gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Hoping I can get pointed in the right direction here.
>>>>
>>>> I have X11 forwarding enabled in Slurm, however I cannot seem to get it
>>>> working properly. It works when I test with "ssh -Y" to the compute node
>>>> from the login node, however when I try through Slurm the Display variable
>>>> looks very different, and I get an error. Example below:
>>>>
>>>> [user at cluster-1 ~]$ ssh -Y cluster-cn02
>>>> Last login: Mon Nov 16 10:09:18 2020 from 172.21.150.10
>>>> -bash-4.2$ env | grep -i display
>>>> DISPLAY=172.21.150.102:10.0
>>>> -bash-4.2$ xclock
>>>> Warning: Missing charsets in String to FontSet conversion
>>>> ** Clock pops up and works **
>>>>
>>>> [user at cluster-1 ~]$ srun -p cluster -w cluster-cn02 --x11 --pty bash -l
>>>> bash-4.2$ env | grep -i display
>>>> DISPLAY=localhost:28.0
>>>> bash-4.2$ xclock
>>>> Error: Can't open display: localhost:28.0
>>>>
>>>>
>>>> Any ideas on where to begin looking? I'm not sure why the display
>>>> variable is being set to localhost instead of the login node.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>
>>> --
>>> Regards.....
>>> Sathish
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201117/4d237aae/attachment-0001.htm>


More information about the slurm-users mailing list