Hi,
we've recentry upgraded our slurm from 24.11.3 to 25.05.1 and it seems that since the upgrade the ssh X11 forwaring is broken.
Quick recap - * on Monday 14'th I performed slurdbd and slurmctld upgrades - X forwarding was still working * on Tuesday 15'th I performed slurmd upgrades - X forwarding stopped working
The issue is very hard to determine and it looks like it sits somhere in slurm code. You can submit a job with --x11 and it starts corretly. Xauthority is created, you have all the magic cookies needed, but when you try to start any application, you get error related to permissions I guess, see for yourself:
``` me@sand ~ ssh -X -Y ui [wcss] me@ui.wcss.pl:~ > srun -p lem-cpu-short -A kdm-staff --gres=storage:local:50G -c 12 --mem 12G -t 1:0:0 --x11 --pty /bin/bash [wcss] me@r17ch05b01 ~ > xauth list r17ch05b01.lem.kdm.wcss.pl/unix:91 MIT-MAGIC-COOKIE-1 d82a2efd [wcss] me@r17ch05b01 ~ > xterm xterm: Xt error: Can't open display: localhost:91.0 [wcss] me@r17ch05b01 ~ > date && telnet -4 localhost 6091 || date Wed Jul 23 12:02:39 CEST 2025 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Connection closed by foreign host. Wed Jul 23 12:02:41 CEST 2025 ```
As you can see the connection to port is being dropped/killed after a second or two. Now, it doesn't really matter which flags for ssh you pick (-X or -Y or both). X forwarding is working when you log in as a regular user outside of slurm job. Also if I do ssh localhost inside a job, then I can perform connection to port assigned to $DISPLAY and it isn't dropped - but it doesn't work since $DISPLAY and cookies are being messed up when you perform triple jump and one within same host.
Our worker nodes are mostly on el9.5 AlmaLinux. Some are on el8.10 - and there acutally you can do some X forwarding but you must use both -X and -Y (which wasn't the case before slurm upgrade). TLS is disabled in slurm.conf. I am 100% sure that both SSHD and Xorg are properly configured.
Has anyone encountered similiar issue? Or any comment from slurm dev team?
Best regards Patryk -- Wroclaw Centre for Networking and Supercomputing
On Wednesday, 23 July 2025 12:19:42 CEST Patryk Bełzak via slurm-users wrote:
Hi,
we've recentry upgraded our slurm from 24.11.3 to 25.05.1 and it seems that since the upgrade the ssh X11 forwaring is broken.
Quick recap -
- on Monday 14'th I performed slurdbd and slurmctld upgrades - X forwarding
was still working * on Tuesday 15'th I performed slurmd upgrades - X forwarding stopped working
The issue is very hard to determine and it looks like it sits somhere in slurm code. You can submit a job with --x11 and it starts corretly. Xauthority is created, you have all the magic cookies needed, but when you try to start any application, you get error related to permissions I guess, see for yourself:
me@sand ~ ssh -X -Y ui [wcss] me@ui.wcss.pl:~ > srun -p lem-cpu-short -A kdm-staff --gres=storage:local:50G -c 12 --mem 12G -t 1:0:0 --x11 --pty /bin/bash [wcss] me@r17ch05b01 ~ > xauth list r17ch05b01.lem.kdm.wcss.pl/unix:91 MIT-MAGIC-COOKIE-1 d82a2efd [wcss] me@r17ch05b01 ~ > xterm xterm: Xt error: Can't open display: localhost:91.0 [wcss] me@r17ch05b01 ~ > date && telnet -4 localhost 6091 || date Wed Jul 23 12:02:39 CEST 2025 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Connection closed by foreign host. Wed Jul 23 12:02:41 CEST 2025
As you can see the connection to port is being dropped/killed after a second or two. Now, it doesn't really matter which flags for ssh you pick (-X or -Y or both). X forwarding is working when you log in as a regular user outside of slurm job. Also if I do ssh localhost inside a job, then I can perform connection to port assigned to $DISPLAY and it isn't dropped - but it doesn't work since $DISPLAY and cookies are being messed up when you perform triple jump and one within same host.
Our worker nodes are mostly on el9.5 AlmaLinux. Some are on el8.10 - and there acutally you can do some X forwarding but you must use both -X and -Y (which wasn't the case before slurm upgrade). TLS is disabled in slurm.conf. I am 100% sure that both SSHD and Xorg are properly configured.
Has anyone encountered similiar issue? Or any comment from slurm dev team?
Best regards Patryk -- Wroclaw Centre for Networking and Supercomputing
I did create a bug report:
https://support.schedmd.com/show_bug.cgi?id=23190
I got the following response per email: Currently, this bug is showing as unsupported in our system. Unsupported bugs are given a very low priority and most times the unsupported bugs are never reviewed by the support team as their focus is on sites with support contracts.
If you have a support contract you might rice the priority.
regards Markus Köberl
Based on this bug report SchedMD fixed a X11 forwarding issue in 25.05, maybe this is related and is not fixed after all?
https://support.schedmd.com/show_bug.cgi?id=22034#c6
And the purported fix:
https://github.com/SchedMD/slurm/commit/3842c368a439e22a37329e596994f52bda2d...
Regards -- Mick Timony Senior DevOps Engineer LASER, Longwood, & O2 Cluster Admin Harvard Medical School -- ________________________________ From: Markus Köberl via slurm-users slurm-users@lists.schedmd.com Sent: Thursday, July 24, 2025 2:15 AM To: slurm-users@lists.schedmd.com slurm-users@lists.schedmd.com; Patryk Bełzak patryk.belzak@pwr.edu.pl Subject: [slurm-users] Re: X11 forwarding broken in 25.05.1
On Wednesday, 23 July 2025 12:19:42 CEST Patryk Bełzak via slurm-users wrote:
Hi,
we've recentry upgraded our slurm from 24.11.3 to 25.05.1 and it seems that since the upgrade the ssh X11 forwaring is broken.
Quick recap -
- on Monday 14'th I performed slurdbd and slurmctld upgrades - X forwarding
was still working * on Tuesday 15'th I performed slurmd upgrades - X forwarding stopped working
The issue is very hard to determine and it looks like it sits somhere in slurm code. You can submit a job with --x11 and it starts corretly. Xauthority is created, you have all the magic cookies needed, but when you try to start any application, you get error related to permissions I guess, see for yourself:
me@sand ~ ssh -X -Y ui [wcss] me@ui.wcss.pl:~ > srun -p lem-cpu-short -A kdm-staff --gres=storage:local:50G -c 12 --mem 12G -t 1:0:0 --x11 --pty /bin/bash [wcss] me@r17ch05b01 ~ > xauth list r17ch05b01.lem.kdm.wcss.pl/unix:91 MIT-MAGIC-COOKIE-1 d82a2efd [wcss] me@r17ch05b01 ~ > xterm xterm: Xt error: Can't open display: localhost:91.0 [wcss] me@r17ch05b01 ~ > date && telnet -4 localhost 6091 || date Wed Jul 23 12:02:39 CEST 2025 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Connection closed by foreign host. Wed Jul 23 12:02:41 CEST 2025
As you can see the connection to port is being dropped/killed after a second or two. Now, it doesn't really matter which flags for ssh you pick (-X or -Y or both). X forwarding is working when you log in as a regular user outside of slurm job. Also if I do ssh localhost inside a job, then I can perform connection to port assigned to $DISPLAY and it isn't dropped - but it doesn't work since $DISPLAY and cookies are being messed up when you perform triple jump and one within same host.
Our worker nodes are mostly on el9.5 AlmaLinux. Some are on el8.10 - and there acutally you can do some X forwarding but you must use both -X and -Y (which wasn't the case before slurm upgrade). TLS is disabled in slurm.conf. I am 100% sure that both SSHD and Xorg are properly configured.
Has anyone encountered similiar issue? Or any comment from slurm dev team?
Best regards Patryk -- Wroclaw Centre for Networking and Supercomputing
I did create a bug report:
https://support.schedmd.com/show_bug.cgi?id=23190
I got the following response per email: Currently, this bug is showing as unsupported in our system. Unsupported bugs are given a very low priority and most times the unsupported bugs are never reviewed by the support team as their focus is on sites with support contracts.
If you have a support contract you might rice the priority.
regards Markus Köberl -- Markus Koeberl Graz University of Technology Signal Processing and Speech Communication Laboratory E-mail: markus.koeberl@tugraz.at