As usual - everything is a DNS (name resolution problem.
If your data centre is on fire, then your DNS server is burning. So it is a DNS problem.
On Mon, 6 Oct 2025 at 17:23, Tilman Hoffbauer via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hello,
we had this issue previously - it was connected to timeouts, where the socket disappeared due to a timeout before a reply could be sent back. In our case this was caused by having link-local multicast name resolution (LLMNR) on by default in systemd-resolved, which was evident by slow calls to `getent hosts <hostname>`.
Hope this helps, Tilman Hoffbauer On 10/6/25 17:35, Ozeryan, Vladimir via slurm-users wrote:
Hello everyone,
Not sure if you guys have heard this tune already but did anyone come across a solution for “Unexpected missing socket error”. There is nothing useful in the logs but the message appears on compute nodes and slurm controller node.
Thank you,
Vlad Ozeryan
AMDS – AB1 Linux-Support
Vladimir.Ozeryan@jhuapl.edu
Ext. 23966
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com