A few things to look at, make sure DNS/Host name resolution works,  disable any firewalls for testing, you can lock it down after, make sure the slurm.conf file is the same on all nodes.

I've just done a 20.11.9 to 24.05.2 upgrade along with a Centos7.9 to rhel 9.10 upgrade on all my nodes.

Sid

Sid


On Tue, 19 Nov 2024, 03:23 Daniel Rodriguez Lopez (ext) via slurm-users, <slurm-users@lists.schedmd.com> wrote:
Dear all,

We recently tried to fix our version of slurm in every node of our
cluster. After the instalation (slurm 20.11.9) in one of the compute
nodes, most of the commads (squeue, sinfo, scontrol show config etc)
returns this error:

  error: Unable to contact slurm controller (connect failure)

The .log files don't show any errors, we have both debugs values equal
to debug5. Also, the rest of the cluster works as usual.

I appreciate any insight on what could be the cause.

Thank you and regards,
Daniel


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com