[slurm-users] SlurmcltdHost confusion

Jens Elkner jel+slurm at cs.ovgu.de
Thu Dec 14 12:08:44 UTC 2023


On Wed, Dec 13, 2023 at 08:16:39PM +0000, Jackson, Gary L. wrote:
Hi Gary,

> The SlurmctldHost value is set like the following in my slurm.conf:
> 
> SlurmctldHost=host0,host1
> 
> That seems to be legal according to the documentation. However, I get error messages like the following:
> 
> $ srun id
> 
> srun: error: get_addr_info: getaddrinfo() failed: Name or service not known
> srun: error: slurm_set_addr: Unable to resolve "host0,host1"
> srun: error: Unable to establish control machine address
> srun: error: Unable to allocate resources: Address already in use
...
> What’s going on?

Not sure, but I've seen such errors, when using a node name, which was not
"registered" via NodeName or discovered otherwise - a code lookup at
this time revealed, that the message is IMHO misleading: slurm does
__not__ make a DNS lookup - it simply greps its internal list of known
nodes and if not found, it emits such messages.

Other options: try to use SlurmctldHost=... for each host on a single
line to rule out a format errors. Not sure, whether it supports ranges,
too (like SlurmctldHost=host[0-1]) ,

Last but not least 'Address already in use' - checking, whether there is
not an instance or something else already listening on the related port
shouldn't hurt ...

Have fun,
jel.
-- 
Otto-von-Guericke University     http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany         Tel: +49 391 67 52768



More information about the slurm-users mailing list