[slurm-users] bug when using SlurmctldParameters=cloud_reg_addrs ? error: get_name_info: getnameinfo() failed: Name or service not known
Pablo Escobar Lopez
pablo.escobarlopez at unibas.ch
Mon Oct 25 16:56:13 UTC 2021
I have configured slurm cloud scheduling for OpenStack. I am using CentOS7
with slurm version 20.11.8 installed using EPEL RPMs and it's working fine
but I am getting some strange errors in the slurm master logs which I think
are a bug.
I am using these options in slurm.conf:
I am using these options in my slurm.conf so the cloud nodes work in
"configless"mode and the ip for the cloud nodes is automatically updated on
the slurm master when the cloud node contacts the slurm master, as
described in the docs:
When the cloud nodes are shutdown I get this info using scontrol:
$>scontrol show node demo-slurm-compute-05 |grep -i NodeAddr
And when the cloud node boots and contacts the master the ip is properly
updated so the option "cloud_reg_addrs" seems to work fine. This is the
output of scontrol when a cloud node boots:
$> scontrol show node demo-slurm-compute-dynamic-05 |grep NodeAddr
NodeAddr=192.168.105.128 NodeHostName=192.168.105.128 Version=20.11.8
But still every time a new cloud node boots and contacts the slurm master I
get these errors in the slurm master log "slurmctld.log"
error: get_name_info: getnameinfo() failed: Name or service not known
error: slurm_auth_get_host: Lookup failed for 192.168.105.128
It seems that even if the node ip is updated on the master slurmctld still
tries to resolve the hostname and it's triggering this error. Despite the
error the node joins the cluster and can execute jobs.
Has anyone experienced this problem? Is this a bug or am I doing something
wrong with my config?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users