[slurm-users] node lookup failure for login/control servers
Xand Meaden
xand.meaden at kcl.ac.uk
Thu Jan 6 16:12:39 UTC 2022
Hi,
We're running Slurm 20.11.7 on Ubuntu, using distro-supplied packages.
On the controller (that running slurmctld) we frequently see the
following in logs:
Jan 06 13:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01
Jan 06 14:25:11 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmplogin1
Jan 06 14:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01
Jan 06 15:25:11 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmplogin1
Jan 06 15:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01
I can't work out why it's trying to find these "nodes". erchpctmpctl01
is the Slurm controller, and erchpctmplogin1 is a login server. Neither
are running slurmd, so why is slurmctld looking for these nodes?
In slurm.conf we have:
SlurmctldHost=erchpctmpctl01
but nothing for the login node, as I don't believe this is required.
We have two other clusters running Slurm 19.05 on CentOS 7 and have
never seen this error on either of them.
Any ideas on suppressing these messages are gratefully received :)
Regards,
Xand
--
Xand Meaden
Senior Research Infrastructure Engineer
e-Research
King's College London
More information about the slurm-users
mailing list