[slurm-users] node lookup failure for login/control servers

Xand Meaden xand.meaden at kcl.ac.uk
Thu Jan 6 16:12:39 UTC 2022


Hi,

We're running Slurm 20.11.7 on Ubuntu, using distro-supplied packages.
On the controller (that running slurmctld) we frequently see the
following in logs:

Jan 06 13:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01
Jan 06 14:25:11 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmplogin1
Jan 06 14:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01
Jan 06 15:25:11 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmplogin1
Jan 06 15:39:56 erchpctmpctl01 slurmctld[897782]: error:
_find_node_record(764): lookup failure for erchpctmpctl01

I can't work out why it's trying to find these "nodes". erchpctmpctl01
is the Slurm controller, and erchpctmplogin1 is a login server. Neither
are running slurmd, so why is slurmctld looking for these nodes?

In slurm.conf we have:

SlurmctldHost=erchpctmpctl01

but nothing for the login node, as I don't believe this is required.

We have two other clusters running Slurm 19.05 on CentOS 7 and have
never seen this error on either of them.

Any ideas on suppressing these messages are gratefully received :)

Regards,
Xand

-- 
Xand Meaden
Senior Research Infrastructure Engineer
e-Research
King's College London





More information about the slurm-users mailing list