[slurm-users] Configless Slurm: DNS SRV record does not work without FQDN on EL8 systems
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Mon Jul 12 07:38:04 UTC 2021
With Configless Slurm you can use a DNS SRV record to point to your
slurmctld server. We're in the process of testing various CentOS 8 (EL8)
alternatives (AlmaLinux, RockyLinux, CentOS 8 Stream), and I've found a
strange behavior on all EL8 systems:
On CentOS 7.9 compute nodes and servers the "host" command shows the DNS
SRV record without having to append the FQDN DNS domain:
$ host -t SRV _slurmctld._tcp
_slurmctld._tcp.nifl.fysik.dtu.dk has SRV record 0 0 6817
que.nifl.fysik.dtu.dk.
whereas the "dig" command doesn't return the answer:
$ dig +short -t SRV -n _slurmctld._tcp
On all EL8 and Fedora FC34 systems in our network, neither "host" nor
"dig" return an answer. Only if the FQDN is appended is the DNS
information returned:
$ host -t SRV _slurmctld._tcp.nifl.fysik.dtu.dk.
$ dig +short -t SRV -n _slurmctld._tcp.nifl.fysik.dtu.dk.
Needless to say, the correct DNS domain is configured in /etc/resolv.conf.
Additionally, I have access to the Slurm cluster at another university,
and on their EL7 nodes "host" works as expected, but on an AlmaLinux 8.4
node it doesn't. So I believe the DNS SRV record problem is not due to
our particular network or DNS setup.
Question: Can other sites with any EL8 nodes and Configless Slurm test the
"host" command as shown above?
Question: Does anyone know why the "host" command apparently changed
behavior from EL7 to EL8 (and FC34) as regards the lookup of SRV records?
This issue is tracked in Slurm bug
https://bugs.schedmd.com/show_bug.cgi?id=11878#c2
Thanks,
Ole
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
More information about the slurm-users
mailing list