[slurm-users] Configless Slurm: DNS SRV record does not work without FQDN on EL8 systems

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Jul 12 07:38:04 UTC 2021


With Configless Slurm you can use a DNS SRV record to point to your 
slurmctld server.  We're in the process of testing various CentOS 8 (EL8) 
alternatives (AlmaLinux, RockyLinux, CentOS 8 Stream), and I've found a 
strange behavior on all EL8 systems:

On CentOS 7.9 compute nodes and servers the "host" command shows the DNS 
SRV record without having to append the FQDN DNS domain:

$ host -t SRV _slurmctld._tcp
_slurmctld._tcp.nifl.fysik.dtu.dk has SRV record 0 0 6817 
que.nifl.fysik.dtu.dk.

whereas the "dig" command doesn't return the answer:

$ dig +short -t SRV -n _slurmctld._tcp

On all EL8 and Fedora FC34 systems in our network, neither "host" nor 
"dig" return an answer.  Only if the FQDN is appended is the DNS 
information returned:

$ host -t SRV _slurmctld._tcp.nifl.fysik.dtu.dk.
$ dig +short -t SRV -n _slurmctld._tcp.nifl.fysik.dtu.dk.

Needless to say, the correct DNS domain is configured in /etc/resolv.conf.

Additionally, I have access to the Slurm cluster at another university, 
and on their EL7 nodes "host" works as expected, but on an AlmaLinux 8.4 
node it doesn't.  So I believe the DNS SRV record problem is not due to 
our particular network or DNS setup.

Question: Can other sites with any EL8 nodes and Configless Slurm test the 
"host" command as shown above?

Question: Does anyone know why the "host" command apparently changed 
behavior from EL7 to EL8 (and FC34) as regards the lookup of SRV records?

This issue is tracked in Slurm bug 
https://bugs.schedmd.com/show_bug.cgi?id=11878#c2

Thanks,
Ole

-- 
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark



More information about the slurm-users mailing list