[slurm-users] enable_configless, srun and DNS vs. hosts file
mark.c.dixon at durham.ac.uk
Wed Nov 10 15:13:30 UTC 2021
I'm using the "enable_configless" mode to avoid the need for a shared
slurm.conf file, and am having similar trouble to others when running
srun: error: fwd_tree_thread: can't find address for host cn120, check slurm.conf
srun: error: Task launch for StepId=113.0 failed on node cn120: Can't find an address, check slurm.conf
srun: error: Application launch failed: Can't find an address, check slurm.conf
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
I understand that the accepted solution is to add the nodenames to DNS. Is
that really correct?
I ask because it would be a great help if slurm instead used the more
usual mechanism and consult the sources listed in /etc/nsswitch.conf. We
use a large /etc/hosts file instead of DNS for our cluster and would
rather not start running named if we can help it.
PS Adding a line like "NodeName=cn[001-999]" to the submit/compute host
slurm.conf file makes this go away (I hope skipping the node detail, or
adding nodes that don't exist [yet] won't cause other problems).
More information about the slurm-users