[slurm-users] enable_configless, srun and DNS vs. hosts file

Mark Dixon mark.c.dixon at durham.ac.uk
Wed Nov 10 15:13:30 UTC 2021


I'm using the "enable_configless" mode to avoid the need for a shared 
slurm.conf file, and am having similar trouble to others when running 
"srun", e.g.

   srun: error: fwd_tree_thread: can't find address for host cn120, check slurm.conf
   srun: error: Task launch for StepId=113.0 failed on node cn120: Can't find an address, check slurm.conf
   srun: error: Application launch failed: Can't find an address, check slurm.conf
   srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

I understand that the accepted solution is to add the nodenames to DNS. Is 
that really correct?

I ask because it would be a great help if slurm instead used the more 
usual mechanism and consult the sources listed in /etc/nsswitch.conf. We 
use a large /etc/hosts file instead of DNS for our cluster and would 
rather not start running named if we can help it.



PS Adding a line like "NodeName=cn[001-999]" to the submit/compute host
    slurm.conf file makes this go away (I hope skipping the node detail, or
    adding nodes that don't exist [yet] won't cause other problems).

More information about the slurm-users mailing list