[slurm-users] enable_configless, srun and DNS vs. hosts file

Paul Brunk pbrunk at uga.edu
Fri Nov 12 14:37:34 UTC 2021


We run configless.  If we add a node to slurm.conf and don't restart slurmd on our submit nodes, then attempts to submit to that new node will get the error you saw.  Restarting slurmd on the submit node fixes it.  This is the documented behavior (adding nodes needs slurmd restarted everywhere).  Could this be what you're seeing (as opposed to /etc/hosts vs DNS)?

Wishing that I'd just listened this time,
Paul Brunk, system administrator, Workstation Support Group
GACRC (formerly RCC) 
UGA EITS  (formerly UCNS)

-----Original Message-----
From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Mark Dixon
Sent: Wednesday, November 10, 2021 10:14
To: slurm-users at lists.schedmd.com
Subject: [slurm-users] enable_configless, srun and DNS vs. hosts file



I'm using the "enable_configless" mode to avoid the need for a shared slurm.conf file, and am having similar trouble to others when running "srun", e.g.

   srun: error: fwd_tree_thread: can't find address for host cn120, check slurm.conf
   srun: error: Task launch for StepId=113.0 failed on node cn120: Can't find an address, check slurm.conf
   srun: error: Application launch failed: Can't find an address, check slurm.conf
   srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

I understand that the accepted solution is to add the nodenames to DNS. Is that really correct?

I ask because it would be a great help if slurm instead used the more usual mechanism and consult the sources listed in /etc/nsswitch.conf. We use a large /etc/hosts file instead of DNS for our cluster and would rather not start running named if we can help it.



PS Adding a line like "NodeName=cn[001-999]" to the submit/compute host
    slurm.conf file makes this go away (I hope skipping the node detail, or
    adding nodes that don't exist [yet] won't cause other problems).

More information about the slurm-users mailing list