[slurm-users] srun at front-end nodes with --enable_configless fails with "Can't find an address, check slurm.conf"

Josef Dvoracek jose at fzu.cz
Mon Mar 22 17:22:19 UTC 2021


Hi @list;

I was able to configure "configless" slurm cluster with quite 
minimalistic slurm.conf everywhere, of-course excepting slurmctld 
server. All nodes are running slurmd, including front-end/login nodes to 
pull the config.

Submitting jobs using sbatch scripts works fine, but interactive jobs 
using srun are failing with

$ srun --verbose -w n26 --pty /bin/bash
...
srun: error: fwd_tree_thread: can't find address for host n26, check 
slurm.conf
srun: error: Task launch for 200137.0 failed on node n26: Can't find an 
address, check slurm.conf
srun: error: Application launch failed: Can't find an address, check 
slurm.conf
...


Does it mean that on submit hosts one has to manually specify all 
relevant NodeNames?
I thought that running slurmd there will pull configuration from 
slurmserver. (I can see the file is actually sucessfully pulled into 
/run/slurm/conf/slurm.conf ).


So far I found two workarounds:

workaround1:

specify nodenames at login/front-end nodes in slurm.conf:

NodeName=n[(...)n26(...)] Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 
State=UNKNOWN

then, srun works as expected.


workaround2:

directing environment variable SLURM_CONF to the slurm.conf pulled by 
slurmd:

export SLURM_CONF=/run/slurm/conf/slurm.conf

then again, srun works as expected.


Is this expected behavior? I actually expected that srun at configless 
login/front-end node with running slurmd recognizes the pulled 
configuration, but apparently, that's not the case.

cheers

josef


setup at front-end and compute nodes:

[root at FRONTEND ~]# slurmd --version
slurm 20.02.5
[root at FRONTEND ~]#

[root at FRONTEND ~]# cat /etc/sysconfig/slurmd
SLURMD_OPTIONS="--conf-server slurmserver2.DOMAIN"
[root at FRONTEND ~]#

[root at FRONTEND ~]# cat /etc/slurm/slurm.conf
ClusterName=CLUSTERNAME
ControlMachine=slurmserver2.DOMAIN
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurmserver2.DOMAIN
AccountingStoragePort=7031
SlurmctldParameters=enable_configless
[root at FRONTEND ~]#












ClusterName=XXXXX
ControlMachine=slurmserver2.DOMAIN
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=slurmserver2.DOMAIN
AccountingStoragePort=7031
SlurmctldParameters=enable_configless



-- 
Josef Dvoracek
Institute of Physics | Czech Academy of Sciences
cell: +420 608 563 558 | https://telegram.me/jose_d | FZU phone nr. : 2669


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4265 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210322/0ab68ca4/attachment-0001.bin>


More information about the slurm-users mailing list