<div dir="ltr"><div class="gmail-votecell gmail-post-layout--left" style="margin:0px;padding-top:0px;padding-bottom:0px;padding-left:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-stretch:inherit;line-height:inherit;font-family:-apple-system,"system-ui","Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:13px;vertical-align:top;box-sizing:inherit;width:auto;color:rgb(35,38,41)"><div class="gmail-js-voting-container gmail-d-flex gmail-jc-center gmail-fd-column gmail-ai-stretch gmail-gs4 gmail-fc-black-200" style="padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;display:flex"><div class="gmail-js-vote-count gmail-flex--item gmail-d-flex gmail-fd-column gmail-ai-center gmail-fc-black-500 gmail-fs-title" style="padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;display:flex"><span style="font-family:inherit;font-style:inherit;font-variant-ligatures:inherit;font-variant-caps:inherit;font-weight:inherit;font-size:15px"><br></span><span style="font-family:inherit;font-style:inherit;font-variant-ligatures:inherit;font-variant-caps:inherit;font-weight:inherit;font-size:15px"><br></span></div></div></div><div class="gmail-postcell gmail-post-layout--right" style="margin:0px;padding-top:0px;padding-bottom:0px;padding-left:0px;border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-stretch:inherit;line-height:inherit;font-family:-apple-system,"system-ui","Segoe UI Adjusted","Segoe UI","Liberation Sans",sans-serif;font-size:13px;vertical-align:top;box-sizing:inherit;width:auto;min-width:0px;color:rgb(35,38,41)"><div class="gmail-s-prose gmail-js-post-body" style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;font-size:15px;vertical-align:baseline;box-sizing:inherit;width:659px"><p style="margin-top:0px;margin-right:0px;margin-left:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both"><br>Hi,</p><p style="margin-top:0px;margin-right:0px;margin-left:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">I have been maintaining a <a href="https://hub.docker.com/repository/registry-1.docker.io/hpcnow/slurm_simulator/general" rel="nofollow noreferrer" style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit">Slurm simulator</a> for ages. I have everything automated in other to try new features and keep my configuration up to date, version after version. Unfortunately, from version 21, the front-end mode makes the slurmd daemon crash with the following error message:</p><pre style="margin-top:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;vertical-align:baseline;box-sizing:inherit;width:auto;max-height:600px;overflow:auto"><code style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;vertical-align:baseline;box-sizing:inherit;background-color:transparent;white-space:inherit;border-radius:0px">slurmd: error: _find_node_record: lookup failure for node "slurm-simulator"
slurmd: error: _find_node_record: lookup failure for node "slurm-simulator", alias "slurm-simulator"
slurmd: error: slurmd initialization failed
</code></pre><p style="margin-top:0px;margin-right:0px;margin-left:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">The exact same container, with the same configuration but using version 20.11.9, works just fine. I reproduce the same steps manually in a VM to remove the noise introduced by the container, but the result is the same.</p><p style="margin-top:0px;margin-right:0px;margin-left:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">The attached configuration is available in the container.</p><pre style="margin-top:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;vertical-align:baseline;box-sizing:inherit;width:auto;max-height:600px;overflow:auto"><code style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;vertical-align:baseline;box-sizing:inherit;background-color:transparent;white-space:inherit;border-radius:0px">[root@slurm-simulator /]# cat /etc/slurm/slurm.conf
ClusterName=simulator
SlurmctldHost=slurm-simulator
FrontendName=slurm-simulator
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
SlurmdParameters=config_overrides
include /etc/slurm/nodes.conf
include /etc/slurm/partitions.conf
[root@slurm-simulator /]# cat /etc/slurm/nodes.conf
NodeName=node[001-10] RealMemory=248000 Sockets=2 CoresPerSocket=32 ThreadsPerCore=1 State=UNKNOWN NodeAddr=slurm-simulator NodeHostName=slurm-simulator
[root@slurm-simulator /]# cat /etc/slurm/partitions.conf
PartitionName=long Nodes=node[001-10] Default=YES State=UP OverSubscribe=NO MaxTime=14-00:00:00
</code></pre><p style="margin-top:0px;margin-right:0px;margin-left:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">The error can be reproduced by running the following commands:</p><pre style="margin-top:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;vertical-align:baseline;box-sizing:inherit;width:auto;max-height:600px;overflow:auto"><code style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;vertical-align:baseline;box-sizing:inherit;background-color:transparent;white-space:inherit;border-radius:0px">docker run --rm --detach \
--name "${USER}_simulator" \
-h "slurm-simulator" \
--security-opt seccomp:unconfined \
--privileged -e container=docker \
-v /run -v /sys/fs/cgroup:/sys/fs/cgroup \
--cgroupns=host \
hpcnow/slurm_simulator:21.08.8-2 /usr/sbin/init
docker exec -ti ${USER}_simulator /bin/bash
slurmd -D -vvvvv
</code></pre><p style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">If you try the same command with v20.11.9 it will work. I have tried using the new SlurmdParameters=config_overrides option, but I still get the same problem. </p><p style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both"><br></p><p style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">Any ideas or suggestions? </p><p style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both"><br></p><p style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-stretch:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;box-sizing:inherit;clear:both">Thanks!</p></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 11 Jul 2022 at 23:21, Jordi Blasco <<a href="mailto:jbllistes@gmail.com">jbllistes@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thank Ole,<div><br></div><div>I checked the /etc/nsswitch.conf and I have even setup a dnsmasq service, just in case.</div><div><br></div><div><font face="monospace">[root@slurm-simulator /]# cat /etc/nsswitch.conf | grep hosts<br># Valid databases are: aliases, ethers, group, gshadow, hosts,<br>hosts: files dns myhostname<br></font></div><div><font face="monospace"><br></font></div><div><font face="monospace">[root@slurm-simulator /]# ping slurm-simulator -c 1<br>PING slurm-simulator (172.17.0.4) 56(84) bytes of data.<br>64 bytes from slurm-simulator (172.17.0.4): icmp_seq=1 ttl=64 time=0.022 ms<br><br>--- slurm-simulator ping statistics ---<br>1 packets transmitted, 1 received, 0% packet loss, time 0ms<br>rtt min/avg/max/mdev = 0.022/0.022/0.022/0.000 ms</font><br></div><div><font face="monospace"><br></font></div><div><font face="monospace">[root@slurm-simulator /]# cat /etc/resolv.conf | grep -v "^#"<br>nameserver 172.17.0.4<br>nameserver 172.31.0.2<br>search eu-west-3.compute.internal<br>[root@slurm-simulator /]# host slurm-simulator<br>slurm-simulator has address 172.17.0.4</font></div><font face="monospace">[root@slurm-simulator /]# host 172.17.0.4<br></font><div><font face="monospace">4.0.17.172.in-addr.arpa domain name pointer slurm-simulator.<br></font></div><div><font face="monospace"><br></font></div><div><br></div><div>Regards,</div><div><br></div><div>Jordi</div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 11 Jul 2022 at 23:09, Ole Holm Nielsen <<a href="mailto:Ole.H.Nielsen@fysik.dtu.dk" target="_blank">Ole.H.Nielsen@fysik.dtu.dk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 7/11/22 12:54, Jordi Blasco wrote:<br>
> I use the front-end node mode <br>
> <<a href="https://slurm.schedmd.com/faq.html#multi_slurmd" rel="noreferrer" target="_blank">https://slurm.schedmd.com/faq.html#multi_slurmd</a>> to emulate a real <br>
> cluster in order to validate the Slurm configuration in a Docker container <br>
> and develop custom plugins. With versions 21.08.8-2 and 22.05.2, slurmd is <br>
> complaining about not being able to find the frontend node.<br>
> <br>
> slurmd -D -vvv<br>
> ...<br>
> slurmd: error: _find_node_record: lookup failure for node "slurm-simulator"<br>
> slurmd: error: _find_node_record: lookup failure for node <br>
> "slurm-simulator", alias "slurm-simulator"<br>
> slurmd: error: slurmd initialization failed<br>
<br>
This could be a DNS lookup issue. Can you ping the node named <br>
"slurm-simulator"?<br>
<br>
/Ole<br>
<br>
</blockquote></div>
</blockquote></div>