<div dir="ltr"><div class="gmail-gE gmail-iv gmail-gt" style="padding:20px 0px 0px;font-size:0.875rem;font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif"><table cellpadding="0" class="gmail-cf gmail-gJ" style="border-collapse:collapse;margin-top:0px;width:auto;font-size:0.875rem;letter-spacing:0.2px;display:block"><tbody style="display:block"></tbody></table><span style="font-family:Arial,Helvetica,sans-serif;font-size:small">Hi All,</span></div><div class="gmail-" style="font-family:Roboto,RobotoDraft,Helvetica,Arial,sans-serif;font-size:medium"><div id="gmail-:29p" class="gmail-ii gmail-gt" style="font-size:0.875rem;direction:ltr;margin:8px 0px 0px;padding:0px"><div id="gmail-:3e0" class="gmail-a3s gmail-aiL" style="overflow:hidden;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:small;line-height:1.5;font-family:Arial,Helvetica,sans-serif"><div dir="ltr"><div><br></div><div>Please help me to resolve this issue</div><div><br></div><div>My compute node (snode) status is UNKNOWN and Reason=NO NETWORK ADDRESS FOUND</div><div><br></div><div>Master node (smaster) :</div><div><br></div><div>[root@smaster ~]# cat /etc/slurm/slurm.conf<br># slurm.conf file generated by configurator easy.html.<br># Put this file on all nodes of your cluster.<br># See the slurm.conf man page for more information.<br>#<br>ControlMachine=smaster<br>ControlAddr=192.168.1.195<br>#<br>#MailProg=/bin/mail<br>MpiDefault=none<br>#MpiParams=ports=#-#<br>ProctrackType=proctrack/pgid<br>ReturnToService=1<br>SlurmctldPidFile=/var/run/slurmctld.pid<br>#SlurmctldPort=6817<br>SlurmdPidFile=/var/run/slurmd.pid<br>#SlurmdPort=6818<br>SlurmdSpoolDir=/var/spool/slurmd<br>SlurmUser=slurm<br>#SlurmdUser=root<br>StateSaveLocation=/var/spool/slurmctld<br>SwitchType=switch/none<br>TaskPlugin=task/none<br>#<br>#<br># TIMERS<br>#KillWait=30<br>#MinJobAge=300<br>#SlurmctldTimeout=120<br>#SlurmdTimeout=300<br>#<br>#<br># SCHEDULING<br>SchedulerType=sched/backfill<br>SelectType=select/cons_tres<br>SelectTypeParameters=CR_Core<br>#<br># LOGGING AND ACCOUNTING<br>AccountingStorageType=accounting_storage/none<br>ClusterName=scluster<br>#JobAcctGatherFrequency=30<br>JobAcctGatherType=jobacct_gather/none<br>#SlurmctldDebug=3<br>SlurmctldLogFile=/var/log/slurmctld.log<br>#SlurmdDebug=3<br>SlurmdLogFile=/var/log/slurmd.log<br>#<br>#<br># COMPUTE NODES<br>NodeName=smaster NodeAddr=192.168.1.195 CPUs=2 RealMemory=1024 State=UNKNOWN<br>NodeName=sndode NodeAddr=192.168.1.196 CPUs=2 RealMemory=1024 State=UNKNOWN<br>#PartitionName=debug Nodes=sndode Default=YES MaxTime=INFINITE State=UP<br>PartitionName=debug Nodes=sndode Default=YES MaxTime=INFINITE State=UP<br>PartitionName=hpc Nodes=smaster Default=YES MaxTime=INFINITE State=UP</div><div><br></div><div><b>On Master Node (smaster):</b></div><div><br></div><div>[root@smaster ~]# sinfo -Nl<br>Tue Feb 02 18:11:00 2021<br>NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON<br>smaster        1      hpc*        idle 2       2:1:1   1024        0      1   (null) none<br>sndode         1     debug    unknown* 2       2:1:1   1024        0      1   (null) NO NETWORK ADDRESS F<br>[root@smaster ~]# scontrol show nodes<br>NodeName=smaster Arch=x86_64 CoresPerSocket=1<br>   CPUAlloc=0 CPUTot=2 CPULoad=0.01<br>   AvailableFeatures=(null)<br>   ActiveFeatures=(null)<br>   Gres=(null)<br>   NodeAddr=192.168.1.195 NodeHostName=smaster Version=20.11.2<br>   OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020<br>   RealMemory=1024 AllocMem=0 FreeMem=4500 Sockets=2 Boards=1<br>   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A<br>   Partitions=hpc<br>   BootTime=2021-02-02T10:53:56 SlurmdStartTime=2021-02-02T13:21:10<br>   CfgTRES=cpu=2,mem=1G,billing=2<br>   AllocTRES=<br>   CapWatts=n/a<br>   CurrentWatts=0 AveWatts=0<br>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br>   Comment=(null)<br><br>NodeName=sndode CoresPerSocket=1<br>   CPUAlloc=0 CPUTot=2 CPULoad=N/A<br>   AvailableFeatures=(null)<br>   ActiveFeatures=(null)<br>   Gres=(null)<br>   NodeAddr=192.168.1.196 NodeHostName=sndode<br>   RealMemory=1024 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1<br>   State=UNKNOWN* ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A<br>   Partitions=debug<br>   BootTime=None SlurmdStartTime=None<br>   CfgTRES=cpu=2,mem=1G,billing=2<br>   AllocTRES=<br>   CapWatts=n/a<br>   CurrentWatts=0 AveWatts=0<br>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br>   Reason=NO NETWORK ADDRESS FOUND [slurm@2021-02-02T10:58:11]<br>   Comment=(null)<br><br>[root@smaster ~]#<br></div><div><br></div><div><b>Conpute Node:</b></div><div><br></div><div>[root@snode ~]# for i in munge slurmd; do service $i status; done<br>Redirecting to /bin/systemctl status munge.service<br>● munge.service - MUNGE authentication service<br>   Loaded: loaded (/usr/lib/systemd/system/munge.service; enabled; vendor preset: disabled)<br>   Active: active (running) since Tue 2021-02-02 13:29:11 IST; 4h 43min ago<br>     Docs: man:munged(8)<br>  Process: 17759 ExecStart=/usr/sbin/munged (code=exited, status=0/SUCCESS)<br> Main PID: 17761 (munged)<br>    Tasks: 4<br>   Memory: 600.0K<br>   CGroup: /system.slice/munge.service<br>           └─17761 /usr/sbin/munged<br><br>Feb 02 13:29:11 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: Starting MUNGE authentication service...<br>Feb 02 13:29:11 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: Started MUNGE authentication service.<br>Redirecting to /bin/systemctl status slurmd.service<br>● slurmd.service - Slurm node daemon<br>   Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled; vendor preset: disabled)<br>   Active: failed (Result: exit-code) since Tue 2021-02-02 13:29:12 IST; 4h 43min ago<br>  Process: 17785 ExecStart=/usr/sbin/slurmd -D $SLURMD_OPTIONS (code=exited, status=1/FAILURE)<br> Main PID: 17785 (code=exited, status=1/FAILURE)<br><br>Feb 02 13:29:11 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: Started Slurm node daemon.<br>Feb 02 13:29:12 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: slurmd.service: main process exited, code=exited, status=1/FAILURE<br>Feb 02 13:29:12 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: Unit slurmd.service entered failed state.<br>Feb 02 13:29:12 <a href="http://snode.calligotech.com">snode.calligotech.com</a> systemd[1]: slurmd.service failed.<br>[root@snode ~]# sinfo -Nl<br>Tue Feb 02 18:12:47 2021<br>NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON<br>smaster        1      hpc*        idle 2       2:1:1   1024        0      1   (null) none<br>sndode         1     debug    unknown* 2       2:1:1   1024        0      1   (null) NO NETWORK ADDRESS F<br>[root@snode ~]# scontrol show nodes<br>NodeName=smaster Arch=x86_64 CoresPerSocket=1<br>   CPUAlloc=0 CPUTot=2 CPULoad=0.01<br>   AvailableFeatures=(null)<br>   ActiveFeatures=(null)<br>   Gres=(null)<br>   NodeAddr=192.168.1.195 NodeHostName=smaster Version=20.11.2<br>   OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020<br>   RealMemory=1024 AllocMem=0 FreeMem=4502 Sockets=2 Boards=1<br>   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A<br>   Partitions=hpc<br>   BootTime=2021-02-02T10:53:56 SlurmdStartTime=2021-02-02T13:21:10<br>   CfgTRES=cpu=2,mem=1G,billing=2<br>   AllocTRES=<br>   CapWatts=n/a<br>   CurrentWatts=0 AveWatts=0<br>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br>   Comment=(null)<br><br>NodeName=sndode CoresPerSocket=1<br>   CPUAlloc=0 CPUTot=2 CPULoad=N/A<br>   AvailableFeatures=(null)<br>   ActiveFeatures=(null)<br>   Gres=(null)<br>   NodeAddr=192.168.1.196 NodeHostName=sndode<br>   RealMemory=1024 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1<br>   State=UNKNOWN* ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A<br>   Partitions=debug<br>   BootTime=None SlurmdStartTime=None<br>   CfgTRES=cpu=2,mem=1G,billing=2<br>   AllocTRES=<br>   CapWatts=n/a<br>   CurrentWatts=0 AveWatts=0<br>   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br>   Reason=NO NETWORK ADDRESS FOUND [slurm@2021-02-02T10:58:11]<br>   Comment=(null)<br><br>[root@snode ~]# sinfo<br>PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST<br>debug        up    1:00:00      1   unk* sndode<br>hpc*         up   infinite      1   idle smaster<br>[root@snode ~]#<br></div><div><br></div><div>Please help me to resolve this issue.</div><div><br></div><div>Regards,</div><div>Zain</div><div><br></div><div><br></div></div></div></div></div></div>