<div dir="auto">Steve, you've exhausted my best ideas... hope someone else can jump in!<div dir="auto"><br></div><div dir="auto">Andy</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Nov 27, 2020, 11:19 AM Steve Bland <<a href="mailto:sbland@rossvideo.com">sbland@rossvideo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">




<div dir="ltr">
<div id="m_-9201662187336214011divRplyFwdMsg" dir="ltr">
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Andy</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I appreciate you making me check again, things do get missed</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
SELinux is off, firewalld is disabled</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
[root@SRVGRIDSLURM01 ~]# sestatus</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
SELinux status:                 disabled</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
[root@SRVGRIDSLURM01 ~]# systemctl status firewalld</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
● firewalld.service - firewalld - dynamic firewall daemon</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
   Active: inactive (dead)</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
     Docs: man:firewalld(1)</p>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The one thing I can think of is that the system running  slurmctld has two network interfaces. It serves as a gateway, so has two network address. The two of the test slurmd's are on the other side of that gateway box, one is on the same box. But the two on
 the other side of the gateway, have a different IP address range and possibly mask</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
this is from slurm.conf for the three nodes. I know they are talking; I can see it in the logs when set to a debug logging level</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
the nodename info comes from slurmd -C, so that is correct</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
added the IP address, but that did not matter<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
# COMPUTE NODES</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
NodeName=SRVGRIDSLURM01 NodeAddr=192.168.1.60 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
NodeName=SRVGRIDSLURM02 NodeAddr=192.168.1.61 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
NodeName=srvgridslurm03 NodeAddr=192.168.1.62 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=7821</p>
<p style="margin-top:0px;margin-bottom:0px;margin:0in;font-size:11pt;font-family:"Calibri",sans-serif">
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP</p>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
about the only thing I can think of is to make one of the nodes on the otherside of the gateway into the control node<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<br>
<div style="font-family:Tahoma;font-size:13px">
<p style="margin-top:0px;margin-bottom:0px;margin:0cm 0cm 0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">
<b><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(31,73,125)">Steve Bland</span></b><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(31,73,125)"><br>
<i>Technical Product Manager</i></span><i><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(31,73,125)"></span></i></p>
<p style="margin-top:0px;margin-bottom:0px;margin:0cm 0cm 0.0001pt;font-size:11pt;font-family:Calibri,sans-serif">
<i><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(31,73,125)">Third Party Products</span></i><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(31,73,125)"><br>
Ross Video | Production Technology Experts<br>
T: +1 (613) 228-0688 ext.4219<br>
<a href="http://www.rossvideo.com/" style="color:purple" target="_blank" rel="noreferrer"><span style="color:blue">www.rossvideo.com</span></a></span></p>
</div>
<div>
<div id="m_-9201662187336214011x_Signature">
<div>
<div style="font-family:Tahoma;font-size:13px">
<div style="font-family:Tahoma;font-size:13px"></div>
</div>
</div>
</div>
</div>
<div id="m_-9201662187336214011x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-9201662187336214011x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Andy Riebs <<a href="mailto:andy.riebs@gmail.com" target="_blank" rel="noreferrer">andy.riebs@gmail.com</a>> on behalf of Andy Riebs <<a href="mailto:andy@candooz.com" target="_blank" rel="noreferrer">andy@candooz.com</a>><br>
<b>Sent:</b> 26 November 2020 13:40<br>
<b>To:</b> Steve Bland <<a href="mailto:sbland@rossvideo.com" target="_blank" rel="noreferrer">sbland@rossvideo.com</a>>; Slurm User Community List <<a href="mailto:slurm-users@lists.schedmd.com" target="_blank" rel="noreferrer">slurm-users@lists.schedmd.com</a>><br>
<b>Subject:</b> Re: [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes</font>
<div> </div>
</div>
<div>
<p style="margin-top:0px;margin-bottom:0px">One last shot on the firewall front Steve -- does the control node have a firewall enabled? I've seen cases where that can cause the sporadic messaging failures that you seem to be seeing.</p>
<p style="margin-top:0px;margin-bottom:0px">That failing, I'll defer to anyone with different ideas!</p>
<p style="margin-top:0px;margin-bottom:0px">Andy<br>
</p>
<div>On 11/26/2020 1:01 PM, Steve Bland wrote:<br>
</div>
<blockquote type="cite">

</blockquote>
</div>
</div>
---------------------------------------------- <br>
<br>
This e-mail and any attachments may contain information that is confidential to Ross Video.
<br>
<br>
If you are not the intended recipient, please notify me immediately by replying to this message. Please also delete all copies. Thank you.
</div>

</blockquote></div>