<div dir="ltr"><div><div>Glad to hear that you make it work.<br><br></div>Regards<br></div>Matthieu<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">2018-05-21 21:21 GMT+02:00 Sean Caron <span dir="ltr"><<a href="mailto:scaron@umich.edu" target="_blank">scaron@umich.edu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Just wanted to follow up. In addition to passing all traffic to the SLURM controller, opened port 6818/TCP to all other compute nodes and this seems to have resolved the issue. Thanks again, Matthieu!<div><br></div><div>Best,</div><div><br></div><div>Sean</div><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 17, 2018 at 8:06 PM, Sean Caron <span dir="ltr"><<a href="mailto:scaron@umich.edu" target="_blank">scaron@umich.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will give that a shot and see what happens.<div><br></div><div>Best,</div><div><br></div><div>Sean</div><div><br></div></div><div class="m_2215946678730725843HOEnZb"><div class="m_2215946678730725843h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux <span dir="ltr"><<a href="mailto:matthieu.hautreux@gmail.com" target="_blank">matthieu.hautreux@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div>Communications in Slurm are not only performed from controller to slurmd and from slurmd to controller. You need to ensure that your login nodes can reach the controller and the slurmd nodes as well as ensure that slurmd on the various nodes can contact each other. This last requirement is because of the tree logic used in slurm communication :<div><br></div><div>- to ensure scalability, slurmctld use a communication tree (see TreeWidth in "man slurm.conf"), used for example to periodically check that all the nodes are working properly</div><div>- the same exact logic is used by srun when it contacts the various slurmd involved in its step</div><div>- reversed tree communications are performed among slurmds of steps at their end to send accounting data and other stuff to the controller</div><div><br></div><div>- only some communications are point-to-point between slurmd and controller, especially the "registering call" performed at slurmd startup.</div><div><br></div><div>When slurmd can not contact each other because of network failures (partitioning) or too restrictive filtering, then you see the kind of flapping that you have. This is because point-to-point communication at slurmd registering make them appears to the controller, tree checks make some of them dissapear, retries can lead to point to point communications to some nodes when the amount of destination nodes contacted by the controller at the same time is lower than the configured TreeWidth, thus nodes suddenly reappear... until the next check... and so on.</div><div><br></div><div>Two options for you :</div><div><br></div><div>- be less restrictive in your filtering rules</div><div>- set TreeWidth to 1 in slurm.conf but you will loose the performance/scalability of slurm internals communication</div><div><br></div><div>If your cluster is large, I would recommend to use the first one.</div><div><br></div><div>HTH</div><span class="m_2215946678730725843m_8599250747320344412HOEnZb"><font color="#888888"><div>Matthieu</div></font></span><div><br></div><div>PS : you can look at that presentation for a few details on the communication logic :</div><div><a href="https://slurm.schedmd.com/SUG14/message_aggregation.pdf" target="_blank">https://slurm.schedmd.com/SUG1<wbr>4/message_aggregation.pdf</a><br></div><div><br></div><div><br></div></div><div class="m_2215946678730725843m_8599250747320344412HOEnZb"><div class="m_2215946678730725843m_8599250747320344412h5"><div class="gmail_extra"><br><div class="gmail_quote">2018-05-17 22:21 GMT+02:00 Sean Caron <span dir="ltr"><<a href="mailto:scaron@umich.edu" target="_blank">scaron@umich.edu</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Sorry, how do you mean? The environment is very basic. Compute nodes and SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one leg on a public IP and one leg on the RFC1918 cluster network. It used to be that nodes that only had a leg on the RFC1918 network (compute nodes and the SLURM controller) had no firewall at all and nodes that were dual homed basically were set to just permit all traffic from the cluster side NIC (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT).<div><br></div><div>Now we're trying to go back to the gateways and compute nodes and actually codify, instead of just passing all traffic from the cluster side NIC, what ports and protocols are actually in use, or at least, what server-to-server communication is expected and normative, and then define a rule set to permit those while dropping other traffic not explicitly whitelisted.</div><div><br></div><div>The compute and gateway nodes work fine with SLURM even when iptables is enabled and the policy is "permit all traffic from that NIC" but once we tighten it down just a little bit to "permit all traffic to and from the SLURM controller" we see these weird instances of node state flapping. It's not clear to me why this is the case since from the standpoint of node to controller communications, these policies are logically very similar, but there it is. The nodes shouldn't have to talk to anything else besides the SLURM controller for SLURM to work, so long as time is synched up between them and there are no issues with the nodes getting to slurm.conf.</div><div><br></div><div>Best,</div><div><br></div><div>Sean</div><div><br></div></div><div class="m_2215946678730725843m_8599250747320344412m_-4999014754154529261HOEnZb"><div class="m_2215946678730725843m_8599250747320344412m_-4999014754154529261h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <span dir="ltr"><<a href="mailto:pgoetz@math.utexas.edu" target="_blank">pgoetz@math.utexas.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Does your SMS have a dedicated interface for node traffic?<span><br>

<br>

On 05/16/2018 04:00 PM, Sean Caron wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>

I see some chatter on 6818/TCP from the compute node to the SLURM controller, and from the SLURM controller to the compute node.<br>

<br>

The policy is to permit all packets inbound from SLURM controller regardless of port and protocol, and perform no filtering whatsoever on any output packets to anywhere. I wouldn't expect this to interfere.<br>

<br>

Anyway, it's not that it NEVER works once the firewall is switched on. It's that it flaps. The firewall is clearly passing enough traffic to have the node marked as up some of the time. But why the periodic "not responding" ... "responding" cycles? Once it says "not responding" I can still scontrol ping from the compute node in question, and standard ICMP ping from one to the other works as well.<br>

<br>

Best,<br>

<br>

Sean<br>

<br>

<br></span><span>

On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <<a href="mailto:alex@calicolabs.com" target="_blank">alex@calicolabs.com</a> <mailto:<a href="mailto:alex@calicolabs.com" target="_blank">alex@calicolabs.com</a>>> wrote:<br>

<br>

    Add a logging rule to your iptables and look at what traffic is<br>

    actually being blocked?<br>

<br>

    On Wed, May 16, 2018 at 11:11 AM Sean Caron <<a href="mailto:scaron@umich.edu" target="_blank">scaron@umich.edu</a><br></span><div><div class="m_2215946678730725843m_8599250747320344412m_-4999014754154529261m_329189167299971363h5">

    <mailto:<a href="mailto:scaron@umich.edu" target="_blank">scaron@umich.edu</a>>> wrote:<br>

<br>

        Hi all,<br>

<br>

        Does anyone use SLURM in a scenario where there is an iptables<br>

        firewall on the compute nodes on the same network it uses to<br>

        communicate with the SLURM controller and DBD machine?<br>

<br>

        I have the very basic situation where ...<br>

<br>

        1. There is no iptables firewall enabled at all on the SLURM<br>

        controller/DBD machine.<br>

<br>

        2. Compute nodes are set to permit all ports and protocols from<br>

        the SLURM controller with a rule like:<br>

<br>

        -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT<br>

<br>

        If I enable this on the compute nodes, they flap up in down in<br>

        "Not responding state". If I switch off the firewall on the<br>

        compute nodes, they work fine.<br>

<br>

        When firewall is up on the compute nodes, SLURM controller can<br>

        ping compute nodes, no problem. I have no reason to believe all<br>

        ports and protocols are not being passed. Time is synched. No<br>

        trouble accessing slurm.conf on any of the clients.<br>

<br>

        Has anyone seen this before? There seems to be very little<br>

        information about SLURM's interactions with iptables. I know<br>

        this is kind of a funky scenario but regulatory requirements<br>

        have me needing to tighten down our cluster network a little<br>

        bit. Is this like a latency issue, or ...?<br>

<br>

        Thanks,<br>

<br>

        Sean<br>

<br>

<br>

</div></div></blockquote>

<br>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div>