[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

Mon May 21 13:21:08 MDT 2018

Just wanted to follow up. In addition to passing all traffic to the SLURM
controller, opened port 6818/TCP to all other compute nodes and this seems
to have resolved the issue. Thanks again, Matthieu!

Best,

Sean

On Thu, May 17, 2018 at 8:06 PM, Sean Caron <scaron at umich.edu> wrote:

> Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will
> give that a shot and see what happens.
>
> Best,
>
> Sean
>
>
> On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux <
> matthieu.hautreux at gmail.com> wrote:
>
>> Hi,
>>
>> Communications in Slurm are not only performed from controller to slurmd
>> and from slurmd to controller. You need to ensure that your login nodes can
>> reach the controller and the slurmd nodes as well as ensure that slurmd on
>> the various nodes can contact each other. This last requirement is because
>> of the tree logic used in slurm communication :
>>
>> - to ensure scalability, slurmctld use a communication tree (see
>> TreeWidth in "man slurm.conf"), used for example to periodically check that
>> all the nodes are working properly
>> - the same exact logic is used by srun when it contacts the various
>> slurmd involved in its step
>> - reversed tree communications are performed among slurmds of steps at
>> their end to send accounting data and other stuff to the controller
>>
>> - only some communications are point-to-point between slurmd and
>> controller, especially the "registering call" performed at slurmd startup.
>>
>> When slurmd can not contact each other because of network failures
>> (partitioning) or too restrictive filtering, then you see the kind of
>> flapping that you have. This is because point-to-point communication at
>> slurmd registering make them appears to the controller, tree checks make
>> some of them dissapear, retries can lead to point to point communications
>> to some nodes when the amount of destination nodes contacted by the
>> controller at the same time is lower than the configured TreeWidth, thus
>> nodes suddenly reappear... until the next check... and so on.
>>
>> Two options for you :
>>
>> - be less restrictive in your filtering rules
>> - set TreeWidth to 1 in slurm.conf but you will loose the
>> performance/scalability of slurm internals communication
>>
>> If your cluster is large, I would recommend to use the first one.
>>
>> HTH
>> Matthieu
>>
>> PS : you can look at that presentation for a few details on the
>> communication logic :
>> https://slurm.schedmd.com/SUG14/message_aggregation.pdf
>>
>>
>>
>> 2018-05-17 22:21 GMT+02:00 Sean Caron <scaron at umich.edu>:
>>
>>> Sorry, how do you mean? The environment is very basic. Compute nodes and
>>> SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one
>>> leg on a public IP and one leg on the RFC1918 cluster network. It used to
>>> be that nodes that only had a leg on the RFC1918 network (compute nodes and
>>> the SLURM controller) had no firewall at all and nodes that were dual homed
>>> basically were set to just permit all traffic from the cluster side NIC
>>> (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT).
>>>
>>> Now we're trying to go back to the gateways and compute nodes and
>>> actually codify, instead of just passing all traffic from the cluster side
>>> NIC, what ports and protocols are actually in use, or at least, what
>>> server-to-server communication is expected and normative, and then define a
>>> rule set to permit those while dropping other traffic not explicitly
>>> whitelisted.
>>>
>>> The compute and gateway nodes work fine with SLURM even when iptables is
>>> enabled and the policy is "permit all traffic from that NIC" but once we
>>> tighten it down just a little bit to "permit all traffic to and from the
>>> SLURM controller" we see these weird instances of node state flapping. It's
>>> not clear to me why this is the case since from the standpoint of node to
>>> controller communications, these policies are logically very similar, but
>>> there it is. The nodes shouldn't have to talk to anything else besides the
>>> SLURM controller for SLURM to work, so long as time is synched up between
>>> them and there are no issues with the nodes getting to slurm.conf.
>>>
>>> Best,
>>>
>>> Sean
>>>
>>>
>>> On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <pgoetz at math.utexas.edu>
>>> wrote:
>>>
>>>> Does your SMS have a dedicated interface for node traffic?
>>>>
>>>> On 05/16/2018 04:00 PM, Sean Caron wrote:
>>>>
>>>>> I see some chatter on 6818/TCP from the compute node to the SLURM
>>>>> controller, and from the SLURM controller to the compute node.
>>>>>
>>>>> The policy is to permit all packets inbound from SLURM controller
>>>>> regardless of port and protocol, and perform no filtering whatsoever on any
>>>>> output packets to anywhere. I wouldn't expect this to interfere.
>>>>>
>>>>> Anyway, it's not that it NEVER works once the firewall is switched on.
>>>>> It's that it flaps. The firewall is clearly passing enough traffic to have
>>>>> the node marked as up some of the time. But why the periodic "not
>>>>> responding" ... "responding" cycles? Once it says "not responding" I can
>>>>> still scontrol ping from the compute node in question, and standard ICMP
>>>>> ping from one to the other works as well.
>>>>>
>>>>> Best,
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com
>>>>> <mailto:alex at calicolabs.com>> wrote:
>>>>>
>>>>>     Add a logging rule to your iptables and look at what traffic is
>>>>>     actually being blocked?
>>>>>
>>>>>     On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu
>>>>>     <mailto:scaron at umich.edu>> wrote:
>>>>>
>>>>>         Hi all,
>>>>>
>>>>>         Does anyone use SLURM in a scenario where there is an iptables
>>>>>         firewall on the compute nodes on the same network it uses to
>>>>>         communicate with the SLURM controller and DBD machine?
>>>>>
>>>>>         I have the very basic situation where ...
>>>>>
>>>>>         1. There is no iptables firewall enabled at all on the SLURM
>>>>>         controller/DBD machine.
>>>>>
>>>>>         2. Compute nodes are set to permit all ports and protocols from
>>>>>         the SLURM controller with a rule like:
>>>>>
>>>>>         -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>>>>>
>>>>>         If I enable this on the compute nodes, they flap up in down in
>>>>>         "Not responding state". If I switch off the firewall on the
>>>>>         compute nodes, they work fine.
>>>>>
>>>>>         When firewall is up on the compute nodes, SLURM controller can
>>>>>         ping compute nodes, no problem. I have no reason to believe all
>>>>>         ports and protocols are not being passed. Time is synched. No
>>>>>         trouble accessing slurm.conf on any of the clients.
>>>>>
>>>>>         Has anyone seen this before? There seems to be very little
>>>>>         information about SLURM's interactions with iptables. I know
>>>>>         this is kind of a funky scenario but regulatory requirements
>>>>>         have me needing to tighten down our cluster network a little
>>>>>         bit. Is this like a latency issue, or ...?
>>>>>
>>>>>         Thanks,
>>>>>
>>>>>         Sean
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180521/781cf1d9/attachment.html>