[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

Mon May 21 13:36:01 MDT 2018

Glad to hear that you make it work.

Regards
Matthieu

2018-05-21 21:21 GMT+02:00 Sean Caron <scaron at umich.edu>:

> Just wanted to follow up. In addition to passing all traffic to the SLURM
> controller, opened port 6818/TCP to all other compute nodes and this seems
> to have resolved the issue. Thanks again, Matthieu!
>
> Best,
>
> Sean
>
>
> On Thu, May 17, 2018 at 8:06 PM, Sean Caron <scaron at umich.edu> wrote:
>
>> Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will
>> give that a shot and see what happens.
>>
>> Best,
>>
>> Sean
>>
>>
>> On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux <
>> matthieu.hautreux at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Communications in Slurm are not only performed from controller to slurmd
>>> and from slurmd to controller. You need to ensure that your login nodes can
>>> reach the controller and the slurmd nodes as well as ensure that slurmd on
>>> the various nodes can contact each other. This last requirement is because
>>> of the tree logic used in slurm communication :
>>>
>>> - to ensure scalability, slurmctld use a communication tree (see
>>> TreeWidth in "man slurm.conf"), used for example to periodically check that
>>> all the nodes are working properly
>>> - the same exact logic is used by srun when it contacts the various
>>> slurmd involved in its step
>>> - reversed tree communications are performed among slurmds of steps at
>>> their end to send accounting data and other stuff to the controller
>>>
>>> - only some communications are point-to-point between slurmd and
>>> controller, especially the "registering call" performed at slurmd startup.
>>>
>>> When slurmd can not contact each other because of network failures
>>> (partitioning) or too restrictive filtering, then you see the kind of
>>> flapping that you have. This is because point-to-point communication at
>>> slurmd registering make them appears to the controller, tree checks make
>>> some of them dissapear, retries can lead to point to point communications
>>> to some nodes when the amount of destination nodes contacted by the
>>> controller at the same time is lower than the configured TreeWidth, thus
>>> nodes suddenly reappear... until the next check... and so on.
>>>
>>> Two options for you :
>>>
>>> - be less restrictive in your filtering rules
>>> - set TreeWidth to 1 in slurm.conf but you will loose the
>>> performance/scalability of slurm internals communication
>>>
>>> If your cluster is large, I would recommend to use the first one.
>>>
>>> HTH
>>> Matthieu
>>>
>>> PS : you can look at that presentation for a few details on the
>>> communication logic :
>>> https://slurm.schedmd.com/SUG14/message_aggregation.pdf
>>>
>>>
>>>
>>> 2018-05-17 22:21 GMT+02:00 Sean Caron <scaron at umich.edu>:
>>>
>>>> Sorry, how do you mean? The environment is very basic. Compute nodes
>>>> and SLURM controller are on an RFC1918 subnet. Gateways are dual homed with
>>>> one leg on a public IP and one leg on the RFC1918 cluster network. It used
>>>> to be that nodes that only had a leg on the RFC1918 network (compute nodes
>>>> and the SLURM controller) had no firewall at all and nodes that were dual
>>>> homed basically were set to just permit all traffic from the cluster side
>>>> NIC (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT).
>>>>
>>>> Now we're trying to go back to the gateways and compute nodes and
>>>> actually codify, instead of just passing all traffic from the cluster side
>>>> NIC, what ports and protocols are actually in use, or at least, what
>>>> server-to-server communication is expected and normative, and then define a
>>>> rule set to permit those while dropping other traffic not explicitly
>>>> whitelisted.
>>>>
>>>> The compute and gateway nodes work fine with SLURM even when iptables
>>>> is enabled and the policy is "permit all traffic from that NIC" but once we
>>>> tighten it down just a little bit to "permit all traffic to and from the
>>>> SLURM controller" we see these weird instances of node state flapping. It's
>>>> not clear to me why this is the case since from the standpoint of node to
>>>> controller communications, these policies are logically very similar, but
>>>> there it is. The nodes shouldn't have to talk to anything else besides the
>>>> SLURM controller for SLURM to work, so long as time is synched up between
>>>> them and there are no issues with the nodes getting to slurm.conf.
>>>>
>>>> Best,
>>>>
>>>> Sean
>>>>
>>>>
>>>> On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <pgoetz at math.utexas.edu>
>>>> wrote:
>>>>
>>>>> Does your SMS have a dedicated interface for node traffic?
>>>>>
>>>>> On 05/16/2018 04:00 PM, Sean Caron wrote:
>>>>>
>>>>>> I see some chatter on 6818/TCP from the compute node to the SLURM
>>>>>> controller, and from the SLURM controller to the compute node.
>>>>>>
>>>>>> The policy is to permit all packets inbound from SLURM controller
>>>>>> regardless of port and protocol, and perform no filtering whatsoever on any
>>>>>> output packets to anywhere. I wouldn't expect this to interfere.
>>>>>>
>>>>>> Anyway, it's not that it NEVER works once the firewall is switched
>>>>>> on. It's that it flaps. The firewall is clearly passing enough traffic to
>>>>>> have the node marked as up some of the time. But why the periodic "not
>>>>>> responding" ... "responding" cycles? Once it says "not responding" I can
>>>>>> still scontrol ping from the compute node in question, and standard ICMP
>>>>>> ping from one to the other works as well.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com
>>>>>> <mailto:alex at calicolabs.com>> wrote:
>>>>>>
>>>>>>     Add a logging rule to your iptables and look at what traffic is
>>>>>>     actually being blocked?
>>>>>>
>>>>>>     On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu
>>>>>>     <mailto:scaron at umich.edu>> wrote:
>>>>>>
>>>>>>         Hi all,
>>>>>>
>>>>>>         Does anyone use SLURM in a scenario where there is an iptables
>>>>>>         firewall on the compute nodes on the same network it uses to
>>>>>>         communicate with the SLURM controller and DBD machine?
>>>>>>
>>>>>>         I have the very basic situation where ...
>>>>>>
>>>>>>         1. There is no iptables firewall enabled at all on the SLURM
>>>>>>         controller/DBD machine.
>>>>>>
>>>>>>         2. Compute nodes are set to permit all ports and protocols
>>>>>> from
>>>>>>         the SLURM controller with a rule like:
>>>>>>
>>>>>>         -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>>>>>>
>>>>>>         If I enable this on the compute nodes, they flap up in down in
>>>>>>         "Not responding state". If I switch off the firewall on the
>>>>>>         compute nodes, they work fine.
>>>>>>
>>>>>>         When firewall is up on the compute nodes, SLURM controller can
>>>>>>         ping compute nodes, no problem. I have no reason to believe
>>>>>> all
>>>>>>         ports and protocols are not being passed. Time is synched. No
>>>>>>         trouble accessing slurm.conf on any of the clients.
>>>>>>
>>>>>>         Has anyone seen this before? There seems to be very little
>>>>>>         information about SLURM's interactions with iptables. I know
>>>>>>         this is kind of a funky scenario but regulatory requirements
>>>>>>         have me needing to tighten down our cluster network a little
>>>>>>         bit. Is this like a latency issue, or ...?
>>>>>>
>>>>>>         Thanks,
>>>>>>
>>>>>>         Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180521/7b72f3fa/attachment-0001.html>