[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

Thu May 17 18:06:39 MDT 2018

Awesome tip. Thanks so much, Matthieu. I hadn't considered that. I will
give that a shot and see what happens.

Best,

Sean

On Thu, May 17, 2018 at 4:49 PM, Matthieu Hautreux <
matthieu.hautreux at gmail.com> wrote:

> Hi,
>
> Communications in Slurm are not only performed from controller to slurmd
> and from slurmd to controller. You need to ensure that your login nodes can
> reach the controller and the slurmd nodes as well as ensure that slurmd on
> the various nodes can contact each other. This last requirement is because
> of the tree logic used in slurm communication :
>
> - to ensure scalability, slurmctld use a communication tree (see TreeWidth
> in "man slurm.conf"), used for example to periodically check that all the
> nodes are working properly
> - the same exact logic is used by srun when it contacts the various slurmd
> involved in its step
> - reversed tree communications are performed among slurmds of steps at
> their end to send accounting data and other stuff to the controller
>
> - only some communications are point-to-point between slurmd and
> controller, especially the "registering call" performed at slurmd startup.
>
> When slurmd can not contact each other because of network failures
> (partitioning) or too restrictive filtering, then you see the kind of
> flapping that you have. This is because point-to-point communication at
> slurmd registering make them appears to the controller, tree checks make
> some of them dissapear, retries can lead to point to point communications
> to some nodes when the amount of destination nodes contacted by the
> controller at the same time is lower than the configured TreeWidth, thus
> nodes suddenly reappear... until the next check... and so on.
>
> Two options for you :
>
> - be less restrictive in your filtering rules
> - set TreeWidth to 1 in slurm.conf but you will loose the
> performance/scalability of slurm internals communication
>
> If your cluster is large, I would recommend to use the first one.
>
> HTH
> Matthieu
>
> PS : you can look at that presentation for a few details on the
> communication logic :
> https://slurm.schedmd.com/SUG14/message_aggregation.pdf
>
>
>
> 2018-05-17 22:21 GMT+02:00 Sean Caron <scaron at umich.edu>:
>
>> Sorry, how do you mean? The environment is very basic. Compute nodes and
>> SLURM controller are on an RFC1918 subnet. Gateways are dual homed with one
>> leg on a public IP and one leg on the RFC1918 cluster network. It used to
>> be that nodes that only had a leg on the RFC1918 network (compute nodes and
>> the SLURM controller) had no firewall at all and nodes that were dual homed
>> basically were set to just permit all traffic from the cluster side NIC
>> (i.e. iptables rule like -A INPUT -i ethX -j ACCEPT).
>>
>> Now we're trying to go back to the gateways and compute nodes and
>> actually codify, instead of just passing all traffic from the cluster side
>> NIC, what ports and protocols are actually in use, or at least, what
>> server-to-server communication is expected and normative, and then define a
>> rule set to permit those while dropping other traffic not explicitly
>> whitelisted.
>>
>> The compute and gateway nodes work fine with SLURM even when iptables is
>> enabled and the policy is "permit all traffic from that NIC" but once we
>> tighten it down just a little bit to "permit all traffic to and from the
>> SLURM controller" we see these weird instances of node state flapping. It's
>> not clear to me why this is the case since from the standpoint of node to
>> controller communications, these policies are logically very similar, but
>> there it is. The nodes shouldn't have to talk to anything else besides the
>> SLURM controller for SLURM to work, so long as time is synched up between
>> them and there are no issues with the nodes getting to slurm.conf.
>>
>> Best,
>>
>> Sean
>>
>>
>> On Thu, May 17, 2018 at 1:21 PM, Patrick Goetz <pgoetz at math.utexas.edu>
>> wrote:
>>
>>> Does your SMS have a dedicated interface for node traffic?
>>>
>>> On 05/16/2018 04:00 PM, Sean Caron wrote:
>>>
>>>> I see some chatter on 6818/TCP from the compute node to the SLURM
>>>> controller, and from the SLURM controller to the compute node.
>>>>
>>>> The policy is to permit all packets inbound from SLURM controller
>>>> regardless of port and protocol, and perform no filtering whatsoever on any
>>>> output packets to anywhere. I wouldn't expect this to interfere.
>>>>
>>>> Anyway, it's not that it NEVER works once the firewall is switched on.
>>>> It's that it flaps. The firewall is clearly passing enough traffic to have
>>>> the node marked as up some of the time. But why the periodic "not
>>>> responding" ... "responding" cycles? Once it says "not responding" I can
>>>> still scontrol ping from the compute node in question, and standard ICMP
>>>> ping from one to the other works as well.
>>>>
>>>> Best,
>>>>
>>>> Sean
>>>>
>>>>
>>>> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com
>>>> <mailto:alex at calicolabs.com>> wrote:
>>>>
>>>>     Add a logging rule to your iptables and look at what traffic is
>>>>     actually being blocked?
>>>>
>>>>     On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu
>>>>     <mailto:scaron at umich.edu>> wrote:
>>>>
>>>>         Hi all,
>>>>
>>>>         Does anyone use SLURM in a scenario where there is an iptables
>>>>         firewall on the compute nodes on the same network it uses to
>>>>         communicate with the SLURM controller and DBD machine?
>>>>
>>>>         I have the very basic situation where ...
>>>>
>>>>         1. There is no iptables firewall enabled at all on the SLURM
>>>>         controller/DBD machine.
>>>>
>>>>         2. Compute nodes are set to permit all ports and protocols from
>>>>         the SLURM controller with a rule like:
>>>>
>>>>         -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>>>>
>>>>         If I enable this on the compute nodes, they flap up in down in
>>>>         "Not responding state". If I switch off the firewall on the
>>>>         compute nodes, they work fine.
>>>>
>>>>         When firewall is up on the compute nodes, SLURM controller can
>>>>         ping compute nodes, no problem. I have no reason to believe all
>>>>         ports and protocols are not being passed. Time is synched. No
>>>>         trouble accessing slurm.conf on any of the clients.
>>>>
>>>>         Has anyone seen this before? There seems to be very little
>>>>         information about SLURM's interactions with iptables. I know
>>>>         this is kind of a funky scenario but regulatory requirements
>>>>         have me needing to tighten down our cluster network a little
>>>>         bit. Is this like a latency issue, or ...?
>>>>
>>>>         Thanks,
>>>>
>>>>         Sean
>>>>
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180517/493203cc/attachment.html>