[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled
Patrick Goetz
pgoetz at math.utexas.edu
Thu May 17 11:21:34 MDT 2018
Does your SMS have a dedicated interface for node traffic?
On 05/16/2018 04:00 PM, Sean Caron wrote:
> I see some chatter on 6818/TCP from the compute node to the SLURM
> controller, and from the SLURM controller to the compute node.
>
> The policy is to permit all packets inbound from SLURM controller
> regardless of port and protocol, and perform no filtering whatsoever on
> any output packets to anywhere. I wouldn't expect this to interfere.
>
> Anyway, it's not that it NEVER works once the firewall is switched on.
> It's that it flaps. The firewall is clearly passing enough traffic to
> have the node marked as up some of the time. But why the periodic "not
> responding" ... "responding" cycles? Once it says "not responding" I can
> still scontrol ping from the compute node in question, and standard ICMP
> ping from one to the other works as well.
>
> Best,
>
> Sean
>
>
> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com
> <mailto:alex at calicolabs.com>> wrote:
>
> Add a logging rule to your iptables and look at what traffic is
> actually being blocked?
>
> On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu
> <mailto:scaron at umich.edu>> wrote:
>
> Hi all,
>
> Does anyone use SLURM in a scenario where there is an iptables
> firewall on the compute nodes on the same network it uses to
> communicate with the SLURM controller and DBD machine?
>
> I have the very basic situation where ...
>
> 1. There is no iptables firewall enabled at all on the SLURM
> controller/DBD machine.
>
> 2. Compute nodes are set to permit all ports and protocols from
> the SLURM controller with a rule like:
>
> -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>
> If I enable this on the compute nodes, they flap up in down in
> "Not responding state". If I switch off the firewall on the
> compute nodes, they work fine.
>
> When firewall is up on the compute nodes, SLURM controller can
> ping compute nodes, no problem. I have no reason to believe all
> ports and protocols are not being passed. Time is synched. No
> trouble accessing slurm.conf on any of the clients.
>
> Has anyone seen this before? There seems to be very little
> information about SLURM's interactions with iptables. I know
> this is kind of a funky scenario but regulatory requirements
> have me needing to tighten down our cluster network a little
> bit. Is this like a latency issue, or ...?
>
> Thanks,
>
> Sean
>
>
More information about the slurm-users
mailing list