[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled

Patrick Goetz pgoetz at math.utexas.edu
Thu May 17 11:21:34 MDT 2018


Does your SMS have a dedicated interface for node traffic?

On 05/16/2018 04:00 PM, Sean Caron wrote:
> I see some chatter on 6818/TCP from the compute node to the SLURM 
> controller, and from the SLURM controller to the compute node.
> 
> The policy is to permit all packets inbound from SLURM controller 
> regardless of port and protocol, and perform no filtering whatsoever on 
> any output packets to anywhere. I wouldn't expect this to interfere.
> 
> Anyway, it's not that it NEVER works once the firewall is switched on. 
> It's that it flaps. The firewall is clearly passing enough traffic to 
> have the node marked as up some of the time. But why the periodic "not 
> responding" ... "responding" cycles? Once it says "not responding" I can 
> still scontrol ping from the compute node in question, and standard ICMP 
> ping from one to the other works as well.
> 
> Best,
> 
> Sean
> 
> 
> On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com 
> <mailto:alex at calicolabs.com>> wrote:
> 
>     Add a logging rule to your iptables and look at what traffic is
>     actually being blocked?
> 
>     On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu
>     <mailto:scaron at umich.edu>> wrote:
> 
>         Hi all,
> 
>         Does anyone use SLURM in a scenario where there is an iptables
>         firewall on the compute nodes on the same network it uses to
>         communicate with the SLURM controller and DBD machine?
> 
>         I have the very basic situation where ...
> 
>         1. There is no iptables firewall enabled at all on the SLURM
>         controller/DBD machine.
> 
>         2. Compute nodes are set to permit all ports and protocols from
>         the SLURM controller with a rule like:
> 
>         -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
> 
>         If I enable this on the compute nodes, they flap up in down in
>         "Not responding state". If I switch off the firewall on the
>         compute nodes, they work fine.
> 
>         When firewall is up on the compute nodes, SLURM controller can
>         ping compute nodes, no problem. I have no reason to believe all
>         ports and protocols are not being passed. Time is synched. No
>         trouble accessing slurm.conf on any of the clients.
> 
>         Has anyone seen this before? There seems to be very little
>         information about SLURM's interactions with iptables. I know
>         this is kind of a funky scenario but regulatory requirements
>         have me needing to tighten down our cluster network a little
>         bit. Is this like a latency issue, or ...?
> 
>         Thanks,
> 
>         Sean
> 
> 



More information about the slurm-users mailing list