[slurm-users] SLURM nodes flap in "Not responding" status when iptables firewall enabled
Sean Caron
scaron at umich.edu
Wed May 16 15:00:01 MDT 2018
I see some chatter on 6818/TCP from the compute node to the SLURM
controller, and from the SLURM controller to the compute node.
The policy is to permit all packets inbound from SLURM controller
regardless of port and protocol, and perform no filtering whatsoever on any
output packets to anywhere. I wouldn't expect this to interfere.
Anyway, it's not that it NEVER works once the firewall is switched on. It's
that it flaps. The firewall is clearly passing enough traffic to have the
node marked as up some of the time. But why the periodic "not responding"
... "responding" cycles? Once it says "not responding" I can still scontrol
ping from the compute node in question, and standard ICMP ping from one to
the other works as well.
Best,
Sean
On Wed, May 16, 2018 at 2:13 PM, Alex Chekholko <alex at calicolabs.com> wrote:
> Add a logging rule to your iptables and look at what traffic is actually
> being blocked?
>
> On Wed, May 16, 2018 at 11:11 AM Sean Caron <scaron at umich.edu> wrote:
>
>> Hi all,
>>
>> Does anyone use SLURM in a scenario where there is an iptables firewall
>> on the compute nodes on the same network it uses to communicate with the
>> SLURM controller and DBD machine?
>>
>> I have the very basic situation where ...
>>
>> 1. There is no iptables firewall enabled at all on the SLURM
>> controller/DBD machine.
>>
>> 2. Compute nodes are set to permit all ports and protocols from the SLURM
>> controller with a rule like:
>>
>> -A INPUT -s IP.of.SLURM.controller/32 -j ACCEPT
>>
>> If I enable this on the compute nodes, they flap up in down in "Not
>> responding state". If I switch off the firewall on the compute nodes, they
>> work fine.
>>
>> When firewall is up on the compute nodes, SLURM controller can ping
>> compute nodes, no problem. I have no reason to believe all ports and
>> protocols are not being passed. Time is synched. No trouble accessing
>> slurm.conf on any of the clients.
>>
>> Has anyone seen this before? There seems to be very little information
>> about SLURM's interactions with iptables. I know this is kind of a funky
>> scenario but regulatory requirements have me needing to tighten down our
>> cluster network a little bit. Is this like a latency issue, or ...?
>>
>> Thanks,
>>
>> Sean
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180516/4fd65024/attachment.html>
More information about the slurm-users
mailing list