fix(network_policy): consolidate policy chains #906

aauren · 2020-05-22T21:30:48Z

Instead of attaching each individual pod FW chain to each primary chain in the filter table (INPUT, FORWARD, and OUTPUT), consolidate all of the rules into a chain called KUBE-ROUTER-FIREWALL.

This reduces the amount of chains that we have to track by about 4 rules per pod per network policy and it allows us to better control the placement of where network policy rules go as compared to services rules and other rules in the primary filter chains.

Fixes #905 by ensuring that KUBE-ROUTER-SERVICES is always evaluated first in the INPUT chain before the new global firewall chain: KUBE-ROUTER-FIREWALL. In this way access to services that are on the same node as the host will pass through INPUT (when they are still src -> pod IP addr : dst -> service VIP), get properly disambiguated via IPVS, and then get filtered properly when they pass through OUTPUT (when they are now src -> pod IP addr : dst -> pod IP addr of service).

Instead of attaching each individual pod FW chain to each primary chain in the filter table (INPUT, FORWARD, and OUTPUT), consolidate all of the rules into a chain called KUBE-ROUTER-FIREWALL. This reduces the amount of chains that we have to track by about 4 rules per pod per network policy and it allows us to better control the placement of where network policy rules go as compared to services rules and other rules in the primary filter chains.

murali-reddy · 2020-05-23T01:44:44Z

@aauren unfortunatley this PR breaks the usability of kube-router's network policy implementation that is independently usable with any service proxy and CNI

many users use kube-router as CNI and network policy implementation along with kube-proxy
use of node's pod CIDR is specific to kube-router as CNI so matching by pod's IP is needed for generic implementation

aauren · 2020-05-25T15:55:59Z

@murali-reddy I understand what you're saying. I now see why you don't rely on podCIDR much in the current code base.

However, this PR could easily be adapted to instead use an ipset that contains all of the pod IP addresses that are local to the node instead of relying on the podCIDR. This would allow for different CNIs to handle their IP address provisioning differently while still implementing single hook points into the primary iptables filter chains. Relying on these singe hooks in the primary chains will make our rules easier to maintain, less invasive to the host, and also reduce the number of rules that we have to add to the host increasing sync performance.

In terms of being able to prioritize service rules for other service proxies (like kube-proxy) on the INPUT chain I would be interested to hear your thoughts on possible solutions. I would argue that there may not be a clean way to make this work for all of the potential proxy solutions that someone could use.

Unless we keep a list of all of the potential chains from the main service proxy providers and look for chains by name (like I did with KUBE-ROUTER-SERVICES) we're left with either insert or append semantics, both of which will likely break other proxy and firewall offerings that are concurrently manipulating iptables. In this case append will break the default firewall engine that comes with Fedora, CentOS, and RedHat that at least some of our users use, and insert will likely break most other service proxies.

In any case, I think that at the very least making sure that we don't break our own use case (like this PR does) seems prudent. While it is a huge strength that kube-router is so modular and is able to work with many other proxy, firewall, and routing solutions, this may be a case where trying to be all things to all people may not be possible for us.

murali-reddy · 2020-05-28T07:29:08Z

thanks @aauren for considering to close the PR in favour of #909, #914, #915. clariying the question you raised

However, this PR could easily be adapted to instead use an ipset that contains all of the pod IP addresses that are local to the node instead of relying on the podCIDR. This would allow for different CNIs to handle their IP address provisioning differently while still implementing single hook points into the primary iptables filter chains.

Sure we could optimize maintaining ipset. Right now as part of #909 I added a rule in built in chain for e.g. in INPUT chain to jump traffic to KUBE-ROUTER-INPUT. This basically applies to all traffic. Modifying that rule to jump only if local pod IP set with matching source or destination will optimize

Relying on these singe hooks in the primary chains will make our rules easier to maintain, less invasive to the host, and also reduce the number of rules that we have to add to the host increasing sync performance.

#909 makes it less invasive now

In terms of being able to prioritize service rules for other service proxies (like kube-proxy) on the INPUT chain I would be interested to hear your thoughts on possible solutions. I would argue that there may not be a clean way to make this work for all of the potential proxy solutions that someone could use.

Best way is not to make any assumptions/expections fron proxy to keep things modular and loosley coupled #914 does it in proxy agnostic manner

aauren closed this May 27, 2020

aauren deleted the consolidate_network_policy_chains branch September 4, 2023 15:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(network_policy): consolidate policy chains #906

fix(network_policy): consolidate policy chains #906

aauren commented May 22, 2020 •

edited

Loading

murali-reddy commented May 23, 2020

aauren commented May 25, 2020

murali-reddy commented May 28, 2020 •

edited

Loading

fix(network_policy): consolidate policy chains #906

fix(network_policy): consolidate policy chains #906

Conversation

aauren commented May 22, 2020 • edited Loading

murali-reddy commented May 23, 2020

aauren commented May 25, 2020

murali-reddy commented May 28, 2020 • edited Loading

aauren commented May 22, 2020 •

edited

Loading

murali-reddy commented May 28, 2020 •

edited

Loading