Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support applying ClusterNetworkPolicy to Nodes in Antrea #5671

Closed
tnqn opened this issue Nov 6, 2023 · 6 comments
Closed

Support applying ClusterNetworkPolicy to Nodes in Antrea #5671

tnqn opened this issue Nov 6, 2023 · 6 comments
Labels
area/network-policy Issues or PRs related to network policies. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@tnqn
Copy link
Member

tnqn commented Nov 6, 2023

Describe what you are trying to solve

#4213 asked for the ability to apply NetworkPolicies to not only Pods but also Kubernetes Nodes. Antrea currently has a feature called ExternalNode, which enables users to apply NetworkPolicies for non-Kubernetes Nodes. The two requirements are kind of the same thing from a technical perspective.

The implementation of ExternalNode moves the primary network interfaces of Nodes to OVS bridge and uses OpenFlow flows to enforce policies, which met great chanllenges in compatibility with OSes, especially when DHCP is used and on cloud platforms. #5192 and #5221 are two examples.

This proposal describes a simple and steady way to apply NetworkPolicies to Kubernetes Nodes, which can also be used by ExternalNode in the future, unifying the implementation and resolving the known problems.

Describe the solution you have in mind

Dataplane

There could be four technical approaches to apply NetworkPolicies to Nodes:

Approach Pros Cons
OVS, bridge network interface 1. Maximum code reuse
2. Effcient
1. Hard to support multiple interfaces
2. Hard to be compaible with various network device managers
tc-eBPF 1. Have full control on packets 1. Hard to support multiple interfaces
2. More efforts to implement firewall from scratch: including priority-based packet classier, connection tracking
OVS with tc-eBPF redirecting packets only 1. Good code reuse 1. Hard to support multiple interfaces
2. Complex datapath
3. Uncertain side effect
iptables/netfilter 1. Mature firewall mechanism
2. Easy to support multiple interfaces
1. Poor performance under a significant number of rule

All things considered, iptables/netfilter seems the most practicable approach if its performance disadvantage doesn't cause a problem in this use case. Then let's look at how it behaves with different numbers of rules. I have ran benchmark using iperf and netperf with different number rules in iptables filter table and here are the numbers:
image

As seen from the numbers, throughput (by Bitrate) and latency (by TCP_RR) are almost unaffected by the number of rules, which meets the expectation as the number of rules only affects the first packet of a connection in most cases. TCP_CRR became worse while number of rules increased. It decreased 9.09% with 50 rules and 13.6% with 100 rules. But considering the use case and the expectation that one NetworkPolicy rule will be mapped to an iptables rule regardless of the number of IPs used by it (thanks to ipset), it's unlikely that the number of rules applied to a single Node could exceed 100 and perhaps could be less as users will likely just deny all and only allow specific accesses on the principle of least privilege.

So we propose to use iptables/netfilter in dataplane.

API Change

Considring Node is a cluster scope resource, and the current API types in Antrea, we will make the following API changes:

  1. A policy/rule's AppliedTo can use NodeSelector, which means the policy/rule is applied to Nodes, instead of Pods.
  2. It's only applicable to ClusterNetworkPolicy
  3. Optionally, we can enforce a validation that AppliedTo can't use NodeSelector and other selectors together to simplify the implementation.

An example of a ClusterNetworkPolicy applied to Nodes, it allows all Nodes in this cluster and clients whose IP are in 10.0.0.0/24 can access control-plane Nodes's TCP 6443 port, and drops all other accesses to these Nodes.

apiVersion: crd.antrea.io/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: node-policy
  spec:
    priority: 5
    tier: securityops
    appliedTo:
    - nodeSelector:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
    ingress:
    - action: Allow
    ports:
    - port: 6443
      protocol: TCP
      from:
      - ipBlock:
          cidr: 10.0.0.0/24       # External clients
      - nodeSelector: {}          # Nodes in this cluster
  - action: Drop

Configuration Change

  1. A feature gate NodeNetworkPolicy will control the enablement of the feature.
  2. Considering a misconfiguration of NodeNetworkPolicies is disruptive and perhaps irreversible (for example, a deny-all rule could stop all communication between kube-apiserver and antrea components, no further policy change can be implemented), we add a collection of static rules that take priority over NodeNetworkPolicies by default, users can configure the "priviledge" rules in the configuration file if the default ones don't work for them.
  3. To support applying NodeNetworkPolicy to multiple interfaces of the Node, there will be a configuration option allowing users to specify network interfaces to which NodeNetworkPoilices will be applied, e.g. [eth*, ens160]. By default, it applies to all network interfaces.

Workflow

image

Interfaces

In antrea-agent, we will reuse the Reconciler interface, and create a new implementation, which programs iptables and ipset to enforce the given rules.

// Reconciler is an interface that knows how to reconcile the desired state of CompletedRule.
type Reconciler interface {
	// Reconcile reconciles the desired state of the provided CompletedRule.
	Reconcile(rule *CompletedRule) error

	// BatchReconcile reconciles the desired state of the provided CompletedRules.
	BatchReconcile(rules []*CompletedRule) error

	// Forget cleans up the actual state of the specified ruleID.
	Forget(ruleID string) error
}

// NodeRuleReconciler implements Reconciler using iptables and ipset, capable of handling rule
// applied to Nodes.
type NodeRuleReconciler struct {
	iptables iptable.Interface
	ipset ipset.Interface
}

In Controller.syncRule, we will call NodeRuleReconciler instead of the existing reconciler to realize a rule if it applies to Nodes:

func (c *Controller) syncRule(key string) error {
	...
	if rule.IsAppliedToNodes() {
	    err = r.nodeRuleReconciler.Reconcile(rule)
	} else {
	    err = r.podRuleReconciler.Reconcile(rule)
	}
	...
}

Plan

  • NodeNetworkPolicy targets release 1.15
  • Unify implementation of ExternalNode on Linux (Long term)
@tnqn tnqn added kind/design Categorizes issue or PR as related to design. area/network-policy Issues or PRs related to network policies. labels Nov 6, 2023
@hongliangl
Copy link
Contributor

Dataplane PRs:

@tnqn
Copy link
Member Author

tnqn commented Nov 15, 2023

@hongliangl I suppose there is a PR implementing the NodeRuleReconcoler consuming the above new util methods, is it ready for review? It's hard to judge an util is reasonable without looking at how they are consumed.

@ColonelBundy
Copy link

@tnqn Thank you for bringing some attention to this. I like your proposal and it looks like it would cover all the use cases I had from my original issue.

Well done 👍

@hongliangl
Copy link
Contributor

@hongliangl I suppose there is a PR implementing the NodeRuleReconcoler consuming the above new util methods, is it ready for review? It's hard to judge an util is reasonable without looking at how they are consumed.

Please look at this PR #5658

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 14, 2024
@tnqn
Copy link
Member Author

tnqn commented Feb 23, 2024

Close this as completed. Related PRs: #5658 #5716

@tnqn tnqn closed this as completed Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network-policy Issues or PRs related to network policies. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants