Skip to content

Commit

Permalink
Merge pull request #4089 from jkh52/knp-template
Browse files Browse the repository at this point in the history
KEP-1281: update stale content.
  • Loading branch information
k8s-ci-robot authored Jun 21, 2023
2 parents 78f729a + d3688bc commit 35befff
Showing 1 changed file with 52 additions and 50 deletions.
102 changes: 52 additions & 50 deletions keps/sig-api-machinery/1281-network-proxy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@
- [Security](#security)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [User Stories](#user-stories)
- [Combined Master and Node Network](#combined-master-and-node-network)
- [Master and Untrusted Node Network](#master-and-untrusted-node-network)
- [Master and Node Networks which are not IP Routable](#master-and-node-networks-which-are-not-ip-routable)
- [Combined Control Plane and Node Network](#combined-control-plane-and-node-network)
- [Control Plane and Untrusted Node Network](#control-plane-and-untrusted-node-network)
- [Control Plane and Node Networks which are not IP Routable](#control-plane-and-node-networks-which-are-not-ip-routable)
- [Better Monitoring](#better-monitoring)
- [Design Details](#design-details)
- [Risks and Mitigations](#risks-and-mitigations)
Expand All @@ -40,16 +40,16 @@ default behavior is insufficient.

## Motivation

Kubernetes has outgrown the [SSH tunnels](https://github.com/kubernetes/kubernetes/issues/54076).
They complicate KAS code and only one cloud provider implemented them.
After a year of deprecation time, they will be removed in an upcoming release.
Historically, Kubernetes used [SSH tunnels](https://github.com/kubernetes/kubernetes/issues/54076), but they only
functioned on GCE; they were deprecated in 1.9 and [removed in
1.22](https://github.com/kubernetes/kubernetes/pull/102297).

In retrospect, having an explicit level of indirection that separates user-initiated network traffic from API
server-initiated traffic is a useful concept.
Cloud providers want to control how API server to pod, node and service network traffic is implemented.
Cloud providers may choose to run their API server (control network) and the cluster nodes (cluster network)
on isolated networks. The control and cluster networks may have overlapping IP addresses.
There for they require a non IP routing layer (SSH tunnel are an example).
Therefore they require a non-IP routing proxy layer (SSH tunnel are an example).
Adding this layer enables metadata audit logging. It allows validation of outgoing API server connections.
Structuring the API server in this way is a forcing function for keeping architectural layering violations out of apiserver.
In combination with a firewall, this separation of networks protects against security concerns such as
Expand All @@ -58,32 +58,32 @@ In combination with a firewall, this separation of networks protects against sec
### Goals

Delete the SSH Tunnel/Node Dialer code from Kube APIServer.
Enable admins to fix https://groups.google.com/d/msg/kubernetes-security-announce/tyd-MVR-tY4/tyREP9-qAwAJ.
Enable admins to mitigate https://groups.google.com/d/msg/kubernetes-security-announce/tyd-MVR-tY4/tyREP9-qAwAJ.
Allow isolation of the Control network from the Cluster network.

### Non-Goals

Build a general purpose Proxy which does everything. (Users should build their own
custom proxies with the desired behavior, based on the provided proxy)
custom proxies with the desired behavior, based on the provided proxy.)
Handle requests from the Cluster to the Control Plane. (The proxy can be extended to
do this. However that is left to the User if they want that behavior)
do this. However that is left to the User if they want that behavior.)

## Definitions

- **Master Network** An IP reachable network space which contains the master components, such as Kubernetes API Server,
- **Control Plane Network** An IP reachable network space which contains the control plane components, such as Kubernetes API Server,
Connectivity Proxy and ETCD server.
- **Node Network** An IP reachable network space which contains all the clusters Nodes, for alpha.
Worth noting that the Node Network may be completely disjoint from the Master network.
It may have overlapping IP addresses to the Master Network or other means of network isolation.
Direct IP routability between cluster and master networks should not be assumed.
Worth noting that the Node Network may be completely disjoint from the Control Plane network.
It may have overlapping IP addresses to the Control Plane Network or other means of network isolation.
Direct IP routability between cluster and control plane networks should not be assumed.
Later version may relax the all node requirement to some.
- **KAS** Kubernetes API Server, responsible for serving the Kubernetes API to clients.
- **KMS** Key Management Service, plugins for secrets encryption key management
- **Egress Selector** A component built into the KAS which provides a golang dialer for outgoing connection requests.
The dialer provided depends on NetworkContext information.
- **Konnectivity Server** The proxy server which runs in the master network.
- **Konnectivity Server** The proxy server which runs in the control plane network.
It has a secure channel established to the cluster network.
It could work on either a HTTP Connect mechanism or gRPC.
It could work on either a gRPC or HTTP Connect mechanism.
If the former it would exposes a gRPC interface to KAS to provide connectivity service.
If the latter it would use standard HTTP Connect.
Formerly known the the Network Proxy Server.
Expand All @@ -95,15 +95,15 @@ Implies no overlapping (i.e. shared) IPs on the network.

## Proposal

We will run a connectivity server inside the master network.
We will run a connectivity server inside the control plane network.
It could work on either a HTTP Connect mechanism or gRPC.
For the alpha version we will attempt to get this working with HTTP Connect.
We will evaluate HTTP Connect for scalability, error handling and traffic types.
For scalability we will be looking at the number of required open connections.
Increasing usage of webhooks means we need better than 1 request per connection (multiplexing).
We also need the tunnel to be tolerant of errors in the requests it is transporting.
HTTP-Connect only supports HTTP requests and not things like DNS requests.
We assume that for HTTP URL request,s it will be the proxy which does the DNS lookup.
We assume that for HTTP URL requests, it will be the proxy which does the DNS lookup.
However this means that we cannot have the KAS perform a DNS request to then do a follow on request.
If no issues are found with HTTP Connect in these areas we will proceed with it.
If an issue is found then we will update the KEP and switch the client to the gRPC solution.
Expand All @@ -112,7 +112,7 @@ This should be as simple as switching the connection mode of the client code.
It may be desirable to allow out of band data (metadata) to be transmitted from the KAS to the Proxy Server.
We expect to handle metadata in the HTTP Connect case using http 'X' headers on the Connect request.
This means that the metadata can only be sent when establishing a KAS to Proxy tunnel.
For the GRPC case we just update the interface to the KAS.
For the gRPC case we just update the interface to the KAS.
In this case the metadata can be sent even during tunnel usage.

Each connectivity proxy allows secure connections to one or more cluster networks.
Expand All @@ -124,7 +124,7 @@ Running the connectivity proxy in a separate process has a few advantages.
- The connectivity proxy can be extended without recompiling the KAS.
Administrators can run their own variants of the connectivity proxy.
- Traffic can be audited or forwarded (eg. via a proprietary VPN) using a custom connectivity proxy.
- The separation removes master <-> cluster connectivity concerns from the KAS.
- The separation removes control plane <-> cluster connectivity concerns from the KAS.
- The code and responsibility separation lowers the complexity of the KAS code base.
- The separation reduces the effects of issue such as crashes in the connectivity impacting the KAS.
Connectivity issues will not stop the KAS from serving API requests.
Expand All @@ -133,13 +133,13 @@ A problem with a node, set of nodes or load-balancers configuration, may be fixe

![API Server Network Proxy Simple Cluster](NetworkProxySimpleCluster.png)
The diagram shows API Server’s outgoing traffic flow.
The user (in blue box), master network (in purple cloud) and
The user (in blue box), control plane network (in purple cloud) and
a cluster network (in green cloud) are represented.

The user (blue) initiates communication to the KAS.
The KAS then initiates connections to other components.
It could be node/pod/service in cluster networks (red dotted arrow to green cloud),
or etcd for storage in the same master network (blue arrow) or mutate the request
or etcd for storage in the same control plane network (blue arrow) or mutate the request
based on an admission web-hook (red dotted arrow to purple cloud).
The KAS handles these cases based on NetworkContext based traffic routing.
The connectivity proxy should be able to do routing solely based on IP.
Expand All @@ -154,8 +154,8 @@ The minimal NetworkContext looks like the following struct in golang:
type EgressType int

const (
// Master is the EgressType for traffic intended to go to the control plane.
Master EgressType = iota
// ControlPlane is the EgressType for traffic intended to go to the control plane.
ControlPlane EgressType = iota
// Etcd is the EgressType for traffic intended to go to Kubernetes persistence store.
Etcd
// Cluster is the EgressType for traffic intended to go to the system being managed by Kubernetes.
Expand All @@ -174,12 +174,12 @@ type NetworkContext struct {
EgressSelectionName specifies the network to route traffic to.
The KAS starts with a list of registered konnectivity service names. These
correspond to networks we route traffic to. So the KAS knows where to
proxy the traffic to, otherwise it return an “Unknown network” error.
proxy the traffic to, otherwise it returns an “Unknown network” error.

The KAS starts with a proxy configuration like the below example.
The example specifies 4 networks. "direct" specifies the KAS talking directly
on the local network (no proxy). "master" specifies the KAS talks to a proxy
listening at 1.2.3.4:5678. "cluster" specifies the KAS talk to a proxy
on the local network (no proxy). "controlplane" specifies the KAS talks to a proxy
listening at 1.2.3.4:5678. "cluster" specifies the KAS talks to a proxy
listening at 1.2.3.5:5679. While these are represented as resources
they are not intended to be loaded dynamically. The names are not case
sensitive. The KAS loads this resource list as a configuration at start time.
Expand All @@ -191,7 +191,7 @@ egressSelections:
- name: direct
connection:
type: direct
- name: master
- name: controlplane
connection:
type: grpc
url: grpc://1.2.3.4:5678
Expand Down Expand Up @@ -246,10 +246,11 @@ It should run on the same machine and must run in the same flat network as the K
It listens on a port for gRPC connections from the KAS.
This port would be for forwarding traffic to the appropriate cluster.
It should have an administrative port speaking https.
The administrative port serves the liveness probe and metrics.
The administrative port serves metrics and (optional) debug/pprof handlers.
It should have a health check port, serving liveness and readiness probes.
The liveness probe prevents a partially broken cluster
where the KAS cannot connect to the cluster. This port also serves
pprof debug commands and monitoring data for the proxy.
where the KAS cannot connect to the cluster.
The readiness probe indicates that at least one Konnectivity Agent is connected.

### Direct Connection

Expand All @@ -264,17 +265,17 @@ Simple clusters (not needing network segregation) run this way to avoid the over
The majority of the KAS communication originates from incoming requests.
Here we cover the outgoing requests. This is our understanding of those requests
and some details as to how they fit in this model. For the alpha release we
support 'master', 'etcd' and 'cluster' connectivity service names.
support 'controlplane', 'etcd' and 'cluster' connectivity service names.

- **ETCD** It is possible to make etcd talk via the proxy.
The etcd client takes a transport.
(https://github.com/etcd-io/etcd/blob/master/client/client.go#L101)
(https://github.com/etcd-io/etcd/blob/main/client/internal/v2/client.go#L101)
We will add configuration as to which proxy an etcd client should use.
(https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/config.go)
This will add an extra process hop to our main scaling axis.
We will scale test the impact and publish the results. As a precaution
we will add an extra network configuration 'etcd' separate from ‘master’.
Etcd requests can be configured separately from the rest of 'master'.
we will add an extra network configuration 'etcd' separate from ‘controlplane’.
Etcd requests can be configured separately from the rest of 'controlplane'.
- **Pods/Exec**, **Pods/Proxy**, **Pods/Portforward**, **Pods/Attach**, **Pods/Log**
Pod requests (and pod sub-resource requests) are meant for the cluster
and will be routed based on the ‘cluster’ NetworkContext.
Expand All @@ -287,14 +288,13 @@ and will be routed based on the ‘cluster’ NetworkContext.
- **Admission Webhooks**
Admission webhooks can be destined for a service or a URL.
If destined for a service then the service rules apply (send to 'cluster').
If destined for a URL then we will use the ‘master’ NetworkContext.
If destined for a URL then we will use the ‘controlplane’ NetworkContext.
- **Aggregated API Server (and OpenAPI requests for aggregated resources)**
Aggregated API Servers can be destined for a service or a URL.
Aggregated API Servers can be destined for a service.
If destined for a service then the service rules apply.
If destined for a URL then we will use the ‘master’ NetworkContext.
- **Authentication, Authorization and Audit Webhooks**
These Webhooks use a kube config file to determine destination.
Given that we use a ‘master’ NetworkContext.
Given that we use a ‘controlplane’ NetworkContext.

**Note**: KMS is also an egress endpoint but will not be covered as egress since it only supports a [Dialer](https://github.com/kubernetes/kubernetes/blob/e8bc121341807f9e33a076f6725b1b1a18d75ba0/staging/src/k8s.io/apiserver/pkg/storage/value/encrypt/envelope/grpc_service.go#L74) using unix domain sockets (UDS). This is used for communicating between processes running on the same host. In the future, we may consider adding egressSelector support if KMS accepts other protocols.

Expand All @@ -321,28 +321,28 @@ implementation of the API Server Network Proxy.

## User Stories

#### Combined Master and Node Network
#### Combined Control Plane and Node Network

Customers can run a cluster which combines the master and cluster networks.
Customers can run a cluster which combines the control plane and cluster networks.
They configure all their connectivity configuration to direct.
This bypasses the proxy and optimizes the performance. For a customer with no
security concerns with combined network, this is a fairly simple straight forward configuration.

#### Master and Untrusted Node Network
#### Control Plane and Untrusted Node Network

A customer may want to isolate their master from their cluster network. This may be a
A customer may want to isolate their control plane from their cluster network. This may be a
simple separation of concerns or due to something like running untrusted workloads on
the cluster network. Placing a firewall between the master and
the cluster network. Placing a firewall between the control plane and
cluster networks accomplishes this. A few ports for the KAS public port and Proxy public port
are opened between these networks. Separation of concerns minimizes the
accidental interactions between the master and cluster networks. It minimizes bandwidth
accidental interactions between the control plane and cluster networks. It minimizes bandwidth
consumption on the cluster network negatively impacting the control plane. The
combination of firewall and proxy minimizes the interaction between the networks to
a set which can be more easily reasoned about, checked and monitored.

#### Master and Node Networks which are not IP Routable
#### Control Plane and Node Networks which are not IP Routable

If master and cluster network CIDRs are not controlled by the same entity, then they
If control plane and cluster network CIDRs are not controlled by the same entity, then they
can end up having conflicting IP CIDRs. Traffic cannot be routed between
them based strictly on IP address. The connection proxy solves this issue.
It also solves connectivity using a VPN tunnel. The proxy offloads the work off sending traffic
Expand All @@ -352,7 +352,7 @@ to the cluster network from the KAS. The proxy gives us extensibility.

Instrumenting the network proxy requests with out of band data
(Eg. requester identity/tradition context) enables a Proxy to
provide increased monitoring of Master originated requests.
provide increased monitoring of Control Plane originated requests.


## Design Details
Expand All @@ -367,7 +367,7 @@ both for several releases.

### Test Plan

The primary test plan is to set up a network namespace with a firewall dividing the master and cluster
The primary test plan is to set up a network namespace with a firewall dividing the control plane and cluster
networks. Then running the existing tests for logs, proxy and portforward to ensure the
routing works correctly. It should work with the correct configuration and fail correctly
with a direct configuration. Normal tests would be run with the direct
Expand Down Expand Up @@ -408,6 +408,8 @@ Beta:
- Feature went Alpha in 1.16 with limited functionality. It will cover the log
sub resource and communication to the etcd server.

- Feature went Beta in 1.18.

## Alternatives [optional]

- Leave SSH Tunnels (deprecated) in the KAS. Prevents us from making the KAS cloud provider agnostic. Blocks out of tree effort.
Expand All @@ -416,5 +418,5 @@ Beta:

## Infrastructure Needed [optional]

Any one wishing to use this feature will need to create network proxy images/pods on the master and set up the ConnectivityServiceConfiguration.
Any one wishing to use this feature will need to create network proxy images/pods on the control plane and set up the EgressSelectorConfiguration.
The network proxy provided is meant as a reference implementation. Users as expected to extend it for their needs.

0 comments on commit 35befff

Please sign in to comment.