-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-1645: Add specification for multi-network scenario #3045
Changes from 12 commits
a2fa9f4
172dd80
79db429
ba397fa
4f2fbea
4ff7540
222a033
da1c8a0
ee29840
67a89be
340fd9e
e3fda31
ea8a3c8
4420a11
dceb39e
0d43ec2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,6 +93,8 @@ tags, and then generate with `hack/update-toc.sh`. | |
- [Exporting Services](#exporting-services) | ||
- [Restricting Exports](#restricting-exports) | ||
- [Importing Services](#importing-services) | ||
- [Multi-network scenario](#multi-network-scenario) | ||
- [Known limitation](#known-limitation) | ||
- [ClusterSet Service Behavior Expectations](#clusterset-service-behavior-expectations) | ||
- [Service Types](#service-types) | ||
- [ClusterSetIP](#clustersetip) | ||
|
@@ -260,6 +262,12 @@ nitty-gritty. | |
The cluster name should be consistent for the life of a cluster and its | ||
membership in the clusterset. Implementations should treat name mutation as | ||
a delete of the membership followed by recreation with the new name. | ||
- **cluster network** - An identifier for the cluster network. Each cluster can have an optional name that can identify the network its running in. The network name must be a valid [RFC | ||
1123](https://tools.ietf.org/html/rfc1123) DNS label. Two or more clusters within the ClusterSet can have the same network identifier. | ||
|
||
The network name should be consistent during its | ||
membership in the clusterset. Implementations should treat network change as | ||
a delete of the membership followed by recreation with the new name. | ||
|
||
[namespace sameness]: https://github.com/kubernetes/community/blob/master/sig-multicluster/namespace-sameness-position-statement.md | ||
|
||
|
@@ -664,6 +672,12 @@ endpoints: | |
The `ServiceImport.Spec.IP` (VIP) can be used to access this service from within | ||
this cluster. | ||
|
||
|
||
#### Multi-network scenario | ||
One or more clusters in a ClusterSet can be running on a discrete network (a non-flat network). An MCS controller can use the `network.k8s.io` `ClusterProperty` to determine if a cluster in a `ClusterSet` is running on a discrete network. Note that the endpoints of the `EndpointSlice` for a cluster on discrete network may only be representative of the pods backing the multi-cluster service and not the real pod addresses. | ||
aattuluri marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How does the MCS controller know how to populate the endpoint slices with the appropriate endpoints? Does it use the ClusterProperty's Some common API here may be useful, ie: an annotation on either the The annotation could target an Ingress endpoint, public hostname, IP address, or in-cluster service.
I understand that the MCS controller implementation is not part of the multi cluster API, but it is at the very least useful to think of a full end to end solution to ensure the API is robust. |
||
##### Known limitation | ||
In a multi-network scenario where the `EndpointSlice`s do not contain the actual pod addresses, there isn't currently a way (K8s native support) to proportionately distribute the traffic based on the actual number of pods. There is active ongoing work in SIG-Network to add an attribute to represent the number of endpoints for `EndpointSlice`s. This will provide a way for kube-proxy to load balance across `EndpointSlice`s . | ||
aattuluri marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### ClusterSet Service Behavior Expectations | ||
|
||
#### Service Types | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,6 +90,7 @@ tags, and then generate with `hack/update-toc.sh`. | |
- [Multi-Cluster Services](#multi-cluster-services) | ||
- [Diagnostics](#diagnostics) | ||
- [Multi-tenant controllers](#multi-tenant-controllers) | ||
- [Multi-network scenario](#multi-network-scenario) | ||
- [<code>ClusterProperty</code> CRD](#-crd) | ||
- [Well known properties](#well-known-properties) | ||
- [Property: <code>id.k8s.io</code>](#property-) | ||
|
@@ -98,10 +99,16 @@ tags, and then generate with `hack/update-toc.sh`. | |
- [Contents](#contents) | ||
- [Consumers](#consumers) | ||
- [Notable scenarios](#notable-scenarios) | ||
- [Property: <code>clusterset.k8s.io</code>](#property--1) | ||
- [Property: <code>network.k8s.io</code>](#property--1) | ||
- [Uniqueness](#uniqueness-1) | ||
- [Lifespan](#lifespan-1) | ||
- [Contents](#contents-1) | ||
- [Consumers](#consumers-1) | ||
- [Notable scenarios](#notable-scenarios-1) | ||
- [Property: <code>clusterset.k8s.io</code>](#property--2) | ||
- [Lifespan](#lifespan-2) | ||
- [Contents](#contents-2) | ||
- [Consumers](#consumers-2) | ||
- [Additional Properties](#additional-properties) | ||
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
|
@@ -294,6 +301,9 @@ My controller interacts with multiple clusters and needs to disambiguate between | |
|
||
_For example, [CAPN's virtualcluster project](https://github.com/kubernetes-sigs/cluster-api-provider-nested) is implementing a multi-tenant scheduler that schedules tenant namespaces only in certain parent clusters, and a separate syncer running in each parent cluster controller needs to compare the name of the parent cluster to determine whether the namespace should be synced. ([ref](https://github.com/kubernetes/enhancements/issues/2149#issuecomment-768486457))._ | ||
|
||
#### Multi-network scenario | ||
|
||
With in a ClusterSet I have one or more clusters where pods across these clusters are not directly routable (a non-flat network). | ||
|
||
### `ClusterProperty` CRD | ||
|
||
|
@@ -307,7 +317,7 @@ The schema for `ClusterProperty` is intentionally loose to support multiple form | |
|
||
### Well known properties | ||
|
||
The `ClusterProperty` CRD will support two specific properties under the well known names `id.k8s.io` and `clusterset.k8s.io`. Being "well known" means that they must conform to the requirements described below, and therefore can be depended on by multi-cluster implementations to achieve use cases dependent on knowledge of a cluster's ID or ClusterSet membership. | ||
The `ClusterProperty` CRD will support three specific properties under the well known names `id.k8s.io`, `network.k8s.io` and `clusterset.k8s.io`. Being "well known" means that they must conform to the requirements described below, and therefore can be depended on by multi-cluster implementations to achieve use cases dependent on knowledge of a cluster's ID or ClusterSet membership. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I asked somewhere (can't find where) about whether we can/should revisit the need for suffix for "well known" properties vs user-defined. I.e could/should these be "id", "network", "clusterset" ? |
||
|
||
The requirements below use the keywords **must, should,** and **may** purposefully in accordance with [RFC-2119](https://tools.ietf.org/html/rfc2119). | ||
|
||
|
@@ -353,6 +363,37 @@ Contains a unique identifier for the containing cluster. | |
|
||
**Reusing cluster names**: Since an `id.k8s.io ClusterProperty` has no restrictions on whether or not a ClusterProperty can be repeatable, if a cluster unregisters from a ClusterSet it is permitted under this standard to rejoin later with the same `id.k8s.io ClusterProperty` it had before. Similarly, a *different* cluster could join a ClusterSet with the same `id.k8s.io ClusterProperty` that had been used by another cluster previously, as long as both do not have membership in the same ClusterSet at the same time. Finally, two or more clusters may have the same `id.k8s.io ClusterProperty` concurrently (though they **should** not; see "Uniqueness" above) *as long as* they both do not have membership in the same ClusterSet. | ||
|
||
#### Property: `network.k8s.io` | ||
wojtek-t marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Contains an identifier representing the network for the cluster. | ||
|
||
|
||
##### Uniqueness | ||
|
||
* The identifier **need not** exist (as its only applicable for multi-network scenario) and **need not** be unique | ||
|
||
##### Lifespan | ||
aattuluri marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* The identifier, if exists, **should** be immutable for the lifespan of a ClusterSet membership. | ||
aattuluri marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
##### Contents | ||
|
||
* The identifier **should** be a valid string. | ||
* The identifier **should** be a human readable description. | ||
|
||
|
||
##### Consumers | ||
|
||
* **Should** be able to rely on the identifier, if exists, unmodified for the entire duration of its membership in a ClusterSet. | ||
* **Should** watch the `network.k8s.io` property to handle potential changes if they live beyond the ClusterSet membership. | ||
* **May** rely on the existence of an identifier for clusters that do not belong to a ClusterSet so long as the implementation provides one. | ||
|
||
|
||
##### Notable scenarios | ||
|
||
**Cluster changes its network**: Since a `network.k8s.io ClusterProperty` must be immutable for the duration of its *membership* in a given ClusterSet, the property contents can be "changed" by unregistering the cluster from the ClusterSet and reregistering it with the new network name. | ||
|
||
#### Property: `clusterset.k8s.io` | ||
|
||
Contains an identifier that relates the containing cluster to the ClusterSet in which it belongs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes an implicit assumption that egress network connectivity is mapped via identifiers to figure if given traffic needs to go through a proxy or is a direct connection. Has it been considered whether or not to express that specifically (e.g. the API is focused on "this traffic is proxied" vs saying "we have labeled our networks and this labeling implicitly implies that there is some proxying". The second statement assumes more about the environment and imposes a specific naming, labeling mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will reword this to more generic after the right name is chosen in favor of
network