Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-1645: Add specification for multi-network scenario #3045

Closed
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions keps/sig-multicluster/1645-multi-cluster-services-api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@ tags, and then generate with `hack/update-toc.sh`.
- [Exporting Services](#exporting-services)
- [Restricting Exports](#restricting-exports)
- [Importing Services](#importing-services)
- [Multi-network scenario](#multi-network-scenario)
- [Known limitation](#known-limitation)
- [ClusterSet Service Behavior Expectations](#clusterset-service-behavior-expectations)
- [Service Types](#service-types)
- [ClusterSetIP](#clustersetip)
Expand Down Expand Up @@ -260,6 +262,12 @@ nitty-gritty.
The cluster name should be consistent for the life of a cluster and its
membership in the clusterset. Implementations should treat name mutation as
a delete of the membership followed by recreation with the new name.
- **cluster network** - An identifier for the cluster network. Each cluster can have an optional name that can identify the network its running in. The network name must be a valid [RFC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes an implicit assumption that egress network connectivity is mapped via identifiers to figure if given traffic needs to go through a proxy or is a direct connection. Has it been considered whether or not to express that specifically (e.g. the API is focused on "this traffic is proxied" vs saying "we have labeled our networks and this labeling implicitly implies that there is some proxying". The second statement assumes more about the environment and imposes a specific naming, labeling mechanism.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will reword this to more generic after the right name is chosen in favor of network

1123](https://tools.ietf.org/html/rfc1123) DNS label. Two or more clusters within the ClusterSet can have the same network identifier.

The network name should be consistent during its
membership in the clusterset. Implementations should treat network change as
a delete of the membership followed by recreation with the new name.

[namespace sameness]: https://github.com/kubernetes/community/blob/master/sig-multicluster/namespace-sameness-position-statement.md

Expand Down Expand Up @@ -664,6 +672,12 @@ endpoints:
The `ServiceImport.Spec.IP` (VIP) can be used to access this service from within
this cluster.


#### Multi-network scenario
One or more clusters in a ClusterSet can be running on a discrete network (a non-flat network). An MCS controller can use the `network.k8s.io` `ClusterProperty` to determine if a cluster in a `ClusterSet` is running on a discrete network. Note that the endpoints of the `EndpointSlice` for a cluster on discrete network may only be representative of the pods backing the multi-cluster service and not the real pod addresses.
aattuluri marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the endpoints of the EndpointSlice for a cluster on discrete network may only be representative of the pods backing the multi-cluster service

How does the MCS controller know how to populate the endpoint slices with the appropriate endpoints? Does it use the ClusterProperty's network value? Or is that a network name that can signal to the MCS controller to use some mapping of endpoints that exists $somewhere?

Some common API here may be useful, ie: an annotation on either the ServiceImport, or the ServiceExport would allow for different services to leverage different services (may be for some QOS isolation reasons).

The annotation could target an Ingress endpoint, public hostname, IP address, or in-cluster service.

proxy.mcs.k8s.io: <prox-url>

I understand that the MCS controller implementation is not part of the multi cluster API, but it is at the very least useful to think of a full end to end solution to ensure the API is robust.

##### Known limitation
In a multi-network scenario where the `EndpointSlice`s do not contain the actual pod addresses, there isn't currently a way (K8s native support) to proportionately distribute the traffic based on the actual number of pods. There is active ongoing work in SIG-Network to add an attribute to represent the number of endpoints for `EndpointSlice`s. This will provide a way for kube-proxy to load balance across `EndpointSlice`s .
aattuluri marked this conversation as resolved.
Show resolved Hide resolved

### ClusterSet Service Behavior Expectations

#### Service Types
Expand Down
45 changes: 43 additions & 2 deletions keps/sig-multicluster/2149-clusterid/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ tags, and then generate with `hack/update-toc.sh`.
- [Multi-Cluster Services](#multi-cluster-services)
- [Diagnostics](#diagnostics)
- [Multi-tenant controllers](#multi-tenant-controllers)
- [Multi-network scenario](#multi-network-scenario)
- [<code>ClusterProperty</code> CRD](#-crd)
- [Well known properties](#well-known-properties)
- [Property: <code>id.k8s.io</code>](#property-)
Expand All @@ -98,10 +99,16 @@ tags, and then generate with `hack/update-toc.sh`.
- [Contents](#contents)
- [Consumers](#consumers)
- [Notable scenarios](#notable-scenarios)
- [Property: <code>clusterset.k8s.io</code>](#property--1)
- [Property: <code>network.k8s.io</code>](#property--1)
- [Uniqueness](#uniqueness-1)
- [Lifespan](#lifespan-1)
- [Contents](#contents-1)
- [Consumers](#consumers-1)
- [Notable scenarios](#notable-scenarios-1)
- [Property: <code>clusterset.k8s.io</code>](#property--2)
- [Lifespan](#lifespan-2)
- [Contents](#contents-2)
- [Consumers](#consumers-2)
- [Additional Properties](#additional-properties)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
Expand Down Expand Up @@ -294,6 +301,9 @@ My controller interacts with multiple clusters and needs to disambiguate between

_For example, [CAPN's virtualcluster project](https://github.com/kubernetes-sigs/cluster-api-provider-nested) is implementing a multi-tenant scheduler that schedules tenant namespaces only in certain parent clusters, and a separate syncer running in each parent cluster controller needs to compare the name of the parent cluster to determine whether the namespace should be synced. ([ref](https://github.com/kubernetes/enhancements/issues/2149#issuecomment-768486457))._

#### Multi-network scenario

With in a ClusterSet I have one or more clusters where pods across these clusters are not directly routable (a non-flat network).

### `ClusterProperty` CRD

Expand All @@ -307,7 +317,7 @@ The schema for `ClusterProperty` is intentionally loose to support multiple form

### Well known properties

The `ClusterProperty` CRD will support two specific properties under the well known names `id.k8s.io` and `clusterset.k8s.io`. Being "well known" means that they must conform to the requirements described below, and therefore can be depended on by multi-cluster implementations to achieve use cases dependent on knowledge of a cluster's ID or ClusterSet membership.
The `ClusterProperty` CRD will support three specific properties under the well known names `id.k8s.io`, `network.k8s.io` and `clusterset.k8s.io`. Being "well known" means that they must conform to the requirements described below, and therefore can be depended on by multi-cluster implementations to achieve use cases dependent on knowledge of a cluster's ID or ClusterSet membership.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked somewhere (can't find where) about whether we can/should revisit the need for suffix for "well known" properties vs user-defined. I.e could/should these be "id", "network", "clusterset" ?


The requirements below use the keywords **must, should,** and **may** purposefully in accordance with [RFC-2119](https://tools.ietf.org/html/rfc2119).

Expand Down Expand Up @@ -353,6 +363,37 @@ Contains a unique identifier for the containing cluster.

**Reusing cluster names**: Since an `id.k8s.io ClusterProperty` has no restrictions on whether or not a ClusterProperty can be repeatable, if a cluster unregisters from a ClusterSet it is permitted under this standard to rejoin later with the same `id.k8s.io ClusterProperty` it had before. Similarly, a *different* cluster could join a ClusterSet with the same `id.k8s.io ClusterProperty` that had been used by another cluster previously, as long as both do not have membership in the same ClusterSet at the same time. Finally, two or more clusters may have the same `id.k8s.io ClusterProperty` concurrently (though they **should** not; see "Uniqueness" above) *as long as* they both do not have membership in the same ClusterSet.

#### Property: `network.k8s.io`
wojtek-t marked this conversation as resolved.
Show resolved Hide resolved

Contains an identifier representing the network for the cluster.


##### Uniqueness

* The identifier **need not** exist (as its only applicable for multi-network scenario) and **need not** be unique

##### Lifespan
aattuluri marked this conversation as resolved.
Show resolved Hide resolved

* The identifier, if exists, **should** be immutable for the lifespan of a ClusterSet membership.
aattuluri marked this conversation as resolved.
Show resolved Hide resolved


##### Contents

* The identifier **should** be a valid string.
* The identifier **should** be a human readable description.


##### Consumers

* **Should** be able to rely on the identifier, if exists, unmodified for the entire duration of its membership in a ClusterSet.
* **Should** watch the `network.k8s.io` property to handle potential changes if they live beyond the ClusterSet membership.
* **May** rely on the existence of an identifier for clusters that do not belong to a ClusterSet so long as the implementation provides one.


##### Notable scenarios

**Cluster changes its network**: Since a `network.k8s.io ClusterProperty` must be immutable for the duration of its *membership* in a given ClusterSet, the property contents can be "changed" by unregistering the cluster from the ClusterSet and reregistering it with the new network name.

#### Property: `clusterset.k8s.io`

Contains an identifier that relates the containing cluster to the ClusterSet in which it belongs.
Expand Down