Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4020: Replace StorageversionAPI with AggregatedDiscovery to fetch served resources by peer apiservers #5113

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

richabanker
Copy link
Contributor

@richabanker richabanker commented Jan 31, 2025

  • One-line PR description: Replace StorageversionAPI with AggregatedDisovery to fetch served resources by peer apiservers. Also some formatting changes.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: richabanker
Once this PR has been reviewed and has the lgtm label, please assign fedebongio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 31, 2025
Yes, enabling this feature will result in new API calls. Specifically:

- Discovery calls via a loopback client: The local apiserver will use a loopback client to discover the resources it serves for each group-version
- Remote discovery calls to peer apiservers: The event handler for apiserver identity lease informer will make remote discovery calls to each peer apiserver whose lease object is added or updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often is this anticipated to happen? If there are no upgrades going on, are these still expected to happen on a regular interval or only once per apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the local discovery calls - those should happen only once upon server startup.
For aggregated discovery calls - since those are tied to the apiserver identity lease informer, unless we can identify a way to know when an apiserver was re-started just by looking at a lease-update, these calls will be made everytime the lease was updated. Given that leases are short lived, this could be a concern.

@@ -397,6 +376,7 @@ We will test the feature mostly in integration test and unit test. We may add e2

#### Beta

- Version aware proying implemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/proying/proxying

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually removed the Version aware proxying bit from this PR since I modified this PR to only focus on "replacing SV API with Agg Discovery"

@@ -305,7 +284,7 @@ For the mTLS between source and destination apiservers, we will do the following
2. The server (destination apiserver) will check the client (source apiserver) certs to determine that the proxy request is from an authenticated client. We will use requestheader authentication (and NOT client cert authentication) for this. The client (source apiserver) will provide the [proxy-client certfiles](https://github.com/kubernetes/kubernetes/blob/release-1.27/cmd/kube-apiserver/app/options/options.go#L222-L233) to the server (destination apiserver) which will verify the presented certs using the CA bundle provided in the [--requestheader-client-ca-file](https://github.com/kubernetes/kubernetes/blob/release-1.27/staging/src/k8s.io/apiserver/pkg/server/options/authentication.go#L125-L128) passed to the apiserver upon bootstrap

### Discovery Merging
TODO: detailed description of discovery merging. (not scheduled until beta.)
Maintaining the Remote Discovery cache, which provides a merged view of all resources available in the cluster at any given time (even during mixed version state), naturally solves the problem of serving a comprehensive discovery document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to elaborate more on the discovery merging mechanism. Is the merged discovery only generated and used by MVP to decide on proxying or would the discovery handling itself need to change to yield a merged document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this, will address merged discovery in a separate PR


5. If the proxy call fails for network issues or any reason, we serve 503 with error `Error while proxying request to destination apiserver`

6. We will also add a poststarthook for the apiserver to ensure that it does not start serving requests until we are done creating/updating SV objects
6. We will also add a poststarthook for the apiserver to ensure that it does not start serving requests until we have populated both Local Discovery and Remote Discovery caches.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on local discovery poststarthook. Remote discovery could be considerably slower? Do we really want to block on those as well? When working on aggregated discovery, we specifically did not block startup on remote apiservices (which is slightly different than remote apiservers) to avoid adding excessive startup time.


- We will add a new header in client requests to specify the compatibility version of the client

- We will publish [min-compatibility version](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/4330-compatibility-versions/README.md#--min-compatibility-version), binary-version of an apiserver as annotations in the apiserver-identity lease
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet! I don't see any section that states how we'll be using these annotations, but +1.


- With [compatibility versions](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/4330-compatibility-versions/README.md) information available for kubernetes components, we can leverage that to perform version aware proxying between API servers, and ensure that a request is proxied to an eligible apiserver given that it was generated by a client compatible with the apiserver/s present in a cluster. Version aware proxying lets us ensure that requests from clients (possibly) expecting to use a feature introduced in later K8s versions, are not incorrectly routed to apiservers at older versions

- We will add a new header in client requests to specify the compatibility version of the client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a MVP or compatibility version feature? I don't fully recall how client side feature gates work but does it make sense to gate client-side compatibility version headers on MVP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me open a separate PR for this discussion

@richabanker richabanker changed the title KEP-4020: Add beta criteria for Mixed Version Proxy [WIP] KEP-4020: Add beta criteria for Mixed Version Proxy Feb 4, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 5, 2025
@richabanker richabanker changed the title [WIP] KEP-4020: Add beta criteria for Mixed Version Proxy [WIP] KEP-4020: Replace StorageversionAPI with AggregatedDisovery to fetch served resources by peer apiservers Feb 5, 2025
@richabanker richabanker changed the title [WIP] KEP-4020: Replace StorageversionAPI with AggregatedDisovery to fetch served resources by peer apiservers [WIP] KEP-4020: Replace StorageversionAPI with AggregatedDiscovery to fetch served resources by peer apiservers Feb 5, 2025
@richabanker richabanker force-pushed the mvp-update branch 4 times, most recently from 9bef8fe to a173ee4 Compare February 5, 2025 21:28
@richabanker richabanker changed the title [WIP] KEP-4020: Replace StorageversionAPI with AggregatedDiscovery to fetch served resources by peer apiservers KEP-4020: Replace StorageversionAPI with AggregatedDiscovery to fetch served resources by peer apiservers Feb 5, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants