Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update KEP 740 #4852

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

HarshalNeelkamal
Copy link

@HarshalNeelkamal HarshalNeelkamal commented Sep 13, 2024

  • One-line PR description: Update KEP for External ServiceAccount JWT signing

  • Issue link: API for external signing of Service Account tokens #740

  • Other comments:

    • The KEP was marked provisional in KEP-740: fix incorrect status #3653 due to being stale for over 3 years.
    • This PR attempts to bring the KEP up to speed with recent changes.
    • The intention is that underlying implementation can be picked back up.
    • A consensus regarding external JWT signing was achieved in the community (as documented here)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 13, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: HarshalNeelkamal
Once this PR has been reviewed and has the lgtm label, please assign mikedanese for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Sep 13, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @HarshalNeelkamal!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 13, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @HarshalNeelkamal. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 13, 2024
Copy link
Contributor

@ahmedtd ahmedtd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One high-level point of discussion... does this need any feature gates? It seems like it is explicitly controlled by a kube-apiserver flag, so there's no point in additionally adding a feature gate.

@HarshalNeelkamal
Copy link
Author

/cc @liggitt @enj @micahhausler

- "@liggit"
- "@tallclair"
approvers:
- "@mikedanese"
- "@taahm"
- "@liggit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: liggitt

owning-sig: sig-auth
participating-sigs: []
reviewers:
- "@mikedanese"
- "@taahm"
- "@liggit"
- "@tallclair"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add @enj @micahhausler as reviewers

* Initial PR: kubernetes/kubernetes#73110
Initial PRs:
- kubernetes/kubernetes#73110
- kubernetes/kubernetes#125177
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy all missing sections / questions from https://raw.githubusercontent.com/kubernetes/enhancements/master/keps/NNNN-kep-template/README.md so they're all here, even if they're optional for alpha and we leave them as TODO for now

- [New API](#new-api)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [Plugins] (#plugins)
- [Updates for token generation](#updates-for-token-generation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +35 to +37
Enable integration with alternate key management solutions for service account keys. While at it, make the association between kube-apiserver and service account keys more agile.

Benefits are 2 fold:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can simplify to just list item 1 and 2 alone and drop the preamble

message SignPayloadRequest {
// payload is the content to be signed. JWT headers must be included by the caller
message SignJWTRequest {
// The raw payload to be signed. A serialized JSON object, no base64 wrapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this already be base64 encoded to remove a possible mistake in the external signer encoding (and symmetric with the "Already wrapped in URL-safe base64" fields in the response)

message SignPayloadRequest {
// payload is the content to be signed. JWT headers must be included by the caller
message SignJWTRequest {
// The raw payload to be signed. A serialized JSON object, no base64 wrapping.
bytes payload = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this claims to make it clear which bit of the jwt this is

Suggested change
bytes payload = 1;
bytes claims = 1;

message SignJWTResponse {
// The header for the JWT. Already wrapped in URL-safe base64, valid for
// directly inserting into the final JWT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoist in the docs from https://docs.google.com/document/d/1QVtBX8J2Tk70toefDegXMXigMFzrdQfezjlhBrKR29Y/edit?tab=t.0#bookmark=id.xf2j5nbnuczz with the restrictions kube-apiserver will put on the returned header

  // header must contain alg, kid, typ claims.
  // type must be “JWT”.
  // alg must be one of the algorithms supported by kube-apiserver (currently RS256, ES256, ES384, ES512).

string algorithm = 4;
// The timestamp for when it's appropriate to refetch verification keys to
// pick changes if any.
google.protobuf.Timestamp suggested_next_fetch_at = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a minimum we'd accept here? (what would kube-apiserver do if this is before time.Now()? make sure we don't hot-loop or do something crazy if skew is encountered like that)

is this what informs the GetCacheAgeMaxSeconds() implementation for PublicKeysGetter? something like suggested_next_fetch_at.Sub(time.Now())/time.Second? that would mean the OIDC docs would return shrinking cache maxAge headers as we got closer and closer to the internal refresh time

Need to document how to use this and/or the metrics to safely add new keys and promote them to signing keys in an external signer

- `<package>`: `<date>` - `<test coverage>`
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an integration test section describing wiring up a real working kube-apiserver configured to run with this signer (KMS KEP might have a good testing section you can mimic)

@dims
Copy link
Member

dims commented Sep 18, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 18, 2024
@k8s-ci-robot
Copy link
Contributor

@HarshalNeelkamal: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-enhancements-verify fda0a65 link true /test pull-enhancements-verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

- [Implementation History](#implementation-history)
<!-- /toc -->

## Summary

The Kubernetes API server has always read service account keys from disk as the process starts, and kept them in memory for the duration of the server's lifetime. As the API server can now verify and issue projected volume tokens, it would be advantageous to support external signing and verifying of token data over an API, as well as reading public keys from an API.
Service account keys are used by kube-apiserver for JWT signing and authentication. Keys are stored on the disk and are loaded by kube-apiserver during process start time. Key management for service account keys is entirely within the cluster and is less flexible. This KEP proposes updates that will allow Kubernetes distributions to integrate with key management solutions of their choice (eg: HSMs, cloud KMSes).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

"... and is less flexible."

Less flexible than what? It might be better to either compare it to something, or just state what it can't/doesn't do (ex: keys are exportable)


### Risks and Mitigations

- New token generation and validation could suffer a performance difference when reading over a socket, as an external process will be signing data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the performance change is not really because a separate process is signing material (indeed, the external signer could be optimized and sign in even less CPU cycles than kube-api), its the communication over a socket (and potentially a remote RPC call, but depends on the signer implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

6 participants