Skip to content

Conversation

@ardaguclu
Copy link
Member

@ardaguclu ardaguclu commented Dec 3, 2025

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 3, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 3, 2025

@ardaguclu: This pull request references CNTRLPLANE-2120 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

This PR is based on #1872 (changes in enhancements/kube-apiserver/kms-encryption-foundations.md).

There are many aspects that need to be implemented to support KMS in OpenShift. We have decided to open more granular EPs to better track the work.

This EPs main aim is to focus on the encryption controller changes in library-go. This EP defers some concepts to future in order to start with simpler, manageable iterations.

PoC PR openshift/library-go#2045 (this is just a PoC, original PR will be opened when this EP merges).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from hasbro17 and yuqi-zhang December 3, 2025 09:35
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from c091cbc to f734b05 Compare December 3, 2025 09:43
Copy link
Member

@flavianmissi flavianmissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to take a short break from reviewing, but leaving the comments I got so far.

@ardaguclu
Copy link
Member Author

/cc @ibihim @flavianmissi

@openshift-ci openshift-ci bot requested review from flavianmissi and ibihim December 3, 2025 14:08
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 2 times, most recently from 1794054 to 1ddc3d8 Compare December 4, 2025 07:19
@ardaguclu
Copy link
Member Author

@flavianmissi I was uncomfortable about the disconnects between the sections and the verbosity. So I overhauled the EP to have better clarity. Please let me know your thoughts.

@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch 3 times, most recently from e920a9c to 5804b76 Compare December 4, 2025 08:35
@ardaguclu ardaguclu force-pushed the kms-encryption-controllers branch from f39a0d7 to 8f79ed6 Compare December 5, 2025 04:21
@flavianmissi
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@ardaguclu
Copy link
Member Author

/cc @benluddy

@openshift-ci openshift-ci bot requested a review from benluddy December 5, 2025 13:00
@ardaguclu
Copy link
Member Author

As we agreed with @flavianmissi, in next iterations there will be another condition to notify users to delete unused kms plugins from cluster, when prune_controller prunes them.

@benluddy
Copy link

/approve
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 19, 2026
- As a cluster admin, I want the same migration and monitoring experience for KMS as local encryption, so I don't need to learn new procedures.
- As a security admin, I want encryption keys stored outside the cluster, so compromised control plane nodes cannot access keys.

### API Extensions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed just for clarification; since we don't plan any API updates in tech preview, do we still need your review/approval?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I suspect you don't. Do you envision any long term API changes as a result of this feature? If you expect any API changes in the long term then I would expect those to be represented in TechPreview though, so that they can be thought out and tested before we GA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you envision any long term API changes as a result of this feature? If you expect any API changes in the long term then I would expect those to be represented in TechPreview though, so that they can be thought out and tested before we GA

Changes in the first iteration will be backported to 4.21. So we wanted to keep it as simple as possible. However, in the next iteration (very likely in 4.22 release cycle), it is almost certain that we'll have API changes (based on the results of our architectural discussions). GA is expected to be in 4.23.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What defines the first iteration vs second iteration?

How will you prove the functionality of the first iteration meets the quality bar and is stable enough to be backported?

You might want to, in that case, run this as two separate feature gates, with the first set of functionality behind the first, focus on getting that tested and promoted, and then adding the new API and second phase behind the second

Given you are now talking about GA in 4.23, that means you'll want that API TP in 4.22 ideally, I'd suggest starting to flesh that out and having an API review here. You could wait until you've completed the first phase in theory but it might delay timelines

Copy link
Member Author

@ardaguclu ardaguclu Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This EP aims to provide foundations in order to enable KMS in encryption controllers in library-go. So basically, this EP is not strictly bound to API definitions (it just tracks the changes in the API fields to initiate migration e.g. #1900 (comment)). When we have API definitions in the next iterations, the work in here should remain almost same.

What defines the first iteration vs second iteration?

The minimal part that can be backported to 4.21 will be first and the rest (that do not need to be backported) will be considered as second.

How will you prove the functionality of the first iteration meets the quality bar and is stable enough to be backported?

We'll add e2e and integration tests even in the first iteration (as called out in the EP)

You might want to, in that case, run this as two separate feature gates, with the first set of functionality behind the first, focus on getting that tested and promoted, and then adding the new API and second phase behind the second

We added API definitions in this EP but actually this EP's focus is independent from API fields. So conceptually in the next iteration, there won't be any changes in here (maybe slight changes).

Given you are now talking about GA in 4.23, that means you'll want that API TP in 4.22 ideally, I'd suggest starting to flesh that out and having an API review here. You could wait until you've completed the first phase in theory but it might delay timelines

Totally agree with you. I'm aware that our current situation is not great :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed I'd appreciate if you can have a look at this EP, when you have a time. Even though we won't make any API changes in this version, your feedback will be valuable about the future. Thank you.

Copy link
Member Author

@ardaguclu ardaguclu Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were planning to use EncryptionType (KMS) without KMSConfig which is supposed to be deprecated. However, during our tests we've realized that this validation disallows empty KMSConfig.

Therefore, with respect to this EP, we'll need to have API change to loosen the validation.
cc: @benluddy

@benluddy
Copy link

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benluddy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2026
annotations:
encryption.apiserver.operator.openshift.io/mode: "kms"
data:
encryption.apiserver.operator.openshift.io-key: "<base64-encoded-kms-config>"
Copy link
Contributor

@ibihim ibihim Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary anymore as this is now a static file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. But we need to make controllers endpoint agnostic so that they can work in the future changes. There is a similar discussion #1900 (comment)

@ibihim
Copy link
Contributor

ibihim commented Jan 21, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2026
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any major concerns here about proceeding, but I think more detail is required about re-use of an existing API and what the migration strategy looks like for folks who already have this configured on upgrade, are we breaking their existing workflows?

kind: APIServer
spec:
encryption:
type: KMS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field already exists doesn't it? Can you explain somewhere what the current behaviour is so that we can see how it has changed with this EP? What happens to clusters already using this feature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review. I've updated the EP by mentioning this. Please let me know your thoughts.

Comment on lines +199 to +201
For Tech Preview v1, no new API fields are added to the APIServer resource.
Users simply set `encryption.type: KMS` ([EncryptionType](https://github.com/openshift/api/blob/6fb7fdae95fd20a36809d502cfc0e0459550d527/config/v1/types_apiserver.go#L214))
and deploy KMS plugins at the hardcoded endpoint `unix:///var/run/kmsplugin/kms.sock`. Current `KMSConfig` will not be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the current KMSConfig do today?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does nothing. There is no functionality listening KMSConfig today.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
Comment on lines 199 to 200
The `encryption.type` field already supports the `KMS` value ([EncryptionType](https://github.com/openshift/api/blob/master/config/v1/types_apiserver.go#L214)), and the `KMSConfig` struct exists in the API.
However, the encryption controllers do not implement KMS support. Setting `type: KMS` has no effect - controllers ignore it and no encryption occurs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but the existing fields are behind the KMSEncryptionProvider feature gate right? And that's only in devpreview/techpreview?

That means we can be more flexible in making changes. Worth double checking what I'm saying and calling this out in the EP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed
Copy link
Contributor

LGTM from the perspective of this is enough to get the first bit going.

IMO we should tombstone the existing KMS fields that are unimplemented and proceed with the v1 of the feature, with a view to considering how the v2 will need API changes in the future so that we can start planning those out/check how those would fit with tombstoning

@ardaguclu
Copy link
Member Author

ardaguclu commented Jan 22, 2026

LGTM from the perspective of this is enough to get the first bit going.

IMO we should tombstone the existing KMS fields that are unimplemented and proceed with the v1 of the feature, with a view to considering how the v2 will need API changes in the future so that we can start planning those out/check how those would fit with tombstoning

After noticing that we can't get rid of API changes in #1900 (comment), maybe in the meantime we can tombstone these fields as well. I'd like to hear thoughts from @benluddy

EDITED: We discussed with @p0lyn0mial as well. Due to the requirement of 4.21 backport, it would be simpler and less painful to only loosen the validation.

@ardaguclu
Copy link
Member Author

/hold cancel
It looks like we agree upon the overall design of this EP. Besides, I'll keep this EP up to date.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2026
@ibihim
Copy link
Contributor

ibihim commented Jan 26, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 26, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2026

@ardaguclu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 67e388d into openshift:master Jan 26, 2026
2 checks passed
@ardaguclu ardaguclu deleted the kms-encryption-controllers branch January 26, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants