From c6afdd4d83c8682ba06b73da3b24fed84966873d Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Fri, 3 Jun 2022 16:18:35 +0200 Subject: [PATCH 01/10] Add KEP for Retroactive default StorageClass assignment --- .../README.md | 711 ++++++++++++++++++ .../kep.yaml | 49 ++ 2 files changed, 760 insertions(+) create mode 100644 keps/sig-storage/3333-reconcile-default-storage-class/README.md create mode 100644 keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md new file mode 100644 index 00000000000..380d9a5953a --- /dev/null +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -0,0 +1,711 @@ +# KEP-3333: Retroactive default StorageClass assignment + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Behavior change](#behavior-change) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [New annotation](#new-annotation) + - [Bind to "" PVs at least once](#bind-to--pvs-at-least-once) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +We intend to change behaviour of default storage class assignment to be +retroactive for existing persistent volume claims without any storage class +assigned. This changes the existing Kubernetes behaviour slightly, which is +further described in sections below. + +A PVC with `pvc.spec.storageClassName=nil` will **always** get a default storage +class, regardless if the storage class is created before or after the PVC +is created. + +## Motivation + +1. It’s hard to mark a different SC as the default one. Cluster admin can + choose between two bad solutions: + + 1. Cluster has two default SCs for a short time, i.e. admin marks the new + default SC as default and then marks the old default SC as non-default. When + there are two default SCs in a cluster, the PVC admission plugin refuses to + accept new PVCs with `pvc.spec.storageClassName = nil`. Hence, cluster users + may get errors when creating PVCs at the wrong time. They must know it’s a + transient error and retry later. + + 2. Cluster has no default SC for a short time, i.e. admin marks the old + default SC as non-default and then marks the new default SC as default. + Since there is no default SC for some time, PVCs with + `pvc.spec.storageClassName = nil` created during this time will not get any + SC and are Pending forever. Users must be smart enough to delete the PVC and + re-create it. + +2. When users want to change the default SC parameters, they must delete the SC + and re-create it, Kubernetes API does not allow change in the SC. So there is no + default SC for some time and second case above applies here too. + +3. Defined ordering during cluster installation. Kubernetes cluster installation + tools must be currently smart enough to create a default SC before starting + anything that may create PVCs that need it. If such a tool supports multiple + cloud providers, storage backends and addons that require storage (such an image + registry), it may be quite complicated to do the ordering right. + +4. If a default storage class does not exist, PVCs with + `pvc.spec.storageClassName=nil` are bound to PVs with + `pv.spec.storageClassName=""`. This can be confusing to users, the PVC is bound + to different PVs or dynamically provisioned, depending on if the default SC + exists or not at the time when the PVC is created. + +### Goals + +* Loosen ordering requirements between PVCs and default SC creation. +* Improve user experience with `pvc.spec.storageClassName=nil`. + +### Non-Goals + +* Introduce new API for default SCs. + +## Proposal + +Right now, the default SC is applied to PVCs in an admission plugin, i.e. only +during PVC creation. Later on, PV controller will bind PVCs with +`pvc.spec.storageClassName=nil` to PVs with `pv.spec.storageClassName=""`, +if such a PV exists. + +We want to change the existing `pvc.spec.storageClassName=nil` behavior +to always require a default SC. It would never bind to a PV with +`pv.spec.storageClassName=""` and it would be `Pending` until a default SC +is created. When a default SC is created, the PVC `pvc.spec.storageClassName` +will be updated with the new default SC name in the PV controller. + +This behavior should be simpler from the user perspective. + +We plan to re-use the existing `storageclass.kubernetes.io/is-default-class` +SC annotation. + +### User Stories (Optional) + + +#### Story 1 + +Admin needs to change the default SC from SC1 to SC2 +1. The admin marks the current default SC1 as non-default. +2. Another user creates PVC requesting a default SC, by leaving + `pvc.spec.storageClassName=nil`. The default SC does not exist at + this point, therefore the admission plugin leaves the PVC untouched with + `pvc.spec.storageClassName=nil`. +3. The admin marks SC2 as default. +4. PV controller, when reconciling the PVC, updates + `pvc.spec.storageClassName=nil` to the new SC2. +5. PV controller uses the new SC2 when binding / provisioning the PVC. + +#### Story 2 + +An installation tool wants to install Kubernetes with a CSI driver providing +a default SC and an application that wants to use it (such as image registry). + +1. The installer creates PVC for the image registry first, requesting the + default storage class by leaving `pvc.spec.storageClassName=nil`. +2. The installer creates a default SC. +3. PV controller, when reconciling the PVC, updates + `pvc.spec.storageClassName=nil` to the new default SC. +4. PV controller uses the new default SC when binding / provisioning the PVC. + +### Notes/Constraints/Caveats (Optional) + +#### Behavior change + +Currently, when a default SC is not available at PVC creation time, a PVC +requesting a default SC (`pvc.spec.storageClassName=nil`) will keep +`pvc.spec.storageClassName=nil` forever. Such PVC can be bound only to PV with +`pvc.spec.storageClassName=""`. + +With the new behavior, the PV controller will wait until a default SC exists. +This may break an existing cluster that depends on the behavior described above. +We expect that the new behavior is better than the existing one. Users that +want to bind their PVC to PVs with `pv.spec.storageClassName=""` can create +PVCs explicitly requesting `pvc.spec.storageClassName=""`. + +### Risks and Mitigations + +This KEP changes current Kubernetes behavior as discussed above. + +Risk: Users depend on existing Kubernetes behavior. + +Mitigation: + +* Document the behavior change with a release note. + +* Suggest users that expect to bind to PVs with `storageClassName=""` to create + PVCs with `storageClassName=""` and not `storageClassName=nil` in Kubernetes + docs. + +## Design Details + +This KEP requires changes in: +* kube-controller-manager / its PV controller: + * Binding of PVCs with `pvc.spec.storageClassName=nil` must be changed to + ignore PVs with `pv.spec.storageClassName=""`. + * `pvc.spec.storageClassName=nil` in PVC must be reconciled to a current + default SC, if it exists, or after it's created. + +* kube-apiserver / PVC update validations: + * PVC update from `pvc.spec.storageClassName=nil` to + `pvc.spec.storageClassName=` is now forbidden, and it + must be allowed for this KEP to work. + +### Test Plan + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + +Kubernetes already has a good coverage of PV/PVC binding tests in +[test/e2e/storage/persistent_volumes.go](https://github.com/kubernetes/kubernetes/blob/ccfac6d3200f63656575819e7b5976c12c3019a6/test/e2e/storage/persistent_volumes.go) + +There are no e2e tests that cover behavior of the default SC presence / absence, +they will be added only with the new behavior. + +##### Unit tests + +All changes should be only in the packages below, which have enough unit test +coverage, we will add only unit tests for the new or changed code. + +- `pkg/controller/volume/persistentvolume`: 2022-06-03 - 79% +- `pkg/apis/core/validation/`: 2022-06-03 - 82% + +##### Integration tests + + + +- test/integration/volume/persistent_volumes_test.go: Has few cases for PV/PVC + binding. + +We plan to extend these test to include default SC and how it will be applied to +existing PVCs. We use integration tests almost like e2e tests of the new +behavior to be sure that change of the default SC won't affect other tests +running in parallel. + +##### e2e tests + + + +There are no tests that would check how the default SC, because it would need to +create/delete default SCs, which could affect other tests running in parallel +that may expect that a default SC exists (typically StatefulSet e2e tests). + +We plan to add a few `[Disruptive]` `[Serial]` tests with the new behavior as +"smoke" tests of the new behavior, still, most of the tests will be integration +ones. + +### Graduation Criteria + +#### Alpha + +- Feature implemented behind a feature flag. +- Initial integration tests completed and enabled. + +#### Beta + +- Implement and enable e2e tests, visible in Testgrid and linked in KEP. + +#### GA + +- No users complaining about the new behavior. +- Scalability tests with existing framework (Kubemark?, TBD). +- Allowing time for feedback (at least 2 releases between beta and GA). +- No conformance tests, since we don't test StorageClasses there. +- Manually test version skew between the API server and KCM. See the expected behavior below in Version Skew Strategy. + +#### Deprecation + +- Announce deprecation and support policy of the existing flag. +- Two versions passed since introducing the functionality that deprecates the + flag (to address version skew). +- Address feedback on usage/changed behavior, provided on GitHub issues. +- Deprecate the flag. + +### Upgrade / Downgrade Strategy + +No change in cluster upgrade / downgrade process. + +### Version Skew Strategy + +This feature is implemented only in the API server and KCM and controlled by +`RetroactiveDefaultStorageClass` feature gate. Following cases may happen: + +| API server | KCM | behavior | +|------------|-----|----------------------------------------------------------------------------------------------------------------------------------| +| off | off | Existing Kubernetes behavior. | +| on | off| Existing Kubernetes behavior, only users can change `pvc.spec.storageClassName=nil` to a SC name. | +| off | on | PV controller may try to change `pvc.spec.storageClassName=nil` to a new default SC name, which will fail on the API server. (*) | +| on | on | New behavior. | + +*) For this case, we strongly suggest that the feature is enabled in the API +server first and then in KCM. Similarly, the feature should be disabled in KCM +first and then in the API server. This follows generic Kubernetes version skew +support. + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +###### Does enabling the feature change any default behavior? + + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +###### What happens if we reenable the feature if it was previously rolled back? + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + +### New annotation + +Keep the existing behavior of `storageclass.kubernetes.io/is-default-class` +annotation as it is, but add a new SC annotation, say +`storageclass.kubernetes.io/default-is-retroactive`, that will retroactively +mark all existing PVCs with `storageClassName: nil` to the new default SC. + +This way, we don't break existing Kubernetes behavior, trading it for a bigger +complexity on user side - they need to use two annotations instead of one. + +We assume that more users will want the new behavior introduced in this KEP +than users that want to keep the existing behavior. + +### Bind to `""` PVs at least once + +Today, if there is no default SC, a PVC with `pvc.spec.storageClassName=nil` +will be bound to PV with `pv.spec.storageClassName=""`. +To keep at least part of this behavior, PV controller could try to bind such +PVCs to PV at least once and only after that change +`pvc.spec.storageClassName` to a newly created SC. + +We find this behavior to be too complicated for users - a PVC with +`pvc.spec.storageClassName=nil`: +* Would be dynamically provisioned by a default + SC, if the SC exists at the point when PVC is created. +* Or, when the default SC does not exist at the time of PVC creation, then + it would be bound to existing PV with `pv.spec.storageClassName=""`. +* Or, when such PV does not exist, then provisioned by a newly created default + SC. + +We think that using `pvc.spec.storageClassName=nil` *only* for a default SC, +regardless when the SC or PVC is created, is more robust user experience. + +## Infrastructure Needed (Optional) + + \ No newline at end of file diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml new file mode 100644 index 00000000000..572e899d935 --- /dev/null +++ b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml @@ -0,0 +1,49 @@ +title: Retroactive default StorageClass assignement +kep-number: 3333 +authors: + - "@RomanBednar" +owning-sig: sig-storage +participating-sigs: +status: provisional +creation-date: 2022-06-03 +reviewers: + - "@jsafrane" +approvers: + - TBD + +##### WARNING !!! ###### +# prr-approvers has been moved to its own location +# You should create your own in keps/prod-readiness +# Please make a copy of keps/prod-readiness/template/nnnn.yaml +# to keps/prod-readiness/sig-xxxxx/00000.yaml (replace with kep number) +#prr-approvers: + +see-also: +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.25" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.25" + beta: "v1.26" + stable: "v1.28" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: RetroactiveDefaultStorageClass + components: + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: +# TBD +# - my_feature_metric From 03d09b598572ce3adc0b9cf1ce00f832bc15354a Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Mon, 6 Jun 2022 11:43:11 +0200 Subject: [PATCH 02/10] Address review comments --- .../README.md | 35 +++++++++++++++++-- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index 380d9a5953a..b5a831e300f 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -78,6 +78,20 @@ is created. ## Motivation +When user needs to provision a storage they create a PVC to request a volume. +A control loop looks for any new PVCs and based on current state of the +cluster the volume will be provided using one of the following methods: + +* Static provisioning - PVC did not specify any storage class and there is + already an existing PV that can be bound to it. Alternatively users can set + `pvc.spec.storageClassName=""` to disable dynamic provisioning explicitly. +* Dynamic provisioning - there is no existing PV that could be bound but PVC + did specify a storage class or there is exactly one storage class in the + cluster marked as default. + +Considering the "normal" operation described above there are additional +cases that can be problematic: + 1. It’s hard to mark a different SC as the default one. Cluster admin can choose between two bad solutions: @@ -86,7 +100,7 @@ is created. there are two default SCs in a cluster, the PVC admission plugin refuses to accept new PVCs with `pvc.spec.storageClassName = nil`. Hence, cluster users may get errors when creating PVCs at the wrong time. They must know it’s a - transient error and retry later. + transient error and manually retry later. 2. Cluster has no default SC for a short time, i.e. admin marks the old default SC as non-default and then marks the new default SC as default. @@ -133,6 +147,9 @@ to always require a default SC. It would never bind to a PV with is created. When a default SC is created, the PVC `pvc.spec.storageClassName` will be updated with the new default SC name in the PV controller. +Any PVs with `pv.spec.storageClassName=""` will be able to bind only to a PVC +with `pvc.spec.storageClassName=""`. + This behavior should be simpler from the user perspective. We plan to re-use the existing `storageclass.kubernetes.io/is-default-class` @@ -140,7 +157,6 @@ SC annotation. ### User Stories (Optional) - #### Story 1 Admin needs to change the default SC from SC1 to SC2 @@ -166,6 +182,18 @@ a default SC and an application that wants to use it (such as image registry). `pvc.spec.storageClassName=nil` to the new default SC. 4. PV controller uses the new default SC when binding / provisioning the PVC. + +#### Story 3 (current behavior) + +User wants to provision a volume and there is one default storage class set by +admin. + +1. Admin creates a default storage class. +2. Another user creates PVC requesting a default SC, by leaving + `pvc.spec.storageClassName=nil`. Since the default SC already exists + the admission plugin changes the `nil` to a name of the default storage + class. + ### Notes/Constraints/Caveats (Optional) #### Behavior change @@ -179,7 +207,8 @@ With the new behavior, the PV controller will wait until a default SC exists. This may break an existing cluster that depends on the behavior described above. We expect that the new behavior is better than the existing one. Users that want to bind their PVC to PVs with `pv.spec.storageClassName=""` can create -PVCs explicitly requesting `pvc.spec.storageClassName=""`. +PVCs explicitly requesting `pvc.spec.storageClassName=""` to provision those +PVs statically. ### Risks and Mitigations From 4756dd6f33d462c31dc129f1c6fd516f0b2a1ca8 Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Mon, 6 Jun 2022 12:14:51 +0200 Subject: [PATCH 03/10] Add PRR --- .../README.md | 152 ++++-------------- .../kep.yaml | 2 + 2 files changed, 36 insertions(+), 118 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index b5a831e300f..af688a27386 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -10,6 +10,7 @@ - [User Stories (Optional)](#user-stories-optional) - [Story 1](#story-1) - [Story 2](#story-2) + - [Story 3 (current behavior)](#story-3-current-behavior) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Behavior change](#behavior-change) - [Risks and Mitigations](#risks-and-mitigations) @@ -334,7 +335,7 @@ No change in cluster upgrade / downgrade process. This feature is implemented only in the API server and KCM and controlled by `RetroactiveDefaultStorageClass` feature gate. Following cases may happen: -| API server | KCM | behavior | +| API server | KCM | Behavior | |------------|-----|----------------------------------------------------------------------------------------------------------------------------------| | off | off | Existing Kubernetes behavior. | | on | off| Existing Kubernetes behavior, only users can change `pvc.spec.storageClassName=nil` to a SC name. | @@ -348,48 +349,14 @@ support. ## Production Readiness Review Questionnaire - - ### Feature Enablement and Rollback - - ###### How can this feature be enabled / disabled in a live cluster? - - -- [ ] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: - - Components depending on the feature gate: +- [X] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: RetroactiveDefaultStorageClass + - Components depending on the feature gate: kube-apiserver, + kube-controller-manager - [ ] Other - Describe the mechanism: - Will enabling / disabling the feature require downtime of the control @@ -399,43 +366,27 @@ well as the [existing list] of feature gates. ###### Does enabling the feature change any default behavior? - +Yes. See "Behavior change" section above for details. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? - +Yes. It has to be disabled in a reverse order of enabling the feature - +first disable the feature in KCM then in API server. See "Version Skew +Strategy" section above for more details. ###### What happens if we reenable the feature if it was previously rolled back? +No issues are expected. The case is exactly the same as when the feature is +enabled for the first time. + ###### Are there any tests for feature enablement/disablement? - +Unit tests will cover feature enablement/disablement. ### Rollout, Upgrade and Rollback Planning +Will be provided when graduating to beta. + @@ -583,67 +534,37 @@ previous answers based on experience in the field. ###### Will enabling / using this feature result in any new API calls? - +Yes. + +- API call type: PATCH PVC +- estimated throughput: low, only once for PVCs that have + `pvc.spec.storageClassName=nil` +- originating component(s): kube-controller-manager ###### Will enabling / using this feature result in introducing new API types? - +No. ###### Will enabling / using this feature result in any new calls to the cloud provider? - +No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - +No. ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? - +With current behavior the PVC state would be stuck in `Pending` state so the +pod would not be scheduled at all. ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? - +PV controller already has all the informers it will need for this change to +be implemented. ### Troubleshooting @@ -689,12 +610,11 @@ Major milestones might include: - the version of Kubernetes where the KEP graduated to general availability - when the KEP was retired or superseded --> +- 1.25: initial version ## Drawbacks - +See "Behavior change" section above. ## Alternatives @@ -733,8 +653,4 @@ regardless when the SC or PVC is created, is more robust user experience. ## Infrastructure Needed (Optional) - \ No newline at end of file +Not needed. \ No newline at end of file diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml index 572e899d935..293d9388498 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml +++ b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml @@ -8,6 +8,7 @@ status: provisional creation-date: 2022-06-03 reviewers: - "@jsafrane" + - "@xing-yang" approvers: - TBD @@ -41,6 +42,7 @@ feature-gates: - name: RetroactiveDefaultStorageClass components: - kube-controller-manager + - kube-apiserver disable-supported: true # The following PRR answers are required at beta release From 7a3358681aa4f7cef02d907aea5fbade1752b503 Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Mon, 6 Jun 2022 13:01:42 +0200 Subject: [PATCH 04/10] Add PRR approval file --- keps/prod-readiness/sig-storage/3333.yaml | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 keps/prod-readiness/sig-storage/3333.yaml diff --git a/keps/prod-readiness/sig-storage/3333.yaml b/keps/prod-readiness/sig-storage/3333.yaml new file mode 100644 index 00000000000..bb72228af33 --- /dev/null +++ b/keps/prod-readiness/sig-storage/3333.yaml @@ -0,0 +1,3 @@ +kep-number: 3333 +alpha: + approver: "@wojtek-t" \ No newline at end of file From c53beb7fe4c4aa7c1e67f278f51f3a5d58d1286d Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Wed, 8 Jun 2022 13:25:38 +0200 Subject: [PATCH 05/10] Address review comments 2 --- .../README.md | 24 +++++++++++-------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index af688a27386..34c4de614e6 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -151,7 +151,8 @@ will be updated with the new default SC name in the PV controller. Any PVs with `pv.spec.storageClassName=""` will be able to bind only to a PVC with `pvc.spec.storageClassName=""`. -This behavior should be simpler from the user perspective. +This behavior should be simpler from the user perspective, without the need to +change PV admission plugin. We plan to re-use the existing `storageclass.kubernetes.io/is-default-class` SC annotation. @@ -291,9 +292,10 @@ https://storage.googleapis.com/k8s-triage/index.html We expect no non-infra related flakes in the last month as a GA graduation criteria. --> -There are no tests that would check how the default SC, because it would need to -create/delete default SCs, which could affect other tests running in parallel -that may expect that a default SC exists (typically StatefulSet e2e tests). +There are no tests that would check how the default SC works, because it would +need to create/delete default SCs, which could affect other tests running in +parallel that may expect that a default SC exists (typically StatefulSet e2e +tests). We plan to add a few `[Disruptive]` `[Serial]` tests with the new behavior as "smoke" tests of the new behavior, still, most of the tests will be integration @@ -309,14 +311,15 @@ ones. #### Beta - Implement and enable e2e tests, visible in Testgrid and linked in KEP. +- Scalability tests with existing framework ([perf-tests](https://github.com/kubernetes/perf-tests/tree/81c96c34e3c1f11c5fe91744ef7ce7bf44c2fe5c/clusterloader2/testing/experimental/storage/pod-startup/volume-types/persistentvolume)). +- Allowing time for feedback (at least 2 releases between beta and GA). +- No conformance tests, since we don't test StorageClasses there. +- Manually test version skew between the API server and KCM. See the expected + behavior below in Version Skew Strategy. #### GA - No users complaining about the new behavior. -- Scalability tests with existing framework (Kubemark?, TBD). -- Allowing time for feedback (at least 2 releases between beta and GA). -- No conformance tests, since we don't test StorageClasses there. -- Manually test version skew between the API server and KCM. See the expected behavior below in Version Skew Strategy. #### Deprecation @@ -555,8 +558,9 @@ No. ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? -For WIP SLI "Startup latency of schedulable stateful pods" there can be one -extra API call in case there is no default SC while creating a PVC. +For WIP SLI ["Startup latency of schedulable stateful pods"](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md#definition) +there can be one extra API call in case there is no default SC while creating a +PVC. With current behavior the PVC state would be stuck in `Pending` state so the pod would not be scheduled at all. From cd609a8d02c2fe01ab0754ff09da4b61fdb6851e Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Thu, 9 Jun 2022 13:56:54 +0200 Subject: [PATCH 06/10] Address review comments 3 --- .../README.md | 24 ++++++++++++------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index 34c4de614e6..6706dbf1071 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -69,13 +69,13 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary We intend to change behaviour of default storage class assignment to be -retroactive for existing persistent volume claims without any storage class -assigned. This changes the existing Kubernetes behaviour slightly, which is -further described in sections below. +retroactive for existing *unbound* persistent volume claims without any storage +class assigned. This changes the existing Kubernetes behaviour slightly, which +is further described in sections below. A PVC with `pvc.spec.storageClassName=nil` will **always** get a default storage -class, regardless if the storage class is created before or after the PVC -is created. +class, regardless if the storage class is created before or after the PVC is +created. ## Motivation @@ -209,8 +209,8 @@ With the new behavior, the PV controller will wait until a default SC exists. This may break an existing cluster that depends on the behavior described above. We expect that the new behavior is better than the existing one. Users that want to bind their PVC to PVs with `pv.spec.storageClassName=""` can create -PVCs explicitly requesting `pvc.spec.storageClassName=""` to provision those -PVs statically. +PVCs explicitly requesting `pvc.spec.storageClassName=""` to bind the PVC to a +statically provisioned PV. ### Risks and Mitigations @@ -239,6 +239,8 @@ This KEP requires changes in: * PVC update from `pvc.spec.storageClassName=nil` to `pvc.spec.storageClassName=` is now forbidden, and it must be allowed for this KEP to work. + * Validation will still reject changing `pvc.spec.storageClassName` from any + other value than `nil`. ### Test Plan @@ -333,6 +335,10 @@ ones. No change in cluster upgrade / downgrade process. +But all PVCs with `pvc.spec.storageClassName=nil` that exist in a cluster prior +to update and are not bound will be updated with default storage class name if +there is one. + ### Version Skew Strategy This feature is implemented only in the API server and KCM and controlled by @@ -640,8 +646,8 @@ than users that want to keep the existing behavior. Today, if there is no default SC, a PVC with `pvc.spec.storageClassName=nil` will be bound to PV with `pv.spec.storageClassName=""`. To keep at least part of this behavior, PV controller could try to bind such -PVCs to PV at least once and only after that change -`pvc.spec.storageClassName` to a newly created SC. +PVCs to PV at least once and only after there are no such PVs left the +`pvc.spec.storageClassName` would change to a newly created SC. We find this behavior to be too complicated for users - a PVC with `pvc.spec.storageClassName=nil`: From e1b4d81803ccc502dc75462d5cc9771d883b9f1c Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Fri, 10 Jun 2022 13:55:22 +0200 Subject: [PATCH 07/10] Address review comments 4 --- .../3333-reconcile-default-storage-class/README.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index 6706dbf1071..d7cd0f8036f 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -96,14 +96,14 @@ cases that can be problematic: 1. It’s hard to mark a different SC as the default one. Cluster admin can choose between two bad solutions: - 1. Cluster has two default SCs for a short time, i.e. admin marks the new + 1. Option 1: Cluster has two default SCs for a short time, i.e. admin marks the new default SC as default and then marks the old default SC as non-default. When there are two default SCs in a cluster, the PVC admission plugin refuses to accept new PVCs with `pvc.spec.storageClassName = nil`. Hence, cluster users may get errors when creating PVCs at the wrong time. They must know it’s a transient error and manually retry later. - 2. Cluster has no default SC for a short time, i.e. admin marks the old + 2. Option 2: Cluster has no default SC for a short time, i.e. admin marks the old default SC as non-default and then marks the new default SC as default. Since there is no default SC for some time, PVCs with `pvc.spec.storageClassName = nil` created during this time will not get any @@ -111,8 +111,12 @@ cases that can be problematic: re-create it. 2. When users want to change the default SC parameters, they must delete the SC - and re-create it, Kubernetes API does not allow change in the SC. So there is no - default SC for some time and second case above applies here too. + and re-create it, Kubernetes API does not allow change in the SC. So there is + no default SC for some time and second case above applies here too. + Re-creating the storage class to change parameters can be useful in cases + there is a quota set for the SC, and since the quota is coupled with SC name + users can not use Option 1 because second SC would have a different name + and so the existing quota would not apply to it. 3. Defined ordering during cluster installation. Kubernetes cluster installation tools must be currently smart enough to create a default SC before starting From 90737303738358c0e6decab885438725c7289b93 Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Mon, 13 Jun 2022 14:42:43 +0200 Subject: [PATCH 08/10] Rewrite KEP to alternative considered --- .../README.md | 85 +++++-------------- 1 file changed, 23 insertions(+), 62 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index d7cd0f8036f..5cc58eb1872 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -38,7 +38,7 @@ - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - [New annotation](#new-annotation) - - [Bind to "" PVs at least once](#bind-to--pvs-at-least-once) + - [Bind PVCs with nil will wait for default SC](#bind-pvcs-with--will-wait-for-default-sc) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional) @@ -73,10 +73,6 @@ retroactive for existing *unbound* persistent volume claims without any storage class assigned. This changes the existing Kubernetes behaviour slightly, which is further described in sections below. -A PVC with `pvc.spec.storageClassName=nil` will **always** get a default storage -class, regardless if the storage class is created before or after the PVC is -created. - ## Motivation When user needs to provision a storage they create a PVC to request a volume. @@ -84,8 +80,7 @@ A control loop looks for any new PVCs and based on current state of the cluster the volume will be provided using one of the following methods: * Static provisioning - PVC did not specify any storage class and there is - already an existing PV that can be bound to it. Alternatively users can set - `pvc.spec.storageClassName=""` to disable dynamic provisioning explicitly. + already an existing PV that can be bound to it. * Dynamic provisioning - there is no existing PV that could be bound but PVC did specify a storage class or there is exactly one storage class in the cluster marked as default. @@ -124,16 +119,9 @@ cases that can be problematic: cloud providers, storage backends and addons that require storage (such an image registry), it may be quite complicated to do the ordering right. -4. If a default storage class does not exist, PVCs with - `pvc.spec.storageClassName=nil` are bound to PVs with - `pv.spec.storageClassName=""`. This can be confusing to users, the PVC is bound - to different PVs or dynamically provisioned, depending on if the default SC - exists or not at the time when the PVC is created. - ### Goals * Loosen ordering requirements between PVCs and default SC creation. -* Improve user experience with `pvc.spec.storageClassName=nil`. ### Non-Goals @@ -141,25 +129,9 @@ cases that can be problematic: ## Proposal -Right now, the default SC is applied to PVCs in an admission plugin, i.e. only -during PVC creation. Later on, PV controller will bind PVCs with -`pvc.spec.storageClassName=nil` to PVs with `pv.spec.storageClassName=""`, -if such a PV exists. - -We want to change the existing `pvc.spec.storageClassName=nil` behavior -to always require a default SC. It would never bind to a PV with -`pv.spec.storageClassName=""` and it would be `Pending` until a default SC -is created. When a default SC is created, the PVC `pvc.spec.storageClassName` -will be updated with the new default SC name in the PV controller. - -Any PVs with `pv.spec.storageClassName=""` will be able to bind only to a PVC -with `pvc.spec.storageClassName=""`. - -This behavior should be simpler from the user perspective, without the need to -change PV admission plugin. - -We plan to re-use the existing `storageclass.kubernetes.io/is-default-class` -SC annotation. +Change PV controller behavior so it will assign a default SC to unbound PVCs +with `pvc.spec.storageClassName=nil` after any storage class is marked as +default. ### User Stories (Optional) @@ -209,33 +181,28 @@ requesting a default SC (`pvc.spec.storageClassName=nil`) will keep `pvc.spec.storageClassName=nil` forever. Such PVC can be bound only to PV with `pvc.spec.storageClassName=""`. -With the new behavior, the PV controller will wait until a default SC exists. -This may break an existing cluster that depends on the behavior described above. -We expect that the new behavior is better than the existing one. Users that -want to bind their PVC to PVs with `pv.spec.storageClassName=""` can create -PVCs explicitly requesting `pvc.spec.storageClassName=""` to bind the PVC to a -statically provisioned PV. +With the new behavior, the PV controller will keep binding PVCs with +`pvc.spec.storageClassName=nil` to PVs with `pv.spec.storageClassName=""` until +a default SC exists. After admin creates default storage class, the `nil` value +of the PVC will change to this SC name and the PVC will no longer bind to PVs +with `pv.spec.storageClassName=""`. ### Risks and Mitigations -This KEP changes current Kubernetes behavior as discussed above. - Risk: Users depend on existing Kubernetes behavior. Mitigation: * Document the behavior change with a release note. -* Suggest users that expect to bind to PVs with `storageClassName=""` to create - PVCs with `storageClassName=""` and not `storageClassName=nil` in Kubernetes - docs. +* Suggest users that the only guaranteed way to bind PVs with + `storageClassName=""` is to create PVCs with `storageClassName=""` and not + `storageClassName=nil`. ## Design Details This KEP requires changes in: * kube-controller-manager / its PV controller: - * Binding of PVCs with `pvc.spec.storageClassName=nil` must be changed to - ignore PVs with `pv.spec.storageClassName=""`. * `pvc.spec.storageClassName=nil` in PVC must be reconciled to a current default SC, if it exists, or after it's created. @@ -645,25 +612,19 @@ complexity on user side - they need to use two annotations instead of one. We assume that more users will want the new behavior introduced in this KEP than users that want to keep the existing behavior. -### Bind to `""` PVs at least once +### Bind PVCs with `nil` will wait for default SC Today, if there is no default SC, a PVC with `pvc.spec.storageClassName=nil` will be bound to PV with `pv.spec.storageClassName=""`. -To keep at least part of this behavior, PV controller could try to bind such -PVCs to PV at least once and only after there are no such PVs left the -`pvc.spec.storageClassName` would change to a newly created SC. - -We find this behavior to be too complicated for users - a PVC with -`pvc.spec.storageClassName=nil`: -* Would be dynamically provisioned by a default - SC, if the SC exists at the point when PVC is created. -* Or, when the default SC does not exist at the time of PVC creation, then - it would be bound to existing PV with `pv.spec.storageClassName=""`. -* Or, when such PV does not exist, then provisioned by a newly created default - SC. - -We think that using `pvc.spec.storageClassName=nil` *only* for a default SC, -regardless when the SC or PVC is created, is more robust user experience. + +We could simplify this by changing the behavior so that PVCs with +`pvc.spec.storageClassName=nil` would always bind to default SC or would get SC +assigned retroactively, and still keep part of the existing behavior by binding +PVs with `pv.spec.storageClassName=""` only to PVCs +with `pvc.spec.storageClassName=""`. + +This change would affect small group of users but might be still too breaking +for users who rely on these semantics. ## Infrastructure Needed (Optional) From 6d49829105e63c7916125c4c63c22e7963667040 Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Tue, 14 Jun 2022 13:29:33 +0200 Subject: [PATCH 09/10] Change status to implementable --- keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml index 293d9388498..3aa70557972 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml +++ b/keps/sig-storage/3333-reconcile-default-storage-class/kep.yaml @@ -4,7 +4,7 @@ authors: - "@RomanBednar" owning-sig: sig-storage participating-sigs: -status: provisional +status: implementable creation-date: 2022-06-03 reviewers: - "@jsafrane" From c6d69a727ebf56681a7d615c4f6234e4de86a5ec Mon Sep 17 00:00:00 2001 From: Roman Bednar Date: Wed, 15 Jun 2022 08:11:35 +0200 Subject: [PATCH 10/10] Address review comments 5 --- .../3333-reconcile-default-storage-class/README.md | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/keps/sig-storage/3333-reconcile-default-storage-class/README.md b/keps/sig-storage/3333-reconcile-default-storage-class/README.md index 5cc58eb1872..504f766039d 100644 --- a/keps/sig-storage/3333-reconcile-default-storage-class/README.md +++ b/keps/sig-storage/3333-reconcile-default-storage-class/README.md @@ -24,7 +24,6 @@ - [Alpha](#alpha) - [Beta](#beta) - [GA](#ga) - - [Deprecation](#deprecation) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) @@ -280,10 +279,10 @@ ones. - Feature implemented behind a feature flag. - Initial integration tests completed and enabled. +- Implement and enable e2e tests, visible in Testgrid and linked in KEP. #### Beta -- Implement and enable e2e tests, visible in Testgrid and linked in KEP. - Scalability tests with existing framework ([perf-tests](https://github.com/kubernetes/perf-tests/tree/81c96c34e3c1f11c5fe91744ef7ce7bf44c2fe5c/clusterloader2/testing/experimental/storage/pod-startup/volume-types/persistentvolume)). - Allowing time for feedback (at least 2 releases between beta and GA). - No conformance tests, since we don't test StorageClasses there. @@ -294,14 +293,6 @@ ones. - No users complaining about the new behavior. -#### Deprecation - -- Announce deprecation and support policy of the existing flag. -- Two versions passed since introducing the functionality that deprecates the - flag (to address version skew). -- Address feedback on usage/changed behavior, provided on GitHub issues. -- Deprecate the flag. - ### Upgrade / Downgrade Strategy No change in cluster upgrade / downgrade process.