Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3762: graduate PersistentVolumeLastPhaseTransitionTime to beta in v1.29 #4125

Merged

Conversation

RomanBednar
Copy link
Contributor

@RomanBednar RomanBednar commented Jul 17, 2023

  • One-line PR description: update KEP for PersistentVolumeLastPhaseTransitionTime feature to match actual implementation and graduate to beta

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 17, 2023
@k8s-ci-robot k8s-ci-robot requested a review from msau42 July 17, 2023 12:22
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jul 17, 2023
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 17, 2023
@wojtek-t
Copy link
Member

/assign @jsafrane

/sig storage

@RomanBednar RomanBednar changed the title KEP-3762: update KEP for PersistentVolume last phase transition time WIP: KEP-3762: update KEP for PersistentVolume last phase transition time Aug 3, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 3, 2023
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 12, 2023
@RomanBednar RomanBednar changed the title WIP: KEP-3762: update KEP for PersistentVolume last phase transition time KEP-3762: graduate PersistentVolumeLastPhaseTransitionTime to beta in v1.29 Sep 12, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2023
@RomanBednar
Copy link
Contributor Author

Test rollout-rollback-rollout

Perform pre-upgrade tests (1.27.5)

Create a PVC to provision a volume:

$ cat /tmp/pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-1
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-hostpath-sc
kc create -f /tmp/pvc.yaml

Verify the PV does not have lastPhaseTransitionTime set:

$ kc get pv/$(kc get pvc/pvc-1 -o json | jq '.spec.volumeName' | tr -d "\"")  -o json | jq  '.status.lastPhaseTransitionTime'
null

Upgrade cluster (1.27.5 -> 1.28.1)

Check available versions:

$ dnf search kubeadm --showduplicates --quiet | grep 1.28
kubeadm-1.28.0-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.28.1-0.x86_64 : Command-line utility for administering a Kubernetes cluster.

Upgrade kubeadm:

$ sudo dnf install -y kubeadm-1.28.1-0

Prepare config file that enables FeatureGate:

$ cat /tmp/config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: PersistentVolumeLastPhaseTransitionTime=true
controllerManager:
  extraArgs:
    cluster-cidr: 10.244.0.0/16
    feature-gates: PersistentVolumeLastPhaseTransitionTime=true

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /tmp/config.yaml
$ sudo kubeadm upgrade apply --config /tmp/config.yaml v1.28.1

Perform kubelet upgrade:

$ sudo dnf install -y kubelet-1.28.1-0
$ sudo systemctl daemon-reload 
$ sudo systemctl restart kubelet

Perform post-upgrade tests

Create a second PVC to provision a volume:

$ cat /tmp/pvc2.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-2
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-hostpath-sc
kc create -f /tmp/pvc2.yaml

Verify it has lastPhaseTransitionTime set:

$ kc get pv/$(kc get pvc/pvc-2 -o json | jq '.spec.volumeName' | tr -d "\"")  -o json | jq  '.status.lastPhaseTransitionTime'
"2023-09-12T08:53:09Z"

Change retain policy on the first PV to Retain:

$ kc get pv/pvc-0c9ea251-b156-4786-ac82-8713b76bb312
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM           STORAGECLASS      REASON   AGE
pvc-0c9ea251-b156-4786-ac82-8713b76bb312   1Gi        RWO            Retain           Bound    default/pvc-1   csi-hostpath-sc            52m

Delete PVC for the first volume to release the PV:

kc delete pvc/pvc-1

Verify the first (pre-upgrade) PVC transitioned phase and transition timestamp is now set:

$ kc get pv/pvc-f2eee26c-bca3-448b-9198-d4948f54dce3 -o json | jq '.status.phase'
"Released"

$ kc get pv/pvc-f2eee26c-bca3-448b-9198-d4948f54dce3 -o json | jq '.status.lastPhaseTransitionTime'
"2023-09-12T08:58:01Z"

Downgrade cluster (1.28.1 -> 1.27.5)

$ kc version -o json | jq '.serverVersion.gitVersion'
"v1.27.5"

Perform post-rollback tests

$ cat /tmp/pvc3.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-3
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 1Gi
  storageClassName: csi-hostpath-sc
kc create -f /tmp/pvc3.yaml

Verify new PV does not have lastPhaseTransitionTime set:

$ kc get pv/$(kc get pvc/pvc-3 -o json | jq '.spec.volumeName' | tr -d "\"")  -o json | jq  '.status.lastPhaseTransitionTime'
null

Verify lastPhaseTransitionTime of previous PVs can not be accessed anymore:

$ kc get pv/$(kc get pvc/pvc-2 -o json | jq '.spec.volumeName' | tr -d "\"")  -o json | jq  '.status.lastPhaseTransitionTime'
null

Verify lastPhaseTransitionTime can not be set manually:

$ kc patch pvc/pvc-3 -p '{"status":{"lastPhaseTransitionTime":"2023-09-11T13:07:09Z"}}'
Warning: unknown field "status.lastPhaseTransitionTime"
persistentvolumeclaim/pvc-3 patched (no change)

Upgrade cluster again (1.27.5 -> 1.28.1)

Install/update kubeadm:

$ sudo dnf install -y kubeadm-1.28.1-0

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /tmp/config.yaml
$ sudo kubeadm upgrade apply --config /tmp/config.yaml v1.28.1

Perform post-upgrade tests again

Verify timestamp is available again and unchanged on old PVs:

$ kc get pv/$(kc get pvc/pvc-2 -o json | jq '.spec.volumeName' | tr -d "\"")  -o json | jq  '.status.lastPhaseTransitionTime'
"2023-09-12T08:53:09Z"
$ kc get pv/pvc-f2eee26c-bca3-448b-9198-d4948f54dce3 -o json | jq '.status.lastPhaseTransitionTime'
"2023-09-12T08:58:01Z"

Change reclaim policy on exiting PV, release it and check lastPhaseTransitionTime is set correctly:

$ kc get pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -o json | jq '.spec.persistentVolumeReclaimPolicy'
"Delete"

$ kc patch pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
persistentvolume/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 patched

$ kc get pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -o json | jq '.spec.persistentVolumeReclaimPolicy'
"Retain"

$ kc get pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -o json | jq '.status.phase'
"Bound"

$ kc delete pvc/pvc-2
persistentvolumeclaim "pvc-2" deleted

$ kc get pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -o json | jq '.status.phase'
"Released"

$ kc get pv/pvc-2e55f2fd-b0dc-4c95-b8d5-085d16ee6d27 -o json | jq '.status.lastPhaseTransitionTime'
"2023-09-12T12:05:07Z"

$ date
Tue Sep 12 12:05:24 PM UTC 2023

@wojtek-t
Copy link
Member

The above test is great (I will look at details later), but can you put it into the KEP itself?

@wojtek-t wojtek-t self-assigned this Sep 13, 2023
@RomanBednar RomanBednar force-pushed the pv-phase-transition-time-update branch 2 times, most recently from aca086a to 2ae551c Compare September 13, 2023 13:22
@RomanBednar
Copy link
Contributor Author

The above test is great (I will look at details later), but can you put it into the KEP itself?

@wojtek-t Sure, moved.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few but very minor comments - overall this looks reasonable.

| on | on | New behavior.
Version skew is not applicable, KCM was not changed in scope of this enhancement.

| API server | KCM | Behavior |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the KCM column from the table - it doesn't matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

timestamp field when a volume transitions to a different phase. Also, if the feature gate is disabled the value must be
re-set to `nil` when updating or creating a volume.
We need to update API server to support the newly proposed field and set a value of the new timestamp field when a volume
transitions to a different phase. The timestamp field must be set to current time also for newly created volumes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change - thanks!


See "Upgrade / Downgrade Strategy" and "Notes/Constraints/Caveats" sections for more details.

###### Are there any tests for feature enablement/disablement?

Unit tests for enabling and disabling feature gate will be added when transitioning to beta. See "Graduation criteria"
section.
Unit tests for enabling and disabling feature gate will be added in alpha - see "Graduation criteria" section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were the tests added?

In fact this comment from the KEP template was introduced for exactly the cases like this:

Additionally, for features that are introducing a new API field, unit tests that
are exercising the `switch` of feature gate itself (what happens if I disable a
feature gate after having objects written with the new field) are also critical.
You can take a look at one potential example of such test in:
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've added a link to it.

@@ -676,13 +651,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

Rollout should not fail and there should be no need for rollback as this enhancement only adds a new field.
Also, rollback in terms of removal of this new field is not possible, once a PV is updated with the new field it can not
be removed after disabling the feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field won't be deleted automatically, but we can manually reset it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I'm adding a better explanation with examples.

@@ -676,13 +651,20 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

Rollout should not fail and there should be no need for rollback as this enhancement only adds a new field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well - not very likely, but one can imagine a some nil-pointer exception or sth like that (i.e. crashes of kube-apiserver when it tries to set it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, rewording a bit.

$ date
Tue Sep 12 12:05:24 PM UTC 2023
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - thanks a lot for thorough testing!

- Other field:
- [ ] Other (treat as last resort)
- Details:
- [X] API .status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 944: please add something like:

N/A - no SLI defined here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -846,6 +1074,8 @@ Describe them, providing:
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
-->

Yes, all PV objects will have an entirely new field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add that it's a timestamp, so together with the field name it will be < 50 bytes or so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two super minor comments - once applied this LGTM.

|-------------|-----------------------|
| off | Existing Kubernetes behavior.|
| on | New behavior. |
| off | Existing Kubernetes behavior. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove these two lines now - those are duplicates now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that, removing.

- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:
N/A - no SLI defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusion - I meant adding this for the question above (for SLO) - if we don't have SLIs, then SLO doesn't make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, moving.

@wojtek-t
Copy link
Member

/lgtm
/approve PRR

Thanks!

@RomanBednar - you still need SIG approval for it

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 20, 2023
Copy link
Member

@jsafrane jsafrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the checklist at the beginning of the KEP.
Very well written KEP otherwise!

is enabled, and persisted if already set and feature gate is disabled.

Feature enablement tests:
https://github.com/RomanBednar/kubernetes/blob/294f5c9a42fead4a4cc75340a6b9171c9c657b3e/pkg/registry/core/persistentvolume/strategy_test.go#L45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link should lead to kubernetes/kubernetes, the commit is already merged there

Copy link
Contributor Author

@RomanBednar RomanBednar Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link is wrong indeed - fixed. And checklist is updated too. Thank you for the review!

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2023
@jsafrane
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, RomanBednar, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2023
@k8s-ci-robot k8s-ci-robot merged commit 92aacea into kubernetes:master Sep 22, 2023
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants