diff --git a/keps/prod-readiness/sig-node/3673.yaml b/keps/prod-readiness/sig-node/3673.yaml index 92a9cf61dd0..ee8f3af7e03 100644 --- a/keps/prod-readiness/sig-node/3673.yaml +++ b/keps/prod-readiness/sig-node/3673.yaml @@ -2,4 +2,6 @@ kep-number: 3673 alpha: approver: "@wojtek-t" beta: + approver: "@wojtek-t" +stable: approver: "@wojtek-t" \ No newline at end of file diff --git a/keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md b/keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md index 75ff7d64d1e..76eb3b2aa67 100644 --- a/keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md +++ b/keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md @@ -383,21 +383,25 @@ We expect no non-infra related flakes in the last month as a GA graduation crite --> A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits. + 1. When maxParallelImagePulls is reached, all further image pulls will be blocked. 2. Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is `Always`. -- : +- pull image parallel test cases: https://github.com/kubernetes/kubernetes/blob/6c258fa74b2f0644a6b31a7ce3e613dda41effd4/test/e2e_node/image_pull_test.go ### Graduation Criteria #### Alpha + - Initial e2e tests completed and enabled #### Beta + - Gather feedback from developers and surveys - Add e2e test to cover the parallel image pull case #### GA + - Gather feedback from real-world usage from kubernetes vendors. - Allowing time for feedback. @@ -566,8 +570,8 @@ feature. NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`. --> -The code change itself cannot be rolled back. But a user can roll back to the existing default behavior by setting `maxParallelImagePulls` to 0, or not setting it and letting it default to 0. - +A user can roll back to the previous default behavior by setting `serialize-image-pulls` to true and restarting kubelet. +Similarly, setting `maxParallelImagePulls` to 1 will be equivalent to setting `serialize-image-pulls` to true. ###### What happens if we reenable the feature if it was previously rolled back? @@ -575,7 +579,7 @@ Nothing will happen. ###### Are there any tests for feature enablement/disablement? -This feature is purely in-memory, so such test isn't really needed. +Yes, see e2e tests section above. - +N/A ###### How can a rollout or rollback fail? Can it impact already running workloads? - - -This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail. -No running workloads will be imapcted. +No running workloads will be impacted. Note that when changing MaxParallelImagePulls, kubelet restart is required. Since the parallel image pull counter is maintained in memory, restarting kubelet will reset the counter and potentially allow more image pulls than the limit. ###### What specific metrics should inform a rollback? - - In worst case, image pulls might fail. Users can monitor image pull k8s events and `runtime_operations_errors_total` metric to see if there is an increase -of image pull failures. +of image pull failures. ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? @@ -633,7 +621,8 @@ Longer term, we may want to require automated upgrade/rollback tests, but we are missing a bunch of machinery and tooling and can't do that now. --> -This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet config and +See e2e tests section above. The upgrade->downgrade->upgrade path needs multiple kubelet restarts and is not quite necessary for this feature. +We manually tested enabling and disabling this feature by changing kubelet config and restarting kubelet. The manual test steps are as following: @@ -644,20 +633,17 @@ The manual test steps are as following: serializeImagePulls: false maxParallelImagePulls: 2 ``` -3. Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior. -4. Observe the k8s events by running `kubectl get events`, and observe that exactly two images finish pulling first, and then the remaining one image finishes. -5. Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the `serializeImagePulls` entry and `maxParallelImagePulls` entry. -6. Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3. -7. Observe the k8s events by running `kubectl get events`, and observe that exactly one image finishes pulling first, and then the remaining one image finishes. - - +1. Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior. +2. Observe the k8s events by running `kubectl get events`, and observe that exactly two images finish pulling first, and then the remaining one image finishes. +3. Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the `serializeImagePulls` entry and `maxParallelImagePulls` entry. +4. Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3. +5. Observe the k8s events by running `kubectl get events`, and observe that exactly one image finishes pulling first, and then the remaining one image finishes. ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? - No. ### Monitoring Requirements diff --git a/keps/sig-node/3673-kubelet-parallel-image-pull-limit/kep.yaml b/keps/sig-node/3673-kubelet-parallel-image-pull-limit/kep.yaml index 15a8d8d1a64..5a5e44444c3 100644 --- a/keps/sig-node/3673-kubelet-parallel-image-pull-limit/kep.yaml +++ b/keps/sig-node/3673-kubelet-parallel-image-pull-limit/kep.yaml @@ -6,22 +6,22 @@ authors: owning-sig: sig-node status: implementable creation-date: 2023-01-05 -last-updated: 2024-10-31 +last-updated: 2025-10-09 reviewers: - "@SergeyKanzhelev" approvers: - "@mrunalp" # The target maturity stage in the current dev cycle for this KEP. -stage: beta +stage: stable # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.32" +latest-milestone: "v1.35" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: alpha: "v1.27" beta: "v1.32" - stable: "v1.34" + stable: "v1.35"