Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-node/3673.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@ kep-number: 3673
alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
stable:
approver: "@wojtek-t"
53 changes: 20 additions & 33 deletions keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -383,21 +383,26 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
-->

A new node_e2e test with `serialize-image-pulls==false` will be added test parallel image pull limits.

1. When maxParallelImagePulls is reached, all further image pulls will be blocked.
2. Verify the behavior when the same image is pulled in parallel, which will happen when image pull policy is `Always`.

- <test>: <link to test coverage>
- pull image parallel test cases: https://github.com/kubernetes/kubernetes/blob/6c258fa74b2f0644a6b31a7ce3e613dda41effd4/test/e2e_node/image_pull_test.go

### Graduation Criteria

#### Alpha

- Initial e2e tests completed and enabled

#### Beta

- Gather feedback from developers and surveys
- Add e2e test to cover the parallel image pull case

#### GA

- Change the default value of `serialize-image-pulls` to false and set the default value of `maxParallelImagePulls` to 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - do we really want to change the default for "GA"?

With https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/5241-beta-featuregate-promotion-requirements, we generally want the GA to effectively be kind of "no-op". Changing the default might be a bit unexpected here.

I know that it doesn't explicitly affect the user (it may affect them implicitly because some pods startup (due to image pulling) may be slower/faster), but still it might not be intuitive.

Let me ping other PRR approvers about it for their thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, should we make this step as beta-2 for this KEP to change the default?

- Gather feedback from real-world usage from kubernetes vendors.
- Allowing time for feedback.

Expand Down Expand Up @@ -552,7 +557,7 @@ Any change of default behavior may be surprising to users or break existing
automations, so be extremely careful here.
-->

The change itself will not change any default behavior. The default behavior will only be changed when the user explicityly sets `maxParallelImagePulls` to a non-zero value.
Yes. The default value of `serialize-image-pulls` will be changed to false, and the default value of `maxParallelImagePulls` will be changed to 2.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
<!--
Expand All @@ -566,16 +571,16 @@ feature.
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
-->

The code change itself cannot be rolled back. But a user can roll back to the existing default behavior by setting `maxParallelImagePulls` to 0, or not setting it and letting it default to 0.

A user can roll back to the previous default behavior by setting `serialize-image-pulls` to true and restarting kubelet.
Similarly, setting `maxParallelImagePulls` to 1 will be equivalent to setting `serialize-image-pulls` to true.

###### What happens if we reenable the feature if it was previously rolled back?

Nothing will happen.

###### Are there any tests for feature enablement/disablement?

This feature is purely in-memory, so such test isn't really needed.
Yes, see e2e tests section above.

<!--
The e2e framework does not currently support enabling or disabling feature
Expand All @@ -595,35 +600,19 @@ https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05
<!--
This section must be completed when targeting beta to a release.
-->

N/A

###### How can a rollout or rollback fail? Can it impact already running workloads?

<!--
Try to be as paranoid as possible - e.g., what if some components will restart
mid-rollout?

Be sure to consider highly-available clusters, where, for example,
feature flags will be enabled on some API servers and not others during the
rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

This is an opt-in feature, and it does not change any default behavior. If there is any bug in this feature, image pulls might fail.
No running workloads will be imapcted.
No running workloads will be impacted.

Note that when changing MaxParallelImagePulls, kubelet restart is required. Since the parallel image pull counter
is maintained in memory, restarting kubelet will reset the counter and potentially allow more image pulls than the limit.

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

In worst case, image pulls might fail. Users can monitor image pull k8s events and `runtime_operations_errors_total` metric to see if there is an increase
of image pull failures.
of image pull failures.

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Expand All @@ -633,7 +622,8 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

This is an opt-in feature, and it does not change any default behavior. We manually tested enabling and disabling this feature by changing kubelet config and
See e2e tests section above. The upgrade->downgrade->upgrade path needs multiple kubelet restarts and is not quite necessary for this feature.
We manually tested enabling and disabling this feature by changing kubelet config and
restarting kubelet.

The manual test steps are as following:
Expand All @@ -644,20 +634,17 @@ The manual test steps are as following:
serializeImagePulls: false
maxParallelImagePulls: 2
```
3. Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior.
4. Observe the k8s events by running `kubectl get events`, and observe that exactly two images finish pulling first, and then the remaining one image finishes.
5. Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the `serializeImagePulls` entry and `maxParallelImagePulls` entry.
6. Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3.
7. Observe the k8s events by running `kubectl get events`, and observe that exactly one image finishes pulling first, and then the remaining one image finishes.


1. Deploy three pods, each with a different container image to the one-node cluster. All the three images are 5GB. The relatively-big size makes sure there is enough time between image pulling events, and makes it easier for us to observe the behavior.
2. Observe the k8s events by running `kubectl get events`, and observe that exactly two images finish pulling first, and then the remaining one image finishes.
3. Manually change the MaxParallelImagePulls setting by SSH-ing to the node again and removing the `serializeImagePulls` entry and `maxParallelImagePulls` entry.
4. Deploy two pods, each with a different container image to the cluster. Both of the two images are 5GB, and they are different images from the three images deployed in step 3.
5. Observe the k8s events by running `kubectl get events`, and observe that exactly one image finishes pulling first, and then the remaining one image finishes.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

<!--
Even if applying deprecation policies, they may still surprise some users.
-->

No.

### Monitoring Requirements
Expand Down
8 changes: 4 additions & 4 deletions keps/sig-node/3673-kubelet-parallel-image-pull-limit/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,22 @@ authors:
owning-sig: sig-node
status: implementable
creation-date: 2023-01-05
last-updated: 2024-10-31
last-updated: 2025-10-09
reviewers:
- "@SergeyKanzhelev"
approvers:
- "@mrunalp"

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
stage: stable

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.32"
latest-milestone: "v1.35"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.27"
beta: "v1.32"
stable: "v1.34"
stable: "v1.35"