Skip to content

Conversation

natasha41575
Copy link
Contributor

@natasha41575 natasha41575 commented Sep 22, 2025

  • One-line PR description: Graduate InPlacePodVerticalScaling to GA.
  • Other comments: We believe we are ready to graduate InPlacePodVerticalScaling to GA in kubernetes 1.35. See comments below for how we have met the graduation criteria, and more details in the KEP for other updates.

/sig node autoscaling scheduling

/cc @tallclair @dchen1107

/cc @dom4ha
for sig-scheduling

/cc @jackfrancis
for sig-autoscaling

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Sep 22, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 22, 2025
- Resize atomicity
- Exposing allocated resources in the pod status
- QOS class changes
- The subset of pod resize tests [here](https://github.com/kubernetes/kubernetes/blob/1aec2eb0030d2f121b4cf78998e9391d9389f1a0/test/e2e/common/node/pod_resize.go) under `doPodResizeTests` and `doPodResizeErrorTests` that meet the Conformance test requirements are promoted to Conformance.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a tracking bug for the Conformance endpoints: kubernetes/kubernetes#133607

@dom4ha
Copy link
Member

dom4ha commented Sep 23, 2025

/label lead-opted-in
/milestone v1.35

@natasha41575
Copy link
Contributor Author

/cc @wojtek-t

@jpbetz jpbetz mentioned this pull request Oct 1, 2025
95 tasks
@jpbetz
Copy link
Contributor

jpbetz commented Oct 1, 2025

/approve
For PRR (Everything appears to have been filled out by Beta, which is what I like to see!)

lack of support for resize is now a significant missing piece of that functionality; however
we don't believe this is a strong enough reason to block IPPR GA. We can, however, consider
whether this should block GA of pod level resources.
- `UpdatePodSandboxResources` is implemented by containerd & CRI-O. This is implemented by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note here that we will at least test this in e2e for stable. We need to validate that NRI plugins in future will be able to "decline" the resize by returning error from this method call.

Copy link
Contributor Author

@natasha41575 natasha41575 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced offline, a summary of the open question:

All errors returned by this call are currently ignored: https://github.com/kubernetes/kubernetes/blob/0a4651c9910533f4649b8a11c334cf23237b1ccc/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L792

I think what we need is a decision about whether or not we should continue ignoring this error in all cases, or if we should only ignore the error in the "unimplemented" case. @tallclair do you have context on why we ignore the error entirely? Can we change it to ignore only when call is not implemented by the CRI?

If we make that change, that will allow us to support and use CRIProxy to test the behavior that Sergey is talking about (that NRI plugins can intercept and block the resize call via UpdatePodSandboxResources).

Copy link
Contributor Author

@natasha41575 natasha41575 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev I thought more and realized actually I think the behavior that you desire for an NRI plugin to block a resize can be achieved by intercepting the UpdateContainerResources CRI call here: https://github.com/kubernetes/kubernetes/blob/0a4651c9910533f4649b8a11c334cf23237b1ccc/pkg/kubelet/kuberuntime/kuberuntime_container.go#L409.

If I recall correctly the UpdatePodSandboxResources was intended to be purely informative, my guess is that's why the decision was made that it should just stay as a best-effort call and the error logged and not bubbled up.

I will look more into this tomorrow, but I think what we actually want here is have coverage that an NRI plugin can decline the resize the resize by returning an error from UpdateContainerResources, not from UpdatePodSandboxResources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tallclair do you have context on why we ignore the error entirely? Can we change it to ignore only when call is not implemented by the CRI?
If errors are completely ignored, how can NRI plugins actually block resizes?

Copy link
Contributor Author

@natasha41575 natasha41575 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If errors are completely ignored, how can NRI plugins actually block resizes?

Errors from UpdatePodSandboxResources are ignored, but errors from UpdateContainerResources are not, so an NRI plugin can block resizes by intercepting UpdateContainerResources.

Can you add a note here that we will at least test this in e2e for stable. We need to validate that NRI plugins in future will be able to "decline" the resize by returning error from this method call.

I added a note under e2e tests that we will use CRI Proxy to test that NRI plugins can intercept UpdateContainerResources to block a resize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big difference between UpdatePodSandboxResources and UpdateContainerResources is that container cgroups are managed by the runtime, while Pod cgroups are managed by the Kubelet. Since the Kubelet is managing the cgroup and doesn't have direct feedback from NRI, we decided to make this an informational best-effort call.

If we wanted to allow errors to prevent the Kubelet from modifying the cgroup, I think this would be technically feasible, but we would need to change the order of calls. Currently, Kubelet modifies the cgroup before calling the CRI. I'm not sure if one is more technically correct than the other. An argument for calling CRI (and NRI, by extension) first is that it's more consistent with UpdateContainerResources NRI, which is called before modifying the cgroups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors from UpdatePodSandboxResources are ignored, but errors from UpdateContainerResources are not, so an NRI plugin can block resizes by intercepting UpdateContainerResources.

I'm not sure, but with Pod Level Resources there might not be an UpdateContainerResources call for all resizes.
/cc @ndixita

Copy link
Contributor Author

@natasha41575 natasha41575 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of offline discussion:

We'll remove UpdatePodSandboxResources from the graduation criteria of this KEP and move it to #5419. We can also stop ignoring the error from UpdatePodSandboxResources (when the containerd implementation is available).

Copy link
Member

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve for sig-scheduling

We agreed that races with scheduler should be out-of-scope of this feature. Races when handling individual pods are not very harmful as they usually require another round of scheduling.

Things will change when we consider the Workload-Aware Scheduling effort in which scheduler needs to perform scheduling of a group of pods and maintain its integrity.

At that time it will become critical to assure that the startup can be performed without races as well as the scheduler is in the loop for any preemption decisions when some workload might be affected.

This work does not block this GA promotion though.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 13, 2025
@k8s-ci-robot k8s-ci-robot requested a review from ndixita October 14, 2025 00:05
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I am happy with downscoping of this KEP and moving it to GA.

Among other things, what was originally in graduation criteria is still valueable to build:

  • VPA need to start taking advantage of this KEP. The IPPR KEP made changes to prevent known disruptions when resize is requested, however VPA is not using it and instead forcing the disruption when kubelet reporting it cannot do it in non-disruptive way.
  • Clarify the exact semantic on interaction with CRI. Container runtimes and NRI plugins MUST have a say in whether resize it allowed. This need to be improved in Pod-level resources resize KEP AND more changes may be needed long term.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2025
Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

In think the feature is in good shape and useful as-is. We have additional enhancements we want to make, but these can progress as separate features. I am fully supportive of graduating to GA.

@dchen1107
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, dom4ha, jackfrancis, jpbetz, natasha41575, SergeyKanzhelev, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@SergeyKanzhelev
Copy link
Member

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit 0653f14 into kubernetes:master Oct 16, 2025
3 of 4 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in SIG Scheduling Oct 16, 2025
@natasha41575 natasha41575 deleted the ippr-ga branch October 16, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lead-opted-in Denotes that an issue has been opted in to a release lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.