-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP 1287: Graduate InPlacePodVerticalScaling to GA #5562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Resize atomicity | ||
- Exposing allocated resources in the pod status | ||
- QOS class changes | ||
- The subset of pod resize tests [here](https://github.com/kubernetes/kubernetes/blob/1aec2eb0030d2f121b4cf78998e9391d9389f1a0/test/e2e/common/node/pod_resize.go) under `doPodResizeTests` and `doPodResizeErrorTests` that meet the Conformance test requirements are promoted to Conformance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a tracking bug for the Conformance endpoints: kubernetes/kubernetes#133607
65f1319
to
0b6845c
Compare
/label lead-opted-in |
/cc @wojtek-t |
/approve |
lack of support for resize is now a significant missing piece of that functionality; however | ||
we don't believe this is a strong enough reason to block IPPR GA. We can, however, consider | ||
whether this should block GA of pod level resources. | ||
- `UpdatePodSandboxResources` is implemented by containerd & CRI-O. This is implemented by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note here that we will at least test this in e2e for stable. We need to validate that NRI plugins in future will be able to "decline" the resize by returning error from this method call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline, a summary of the open question:
All errors returned by this call are currently ignored: https://github.com/kubernetes/kubernetes/blob/0a4651c9910533f4649b8a11c334cf23237b1ccc/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L792
I think what we need is a decision about whether or not we should continue ignoring this error in all cases, or if we should only ignore the error in the "unimplemented" case. @tallclair do you have context on why we ignore the error entirely? Can we change it to ignore only when call is not implemented by the CRI?
If we make that change, that will allow us to support and use CRIProxy to test the behavior that Sergey is talking about (that NRI plugins can intercept and block the resize call via UpdatePodSandboxResources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKanzhelev I thought more and realized actually I think the behavior that you desire for an NRI plugin to block a resize can be achieved by intercepting the UpdateContainerResources CRI call here: https://github.com/kubernetes/kubernetes/blob/0a4651c9910533f4649b8a11c334cf23237b1ccc/pkg/kubelet/kuberuntime/kuberuntime_container.go#L409.
If I recall correctly the UpdatePodSandboxResources
was intended to be purely informative, my guess is that's why the decision was made that it should just stay as a best-effort call and the error logged and not bubbled up.
I will look more into this tomorrow, but I think what we actually want here is have coverage that an NRI plugin can decline the resize the resize by returning an error from UpdateContainerResources, not from UpdatePodSandboxResources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair do you have context on why we ignore the error entirely? Can we change it to ignore only when call is not implemented by the CRI?
If errors are completely ignored, how can NRI plugins actually block resizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If errors are completely ignored, how can NRI plugins actually block resizes?
Errors from UpdatePodSandboxResources
are ignored, but errors from UpdateContainerResources
are not, so an NRI plugin can block resizes by intercepting UpdateContainerResources
.
Can you add a note here that we will at least test this in e2e for stable. We need to validate that NRI plugins in future will be able to "decline" the resize by returning error from this method call.
I added a note under e2e tests that we will use CRI Proxy to test that NRI plugins can intercept UpdateContainerResources to block a resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big difference between UpdatePodSandboxResources
and UpdateContainerResources
is that container cgroups are managed by the runtime, while Pod cgroups are managed by the Kubelet. Since the Kubelet is managing the cgroup and doesn't have direct feedback from NRI, we decided to make this an informational best-effort call.
If we wanted to allow errors to prevent the Kubelet from modifying the cgroup, I think this would be technically feasible, but we would need to change the order of calls. Currently, Kubelet modifies the cgroup before calling the CRI. I'm not sure if one is more technically correct than the other. An argument for calling CRI (and NRI, by extension) first is that it's more consistent with UpdateContainerResources NRI, which is called before modifying the cgroups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors from UpdatePodSandboxResources are ignored, but errors from UpdateContainerResources are not, so an NRI plugin can block resizes by intercepting UpdateContainerResources.
I'm not sure, but with Pod Level Resources there might not be an UpdateContainerResources call for all resizes.
/cc @ndixita
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of offline discussion:
We'll remove UpdatePodSandboxResources
from the graduation criteria of this KEP and move it to #5419. We can also stop ignoring the error from UpdatePodSandboxResources
(when the containerd implementation is available).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve for sig-scheduling
We agreed that races with scheduler should be out-of-scope of this feature. Races when handling individual pods are not very harmful as they usually require another round of scheduling.
Things will change when we consider the Workload-Aware Scheduling effort in which scheduler needs to perform scheduling of a group of pods and maintain its integrity.
At that time it will become critical to assure that the startup can be performed without races as well as the scheduler is in the loop for any preemption decisions when some workload might be affected.
This work does not block this GA promotion though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
I am happy with downscoping of this KEP and moving it to GA.
Among other things, what was originally in graduation criteria is still valueable to build:
- VPA need to start taking advantage of this KEP. The IPPR KEP made changes to prevent known disruptions when resize is requested, however VPA is not using it and instead forcing the disruption when kubelet reporting it cannot do it in non-disruptive way.
- Clarify the exact semantic on interaction with CRI. Container runtimes and NRI plugins MUST have a say in whether resize it allowed. This need to be improved in Pod-level resources resize KEP AND more changes may be needed long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
In think the feature is in good shape and useful as-is. We have additional enhancements we want to make, but these can progress as separate features. I am fully supportive of graduating to GA.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, dom4ha, jackfrancis, jpbetz, natasha41575, SergeyKanzhelev, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
/sig node autoscaling scheduling
/cc @tallclair @dchen1107
/cc @dom4ha
for sig-scheduling
/cc @jackfrancis
for sig-autoscaling