Skip to content

Commit 2efe8d5

Browse files
committed
KEP-1287: Add back container status allocatedResources
1 parent bc08140 commit 2efe8d5

File tree

1 file changed

+23
-53
lines changed
  • keps/sig-node/1287-in-place-update-pod-resources

1 file changed

+23
-53
lines changed

keps/sig-node/1287-in-place-update-pod-resources/README.md

Lines changed: 23 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,12 @@
2828
- [Notes](#notes)
2929
- [Lifecycle Nuances](#lifecycle-nuances)
3030
- [Atomic Resizes](#atomic-resizes)
31+
- [Edge-triggered Resizes](#edge-triggered-resizes)
32+
- [Memory Limit Decreases](#memory-limit-decreases)
3133
- [Sidecars](#sidecars)
3234
- [QOS Class](#qos-class)
3335
- [Resource Quota](#resource-quota)
3436
- [Affected Components](#affected-components)
35-
- [Instrumentation](#instrumentation)
3637
- [Static CPU & Memory Policy](#static-cpu--memory-policy)
3738
- [Future Enhancements](#future-enhancements)
3839
- [Mutable QOS Class "Shape"](#mutable-qos-class-shape)
@@ -64,7 +65,7 @@
6465
- [Implementation History](#implementation-history)
6566
- [Drawbacks](#drawbacks)
6667
- [Alternatives](#alternatives)
67-
- [Allocated Resources](#allocated-resources-1)
68+
- [Allocated Resource Limits](#allocated-resource-limits)
6869
<!-- /toc -->
6970

7071
## Release Signoff Checklist
@@ -216,8 +217,7 @@ PodStatus is extended to show the resources applied to the Pod and its Container
216217
* Pod.Status.ContainerStatuses[i].Resources (new field, type
217218
v1.ResourceRequirements) shows the **actual** resources held by the Pod and
218219
its Containers for running containers, and the allocated resources for non-running containers.
219-
* Pod.Status.AllocatedResources (new field) reports the aggregate pod-level allocated resources,
220-
computed from the container-level allocated resources.
220+
* Pod.Status.ContainerStatuses[i].AllocatedResources (new field) reports the allocated resource requests.
221221
* Pod.Status.Resize (new field, type map[string]string) explains what is
222222
happening for a given resource on a given container.
223223

@@ -234,43 +234,13 @@ Additionally, a new `Pod.Spec.Containers[i].ResizePolicy[]` field (type
234234

235235
When the Kubelet admits a pod initially or admits a resize, all resource requirements from the spec
236236
are cached and checkpointed locally. When a container is (re)started, these are the requests and
237-
limits used. The allocated resources are only reported in the API at the pod-level, through the
238-
`Pod.Status.AllocatedResources` field.
237+
limits used. Only the allocated requests are reported in the API, through the
238+
`Pod.Status.ContainerStatuses[i].AllocatedResources` field.
239239

240-
```
241-
type PodStatus struct {
242-
// ...
243-
244-
// AllocatedResources is the pod-level allocated resources. Only allocated requests are included.
245-
// +optional
246-
AllocatedResources *PodAllocatedResources `json:"allocatedResources,omitempty"`
247-
}
248-
249-
// PodAllocatedResources is used for reporting pod-level allocated resources.
250-
type PodAllocatedResources struct {
251-
// Requests is the pod-level allocated resource requests, either directly
252-
// from the pod-level resource requirements if specified, or computed from
253-
// the total container allocated requests.
254-
// +optional
255-
Requests v1.ResourceList
256-
}
257-
258-
```
259-
260-
The alpha implementation of In-Place Pod Vertical Scaling included `AllocatedResources` in the
261-
container status, but only included requests. This field will remain in alpha, guarded by the
262-
separate `InPlacePodVerticalScalingAllocatedStatus` feature gate, and is a candidate for future
263-
removal. With the allocated status feature gate enabled, Kubelet will continue to populate the field
264-
with the allocated requests from the checkpoint.
265-
266-
The scheduler uses `max(spec...resources, status.allocatedResources, status...resources)` for fit
240+
The scheduler uses `max(spec...resources, status...allocatedResources, status...resources)` for fit
267241
decisions, but since the actual resources are only relevant and reported for running containers, the
268242
Kubelet sets `status...resources` equal to the allocated resources for non-running containers.
269243

270-
See [`Alternatives: Allocated Resources`](#allocated-resources-1) for alternative APIs considered.
271-
272-
The allocated resources API should be reevaluated prior to GA.
273-
274244
#### Subresource
275245

276246
Resource changes can only be made via the new `/resize` subresource, which accepts Update and Patch
@@ -498,7 +468,7 @@ To compute the Node resources allocated to Pods, pending resizes must be factore
498468
The scheduler will use the maximum of:
499469
1. Desired resources, computed from container requests in the pod spec, unless the resize is marked as `Infeasible`
500470
1. Actual resources, computed from the `.status.containerStatuses[i].resources.requests`
501-
1. Allocated resources, reported in `.status.allocatedResources.requests`
471+
1. Allocated resources, reported in `.status.containerStatuses[i].allocatedResources`
502472

503473
### Flow Control
504474

@@ -518,7 +488,7 @@ This is intentionally hitting various edge-cases for demonstration.
518488
1. kubelet runs the pod and updates the API
519489
- `spec.containers[0].resources.requests[cpu]` = 1
520490
- `status.resize` = unset
521-
- `status.allocatedResources.requests[cpu]` = 1
491+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1
522492
- `status.containerStatuses[0].resources.requests[cpu]` = 1
523493
- actual CPU shares = 1024
524494

@@ -542,67 +512,67 @@ This is intentionally hitting various edge-cases for demonstration.
542512
- apiserver validates the request and accepts the operation
543513
- `spec.containers[0].resources.requests[cpu]` = 2
544514
- `status.resize` = `"InProgress"`
545-
- `status.allocatedResources.requests[cpu]` = 1.5
515+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
546516
- `status.containerStatuses[0].resources.requests[cpu]` = 1
547517
- actual CPU shares = 1024
548518

549519
1. Container runtime applied cpu=1.5
550520
- `spec.containers[0].resources.requests[cpu]` = 2
551521
- `status.resize` = `"InProgress"`
552-
- `status.allocatedResources.requests[cpu]` = 1.5
522+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
553523
- `status.containerStatuses[0].resources.requests[cpu]` = 1
554524
- actual CPU shares = 1536
555525

556526
1. kubelet syncs the pod, and sees resize #2 (cpu = 2)
557527
- kubelet decides this is feasible, but currently insufficient available resources
558528
- `spec.containers[0].resources.requests[cpu]` = 2
559529
- `status.resize[cpu]` = `"Deferred"`
560-
- `status.allocatedResources.requests[cpu]` = 1.5
530+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
561531
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
562532
- actual CPU shares = 1536
563533

564534
1. Resize #3: cpu = 1.6
565535
- apiserver validates the request and accepts the operation
566536
- `spec.containers[0].resources.requests[cpu]` = 1.6
567537
- `status.resize[cpu]` = `"Deferred"`
568-
- `status.allocatedResources.requests[cpu]` = 1.5
538+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.5
569539
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
570540
- actual CPU shares = 1536
571541

572542
1. Kubelet syncs the pod, and sees resize #3 and admits it
573543
- `spec.containers[0].resources.requests[cpu]` = 1.6
574544
- `status.resize[cpu]` = `"InProgress"`
575-
- `status.allocatedResources.requests[cpu]` = 1.6
545+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
576546
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
577547
- actual CPU shares = 1536
578548

579549
1. Container runtime applied cpu=1.6
580550
- `spec.containers[0].resources.requests[cpu]` = 1.6
581551
- `status.resize[cpu]` = `"InProgress"`
582-
- `status.allocatedResources.requests[cpu]` = 1.6
552+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
583553
- `status.containerStatuses[0].resources.requests[cpu]` = 1.5
584554
- actual CPU shares = 1638
585555

586556
1. Kubelet syncs the pod
587557
- `spec.containers[0].resources.requests[cpu]` = 1.6
588558
- `status.resize[cpu]` = unset
589-
- `status.allocatedResources.requests[cpu]` = 1.6
559+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
590560
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
591561
- actual CPU shares = 1638
592562

593563
1. Resize #4: cpu = 100
594564
- apiserver validates the request and accepts the operation
595565
- `spec.containers[0].resources.requests[cpu]` = 100
596566
- `status.resize[cpu]` = unset
597-
- `status.allocatedResources.requests[cpu]` = 1.6
567+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
598568
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
599569
- actual CPU shares = 1638
600570

601571
1. Kubelet syncs the pod, and sees resize #4
602572
- this node does not have 100 CPUs, so kubelet cannot admit it
603573
- `spec.containers[0].resources.requests[cpu]` = 100
604574
- `status.resize[cpu]` = `"Infeasible"`
605-
- `status.allocatedResources.requests[cpu]` = 1.6
575+
- `status.containerStatuses[0].allocatedResources[cpu]` = 1.6
606576
- `status.containerStatuses[0].resources.requests[cpu]` = 1.6
607577
- actual CPU shares = 1638
608578

@@ -795,7 +765,7 @@ With InPlacePodVerticalScaling enabled, resource quota needs to consider pending
795765
to how this is handled by scheduling, resource quota will use the maximum of:
796766
1. Desired resources, computed from container requests in the pod spec, unless the resize is marked as `Infeasible`
797767
1. Actual resources, computed from the `.status.containerStatuses[i].resources.requests`
798-
1. Allocated resources, reported in `.status.allocatedResources.requests`
768+
1. Allocated resources, reported in `.status.containerStatuses[i].allocatedResources`
799769

800770
To properly handle scale-down, resource quota controller now needs to evaluate
801771
pod updates where `.status...resources` changed.
@@ -1107,7 +1077,7 @@ Setup a guaranteed class Pod with two containers (c1 & c2).
11071077
#### Backward Compatibility and Negative Tests
11081078

11091079
1. Verify that Node is allowed to update only a Pod's AllocatedResources field.
1110-
1. Verify that only Node account is allowed to udate AllocatedResources field.
1080+
1. Verify that only Node account is allowed to update AllocatedResources field.
11111081
1. Verify that updating Pod Resources in workload template spec retains current
11121082
behavior:
11131083
- Updating Pod Resources in Job template is not allowed.
@@ -1478,7 +1448,7 @@ _This section must be completed when targeting beta graduation to a release._
14781448
- Improve memory limit downsize handling
14791449
- Rename ResizeRestartPolicy `NotRequired` to `PreferNoRestart`,
14801450
and update CRI `UpdateContainerResources` contract
1481-
- Add pod-level `AllocatedResources`
1451+
- Add back `AllocatedResources` field to resolve a scheduler corner case
14821452
- Switch to edge-triggered resize actuation
14831453

14841454
## Drawbacks
@@ -1500,9 +1470,9 @@ information to express the idea and why it was not acceptable.
15001470
We considered having scheduler approve the resize. We also considered PodSpec as
15011471
the location to checkpoint allocated resources.
15021472

1503-
### Allocated Resources
1473+
### Allocated Resource Limits
15041474

1505-
If we need allocated resources & limits in the pod status API, the following options have been
1475+
If we need allocated limits in the pod status API, the following options have been
15061476
considered:
15071477

15081478
**Option 1: New field "AcceptedResources"**

0 commit comments

Comments
 (0)