KEP-5526: Pod Level Resource Managers 1.36 by KevinTMtz · Pull Request #5775 · kubernetes/enhancements

KevinTMtz · 2026-01-08T22:02:05Z

This KEP change does not modify the original proposed behavior, only adds some specifics that surfaced during the implementation of the feature.

The most significant additions are a new property to the CPU and Memory managers state to keep track of the pod level exclusive resource allocation (requires PRR review perspective) and the introduction of a new term ResourceIsolationLevel that describes if the container resources belong to the container, the pod or the node (currently only used by the CPU manager to handle CPU quota enablement/disablement).

One-line PR description: Pod Level Resource Managers update to mainly add implementation specifics

Issue link: Pod Level Resource Managers #5526

Other comments:
- [PodLevelResourceManagers] Pod Level Resource Managers - Alpha kubernetes#134768
- KEP-5526: Pod Level Resource Managers #5527

KevinTMtz · 2026-01-08T23:00:18Z

cc @pravk03 @swatisehgal

Added a glossary with the terms that may cause some confusion, refer to previous comments in original PR:

KevinTMtz · 2026-01-12T19:03:31Z

/assign @ndixita

KevinTMtz · 2026-01-12T19:03:40Z

/assign @ffromani

KevinTMtz · 2026-01-12T19:03:50Z

/assign @pravk03

KevinTMtz · 2026-01-12T19:04:14Z

/assign @swatisehgal

keps/sig-node/5526-pod-level-resource-managers/README.md

ndixita · 2026-01-12T20:11:08Z

keps/sig-node/5526-pod-level-resource-managers/README.md

    to the container's cgroup.

+6.  **[New Logic] Container and Pod Removal:** When a container is removed, its
+    individual assignment is deleted from the state. The pod-level resource


Are the container resources also released back to pod's shared pool?

No, because of the static behavior of the CPU and Memory managers, the podSharedPool remains the same until the pod termination. Rewrote that section to be more clear about it.

This behavior as written looks reasonnable for alpha. If ever becomes undesirable, we would need a clear use case to think about further changes.

+1 to keep the simplicity for alpha.

keps/sig-node/5526-pod-level-resource-managers/README.md

jpbetz · 2026-01-15T16:42:48Z

PRR looks good for alpha. The PRR approval is already codified by the approval last release (https://github.com/KevinTMtz/enhancements/blob/pod-level-resource-managers-1-36/keps/prod-readiness/sig-node/5526.yaml) so no other approval steps are required for PRR.

ffromani

The KEP changes looks reasonnable and match our conversation in the PR.
The only point worth doubleclicking is the state file changes and the backward/forward upgrade and rollback flows.
While what we have is a good start, I'd like to make sure we are on par with the existing quality standards, and if there are cheap and reasonnable ways to improve. For example: is it time (and is it worth?) to finally begin adding an explicit version, using a field, encoded in the filename, or both, to the checkpoint file?

I'll think about it keeping practicality in mind - it will be very unfair to charge these improvements to this KEP, unless they are very cheap like wrapping new fields in a struct like we already did. I'll pause reviews momentarily to let other reviewers chime in.

keps/sig-node/5526-pod-level-resource-managers/README.md

ffromani · 2026-01-22T11:37:20Z

keps/sig-node/5526-pod-level-resource-managers/README.md

    to the container's cgroup.

+6.  **[New Logic] Container and Pod Removal:** When a container is removed, its
+    individual assignment is deleted from the state. The pod-level resource


This behavior as written looks reasonnable for alpha. If ever becomes undesirable, we would need a clear use case to think about further changes.

ffromani · 2026-01-22T11:38:22Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+state information associated with a pod-level resource allocation. While its
+initial implementation only encapsulates the overall `CPUSet` assigned to the
+pod (`CPUSet cpuset.CPUSet`), using a struct instead of a raw type is a
+strategic choice for future-proofing. It allows for the seamless addition of new


yes. Is meant to be forward compatible. The deserialization code in older kubelets will just ignore unknown fields in the struct.

ffromani · 2026-01-22T11:40:06Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+To achieve this, the CPU Manager performs a check to determine the
+`ResourceIsolationLevel` for each container. This logic evaluates the
+container's resource assignment and its eligibility for exclusivity to
+distinguish between:


IIRC we decided to keep this concept internal to the kubelet (which is fine). Just add a note this is an internal concept and not part of the user-facing API.

Added note, thank you.

ffromani · 2026-01-22T11:41:14Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+The `PodEntry` struct encapsulates the pod-level memory assignment, currently
+containing the overall `MemoryBlocks` slice assigned to the pod. As with the CPU
+Manager, using a struct here is a forward-looking design choice. It allows for
+the future inclusion of additional metadata—such as the original topology hint
+or request details—without disrupting the state file schema or requiring complex
+migrations.


LGTM. Need another pass to make sure I'm not neglecting anything, because memory manager is the lesser active resource manager so details fade quickly.

ffromani · 2026-01-22T11:43:00Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+    *   **Mitigation (Forward Compatibility):** The changes to the state file
+        are additive (new fields only), preserving the overall format. However,
+        because older Kubelets cannot gracefully handle these new fields, the
+        only safe rollback path involves draining the node and deleting the
+        state file before restarting the older Kubelet version.


We discussed a more compatible approach with multiple checkpoints, but that made the code much more complex without obvious compatibility gains. Looking at the big picture, this seems the sweet spot of simplicity and robustness.

ffromani · 2026-01-22T11:44:36Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+    *   **Mitigation (Backward Compatibility):** The new Kubelet version will be
        implemented to transparently read the old state file format. When it
        admits a new pod using this feature, it will write the new pod-aware
        entries, ensuring a seamless upgrade path without manual intervention.


this is easy and cheap, should be no bother.

ndixita · 2026-01-26T18:37:00Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+compatible with the changes presented in this KEP. Standard init containers will
+be allocated exclusive slices from the pod's managed resource pool. Because they
+run sequentially, their resources are made reusable for subsequent app
+containers. Restartable init containers (sidecars) that are `Guaranteed` will


nit: We could add that sidecar containers won't yield the resources as they are intended to run for the entire duration of pod's life

Done, reworded to be more clear about it.

ndixita · 2026-01-26T20:16:42Z

lgtm for partitioning logic, and overall design.

Pending items to review from my end: CPU Quota management section, and PRR questionnaire, to be done by EOD

ndixita · 2026-01-28T04:00:11Z

Took another pass and lgtm!

In a follow up PR, let's add:

A small note about topology manager in-memory state being cleared when a container is removed.
Although this KEP isn't about supporting PLR + IPPR, for Beta we will need an e2e that covers cases for pods resizes on node with resource managers enabled. Let's please add this as a todo for Beta.

ffromani

/lgtm

My comments have been addresses, the KEP is in good shape and I don't have any low hanging fruits for improvements.

[EDIT] I still want to deep dive into memory manager fine details, but there are no red (or yellow) flags in the changes here so I prefer to move forward.

ffromani · 2026-01-30T10:00:56Z

1. A small note about topology manager in-memory state being cleared when a container is removed.

regarding this point, research in github history and in the KEP has been inconclusive so far. The most credible theory, stemming from
kubernetes/kubernetes#74357 , is that the lifecycle hooks where wired following the template of cpumanager. Note that cpumanager was later removed from that hook to prevent exclusive CPU loss on container restart (issue link pending). Therefore, there is a fair chance we keep the affinity setting in the topology manager in-memory state longer than necessary.

I'd welcome further investigation in this area but we need some more hard data.

KevinTMtz · 2026-02-02T19:02:33Z

Took another pass and lgtm!

In a follow up PR, let's add:

A small note about topology manager in-memory state being cleared when a container is removed.

Although this KEP isn't about supporting PLR + IPPR, for Beta we will need an e2e that covers cases for pods resizes on node with resource managers enabled. Let's please add this as a todo for Beta.

For context, the "1. A small note about topology manager in-memory state being cleared when a container is removed." is relevant because of this discussion kubernetes/kubernetes#134345 (comment) from RestartAllContainers.

KevinTMtz · 2026-02-02T20:21:04Z

Amended the commit with the required updates to kep.yaml, no new changes to the KEP README.md file.

ffromani · 2026-02-03T06:54:22Z

/lgtm

Amended the commit with the required updates to kep.yaml, no new changes to the KEP README.md file.

Confirmed by inter-diff

dchen1107 · 2026-02-09T22:18:32Z

keps/sig-node/5526-pod-level-resource-managers/README.md

+exclusive allocations (for both `Guaranteed` sidecars and `Guaranteed`
+application containers) have been reserved. This ensures that shared sidecars do
+not interfere with the exclusive resources of application containers once they
+start.


+1 on podSharedPool. This allows us to treat a Pod as a single NUMA unit while allowing internal flexibility, meanwhile we still maintains backward compatibility at container level.

@pravk03 for vis.

dchen1107 · 2026-02-09T23:11:36Z

keps/sig-node/5526-pod-level-resource-managers/README.md

-Unset               | Set                 | Current behavior (alignment and exclusive allocation based on container resources)
+Unset               | Set                 | Current behavior (alignment and exclusive allocation based on individual container resources).
 Set                 | Unset               | No alignment or exclusive allocation. Containers run in the node's shared pool.
 Set                 | Set                 | Alignment and exclusive allocation for containers that specify `Guaranteed` resources; pod-level resources are ignored for allocation decisions.


Can you help me understand the difference between this category (pod is set, and container is set) with the above one (pod is unset, and container is set) when topology-manager-scope is set to container?

When only container-level resources are used (pod is unset, and container is set) and it is wanted to obtain exclusive resources, all the containers must have a guaranteed QoS, thus requiring each one of them to have an equal request and limit. In summary, it is an all or nothing situation, all containers receive exclusive resources, or none of them.

When pod-level resources are used (pod is set, and container is set), there is no longer an obligation to obtain Guaranteed QoS for each container, only for the overall pod, but it still requires the containers that you need to have exclusive resources to be Guaranteed. To put an example, it enables a hybrid model where you can have a pod with Guaranteed QoS determined by its pod-level resources, and three containers (one of them has a Guaranteed QoS), so only the Guaranteed container will obtain exclusive resources, while the non-Guaranteed containers will access the shared node resources constrained by the pod-level limit.

Thanks for the detailed explanation. It makes sense to me, but can you add an example below to reflect this difference?

dchen1107 · 2026-02-09T23:22:21Z

keps/sig-node/5526-pod-level-resource-managers/README.md


 ###### Special configurations

 **Underutilization of Pod-Level Resources**


This is by design, but please ensure a logging message for a warning to the users.

This log message change would be a part of PLR implementation then @KevinTMtz. We can submit that as a follow up PR change as that wouldn't affect this KEP in particular.

dchen1107 · 2026-02-10T00:19:30Z

keps/sig-node/5526-pod-level-resource-managers/README.md

-        sidecar behavior, if a restartable init container is individually
-        `Guaranteed`, it is allocated an exclusive slice that is reserved for
-        the entire pod lifecycle and is *not* added to the reusable set.
+    -   **Standard Init Containers:** If individually `Guaranteed`, they are


In this case, shouldn't we have a pod resource underutilization issue? Since standard Init Containers are running sequentially, and also before any restarable sidecar container, can we change the logic by allowing them to burst to the entire pod?

Guaranteed init containers get the corresponding exclusive resources to their request and limit.

Non-guaranteed init containers are granted access to the entire pod-level resource budget minus any resources exclusively allocated to Guaranteed sidecars running at that time. So yes, they are able to burst to whatever is not exclusively allocated in the pod.

My point is once pod level resource is set, can we allow those standard init containers burst to entire pod resource and ignore its own resource request and limit since they are running sequentially anyway.

That behavior can be obtained if no request and limit is specified for init containers. I think it is better to keep consistency and respect the expected behavior of the container-level limit that the user has, and if the user wants the container to burst with available resources on the pod, the pod shared pool will take care of that if no container-level limit is specified.

sgtm. Let's document the best practice for this.

dchen1107 · 2026-02-10T00:37:06Z

keps/sig-node/5526-pod-level-resource-managers/README.md

 1.  The `topology-manager-scope` is set to `pod`.
 2.  The sum of resource requests from all containers that are `Guaranteed`
    exactly equals the total resource budget defined in `pod.spec.resources`.
 3.  There is at least one other container in the pod that does not qualify for


What about ephemeral container(s)? Today ephemeral containers borrow the node / pod resource, can we explicitly document this? With topology-manager-scope being introduced, should we allow it use pod shared resource or ...?

Ephemeral containers are not offered any alignment or exclusive allocation by the resource managers, since they lack any guarantees for resources or execution. I think it is better to keep them separate as they currently are, and not receive any special handling. Do you have any specific concerns about ephemeral containers and resource managers?

No, I don't have much concerns. Just want us to document it clearly in design and the user doc.

@dchen1107 thanks for bringing this up. I have a few questions now which I missed earlier.

Consider a case: A pod with PLR set, has 1 guaranteed containers and 2 non-guaranteed containers.

Pod-level limit = 4 CPUs
container 1 limit = 2 CPUs
container 2 & 3 can share the other 2 CPUs from the pod shared pool and will have the corresponding cpuset set. If the pod is using all 4 CPUs, there's no way to inject an ephemeral container to such a pod. And there's no way to add enough buffer to the pod-level limit here to accommodate the ephemeral container under pod-level limits.

Yes, I asked Kevin a follow up PR for ephemeral containers and other small changes. We just need to make decision where to borrow resource for the ephemeral container, pod shared pool? If no shared, can borrow from the first container? Or the container being debugged?

dchen1107 · 2026-02-10T01:54:42Z

keps/sig-node/5526-pod-level-resource-managers/README.md

-        them. However, to completely eliminate any risk of parsing errors, the
-        safest and recommended rollback procedure involves draining the node and
-        deleting the state file.
+    *   **Failure Mode:** Forward compatibility is not supported. An older


AFAIK today's K8s roll back feature still requires node drain as the best practice. In this case I don't have much concern for alpha. Let's re-consider this in beta.

dchen1107 · 2026-02-10T01:56:20Z

/lgtm
/approve

Please send me the followup PR based on the comments later.

k8s-ci-robot · 2026-02-10T01:56:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, KevinTMtz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/sig-node/5526-pod-level-resource-managers/OWNERS~~ [dchen1107]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

esotsal · 2026-02-11T04:38:11Z

/cc

Interesting KEP with good discussion, looking forward for the implementation.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 8, 2026

k8s-ci-robot requested review from dchen1107 and mrunalp January 8, 2026 22:02

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 8, 2026

KevinTMtz changed the title ~~KEP-5526: Pod Level Resource Managers~~ KEP-5526: Pod Level Resource Managers update for 1.36 Jan 8, 2026

KevinTMtz mentioned this pull request Jan 8, 2026

KEP-5526: Pod Level Resource Managers #5527

Merged

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from 947d142 to 729da77 Compare January 9, 2026 00:50

k8s-ci-robot assigned ndixita Jan 12, 2026

k8s-ci-robot assigned ffromani Jan 12, 2026

k8s-ci-robot assigned pravk03 Jan 12, 2026

k8s-ci-robot assigned swatisehgal Jan 12, 2026

ndixita reviewed Jan 12, 2026

View reviewed changes

keps/sig-node/5526-pod-level-resource-managers/README.md Show resolved Hide resolved

ndixita reviewed Jan 12, 2026

View reviewed changes

keps/sig-node/5526-pod-level-resource-managers/README.md Show resolved Hide resolved

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from 729da77 to bc51a40 Compare January 12, 2026 22:13

KevinTMtz changed the title ~~KEP-5526: Pod Level Resource Managers update for 1.36~~ KEP-5526: Pod Level Resource Managers 1.36 Jan 12, 2026

KevinTMtz mentioned this pull request Jan 13, 2026

Pod Level Resource Managers #5526

Open

4 tasks

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from bc51a40 to c43a33d Compare January 20, 2026 21:46

ffromani reviewed Jan 22, 2026

View reviewed changes

This was referenced Jan 22, 2026

[WIP][FG:InPlacePodVerticalScaling] Fix Static CPU management policy alongside InPlacePodVerticalScaling kubernetes/kubernetes#129719

Open

KEP 5554: In place update pod resources alongside static cpu manager policy KEP creation #5555

Merged

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from c43a33d to b909fd0 Compare January 22, 2026 18:54

ndixita reviewed Jan 26, 2026

View reviewed changes

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from b909fd0 to 44e13e4 Compare January 27, 2026 00:18

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from 44e13e4 to b30db39 Compare January 28, 2026 21:22

ffromani reviewed Jan 29, 2026

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2026

KevinTMtz mentioned this pull request Feb 2, 2026

Implement RestartAllContainers kubernetes/kubernetes#134345

Merged

Pod Level Resource Managers KEP

42f5e85

KevinTMtz force-pushed the pod-level-resource-managers-1-36 branch from b30db39 to 42f5e85 Compare February 2, 2026 20:17

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 3, 2026

dchen1107 reviewed Feb 9, 2026

View reviewed changes

dchen1107 reviewed Feb 10, 2026

View reviewed changes

k8s-ci-robot assigned dchen1107 Feb 10, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2026

k8s-ci-robot merged commit 5eb0625 into kubernetes:master Feb 10, 2026
4 checks passed

k8s-ci-robot added this to the v1.36 milestone Feb 10, 2026

k8s-ci-robot requested a review from esotsal February 11, 2026 04:38


		###### Special configurations

		Underutilization of Pod-Level Resources

Conversation

KevinTMtz commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KevinTMtz commented Jan 8, 2026

Uh oh!

KevinTMtz commented Jan 12, 2026

Uh oh!

KevinTMtz commented Jan 12, 2026

Uh oh!

KevinTMtz commented Jan 12, 2026

Uh oh!

KevinTMtz commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpbetz commented Jan 15, 2026

Uh oh!

ffromani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndixita commented Jan 26, 2026

Uh oh!

ndixita commented Jan 28, 2026

Uh oh!

ffromani left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ffromani commented Jan 30, 2026

Uh oh!

KevinTMtz commented Feb 2, 2026

Uh oh!

KevinTMtz commented Feb 2, 2026

Uh oh!

ffromani commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndixita Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KevinTMtz commented Jan 8, 2026 •

edited

Loading

KevinTMtz commented Jan 12, 2026 •

edited

Loading

ffromani left a comment •

edited

Loading

ffromani commented Feb 3, 2026 •

edited

Loading

ndixita Feb 10, 2026 •

edited

Loading

ndixita Feb 10, 2026 •

edited

Loading