-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4800: Split UnCoreCache awareness #4810
Conversation
Welcome @ajcaldelas! |
Hi @ajcaldelas. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc /ok-to-test |
|
||
###### How can this feature be enabled / disabled in a live cluster? | ||
|
||
Cannot be dynamically enabled/disabled because of the CPU Manager state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? So the only way to enable this feature is to create a node with these settings?
Could I not set this policy and restart kubelet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree @kannon92 phrasing seems more correct and easier to follow. We can totally enable the feature but this require a kubelet restart, and possibly the deletion of the cpu manager state file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this clear it up?
This feature depends on the CPU Manager State file found in /var/lib/kubelet/cpu_manager_state
because of this feature the deletion of this file and restart of the Kubelet with the feature enabled again.
This is basically the reasoning maybe @wongchar could help me out here with the phrasing and more reasoning as to why this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Need to drain the node, enable/disable the feature, remove the cpu_manager_state file, reload daemons, and restart kubelet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write
"disabling the feature requires a kubelet config change and a kubelet restart" or so.
The state file deletion is a a side effect inherited by how cpumanager works. Nothing specific in this feature seems to imply or require a deletion is warranted.
Actually, deletions of state files should be avoided entirely and they are a smell, but that's another story for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this sound?
From our testing this is a requirement going from none
to static
cannot be done dynamically because of the cpu_manager_state
file. The node needs to be drained and the policy checkpoint file need to be removed before restart Kubelet. This requirement stems from a CPUManager
and has nothing to do with this feature specifically.
8c5dc13
to
8d351cd
Compare
/assign @klueska According to the KEP board you are marked as the sig-node approver for this. |
@haircommander can you review this and see if your "request changes" is still needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM |
/lgtm (formally this time) |
|
||
- Modify the "Allocate" static policy to check for the option, prefer-align-cpus-by-uncorecache. For platforms where SMT is enabled, prefer-align-cpus-by-uncorecache will continue to follow default behaviour and try to allocate full cores when possible. prefer-align-cpus-by-uncorecache can be enabled along with full-pcpus-only and enforce full core assignment with uncorecache alignment. | ||
|
||
- prefer-align-cpus-by-uncorecache will be compatible with the default CPU allocation logic with future plans to be compatible with the options distribute-cpus-across-numa and distribute-cpus-across-cores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that for now you cannot combine these options together? What happens if you do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, for now when the mentioned flags are enabled will cause a failure and node will not be schedulable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reached agreement that expanding/improving compatibility between options is a Beta graduation blocker.
Co-authored-by: Kevin Klues <[email protected]>
/lgtm |
@klueska please let us know if there's anything else, thanks! |
/approve |
@klueska: GitHub didn't allow me to request PR reviews from the following users: for, final, approval. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/approve |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ajcaldelas, klueska, mrunalp, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
One-line PR description: Introduce a new CPU Manager Static policy option for L3 Cache awareness
Issue link: Split L3 Cache Topology Awareness in CPU Manager #4800