-
Notifications
You must be signed in to change notification settings - Fork 41.6k
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them #105631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them #105631
Conversation
|
/sig node |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/triage accepted |
|
/cc @fromanirh @swatisehgal The core algorithm hasn't been implemented yet, but the base infrastructure is now in place. Please take note of:
|
|
@fromanirh @swatisehgal |
357ba1d to
6f155d0
Compare
Signed-off-by: Kevin Klues <[email protected]>
This parameter ensures that CPUs are always allocated in groups of size 'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e. hyperthreads) from the same core are handed out together. Signed-off-by: Kevin Klues <[email protected]>
Signed-off-by: Kevin Klues <[email protected]>
002a080 to
d9725e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again a batch of minor comments, mostly for future cleanups (when we move to beta or so). The algorithm itself is nicely written and very nicely commented. I want to review the tests carefully today/tomorrow, and if everything goes as I expect, I'll add my LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can (and probably should) be a free function as well, because it doesn't use cpuAccumulator fields at all. Not worth changing now, let's just not it down for the future refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: using bool here raise a flag. Could we use a naked bool maybe? or should we just move to a enum using integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing this to an enum. It's only a bool, because my first iteration of this didn't use constants at all -- I had the functions directly returning a bool, with true for continue and false for break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can totally wait for the refactoring, but it seems to suggest a plain bool could be simpler and equally readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
@fromanirh
Yes, as I pointed out here, I struggled with where to put this because I wanted it namespaced somehow. It ill be pulled out to a top-level function once we move to a |
Signed-off-by: Kevin Klues <[email protected]>
Sure, I was recording them all. I understand the issue at hand and I'm actually pushing (slightly :) ) to wait and address all these nits in the future cleanup work. |
d9725e6 to
86f9c26
Compare
|
Based on the feedback above, I am removing the hold on this PR. Just need an lgtm from here then. |
|
/test pull-kubernetes-integration |
|
/lgtm |
|
/lgtm |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR implements the logic describe in KEP-2902.
It adds a new
CPUManagerpolicy option calleddistribute-cpus-across-numato the staticCPUManagerpolicy. When enabled, this will trigger theCPUManagerto evenly distribute CPUs across NUMA nodes in cases where more than one NUMA node is required to satisfy the allocation.Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: