KEP-5776: Configurable cAdvisor Metrics Collection#5776
KEP-5776: Configurable cAdvisor Metrics Collection#5776NahumLitvin wants to merge 1 commit intokubernetes:masterfrom
Conversation
Add KEP for ConfigurableCAdvisorMetrics feature gate that allows operators to disable expensive ProcessMetrics collection via KubeletConfiguration.cadvisor.includedMetrics.processMetrics. Production data shows 99.4% kubelet CPU reduction on high-density nodes (200+ pods) when ProcessMetrics is disabled. Related: kubernetes/kubernetes#123340
|
Welcome @NahumLitvin! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: NahumLitvin The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @NahumLitvin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
you need to have a KEP issue first /sig instrumentation |
|
Per-component metric manipulations should ideally happen through a common interface: kubernetes/kubernetes#131572, for multiple reasons (common underlying DX expectations, ability to ship faster as most changes can be made in component-base and trickle down to components, etc.). For component-specific behaviors, it'd be good to have those gates and their configurations exposed through the same interface, albeit the implementation currently does not account for that, and will need to be updated to accomodate for per-component behaviors. This is also something that should be discussed first because this effectively goes in the other direction, as currently all options exposed in the Metrics API as well as their behaviors all live in component-base. |
Summary
This KEP proposes adding a
cadvisorconfiguration section to KubeletConfiguration that allows operators to selectively disable expensive cAdvisor metric collectors. The primary use case is disablingProcessMetricscollection, which scans/procfor every thread in every container and causes extreme CPU overhead on high-density nodes.Production Evidence
Testing on EKS 1.31 clusters with ~200 pods/node showed:
Proposal
ConfigurableCAdvisorMetrics(target 1.33)KubeletConfiguration.cadvisor.includedMetrics.processMetrics(defaulttrue)Related
/sig node
/kind kep