-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Support cgroup v2 by using jdk.internal.platform.Metrics
in Pulsar Loadbalancer
#16832
Conversation
jdk.internal.platform.Metrics
in Pulsar Loadbalancerjdk.internal.platform.Metrics
in Pulsar Loadbalancer
8fcc1c6
to
92b41b6
Compare
9bc04e1
to
3e6d1b7
Compare
/pulsarbot run-failure-checks |
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/LinuxInfoUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/pulsarbot run-failure-checks |
@@ -97,4 +101,25 @@ public void testNoNICSpeed() throws Exception { | |||
} | |||
|
|||
|
|||
@Test | |||
public void testCGroupMetrics() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check if the test really uses the jdk.internal.platform.Metrics
not the old way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, we can assert metrics != null
.
/pulsarbot run-failure-checks |
The pr had no activity for 30 days, mark with Stale label. |
Any updates? I think this PR makes sense to use the cgroup v2. |
try { | ||
if (metrics != null && getCpuUsageMethod != null) { | ||
return (long) getCpuUsageMethod.invoke(metrics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backward compatibility, don't we need to multiply the limit in calculateBrokerHostUsage?
public void calculateBrokerHostUsage() {
...
double totalCpuLimit = getTotalCpuLimit(isCGroupsEnabled);
if (isCGroupsEnabled && metrics != null && getCpuUsageMethod != null) {
// cgroup cpuUsage is already scaled, [0.0, 1.0]
usage.setCpu(new ResourceUsage(cpuUsage * totalCpuLimit, totalCpuLimit));
} else {
usage.setCpu(new ResourceUsage(cpuUsage, totalCpuLimit));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Metrics.getCpuUsage
will return the aggregate time, so I think we don't need to multiply the limit.
/**
* Returns the aggregate time, in nanoseconds, consumed by all
* tasks in the Isolation Group.
*
* @return Time in nanoseconds, -1 if unknown or
* -2 if the metric is not supported.
*
*/
public long getCpuUsage();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update this PR as soon. |
# Conflicts: # bin/pulsar # pom.xml # pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/LinuxInfoUtils.java
add test
Yes, we need a separate PR to refactor it. |
jdk.internal.platform.Metrics
in Pulsar Loadbalancerjdk.internal.platform.Metrics
in Pulsar Loadbalancer
We need this to be included in branch-3.0 and branch-2.11 asap. I backported the changes to branch-2.10 with all dependencies in #20659. This is becoming urgent. AKS Kubernetes 1.25+ switches to use cgroup v2: AKS Kubernetes 1.24 goes End-of-life on July 30, 2023. GKE contains to have a way to select between cgroup v1 & cgroup v2: |
Master Issue: #16601
Motivation
The Pulsar load balancer detects CPU limits using cgroup v1 API, and the
jdk.internal.platform.Metrics
already support cgroup (V1, v2) so we should usejdk.internal.platform.Metrics
to get the cgroup metrics.Reference: https://code.yawk.at/java/17/java.base/jdk/internal/platform/
Modifications
Use
jdk.internal.platform.Metrics
to get the cgroup metrics in the LinuxInfoUtils.Verifying this change
This change is already covered by existing tests, such as testCGroupMetrics.
(example:)
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation
Check the box below or label this PR directly.
Need to update docs?
doc-required
(Your PR needs to update docs and you will update later)
doc-not-needed
(Please explain why)
doc
(Your PR contains doc changes)
doc-complete
(Docs have been already added)