Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][broker] PIP-264: Add Java runtime metrics #22616

Merged
merged 28 commits into from
May 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c40a356
Draft runtime metrics
dragosvictor Apr 16, 2024
92e0412
Test more metrics
dragosvictor Apr 16, 2024
f3995c5
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor Apr 17, 2024
7aa4ea4
Comment test
dragosvictor Apr 19, 2024
4392dba
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor Apr 25, 2024
d1c0ec4
Move RuntimeMetrics instance to OpenTelemetryService class
dragosvictor Apr 25, 2024
209f75f
Enable experimental JMX metrics
dragosvictor Apr 25, 2024
48ef978
Test Cleanup
dragosvictor Apr 25, 2024
7c3581d
Cleanup
dragosvictor Apr 25, 2024
7c21288
Allow parsing of scientific notation values in PrometheusMetricsClient
dragosvictor Apr 25, 2024
5b5b159
Fix license check
dragosvictor Apr 25, 2024
785a90d
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor Apr 26, 2024
647ca66
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor Apr 29, 2024
5d26bdd
Merge remote-tracking branch 'apache/master' into pip-264-java-runtim…
merlimat May 1, 2024
331ea42
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor May 3, 2024
92c138a
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
dragosvictor May 6, 2024
bfd7022
Update library versions
dragosvictor May 6, 2024
a0fb5b6
[fix][test] Clear MockedPulsarServiceBaseTest fields to prevent test …
lhotari May 6, 2024
4af0528
Merge remote-tracking branch 'lhotari/lh-fix-test-memory-leak' into p…
dragosvictor May 6, 2024
ec01dc6
Merge remote-tracking branch 'origin/master' into pip-264-java-runtim…
lhotari May 6, 2024
a7e0480
Drop Otel instance references at closing time to fix test runtime OOM…
lhotari May 6, 2024
65031f4
Update conf/pulsar_env.sh
dragosvictor May 6, 2024
75a1af6
Remove extra dependency to opentelemetry-runtime-telemetry-java17 in …
dragosvictor May 6, 2024
1f38e15
Capture heap dump on OOM in integration tests
lhotari May 7, 2024
6a14953
Run AdminApiTransactionMultiBrokerTest in isolation since it requires…
lhotari May 7, 2024
2c682b3
Increase Pulsar container -Xmx from 128M to 150M
lhotari May 7, 2024
11756ff
Run AdminApiTransactionMultiBrokerTest independently
lhotari May 7, 2024
d4f5149
increase test JVM heap max size to 1300m
lhotari May 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions build/run_unit_group.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ function test_group_broker_group_2() {

function test_group_broker_group_3() {
mvn_test -pl pulsar-broker -Dgroups='broker-admin'
# run AdminApiTransactionMultiBrokerTest independently with a larger heap size
mvn_test -pl pulsar-broker -DtestMaxHeapSize=1500M -Dtest=org.apache.pulsar.broker.admin.v3.AdminApiTransactionMultiBrokerTest -DtestForkCount=1 -DtestReuseFork=false
}

function test_group_broker_group_4() {
Expand Down
4 changes: 4 additions & 0 deletions conf/pulsar_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,7 @@ PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS:-" -Dpulsar.allocator.exit_on_oom=true -D
#Wait time before forcefully kill the pulsar server instance, if the stop is not successful
#PULSAR_STOP_TIMEOUT=

# Enable semantically stable telemetry for JVM metrics, unless otherwise overridden by the user.
if [ -z "$OTEL_SEMCONV_STABILITY_OPT_IN" ]; then
export OTEL_SEMCONV_STABILITY_OPT_IN=jvm
fi
2 changes: 2 additions & 0 deletions distribution/server/src/assemble/LICENSE.bin.txt
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,8 @@ The Apache Software License, Version 2.0
- io.opentelemetry.instrumentation-opentelemetry-instrumentation-api-1.33.2.jar
- io.opentelemetry.instrumentation-opentelemetry-instrumentation-api-semconv-1.33.2-alpha.jar
- io.opentelemetry.instrumentation-opentelemetry-resources-1.33.2-alpha.jar
- io.opentelemetry.instrumentation-opentelemetry-runtime-telemetry-java17-1.33.2-alpha.jar
- io.opentelemetry.instrumentation-opentelemetry-runtime-telemetry-java8-1.33.2-alpha.jar
- io.opentelemetry.semconv-opentelemetry-semconv-1.25.0-alpha.jar

BSD 3-clause "New" or "Revised" License
Expand Down
3 changes: 2 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ flexible messaging model and an intuitive client API.</description>
--add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED <!--MBeanStatsGenerator-->
--add-opens java.base/jdk.internal.platform=ALL-UNNAMED <!--LinuxInfoUtils-->
</test.additional.args>
<testMaxHeapSize>1300M</testMaxHeapSize>
<testReuseFork>true</testReuseFork>
<testForkCount>4</testForkCount>
<testRealAWS>false</testRealAWS>
Expand Down Expand Up @@ -1652,7 +1653,7 @@ flexible messaging model and an intuitive client API.</description>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>${testJacocoAgentArgument} -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${testHeapDumpPath} -XX:+ExitOnOutOfMemoryError -Xmx1G -XX:+UseZGC
<argLine>${testJacocoAgentArgument} -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${testHeapDumpPath} -XX:+ExitOnOutOfMemoryError -Xmx${testMaxHeapSize} -XX:+UseZGC
-Dpulsar.allocator.pooled=true
-Dpulsar.allocator.leak_detection=Advanced
-Dpulsar.allocator.exit_on_oom=false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ public static Multimap<String, Metric> parseMetrics(String metrics) {
// or
// pulsar_subscriptions_count{cluster="standalone", namespace="public/default",
// topic="persistent://public/default/test-2"} 0.0
Pattern pattern = Pattern.compile("^(\\w+)\\{([^}]+)}\\s([+-]?[\\d\\w.-]+)$");
Pattern pattern = Pattern.compile("^(\\w+)\\{([^}]+)}\\s([+-]?[\\d\\w.+-]+)$");
Pattern tagsPattern = Pattern.compile("(\\w+)=\"([^\"]+)\"(,\\s?)?");

Splitter.on("\n").split(metrics).forEach(line -> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
import org.testng.annotations.Test;

@Slf4j
@Test(groups = "broker-admin")
@Test(groups = "broker-admin-isolated")
public class AdminApiTransactionMultiBrokerTest extends TransactionTestBase {

private static final int NUM_BROKERS = 16;
Expand Down
14 changes: 14 additions & 0 deletions pulsar-opentelemetry/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@
<groupId>io.opentelemetry.semconv</groupId>
<artifactId>opentelemetry-semconv</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-runtime-telemetry-java17</artifactId>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
Expand Down Expand Up @@ -130,6 +134,16 @@
</execution>
</executions>
</plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<systemPropertyVariables>
<otel.semconv-stability.opt-in>jvm</otel.semconv-stability.opt-in>
</systemPropertyVariables>
</configuration>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import static com.google.common.base.Preconditions.checkArgument;
import com.google.common.annotations.VisibleForTesting;
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk;
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdkBuilder;
Expand All @@ -29,6 +30,7 @@
import java.io.Closeable;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Consumer;
import lombok.Builder;
import org.apache.commons.lang3.StringUtils;
Expand All @@ -42,7 +44,9 @@ public class OpenTelemetryService implements Closeable {
public static final String OTEL_SDK_DISABLED_KEY = "otel.sdk.disabled";
static final int MAX_CARDINALITY_LIMIT = 10000;

private final OpenTelemetrySdk openTelemetrySdk;
private final AtomicReference<OpenTelemetrySdk> openTelemetrySdkReference = new AtomicReference<>();

private final AtomicReference<RuntimeMetrics> runtimeMetricsReference = new AtomicReference<>();

/**
* Instantiates the OpenTelemetry SDK. All attributes are overridden by system properties or environment
Expand Down Expand Up @@ -94,15 +98,28 @@ public OpenTelemetryService(String clusterName,
builderCustomizer.accept(sdkBuilder);
}

openTelemetrySdk = sdkBuilder.build().getOpenTelemetrySdk();
openTelemetrySdkReference.set(sdkBuilder.build().getOpenTelemetrySdk());

// For a list of exposed metrics, see https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/
runtimeMetricsReference.set(RuntimeMetrics.builder(openTelemetrySdkReference.get())
.enableAllFeatures()
.enableExperimentalJmxTelemetry()
.build());
}

public OpenTelemetry getOpenTelemetry() {
return openTelemetrySdk;
return openTelemetrySdkReference.get();
}

@Override
public void close() {
openTelemetrySdk.close();
RuntimeMetrics runtimeMetrics = runtimeMetricsReference.getAndSet(null);
if (runtimeMetrics != null) {
runtimeMetrics.close();
}
OpenTelemetrySdk openTelemetrySdk = openTelemetrySdkReference.getAndSet(null);
if (openTelemetrySdk != null) {
openTelemetrySdk.close();
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -198,4 +198,52 @@ public void testServiceIsDisabledByDefault() throws Exception {
// Validate that the callback has not being called.
assertThat(callback).isFalse();
}

@Test
public void testJvmRuntimeMetrics() {
// Attempt collection of GC metrics. The metrics should be populated regardless if GC is triggered or not.
Runtime.getRuntime().gc();

var metrics = reader.collectAllMetrics();

// Process Metrics
// Replaces process_cpu_seconds_total
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.cpu.time"));

// Memory Metrics
// Replaces jvm_memory_bytes_used
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.memory.used"));
// Replaces jvm_memory_bytes_committed
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.memory.committed"));
// Replaces jvm_memory_bytes_max
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.memory.limit"));
// Replaces jvm_memory_bytes_init
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.memory.init"));
// Replaces jvm_memory_pool_allocated_bytes_total
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.memory.used_after_last_gc"));

// Buffer Pool Metrics
// Replaces jvm_buffer_pool_used_bytes
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.buffer.memory.usage"));
// Replaces jvm_buffer_pool_capacity_bytes
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.buffer.memory.limit"));
// Replaces jvm_buffer_pool_used_buffers
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.buffer.count"));

// Garbage Collector Metrics
// Replaces jvm_gc_collection_seconds
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.gc.duration"));

// Thread Metrics
// Replaces jvm_threads_state, jvm_threads_current and jvm_threads_daemon
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.thread.count"));

// Class Loading Metrics
// Replaces jvm_classes_currently_loaded
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.class.count"));
// Replaces jvm_classes_loaded_total
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.class.loaded"));
// Replaces jvm_classes_unloaded_total
assertThat(metrics).anySatisfy(metric -> assertThat(metric).hasName("jvm.class.unloaded"));
}
}
2 changes: 1 addition & 1 deletion tests/docker-images/latest-version-image/conf/bookie.conf
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/bookie.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M -XX:MaxDirectMemorySize=512M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx128M -XX:MaxDirectMemorySize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar bookie
user=pulsar
stopwaitsecs=15
2 changes: 1 addition & 1 deletion tests/docker-images/latest-version-image/conf/broker.conf
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/broker.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx150M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar broker
user=pulsar
stopwaitsecs=15
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/functions_worker.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/pulsar/logs/functions",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx150M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar functions-worker
user=pulsar
stopwaitsecs=15
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/global-zk.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx128M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar configuration-store
user=pulsar
stopwaitsecs=15
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/local-zk.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx128M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar zookeeper
user=pulsar
stopwaitsecs=15
2 changes: 1 addition & 1 deletion tests/docker-images/latest-version-image/conf/proxy.conf
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/proxy.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx150M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar proxy
user=pulsar
stopwaitsecs=15
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ autostart=false
redirect_stderr=true
stdout_logfile=/var/log/pulsar/pulsar-websocket.log
directory=/pulsar
environment=PULSAR_MEM="-Xmx128M",PULSAR_GC="-XX:+UseZGC"
environment=PULSAR_MEM="-Xmx150M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/pulsar -XX:+ExitOnOutOfMemoryError",PULSAR_GC="-XX:+UseZGC"
command=/pulsar/bin/pulsar websocket
user=pulsar
stopwaitsecs=15
Loading