-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Emit health metrics for OTel SDK metric collection and export #145179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
75b857f
Bump OTel to get health metrics
mamazzol 713645b
Add health metrics to OTel SDK
mamazzol a215113
Small refactor
mamazzol 0fb8b2f
Merge branch 'main' into otel-health
mamazzol 5a1de63
Merge from main
mamazzol 66006de
Merge branch 'main' into otel-health
mamazzol d35976a
Merge branch 'main' into otel-health
mamazzol 67bc726
Merge branch 'main' into otel-health
mamazzol d6ecec0
Adding test for health metrics exported
mamazzol 1b83ae1
Add toDuration method to TimeValue
mamazzol 987fa0a
Change RunTask to run ES with OTEL SDK enabled without mock APM server
mamazzol 9d6a989
PR Feedback
mamazzol 843f705
Merge branch 'main' into otel-health
mamazzol 716124b
simplify disable APM Agent logic
mamazzol 0950d19
Remove duplicated flush
mamazzol 18d12d7
Merge branch 'main' into otel-health
prdoyle 13e9b5f
Merge branch 'main' into otel-health
prdoyle 814da9c
Reinstate double flush to ensure flushing of health metrics
mamazzol 65882c3
Merge branch 'main' into otel-health
mamazzol 365d364
Merge branch 'main' into otel-health
mamazzol 538db6a
Merge branch 'main' into otel-health
mamazzol 40f61bb
Merge branch 'main' into otel-health
mamazzol d5abad8
Merge branch 'main' into otel-health
mamazzol b44c92b
Merge branch 'main' into otel-health
prdoyle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,10 +10,12 @@ | |
| package org.elasticsearch.telemetry.apm.internal; | ||
|
|
||
| import io.opentelemetry.api.metrics.Meter; | ||
| import io.opentelemetry.api.metrics.MeterProvider; | ||
| import io.opentelemetry.exporter.otlp.http.metrics.OtlpHttpMetricExporter; | ||
| import io.opentelemetry.exporter.otlp.http.metrics.OtlpHttpMetricExporterBuilder; | ||
| import io.opentelemetry.instrumentation.runtimemetrics.java17.RuntimeMetrics; | ||
| import io.opentelemetry.instrumentation.runtimetelemetry.RuntimeTelemetry; | ||
| import io.opentelemetry.sdk.OpenTelemetrySdk; | ||
| import io.opentelemetry.sdk.common.InternalTelemetryVersion; | ||
| import io.opentelemetry.sdk.metrics.SdkMeterProvider; | ||
| import io.opentelemetry.sdk.metrics.export.AggregationTemporalitySelector; | ||
| import io.opentelemetry.sdk.metrics.export.PeriodicMetricReader; | ||
|
|
@@ -23,15 +25,14 @@ | |
| import org.elasticsearch.common.settings.Settings; | ||
| import org.elasticsearch.core.TimeValue; | ||
|
|
||
| import java.time.Duration; | ||
| import java.util.Objects; | ||
| import java.util.concurrent.TimeUnit; | ||
|
|
||
| import static org.elasticsearch.telemetry.TelemetryProvider.OTEL_METRICS_ENABLED_SYSTEM_PROPERTY; | ||
|
|
||
| public class OTelSdkMeterSupplier implements MeterSupplier { | ||
| private final Settings settings; | ||
| private volatile SdkMeterProvider meterProvider; | ||
| private volatile RuntimeMetrics runtimeMetrics; | ||
| private volatile OTelMetricsResources resources; | ||
| private static final Object mutex = new Object(); | ||
|
|
||
| OTelSdkMeterSupplier(Settings settings) { | ||
|
|
@@ -41,27 +42,41 @@ public class OTelSdkMeterSupplier implements MeterSupplier { | |
| @Override | ||
| public Meter get() { | ||
| synchronized (mutex) { | ||
| if (meterProvider == null) { | ||
| var exporter = createOTLPExporter(); | ||
| TimeValue intervalTimeValue = OTelSdkSettings.TELEMETRY_OTEL_METRICS_INTERVAL.get(settings); | ||
| var reader = PeriodicMetricReader.builder(exporter).setInterval(Duration.ofMillis(intervalTimeValue.millis())).build(); | ||
| meterProvider = SdkMeterProvider.builder() | ||
| .setResource(Resource.builder().put("service.name", "elasticsearch").build()) | ||
| .registerMetricReader(reader) | ||
| .build(); | ||
| if (OTelSdkSettings.TELEMETRY_OTEL_METRICS_ENABLED.get(settings)) { | ||
| var otelSdk = OpenTelemetrySdk.builder().setMeterProvider(meterProvider).build(); | ||
| // RuntimeMetrics uses two underlying implementations to gather the full set of metric data, JFR and JMX. | ||
| // The metrics gathered by the two implementations are mutually exclusive and the union of them produces the full | ||
| // set of available metrics. See more at: https://ela.st/otel-runtime-telemetry | ||
| runtimeMetrics = RuntimeMetrics.builder(otelSdk).build(); | ||
| } | ||
| if (resources == null) { | ||
| resources = createMeteringResources(); | ||
| } | ||
| return meterProvider.get("elasticsearch"); | ||
| return resources.systemMeterProvider().get("elasticsearch"); | ||
| } | ||
| } | ||
|
|
||
| private OtlpHttpMetricExporter createOTLPExporter() { | ||
| private OTelMetricsResources createMeteringResources() { | ||
| TimeValue intervalTimeValue = OTelSdkSettings.TELEMETRY_OTEL_METRICS_INTERVAL.get(settings); | ||
|
|
||
| // Reader to collect metrics about OTLPExporter | ||
| var metricHealthReader = PeriodicMetricReader.builder(createOTLPExporter(MeterProvider.noop())) | ||
| .setInterval(intervalTimeValue.toDuration()) | ||
| .build(); | ||
| var metricHealthProvider = sdkMeterProvider(metricHealthReader); | ||
|
|
||
| var reader = PeriodicMetricReader.builder(createOTLPExporter(metricHealthProvider)) | ||
| .setInterval(intervalTimeValue.toDuration()) | ||
| .build(); | ||
| var systemMeterProvider = sdkMeterProvider(reader); | ||
| var otelSdk = OpenTelemetrySdk.builder().setMeterProvider(systemMeterProvider).build(); | ||
|
|
||
| // RuntimeTelemetry uses JMX (Java 8+) and JFR (Java 17+) to collect JVM metrics. See https://ela.st/otel-runtime-telemetry | ||
| var runtimeTelemetry = OTelSdkSettings.TELEMETRY_OTEL_METRICS_ENABLED.get(settings) ? RuntimeTelemetry.create(otelSdk) : null; | ||
| return new OTelMetricsResources(systemMeterProvider, metricHealthProvider, runtimeTelemetry); | ||
| } | ||
|
|
||
| private static SdkMeterProvider sdkMeterProvider(PeriodicMetricReader reader) { | ||
| return SdkMeterProvider.builder() | ||
| .setResource(Resource.builder().put("service.name", "elasticsearch").build()) | ||
| .registerMetricReader(reader) | ||
| .build(); | ||
| } | ||
|
|
||
| private OtlpHttpMetricExporter createOTLPExporter(MeterProvider healthExportMeterProvider) { | ||
| String endpoint = OTelSdkSettings.TELEMETRY_OTEL_METRICS_ENDPOINT.get(settings); | ||
| if (endpoint == null || endpoint.isEmpty()) { | ||
| throw new IllegalStateException( | ||
|
|
@@ -70,7 +85,9 @@ private OtlpHttpMetricExporter createOTLPExporter() { | |
| } | ||
| OtlpHttpMetricExporterBuilder builder = OtlpHttpMetricExporter.builder() | ||
| .setEndpoint(endpoint) | ||
| .setAggregationTemporalitySelector(AggregationTemporalitySelector.deltaPreferred()); | ||
| .setMeterProvider(() -> healthExportMeterProvider) | ||
| .setAggregationTemporalitySelector(AggregationTemporalitySelector.deltaPreferred()) | ||
| .setInternalTelemetryVersion(InternalTelemetryVersion.LATEST); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (This line here seems to be the real change in this commit; the rest is refactoring.) |
||
| String authHeader = getAuthorizationHeader(); | ||
| if (authHeader != null) { | ||
| builder.addHeader("Authorization", authHeader); | ||
|
|
@@ -95,24 +112,45 @@ private String getAuthorizationHeader() { | |
| @Override | ||
| public void attemptFlushMetrics() { | ||
| synchronized (mutex) { | ||
| if (meterProvider != null) { | ||
| // If the timeout expires, this quietly returns, which is ok in this context. | ||
| meterProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
| if (resources != null) { | ||
| resources.systemMeterProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
| resources.meterHealthMeterProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
| // PeriodicMetricReader records collection.duration after | ||
| // each collection, so a second cycle is required to collect and export it. | ||
| resources.systemMeterProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
| resources.meterHealthMeterProvider.forceFlush().join(10, TimeUnit.SECONDS); | ||
|
mamazzol marked this conversation as resolved.
|
||
| } | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public void close() { | ||
| synchronized (mutex) { | ||
| if (runtimeMetrics != null) { | ||
| runtimeMetrics.close(); | ||
| runtimeMetrics = null; | ||
| if (resources != null) { | ||
| resources.close(); | ||
| resources = null; | ||
| } | ||
| if (meterProvider != null) { | ||
| meterProvider.close(); | ||
| meterProvider = null; | ||
| } | ||
| } | ||
|
|
||
| private record OTelMetricsResources( | ||
| SdkMeterProvider systemMeterProvider, | ||
| SdkMeterProvider meterHealthMeterProvider, | ||
| RuntimeTelemetry runtimeTelemetry | ||
| ) implements AutoCloseable { | ||
|
mamazzol marked this conversation as resolved.
|
||
|
|
||
| OTelMetricsResources { | ||
| Objects.requireNonNull(systemMeterProvider, "systemMeterProvider"); | ||
| Objects.requireNonNull(meterHealthMeterProvider, "meterHealthMeterProvider"); | ||
| } | ||
|
|
||
| @Override | ||
| public void close() { | ||
| if (runtimeTelemetry != null) { | ||
| runtimeTelemetry.close(); | ||
| } | ||
| systemMeterProvider.close(); | ||
| meterHealthMeterProvider.close(); | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.