Skip to content

chore: Update to OpenTelemetry 0.31.0#8922

Merged
rohan-b99 merged 113 commits intodevfrom
bryn/otel-0.31-migration
Mar 13, 2026
Merged

chore: Update to OpenTelemetry 0.31.0#8922
rohan-b99 merged 113 commits intodevfrom
bryn/otel-0.31-migration

Conversation

@BrynCooke
Copy link
Copy Markdown
Contributor

@BrynCooke BrynCooke commented Feb 26, 2026

Updates the Router to Otel 0.31.0, the latest at the time of writing.

This is mostly just updating to deal with changed APIs from upstream, however there are a couple of areas that are not compatible and required code changes.

  1. Observable gauge. Previously is was possible to register and deregister callbacks, however this is no longer possible. To prevent breaking of existing code I have introduced our own registry of callbacks which the current gauges will delegate to, however observable gauge should be avoided in future in favour of using regular sync gauges.
  2. Zipkin exporter has a regression upstream and no longer populates service name. We cannot work around this, but given that zipkin does support OTLP we should just tell people to migrate and eventually remove the zipkin exporter.
  3. Datadog we are migrating to use upstream code, so we will lose priority sampling of -1 and 2 (ALWAYS DROP/ALWAYS SAMPLE)

Closes #7794
Closes #8368


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

bryn added 30 commits February 19, 2026 19:40
Add comprehensive migration plan document covering:
- Dependency updates (OTel 0.31, datadog 0.19)
- Internal datadog exporter removal
- API changes (Resource, Key/KeyValue, instrument builders)
- SpanExporter trait lifetime changes with Arc<Mutex> pattern
- MetricExporter temporality configuration
- Observable gauge lifecycle management
- 18 phases with verified answers to all open questions
Update all OpenTelemetry crates to their latest compatible versions:
- opentelemetry/sdk/otlp/zipkin/prometheus/http/semantic-conventions: 0.31
- opentelemetry-aws: 0.19
- opentelemetry-datadog: 0.19
- tracing-opentelemetry: 0.32 (compatible with OTel 0.31)

This is Phase 1 of the OTel 0.31 migration.
Replace the forked internal datadog exporter with the external
opentelemetry-datadog crate (version 0.19).

Key changes:
- Delete tracing/datadog_exporter/ directory with all internal exporter code
- Create tracing/datadog/propagator.rs preserving our custom propagator with
  full SamplingPriority support (UserReject, AutoReject, AutoKeep, UserKeep)
  that the external crate doesn't provide
- Update DatadogExporter usage to opentelemetry_datadog::DatadogExporter
- Update imports throughout to use the new locations

The external crate's DatadogTraceState::with_priority_sampling only takes
a bool, so we keep our custom propagator implementation for full sampling
priority control needed by the DatadogAgentSampling.
Remove NamedTokioRuntime and named_runtime_channel module which are no
longer needed with OpenTelemetry SDK 0.31.

The OTel 0.31 BatchSpanProcessor::builder() no longer takes a runtime
parameter - it uses tokio internally. This simplifies all trace exporter
configurations (Apollo, Datadog, OTLP, Zipkin).
OpenTelemetry SDK 0.31 replaces the Resource constructor API with a
builder pattern:
- Resource::new(attrs) -> Resource::builder_empty().with_attributes(attrs).build()
- Resource::from_detectors(...) -> Resource::builder_empty().with_detectors(...).build()
- Resource::empty() -> Resource::builder_empty().build()
OpenTelemetry 0.31 removes the convenience methods Key::string(),
Key::array(), etc. Update all usages to use KeyValue::new() with
explicit Value types where needed.
OpenTelemetry SDK 0.31 renames the instrument builder finalization
method from .init() to .build() for consistency with other builder
patterns in the SDK.
OpenTelemetry SDK 0.31 adds a new required field to SpanData:
parent_span_is_remote. Set to false since we construct SpanData
internally from LightSpanData, not from actual OTel spans with
remote parent detection.
Update SpanExporter implementations for OTel 0.31 API changes:
- export(&mut self, ...) -> export(&self, ...)
- shutdown(&mut self) -> shutdown(&self) returning ExportResult
- BoxFuture -> impl Future return type

Updated exporters:
- NamedSpanExporter (error_handler.rs)
- ExporterWrapper (datadog/mod.rs)
- FailingSpanExporter (test mock)

Note: apollo_telemetry::Exporter needs Arc<Mutex> refactoring for
interior mutability which is a larger change.
- Enable semconv_experimental feature for semantic conventions
- Rename TracerProvider to SdkTracerProvider
- Rename Builder to TracerProviderBuilder in tracing reload
- Rewrite error_handler.rs for OTel 0.31 compatibility:
  - Remove global error handler (set_error_handler no longer exists)
  - Remove MetricsError usage (no longer exists in OTel 0.31)
  - Rename NamedMetricsExporter to NamedMetricExporter
  - Update SpanExporter and PushMetricExporter implementations
- Rename MetricsExporterBuilder to MetricExporterBuilder
- Fix SpanExporter trait implementation signatures

Remaining work: metrics aggregation module requires significant
refactoring due to InstrumentProvider trait changes in OTel 0.31.
The opentelemetry_otlp 0.31 removed new_exporter() function in favor of
direct builder patterns:
- TonicExporterBuilder::default() for gRPC transport
- HttpExporterBuilder::default() for HTTP transport
- SpanExporterBuilder::new().with_tonic() for span exporters
- MetricExporterBuilder::new().with_tonic() for metric exporters

Also updated build_metrics_exporter() to use with_temporality().build()
as the aggregation/temporality selector API has been simplified.
The opentelemetry_sdk 0.31 simplified the ResourceDetector trait by
removing the timeout parameter from detect(). Updated all three
implementations (StaticResourceDetector, EnvServiceNameDetector,
ConfigResourceDetector) to use the new signature.
The opentelemetry-datadog crate was only in dev-dependencies but is
used in the main source code. Moved it to regular dependencies and
removed the temporary comment noting it was disabled.

Version 0.19 is compatible with opentelemetry 0.31.
The TemporalitySelector trait was removed in OpenTelemetry SDK 0.31.
Temporality is now set directly on metric exporters via with_temporality().

Removed:
- CustomTemporalitySelector struct and TemporalitySelector impl
- Related temporality override tests
- Import of CustomTemporalitySelector in metrics/apollo/mod.rs

Added simple From<&Temporality> conversion for direct temporality setting.
In OTel SDK 0.31, the `new_view(Instrument, Stream)` function was removed
and replaced with closure-based views: `Fn(&Instrument) -> Option<Stream>`.

Changes:
- Update MetricsBuilder::with_view() to accept closure instead of Box<dyn View>
- Replace MetricView's TryInto<Box<dyn View>> with into_view_fn() method
- Update allocation views to use closure pattern
- Use Stream::builder() API instead of Stream::new() builder pattern
The global shutdown_tracer_provider() function was removed in OTel 0.31.
Instead, set a new default tracer provider which causes the old one to
be returned and dropped, triggering its shutdown.
The new_pipeline() function was removed in opentelemetry-zipkin 0.31.
Use ZipkinExporter::builder() with with_collector_endpoint() instead.

Service name is now handled via the Resource on the TracerProvider
rather than being set directly on the exporter.
OTel 0.31 changed SpanExporter trait methods from &mut self to &self:
- export(&mut self, ...) -> export(&self, ...)
- shutdown(&mut self) -> shutdown(&self) -> OTelSdkResult
- set_resource(&mut self, ...) -> set_resource(&self, ...)

Introduce ExporterInner struct wrapped in Mutex to provide interior
mutability while keeping all existing method implementations intact.
The outer Exporter delegates to inner.lock().*_impl() methods.

Also updates ApolloOtlpExporter to use &self and return OTelSdkResult,
and replaces TraceError/ExportResult with OTelSdkError/OTelSdkResult.
In OTel SDK 0.31, the tracer provider struct was renamed from
TracerProvider to SdkTracerProvider. The builder type remains
TracerProviderBuilder.

Updated all usages across:
- src/tracer.rs
- src/plugins/telemetry/reload/activation.rs
- src/plugins/telemetry/reload/otel.rs
- src/plugins/telemetry/reload/tracing.rs
- src/plugins/telemetry/otel/tracer.rs
- tests/common.rs
Temporality moved from opentelemetry_sdk::metrics::data::Temporality
to opentelemetry_sdk::metrics::Temporality in OTel SDK 0.31.
Update the migration plan with detailed findings from implementation
attempt:

- Add current status section tracking completed commits
- Add Phase 10A for tonic 0.14.5 upgrade (required first)
- Update Phase 10B/11 with exact SpanExporter/SpanProcessor signatures
- Rewrite Phase 18 with observable instrument API changes discovered:
  - ObservableCounter::new() takes 0 args in 0.31
  - with_inner() is pub(crate) only
  - observe() removed from observable types
  - Solution: store extra observables in keep_alive collection
- Add Phase 19 for trace Config API (builder methods removed)
- Add Phase 20/21 for remaining fixes and test updates
- Add summary of remaining commits in order
opentelemetry-otlp 0.31 depends on tonic 0.14.5. Update direct
dependency to match and avoid version conflicts.

Feature names changed in tonic 0.14:
- tls → tls-ring
- tls-roots → tls-native-roots

Also update tonic-build to 0.14.5 for compatibility.
SpanProcessor trait changes in OTel 0.31:
- force_flush() now returns OTelSdkResult instead of TraceResult<()>
- shutdown() replaced by shutdown_with_timeout(Duration)
- Import path: opentelemetry_sdk::error::OTelSdkResult

Updated implementations:
- ApolloFilterSpanProcessor in tracing/mod.rs
- DatadogSpanProcessor in tracing/datadog/span_processor.rs
- MockSpanProcessor in test code
SpanExporter trait changes in OTel 0.31:
- export() returns impl Future instead of BoxFuture (remove #[async_trait])
- shutdown(&self) → shutdown(&mut self)
- set_resource(&self) → set_resource(&mut self)
- force_flush(&mut self) required (has default impl)

Updated implementations:
- Exporter in apollo_telemetry.rs
- ApolloOtlpExporter::shutdown in apollo_otlp_exporter.rs
- Call site in shutdown_impl now uses &mut self.otlp_exporter
In tonic 0.14, the prost-related build functionality was moved from
tonic-build to a separate tonic-prost-build crate. This includes the
configure() function used for protobuf compilation.

Changes:
- Add tonic-prost-build 0.14.0 as a build dependency
- Update studio.rs to use tonic_prost_build::configure()
- Fix compile_protos() call to pass PathBuf directly (API change)
ExportError was moved from opentelemetry to opentelemetry_sdk crate.
Aggregation enum moved from opentelemetry_sdk::metrics to
opentelemetry_sdk::metrics::aggregation module.
Major API changes in the metrics module:
- MeterProvider::versioned_meter replaced with meter_with_scope
- SyncCounter/SyncHistogram/SyncGauge/SyncUpDownCounter traits replaced
  with unified SyncInstrument trait using measure() method
- AsyncInstrument::as_any() method removed
- InstrumentProvider methods now take builder types directly instead of
  individual parameters
- Observable instrument callbacks are now set via builder, not
  register_callback method
- opentelemetry::metrics::Result type removed

Updated AggregateInstrumentProvider macros to use new InstrumentBuilder,
HistogramBuilder, and AsyncInstrumentBuilder parameter types.
The AggregationSelector trait was removed in OpenTelemetry SDK 0.31.
Histogram bucket boundaries are now configured using the Views API
on the MeterProvider instead of passing an aggregation selector to
exporters.

Changes:
- Remove CustomAggregationSelector from metrics/mod.rs
- Update OTLP exporter to use MetricExporter::builder() pattern
- Update Prometheus exporter to remove with_aggregation_selector()
- Add histogram bucket boundary views to both exporters
- Update Apollo metrics to use MetricExporter::builder()
- Add spec_unstable_metrics_views feature for Aggregation type access
- Fix Aggregation import path (now opentelemetry_sdk::metrics::Aggregation)
OpenTelemetry 0.31 introduced significant changes to the metrics API:

MeterProvider changes:
- `versioned_meter()` replaced by `meter_with_scope(InstrumentationScope)`
- `GlobalMeterProvider` removed, use `Arc<dyn MeterProvider + Send + Sync>`
- Added `public_dynamic()` for dynamic meter providers

InstrumentProvider changes:
- Methods now take builder types (InstrumentBuilder, HistogramBuilder,
  AsyncInstrumentBuilder) instead of individual parameters
- `register_callback()` and related types (Observer, CallbackRegistration)
  removed

Observable instrument changes:
- Observable instruments are now marker types without `observe()` method
- Observations happen through callbacks registered at build time
- Aggregate observables now leak delegate storage to keep registrations alive

Other changes:
- Remove duplicate Eq/Hash derives from prost (now included by default)
- AsyncInstrument trait now requires `T: Send + Sync` bounds
- StreamBuilder::with_allowed_attribute_keys() now takes impl IntoIterator
- Update test code to use new APIs
@BrynCooke BrynCooke requested a review from goto-bus-stop March 11, 2026 10:03
@goto-bus-stop
Copy link
Copy Markdown
Member

i like the changeset, has all the important information and only the important information 👍

Copy link
Copy Markdown
Member

@goto-bus-stop goto-bus-stop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much went thru everything and this is all my comments!

Comment thread apollo-router/src/plugins/telemetry/config.rs Outdated
Comment thread .changesets/maint_bryn_otel_0_31_migration.md Outdated
Comment thread apollo-router/Cargo.toml Outdated
Comment thread apollo-router/Cargo.toml
# This means including the rmp library
# opentelemetry-datadog = { version = "0.12.0", features = ["reqwest-client"] }
opentelemetry-aws = "0.19"
rmp = "0.8"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we retain the TEMP DATADOG comments around this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we don't have the dependency at all anymore, so maybe not?

Comment thread apollo-router/tests/integration/telemetry/otlp/tracing.rs
// manually filter salsa logs because some of them run at the INFO level https://github.com/salsa-rs/salsa/issues/425
let log_level = format!("{log_level},salsa=error");
// filter opentelemetry internal logs to warn level (OTel 0.31 emits INFO logs for provider setup)
let log_level = format!("{log_level},salsa=error,opentelemetry=warn");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still possible for users to explicitly set the opentelemetry log level w/ RUST_LOG?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, We're effectively saying that it will always be warn.

Comment thread apollo-router/src/plugins/telemetry/tracing/apollo.rs Outdated
Comment thread apollo-router/src/cache/metrics.rs
@goto-bus-stop goto-bus-stop changed the title Bryn/otel 0.31 migration feat: Update to OpenTelemetry 0.31.0 Mar 11, 2026
@goto-bus-stop goto-bus-stop changed the title feat: Update to OpenTelemetry 0.31.0 maint: Update to OpenTelemetry 0.31.0 Mar 11, 2026
@goto-bus-stop goto-bus-stop changed the title maint: Update to OpenTelemetry 0.31.0 chore: Update to OpenTelemetry 0.31.0 Mar 11, 2026
@goto-bus-stop
Copy link
Copy Markdown
Member

goto-bus-stop commented Mar 13, 2026

I guess both the reintroduced tests are a bit flaky...

e; oh, that one is just bc of the name, thanks the fix Rohan 😁

@rohan-b99 rohan-b99 enabled auto-merge (squash) March 13, 2026 11:29
@rohan-b99 rohan-b99 disabled auto-merge March 13, 2026 11:30
@rohan-b99 rohan-b99 merged commit 56a2fcc into dev Mar 13, 2026
10 of 11 checks passed
@rohan-b99 rohan-b99 deleted the bryn/otel-0.31-migration branch March 13, 2026 11:31
smyrick pushed a commit that referenced this pull request Mar 17, 2026
Co-authored-by: bryn <bryn@apollographql.com>
Co-authored-by: Renée <renee.kooi@apollographql.com>
Co-authored-by: rohan-b99 <43239788+rohan-b99@users.noreply.github.com>
smyrick pushed a commit that referenced this pull request Mar 20, 2026
Co-authored-by: bryn <bryn@apollographql.com>
Co-authored-by: Renée <renee.kooi@apollographql.com>
Co-authored-by: rohan-b99 <43239788+rohan-b99@users.noreply.github.com>
@abernix abernix mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants