Provide unit conversion for common non-second duration instruments (TSH-20621) by theJC · Pull Request #8415 · apollographql/router

theJC · 2025-10-14T04:48:17Z

Customers of Apollo integrate their OTLP streams with various different observability platform providers that have varying levels of sophistication on how they ingest incoming data.

We have a migration blocking use case where we require the ability to have a metric stream be sent in units of milliseconds. If this was a brand new metric we wouldnt need to, but:

The custom metric has existed for years on our Apollo Gateway based federated graph, emanating this metric as milliseconds and is defined as milliseconds in Datadog
Our internal customers have significantly leveraged this metric over the years and there are 2,322 different datadog artifacts (dashboards, SLOs, monitors, etc) that use this metric, is referred 4,184 different times. We do not have the capacity of transitioning customers to a new metric that uses different units at this time, it would have to be a gradual process over time, after completion of our Router migration
We need both Router and Gateway to be producing this metric as we migrate the remainder of traffic off our Gateway solution so that those using this metric to monitor the performance and availability of subgraphs have continuity throughout the migration of clients from the Gateway soluction to the Router solution.
When attempting to send this metric via Router, it was observed that the values emanated from Router are 1000 times lower... the Datadog ingestion pipeline will not do unit conversion of incoming data to match how the metric is defined in the metric metadata.
Therefore we need Router the ability for Router to emanate this metric in ms units. I am fine with verbiage in the documentation that one should strive to use second units for durations, and only use non-seconds when they uncover the reality of integrating with various OTLP ingesting systems requires some flexibility, especially for migrations where your customers may have heavily invested in a particular metric in their previous incarnation of a federated graph with Gateway.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

bnjjj

Could you provide a proper description of the PR please to make sure I understand the end goal. Also I'm not against this change but I can see 1 main issue. First one is I think the lack of consistency (we probably already have) on the way we measure time here. I think it might be good to be consistent and always use seconds everywhere except if it's for really long duration potentially.

theJC · 2025-10-14T14:09:56Z

@bnjjj -- Updated the description, apologies, I ran out of steam before crashing last night ;)

In a clean room implementation of a brand new supergraph, I completely agree with you on wanting to be consistent with time units when possible. However for migration cases where there already exists a metric from the Router predecessor in which the migration plan requires continuity of the emanation of the same metric from Router OR case where OTLP integration platform has limitations, I believe Router's customers require the ability for a small amount of flexibility on the units.

Also, in case this helps, ref: TSH-20621

theJC · 2025-10-14T16:00:11Z

System testing this change leveraging otel-collector docker image:

Sent a request in and had the collector log out the custom metric (using ms) and the http.client.request.duration (using seconds). Additional attributes on the metric removed here to keep these snippets terse:

Metric #0
Descriptor:
     -> Name: http.client.request.duration
     -> Description: Duration of HTTP client requests.
     -> Unit: s
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
Data point attributes:
     -> clientname: Str(client-name)
     -> graphql.operation.name: Str(thisIsATestOperation)
     -> http.response.status_code: Int(200)
     -> subgraph.name: Str(graphql-diagnostics-api)
StartTimestamp: 2025-10-14 15:33:42.996522 +0000 UTC
Timestamp: 2025-10-14 15:57:08.024024 +0000 UTC
Count: 1
Sum: 0.122261
Min: 0.122261
Max: 0.122261

Metric #1
Descriptor:
     -> Name: custom.metric.call.time
     -> Description: Call time for a subgraph service and process results
     -> Unit: ms
     -> DataType: Histogram
     -> AggregationTemporality: Delta
HistogramDataPoints #0
Data point attributes:
     -> clientname: Str(client-name)
     -> graphql.operation.name: Str(thisIsATestOperation)
     -> subgraph.name: Str(graphql-diagnostics-api)
StartTimestamp: 2025-10-14 15:33:42.996545 +0000 UTC
Timestamp: 2025-10-14 15:57:08.024048 +0000 UTC
Count: 1
Sum: 122.359542
Min: 122.359542
Max: 122.359542

bnjjj · 2025-10-15T08:58:09Z

.changesets/feat_convertNonSecondTimeUnits.md

+telemetry:
+  instrumentation:
+    instruments:
+      router:
+        http.server.request.duration:
+          unit: "ms"  # Values are now automatically converted to milliseconds


This configuration is probably wrong. Because it's not a custom metric it's a built-in/otel metric. So it would work for a custom one but not for built-in/otel ones.

Good catch! Updated to an example that does work and is representative of actual intended use case.

I think it's still invalid. Let me give you a correct one

Thanks for the suggestion, I've used it

.changesets/feat_convertNonSecondTimeUnits.md

BrynCooke · 2025-10-15T09:27:41Z

apollo-router/src/plugins/telemetry/config_new/instruments.rs

+/// Defaults to seconds for any other unit string.
+fn duration_to_f64(duration: std::time::Duration, unit: &str) -> f64 {
+    match unit {
+        "ms" => duration.as_secs_f64() * 1000.0,


why not as_millis() as f64 for consistency with the other units?

Good 👀. I'm using as_secs_f64() * 1000 here because:

duration.as_millis() returns a u128 integer, which truncates any fractional milliseconds.

as_millis_f64() exists but it's currently unstable (Tracking Issue for Duration::as_millis_{f64,f32} rust-lang/rust#122451)

…-generated suggestions)

bnjjj · 2025-10-15T15:14:49Z

.changesets/feat_convertNonSecondTimeUnits.md

+telemetry:
+  instrumentation:
+    instruments:
+      router:
+        http.server.request.duration:
+          unit: "ms"  # Values are now automatically converted to milliseconds


I think it's still invalid. Let me give you a correct one

.changesets/feat_convertNonSecondTimeUnits.md

Co-authored-by: Coenen Benjamin <benjamin.coenen@hotmail.com>

bnjjj · 2025-10-15T15:37:09Z

@Mergifyio copy dev
@theJC Thanks so much for opening this! Now that it looks like approval is on the horizon, we're going to move this PR over to a direct branch on the repository so the full CI run can happen, including having access to our GITHUB_TOKEN which allows us to go over the GitHub anonymous download rate-limits which aren't currently being permitted on your PR.
You will briefly see a new PR show up in the metadata here, and it will preserve your contribution credit!

mergify · 2025-10-15T15:37:17Z

copy dev

✅ Pull request copies have been created

Details

#8423 Provide unit conversion for common non-second duration instruments (TSH-20621) (copy #8415) has been created for branch dev

…SH-20621) (copy #8415) (#8423) Signed-off-by: Benjamin <5719034+bnjjj@users.noreply.github.com> Co-authored-by: Jon Christiansen <467023+theJC@users.noreply.github.com>

theJC requested a review from a team October 14, 2025 04:48

bnjjj reviewed Oct 14, 2025

View reviewed changes

theJC changed the title ~~Provide unit conversion for common non-second duration instruments~~ Provide unit conversion for common non-second duration instruments (TSH-20621) Oct 14, 2025

theJC force-pushed the convertNonSecondTimeUnits branch from 7515b9d to 0a3f1cc Compare October 14, 2025 16:11

theJC requested a review from a team as a code owner October 14, 2025 17:47

bnjjj suggested changes Oct 15, 2025

View reviewed changes

BrynCooke reviewed Oct 15, 2025

View reviewed changes

theJC added 7 commits October 15, 2025 10:04

Provide unit conversion for common non-second duration instruments

4bb451f

Add changeset

70e5df5

Update documentation

093de82

Update doc to incorporate suggested changes from Apollo librarian (AI…

6486219

…-generated suggestions)

Apply lint formatting fixes

0186eb9

Update doc to incorporate suggested changes from Apollo librarian (AI…

3e997f1

…-generated suggestions)

Update changeset content per MR feedback

2a688cc

theJC force-pushed the convertNonSecondTimeUnits branch from 4e47c08 to 2a688cc Compare October 15, 2025 15:04

theJC requested review from BrynCooke and bnjjj October 15, 2025 15:07

bnjjj suggested changes Oct 15, 2025

View reviewed changes

Update .changesets/feat_convertNonSecondTimeUnits.md

e1951ce

Co-authored-by: Coenen Benjamin <benjamin.coenen@hotmail.com>

bnjjj approved these changes Oct 15, 2025

View reviewed changes

mergify bot mentioned this pull request Oct 15, 2025

Provide unit conversion for common non-second duration instruments (TSH-20621) (copy #8415) #8423

Merged

10 tasks

bnjjj closed this Oct 15, 2025

abernix mentioned this pull request Oct 27, 2025

prep release: v2.8.0 #8495

Merged

theJC deleted the convertNonSecondTimeUnits branch December 8, 2025 18:13

Conversation

theJC commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

bnjjj left a comment

Choose a reason for hiding this comment

Uh oh!

theJC commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theJC commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bnjjj Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

theJC Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnjjj Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

theJC Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BrynCooke Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

theJC Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bnjjj Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bnjjj commented Oct 15, 2025

Uh oh!

mergify bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Pull request copies have been created

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

theJC commented Oct 14, 2025 •

edited

Loading

theJC commented Oct 14, 2025 •

edited

Loading

theJC commented Oct 14, 2025 •

edited

Loading

theJC Oct 15, 2025 •

edited

Loading

theJC Oct 15, 2025 •

edited

Loading

mergify bot commented Oct 15, 2025 •

edited

Loading