Skip to content

FEATURE: Add the ability to rename metrics (TSH-20610)#8412

Closed
theJC wants to merge 8 commits intoapollographql:devfrom
theJC:metricRenameSupport
Closed

FEATURE: Add the ability to rename metrics (TSH-20610)#8412
theJC wants to merge 8 commits intoapollographql:devfrom
theJC:metricRenameSupport

Conversation

@theJC
Copy link
Contributor

@theJC theJC commented Oct 13, 2025

The need

Many Apollo customers use observability platforms (like Datadog). While OTLP semantic naming is idealistically useful for consistent metrics naming across systems, it does create practical cost management issues. Datadog, for instance, only allows tag indexing to be enabled/disabled by metric name, not by the service emitting the metric. This means customers who wish to have the ability to control the costs for tag ingestion specific by service require the need to rename Router-emitted metrics to isolate them to be able to have the necessary cost insights and controls per emanating service.

For example, for 'http.server.request.duration', the tags needed for Router monitoring can significantly differ from those for other services emitting the same metric. For one service the cardinality of a given tag may be tolerable to budget for, whereas the cardinality of the same tag by another service may not be.

Our previous federated GraphQL solution, Apollo Gateway, we prefixed all metrics with the service name, ensuring uniqueness and provided us direct control over the cost incurred by controlling the indexing on its tags isolated from other services.

The implementation

The logic added to production code is quite minimal; the bulk of this PR involves testing to ensure it works as intended and only affects customer metrics, not Apollo Studio metrics. This change uses the Rust opentelemetry crate's ability to specify instrument names via views, exposing this through a 'rename' directive in the Router config YAML.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@theJC theJC requested a review from a team October 13, 2025 15:16
@theJC theJC requested review from a team as code owners October 13, 2025 15:16
@theJC theJC changed the title FEATURE: Add the ability to rename metrics FEATURE: Add the ability to rename metrics (TSH-20610) Oct 13, 2025
@theJC theJC force-pushed the metricRenameSupport branch 2 times, most recently from e568b4b to bba0c7b Compare October 13, 2025 21:38
Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just needs to apply some changes in docs


<Note>

**Important naming considerations:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theJC It doesn't appear in the PR (I don't know why) but I think it's good if you can follow the suggestions here made by our AI tool

Copy link
Contributor Author

@theJC theJC Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

- Dots (`.`) are converted to underscores (`_`)
- Unit suffixes are added (e.g., `_seconds`, `_bytes`)
- Example: `apollo.router.http.duration` becomes `apollo_router_http_duration_seconds`
- **Apollo Studio metrics** are not affected by renaming—they continue to use original metric names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@@ -165,7 +220,7 @@ telemetry:
| `service_name` | `unknown_service:router` | The OpenTelemetry service name. |
| `service_namespace` | | The OpenTelemetry namespace. |
| `resource` | | The OpenTelemetry resource to attach to metrics. |
| `views` | | Override default buckets or configuration for metrics (including dropping the metric itself) |
| `views` | | Customize metrics using OpenTelemetry views: rename metrics, override default buckets or attributes, or drop metrics entirely. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Copy link
Contributor Author

@theJC theJC Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This AI assistant is killing me, every time I correct something, the next run comes back with completely different suggestions. Hopefully this is good enough.

Actually in the last run, its suggestions are exactly the text I already have in there.... seems like that bot is having some sort of issue.

@theJC theJC force-pushed the metricRenameSupport branch from fba3994 to c1b4e23 Compare October 14, 2025 17:31
@theJC theJC force-pushed the metricRenameSupport branch from ff67902 to 033b358 Compare October 15, 2025 15:03
@bnjjj
Copy link
Contributor

bnjjj commented Oct 15, 2025

@Mergifyio copy dev
@theJC Thanks so much for opening this! Now that it looks like approval is on the horizon, we're going to move this PR over to a direct branch on the repository so the full CI run can happen, including having access to our GITHUB_TOKEN which allows us to go over the GitHub anonymous download rate-limits which aren't currently being permitted on your PR.
You will briefly see a new PR show up in the metadata here, and it will preserve your contribution credit!

@mergify
Copy link
Contributor

mergify bot commented Oct 15, 2025

copy dev

✅ Pull request copies have been created

Details

@bnjjj bnjjj closed this Oct 15, 2025
bnjjj pushed a commit that referenced this pull request Oct 16, 2025
…8424)

Signed-off-by: Benjamin <5719034+bnjjj@users.noreply.github.com>
Co-authored-by: Jon Christiansen <467023+theJC@users.noreply.github.com>
@abernix abernix mentioned this pull request Oct 27, 2025
@theJC theJC deleted the metricRenameSupport branch December 8, 2025 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants