FEATURE: Add the ability to rename metrics (TSH-20610)#8412
FEATURE: Add the ability to rename metrics (TSH-20610)#8412theJC wants to merge 8 commits intoapollographql:devfrom
Conversation
e568b4b to
bba0c7b
Compare
bnjjj
left a comment
There was a problem hiding this comment.
LGTM, just needs to apply some changes in docs
|
|
||
| <Note> | ||
|
|
||
| **Important naming considerations:** |
There was a problem hiding this comment.
@theJC It doesn't appear in the PR (I don't know why) but I think it's good if you can follow the suggestions here made by our AI tool
| - Dots (`.`) are converted to underscores (`_`) | ||
| - Unit suffixes are added (e.g., `_seconds`, `_bytes`) | ||
| - Example: `apollo.router.http.duration` becomes `apollo_router_http_duration_seconds` | ||
| - **Apollo Studio metrics** are not affected by renaming—they continue to use original metric names |
| @@ -165,7 +220,7 @@ telemetry: | |||
| | `service_name` | `unknown_service:router` | The OpenTelemetry service name. | | |||
| | `service_namespace` | | The OpenTelemetry namespace. | | |||
| | `resource` | | The OpenTelemetry resource to attach to metrics. | | |||
| | `views` | | Override default buckets or configuration for metrics (including dropping the metric itself) | | |||
| | `views` | | Customize metrics using OpenTelemetry views: rename metrics, override default buckets or attributes, or drop metrics entirely. | | |||
There was a problem hiding this comment.
This AI assistant is killing me, every time I correct something, the next run comes back with completely different suggestions. Hopefully this is good enough.
Actually in the last run, its suggestions are exactly the text I already have in there.... seems like that bot is having some sort of issue.
fba3994 to
c1b4e23
Compare
…-generated suggestions)
…-generated suggestions)
…-generated suggestions)
…-generated suggestions)
ff67902 to
033b358
Compare
|
@Mergifyio copy dev |
✅ Pull request copies have been createdDetails
|
The need
Many Apollo customers use observability platforms (like Datadog). While OTLP semantic naming is idealistically useful for consistent metrics naming across systems, it does create practical cost management issues. Datadog, for instance, only allows tag indexing to be enabled/disabled by metric name, not by the service emitting the metric. This means customers who wish to have the ability to control the costs for tag ingestion specific by service require the need to rename Router-emitted metrics to isolate them to be able to have the necessary cost insights and controls per emanating service.
For example, for 'http.server.request.duration', the tags needed for Router monitoring can significantly differ from those for other services emitting the same metric. For one service the cardinality of a given tag may be tolerable to budget for, whereas the cardinality of the same tag by another service may not be.
Our previous federated GraphQL solution, Apollo Gateway, we prefixed all metrics with the service name, ensuring uniqueness and provided us direct control over the cost incurred by controlling the indexing on its tags isolated from other services.
The implementation
The logic added to production code is quite minimal; the bulk of this PR involves testing to ensure it works as intended and only affects customer metrics, not Apollo Studio metrics. This change uses the Rust opentelemetry crate's ability to specify instrument names via views, exposing this through a 'rename' directive in the Router config YAML.
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩