Conversation
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 4 new, 9 changed, 9 removedBuild ID: db0ec3998b801bf109974510 URL: https://www.apollographql.com/docs/deploy-preview/db0ec3998b801bf109974510 |
This comment has been minimized.
This comment has been minimized.
cddaf34 to
6bf8c79
Compare
added 7 commits
October 20, 2025 11:59
…Value This refactors the increment handling in the telemetry system to use opentelemetry::Value (F64) instead of i64. This provides better type compatibility with the OpenTelemetry specification and allows for more flexible metric value handling. Technical changes: - Updated Increment enum variants to use Option<opentelemetry::Value> - Modified value conversion functions to work with opentelemetry::Value - Updated all increment operations in counters and histograms
Adds a new metric to track router processing overhead, which measures the time spent in the router that is not waiting for subgraph requests. This metric provides insight into the router's own processing time and helps identify performance bottlenecks within the router itself. Technical changes: - Added router_overhead instrument configuration - Integrated RouterOverheadAttributes for metric attributes - Wired up metric collection in router request/response lifecycle
6bf8c79 to
a69f46d
Compare
d7eb2a6 to
82af50e
Compare
bnjjj
reviewed
Oct 20, 2025
| .f64_histogram(ROUTER_OVERHEAD_METRIC) | ||
| .with_unit("s") | ||
| .with_description( | ||
| "Router processing overhead (time not spent waiting for subgraphs).", |
Contributor
There was a problem hiding this comment.
Should we document somewhere that if they have coprocessor it's included in this metric ?
Contributor
|
BTW if this PR lands #8389 we'll be also able to enable this layer for coprocessors I think |
6aa1dab to
1b7b555
Compare
bnjjj
approved these changes
Oct 21, 2025
Merged
This was referenced Jan 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Router overhead metric
The
apollo.router.overheadhistogram provides a direct measurement of router processing overhead. This metric tracks the time the router spends on tasks other than waiting for subgraph requests—including GraphQL parsing, validation, query planning, response composition, and plugin execution.The overhead calculation excludes time spent waiting for subgraph responses, giving you visibility into the router's actual processing time versus subgraph latency. This metric helps identify when the router itself is a bottleneck versus when delays are caused by downstream services.
Important considerations for router overhead metrics:
Version variability: Router overhead may vary between router versions. For example, a correctness fix or security improvement may result in higher overhead. Always compare overhead measurements within the same router version.
Configuration requirements: For meaningful overhead measurements, configure operation limits and traffic shaping. Without these controls, unbounded request complexity or traffic spikes can skew overhead measurements.
CPU saturation: High overhead values often indicate CPU saturation. When the router's CPU resources are exhausted, processing time increases significantly. Monitor CPU utilization alongside overhead metrics to identify resource constraints.
Available attributes:
subgraph.active_requests: A boolean indicating whether any subgraph requests were active at the time the overhead was calculated. This attribute is critical for filtering meaningful overhead measurements.When to filter out
subgraph.active_requests: true: For operations that stream results (such as queries with@defer), the overhead metric becomes less meaningful when subgraph requests are still active, since the router is in a waiting state rather than actively processing. When analyzing overhead to identify router processing bottlenecks, exclude measurements wheresubgraph.active_requests: trueto focus only on pure router processing time without subgraph wait time interference.Configuration example:
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩