feat: memory tracking metrics by rohan-b99 · Pull Request #8717 · apollographql/router

rohan-b99 · 2025-12-04T17:01:42Z

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

The majority of this work is from #8525, this PR includes some extra tests and places where memory tracking has been added.

Performance testing with vegeta shows the changes have negligible impact:

% cat dev/perf.59242.vegeta | vegeta report                                  
Requests      [total, rate, throughput]         2500, 500.20, 496.03
Duration      [total, attack, wait]             5.04s, 4.998s, 41.965ms
Latencies     [min, mean, 50, 90, 95, 99, max]  3.481ms, 12.243ms, 6.396ms, 19.94ms, 29.15ms, 135.015ms, 311.06ms
Bytes In      [total, mean]                     6248410, 2499.36
Bytes Out     [total, mean]                     14312750, 5725.10
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:2500

% cat memory-tracking-metrics/perf.64046.vegeta | vegeta report 
Requests      [total, rate, throughput]         2500, 500.14, 497.60
Duration      [total, attack, wait]             5.024s, 4.999s, 25.518ms
Latencies     [min, mean, 50, 90, 95, 99, max]  3.368ms, 11.976ms, 5.898ms, 20.586ms, 29.021ms, 122.014ms, 314.001ms
Bytes In      [total, mean]                     10612414, 4244.97
Bytes Out     [total, mean]                     14312750, 5725.10
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:2500

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

This adds allocation tracking utilities so that we can get a handle on how much a single request is allocating.

Add allocation metrics for the query planner and requests

apollo-librarian · 2025-12-04T17:01:58Z

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 1 changed, 0 removed

* graphos/routing/(latest)/observability/router-telemetry-otel/enabling-telemetry/standard-instruments.mdx

Build ID: eeafb8119b1eea6f90cf00bd
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/eeafb8119b1eea6f90cf00bd

…aphql/router into memory-tracking-metrics

aaronArinder

super excited for this

aaronArinder · 2026-01-09T17:07:39Z

apollo-router/src/plugins/telemetry/metrics/allocation/mod.rs

+            // Verify metrics were recorded
+            // Note: We can't easily assert on histogram values, but the test verifies
+            // the layer compiles and runs without errors


is it hard to assert because the values might be wildly different or for some other reason? it'd be great to have proof that the metrics emitted are what folks think they'll be (ie, scoped to the request, to query planning, etc)

Thanks for catching this - I've added some assert_histogram_sum! calls here instead

aaronArinder · 2026-01-09T17:09:28Z

apollo-router/src/allocator.rs

+// Thread-local to track the current task's allocation stats.
+//
+// ## Why Cell<Option<NonNull<T>>> instead of Cell<Option<Arc<T>>> or Mutex<Option<Arc<T>>>?
+//
+// We use a NonNull pointer instead of Arc because:
+//
+// 1. **Cell requires Copy**: Cell::get() requires T: Copy, but Arc<T> is not Copy
+//    because it has a Drop implementation for reference counting.
+//
+// 2. **TLS destructors conflict with global allocators**: If we stored Option<Arc<T>>
+//    in the thread-local, its Drop implementation would run when the thread exits.
+//    This Drop could call the allocator (to deallocate the Arc), causing a fatal
+//    reentrancy error: "the global allocator may not use TLS with destructors".
+//
+// 3. **Cell is faster than Mutex**: Cell has zero overhead (just a memory read/write),
+//    while Mutex requires atomic operations and potential thread parking. Since we
+//    access this on every allocation, performance is critical.
+//
+// ## Safety invariants:
+//
+// - The NonNull pointer is only valid while a MemoryTrackedFuture holding the corresponding
+//   Arc is on the call stack (either in poll() or with_memory_tracking()).
+// - We manually manage Arc reference counts when propagating across tasks.
+// - The pointer always points to valid AllocationStats when Some.


🙏 so, so nice

aaronArinder · 2026-01-09T17:10:20Z

apollo-router/src/allocator.rs

+/// If a parent context exists, creates a child context that tracks to the parent.
+/// If no parent exists, creates a new root context with the given name.
+/// This is useful for tracking allocations in synchronous code or threads.
+#[allow(dead_code)]


still dead? assuming so; also, same question but for the other allow(dead_code)s

I've removed a couple of unused functions but had to had some more feature gates so cargo xtask lint would pass

aaronArinder · 2026-01-09T17:11:27Z

apollo-router/src/allocator.rs

+// on top. The tracking uses thread-locals with raw pointers to avoid TLS destructor
+// issues (see CURRENT_TASK_STATS documentation above).
+#[cfg(all(feature = "global-allocator", not(feature = "dhat-heap"), unix))]
+unsafe impl GlobalAlloc for CustomAllocator {


never a scarier line existed

carodewig · 2026-01-13T21:17:45Z

.changesets/feat_memory_tracking_metrics.md

@@ -0,0 +1,5 @@
+### Implement memory tracking metrics for requests ([PR #8717](https://github.com/apollographql/router/pull/8717))
+
+Adds the `apollo.router.request.memory` and `apollo.router.query_planner.memory` metrics which track allocations/deallocations during the request lifecycle.


Could this include some more context about how this is done? If I were reading this without context, I wouldn't be aware of the fact that this required a custom allocator.

bryn and others added 9 commits December 4, 2025 11:16

(feat) Allocation tracking

f99a067

This adds allocation tracking utilities so that we can get a handle on how much a single request is allocating.

(feat) Allocation tracking

27ea90a

This adds allocation tracking utilities so that we can get a handle on how much a single request is allocating.

(feat) Allocation tracking

2dbe668

Add allocation metrics for the query planner and requests

Add attributes to remove warnings and test for dealloc/realloc/zeroing

6cff27e

Add some extra with_memory_tracking calls

b345105

Remove extra tracking/debug code

33ef661

format/lint

76407a8

Remove extra compute job tracking

9fefc80

Update assertion as meter provider is now added by default

31f246b

rohan-b99 requested a review from a team December 4, 2025 17:01

This comment has been minimized.

Sign in to view

Add changeset and docs

e5e73ba

rohan-b99 requested a review from a team as a code owner December 4, 2025 17:36

rohan-b99 added 4 commits December 4, 2025 17:37

Merge branch 'dev' into memory-tracking-metrics

4adef8c

Add conditional compile only for unix

a6eb2c3

Merge branch 'memory-tracking-metrics' of https://github.com/apollogr…

76eb0c1

…aphql/router into memory-tracking-metrics

Flag imports

9ee5064

rohan-b99 changed the title ~~Memory tracking metrics~~ feat: memory tracking metrics Dec 5, 2025

rohan-b99 added 4 commits December 5, 2025 17:44

Merge branch 'dev' into memory-tracking-metrics

a7445ad

Merge branch 'dev' into memory-tracking-metrics

104bdc5

Merge branch 'dev' into memory-tracking-metrics

a9ef97c

Merge branch 'dev' into memory-tracking-metrics

aa52529

aaronArinder approved these changes Jan 9, 2026

View reviewed changes

rohan-b99 added 5 commits January 12, 2026 09:40

Assert on histogram values in test_allocation_metrics_layer

bdb823d

Remove dead code/unused functions

fc20934

Make histogram test work across platforms

93dbdc8

Clippy

14f3c98

Feature gate more fields

d4b0f0a

carodewig reviewed Jan 13, 2026

View reviewed changes

rohan-b99 added 2 commits January 14, 2026 10:57

Improve changelog

03104d2

Merge branch 'dev' into memory-tracking-metrics

20e44b8

rohan-b99 merged commit 2dd451c into dev Jan 14, 2026
15 checks passed

rohan-b99 deleted the memory-tracking-metrics branch January 14, 2026 13:45

abernix mentioned this pull request Jan 27, 2026

prep release: v2.11.0 #8835

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: memory tracking metrics#8717

feat: memory tracking metrics#8717
rohan-b99 merged 25 commits intodevfrom
memory-tracking-metrics

rohan-b99 commented Dec 4, 2025 •

edited by atlassian bot

Loading

Uh oh!

apollo-librarian bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

aaronArinder left a comment

Uh oh!

aaronArinder Jan 9, 2026

Uh oh!

rohan-b99 Jan 12, 2026

Uh oh!

aaronArinder Jan 9, 2026

Uh oh!

aaronArinder Jan 9, 2026

Uh oh!

rohan-b99 Jan 12, 2026

Uh oh!

aaronArinder Jan 9, 2026

Uh oh!

carodewig Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,5 @@
		### Implement memory tracking metrics for requests ([PR #8717](https://github.com/apollographql/router/pull/8717))

		Adds the `apollo.router.request.memory` and `apollo.router.query_planner.memory` metrics which track allocations/deallocations during the request lifecycle.

Conversation

rohan-b99 commented Dec 4, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

apollo-librarian bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Docs preview ready

Uh oh!

This comment has been minimized.

aaronArinder left a comment

Choose a reason for hiding this comment

Uh oh!

aaronArinder Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

rohan-b99 Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

aaronArinder Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

aaronArinder Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

rohan-b99 Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

aaronArinder Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

carodewig Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-b99 commented Dec 4, 2025 •

edited by atlassian bot

Loading

apollo-librarian bot commented Dec 4, 2025 •

edited

Loading