Skip to content

feat: add memory limit option for cooperative cancellation #8808

Merged
rohan-b99 merged 23 commits intodevfrom
rohan-b99/cooperative-cancellation-memory-limit
Jan 22, 2026
Merged

feat: add memory limit option for cooperative cancellation #8808
rohan-b99 merged 23 commits intodevfrom
rohan-b99/cooperative-cancellation-memory-limit

Conversation

@rohan-b99
Copy link
Contributor

@rohan-b99 rohan-b99 commented Jan 14, 2026

Adds a memory_limit option to the experimental_cooperative_cancellation configuration that allows you to set a maximum memory allocation limit for query planning operations. When the memory limit is exceeded during query planning, the router will:

  • In enforce mode: Cancel the query planning task and return an error to the client
  • In measure mode: Record the cancellation outcome in metrics but allow the query planning to complete

In both modes, the query will be logged in a warn message.

The memory limit works alongside the existing timeout option, and whichever limit is reached first will trigger cancellation. This feature helps prevent excessive memory usage from complex queries or query planning operations that consume too much memory.

Platform requirements: This feature is only available on Unix platforms when the global-allocator feature is enabled and dhat-heap is not enabled (same requirements as memory tracking metrics).

Example configuration:

supergraph:
  query_planning:
    experimental_cooperative_cancellation:
      enabled: true
      mode: enforce  # or "measure" to only record metrics
      memory_limit: 50mb  # Supports formats like "50mb", "1gb", "1024kb", etc.
      timeout: 5s  # Optional: can be combined with memory_limit

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@rohan-b99 rohan-b99 requested a review from a team January 14, 2026 15:21
@apollo-librarian
Copy link

apollo-librarian bot commented Jan 14, 2026

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: ca679689729cf46982242e1c
Build Logs: View logs

@github-actions

This comment has been minimized.

@rohan-b99 rohan-b99 changed the title Rohan b99/cooperative cancellation memory limit feat: add memory limit option for cooperative cancellation Jan 14, 2026
@rohan-b99 rohan-b99 requested a review from a team as a code owner January 14, 2026 17:19
Copy link
Contributor

@aaronArinder aaronArinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏


let exceeded_memory_limit_setter = exceeded_memory_limit.clone();
let task = if let Some(memory_limit) = self.cooperative_cancellation.memory_limit() {
let stats = crate::allocator::current().expect("memory limit cooperative cancellation is set but no stats are available");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting the current stats happens in the main thread, no? It looks like the task is returned below; so, if we panic here, we'll unwind all the way to program termination; should we emit a warning/error instead and continue on as though cooperative cancellation isn't in enforce mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I've changed this to log an error and continue instead

abort_handle.abort();
}
});
log::warn!("memory limit exceeded planning query: {}", &query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the pain and sorrow that this line could have helped us avoid ❤️

}
None => planning_task.await,
} else {
unreachable!("cooperative cancellation is not in enforce or measure mode");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will panic if it actually turns out to be unreachable for some reason (like someone being too quick with a refactor); should we return a CacheResolverError instead? That might save us a hard conversation with a customer later, but it's also sort of unlikely that someone would refactor this into a real panic without someone else catching it before merging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good forward thinking - since the underlying mode is an enum anyway, I removed the if and changed to a match so we only have the 2 cases to deal with in the first place

@aaronArinder aaronArinder mentioned this pull request Jan 20, 2026
@rohan-b99 rohan-b99 merged commit 23f81cd into dev Jan 22, 2026
15 checks passed
@rohan-b99 rohan-b99 deleted the rohan-b99/cooperative-cancellation-memory-limit branch January 22, 2026 14:19
@abernix abernix mentioned this pull request Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants