feat: add memory limit option for cooperative cancellation #8808
feat: add memory limit option for cooperative cancellation #8808
Conversation
This adds allocation tracking utilities so that we can get a handle on how much a single request is allocating.
This adds allocation tracking utilities so that we can get a handle on how much a single request is allocating.
Add allocation metrics for the query planner and requests
✅ Docs preview has no changesThe preview was not built because there were no changes. Build ID: ca679689729cf46982242e1c |
This comment has been minimized.
This comment has been minimized.
|
|
||
| let exceeded_memory_limit_setter = exceeded_memory_limit.clone(); | ||
| let task = if let Some(memory_limit) = self.cooperative_cancellation.memory_limit() { | ||
| let stats = crate::allocator::current().expect("memory limit cooperative cancellation is set but no stats are available"); |
There was a problem hiding this comment.
I think getting the current stats happens in the main thread, no? It looks like the task is returned below; so, if we panic here, we'll unwind all the way to program termination; should we emit a warning/error instead and continue on as though cooperative cancellation isn't in enforce mode?
There was a problem hiding this comment.
Good catch, I've changed this to log an error and continue instead
| abort_handle.abort(); | ||
| } | ||
| }); | ||
| log::warn!("memory limit exceeded planning query: {}", &query); |
There was a problem hiding this comment.
all the pain and sorrow that this line could have helped us avoid ❤️
| } | ||
| None => planning_task.await, | ||
| } else { | ||
| unreachable!("cooperative cancellation is not in enforce or measure mode"); |
There was a problem hiding this comment.
this will panic if it actually turns out to be unreachable for some reason (like someone being too quick with a refactor); should we return a CacheResolverError instead? That might save us a hard conversation with a customer later, but it's also sort of unlikely that someone would refactor this into a real panic without someone else catching it before merging
There was a problem hiding this comment.
Good forward thinking - since the underlying mode is an enum anyway, I removed the if and changed to a match so we only have the 2 cases to deal with in the first place
Adds a
memory_limitoption to theexperimental_cooperative_cancellationconfiguration that allows you to set a maximum memory allocation limit for query planning operations. When the memory limit is exceeded during query planning, the router will:In both modes, the query will be logged in a warn message.
The memory limit works alongside the existing
timeoutoption, and whichever limit is reached first will trigger cancellation. This feature helps prevent excessive memory usage from complex queries or query planning operations that consume too much memory.Platform requirements: This feature is only available on Unix platforms when the
global-allocatorfeature is enabled anddhat-heapis not enabled (same requirements as memory tracking metrics).Example configuration:
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩