Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

avantgardnerio · 2023-08-04T01:23:35Z

Is your feature request related to a problem or challenge?

Currently, there is only one Aggregation: GroupedHashAggregateStream. It does a lovely job, but it allocates memory for every unique group by value.

For large datasets, this can cause OOM errors, even if the very next operation is a sort by max(x) limit y.

Describe the solution you'd like

I would like to add a GroupedAggregateStream based on a PriorityQueue of grouped values that can be used instead of GroupedHashAggregateStream under the specific conditions above, so that Top K queries work even on datasets with cardinality larger than available memory.

Describe alternatives you've considered

A more generalized implementation where we:

sort by group_val
aggregate by group_val emiting rows in a stream as the aggregate for each group is computed
feed that into a (new) generalized TopKExec node that is only responsible for doing the top K operation

Unfortunately, despite being more general, I'm told that this approach will still OOM in our case.

Additional context

Please see the following similar (but not same) tickets for related top K issues:

The text was updated successfully, but these errors were encountered:

tustvold · 2023-08-04T13:57:01Z

Perhaps we could add a redact group API to the new row accumulators, this would allow using them for this as well as for window functions

avantgardnerio · 2023-08-04T14:09:29Z

Perhaps we could add a redact group API to the new row accumulators, this would allow using them for this as well as for window functions

I struggled with this for a bit. Originally I rejected using GroupValuesRows because it had a hash table in it, whereas this needed to be ordered. Eventually I realized I needed a BiMap and ended up with both a hash table (group to value) and a priority queue (value to group). I think this means we could merge the two implementations by passing an optional limit to GroupValuesRows.

I don't think we'd want to always evict groups, because we might not even need to add them in the first place if the value being aggregated is less/greater than the min/max of the priority queue - so it would be a no-op.

avantgardnerio · 2023-08-04T14:11:28Z

Also, I think usually this optimization would be applied for single terms max(x) not max(x), min(y) so we would probably want this to apply to PrimitiveGroupsAccumulator if anything.

tustvold · 2023-08-04T14:44:04Z

I needed a BiMap

Yeah, I think you need both a priority queue to work out which groups to keep, along with a HashMap to work out which rows belong to which groups. I can't think of an obvious way to avoid this.

I don't think we'd want to always evict groups, because we might not even need to add them in the first place if the value being aggregated is less/greater than the min/max of the priority queue - so it would be a no-op.

I was envisaging something like adding support to the GroupsAccumulator::inter to optionally return a list of groups to redact, possibly as a BooleanBuffer. This would effectively be groups from previous calls to GroupsAccumulator::inter that are no longer needed. This would then be fed to a new GroupValues::evict method to clear them out from the various aggregators, possibly using something relatively cheap like Vec::retain.

Or something to that effect, just spitballing here. I really want to get Window functions using GroupsAccumulator so that we can get rid of the old scalar accumulators (#7112), and the GroupValues::evict API would enable this.

avantgardnerio · 2023-08-04T14:52:22Z

get rid of the old scalar accumulators

Interesting... I thought we were going the other way, due to this comment.

tustvold · 2023-08-04T15:16:37Z

By scalar, I am referring to the ones based around ScalarValue, i.e. https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.Accumulator.html

avantgardnerio added the enhancement New feature or request label Aug 4, 2023

avantgardnerio mentioned this issue Aug 4, 2023

Create a Priority Queue based Aggregation with limit #7192

Merged

15 tasks

This was referenced Aug 4, 2023

[EPIC] A collection of Sort + Limit / Top K optimizations #7195

Open

Optimize SELECT min/max queries with limit #7198

Closed

avantgardnerio mentioned this issue Aug 18, 2023

Add TopK limit for aggregation when possible coralogix/arrow-datafusion#207

Merged

14 tasks

avantgardnerio mentioned this issue Aug 30, 2023

Bg unsafe2 #7446

Closed

avantgardnerio closed this as completed in #7192 Sep 13, 2023

alamb mentioned this issue Jan 14, 2024

Add 'clickbench_extended' benchmark #8860

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

avantgardnerio commented Aug 4, 2023

tustvold commented Aug 4, 2023

avantgardnerio commented Aug 4, 2023

avantgardnerio commented Aug 4, 2023

tustvold commented Aug 4, 2023 •

edited

Loading

avantgardnerio commented Aug 4, 2023

tustvold commented Aug 4, 2023

Memory is coupled to group by cardinality, even when the aggregate output is truncated by a limit clause #7191

Memory is coupled to group by cardinality, even when the aggregate output is truncated by a limit clause #7191

Comments

avantgardnerio commented Aug 4, 2023

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

tustvold commented Aug 4, 2023

avantgardnerio commented Aug 4, 2023

avantgardnerio commented Aug 4, 2023

tustvold commented Aug 4, 2023 • edited Loading

avantgardnerio commented Aug 4, 2023

tustvold commented Aug 4, 2023

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

Memory is coupled to `group by` cardinality, even when the aggregate output is truncated by a `limit` clause #7191

tustvold commented Aug 4, 2023 •

edited

Loading