Spill large blocks to separate spill files in distinct aggregates by pgupta2 · Pull Request #17096 · prestodb/presto

pgupta2 · 2021-12-14T08:09:21Z

Distinct Aggregates spilling logic compacts all values corresponding
to the same groupId into a single array block and put it in the output
block of the aggregator. This is done for each aggregate function
in the HashAggregationBuilder. If multiple distinct aggregate functions
are present, then we perform same kind of compaction into
corresponding blocks in the same position within a page.

Due to this, a single row within the page can become really huge and
when this page is spilled, it can fail with 'integer overflow' error during
serialization. Instead of compressing all values within a single array
block, we can spill them into a separate spill file, if the block size goes
beyond a certain threshold.

When large block spill is enabled, we use a hybrid approach for prepping
page for spill. If an array block for a groupId goes beyond a threshold, we
spill it into a separate spill file and store the serialized file handle in the
output block. If the size is within the threshold, we will directly store the
values in the output block. This approach enables us to perform a spill
only iff a block is huge. Since, this will happen rarely, most of the time,
we will be storing the data directly in the output block of the aggregate.

Test plan -

Unit Tests
Tested ~350 production pipelines that have distinct aggregates for correctness and performance.

== RELEASE NOTES ==

General Changes
* Add a new configuration property ``experimental.distinct-aggregation-large-block-spill-enabled`` to enable spilling of blocks that are larger than ``experimental.distinct-aggregation-large-block-size-threshold`` bytes into a separate spill file.  This can be overridden by ``distinct_aggregation_large_block_spill_enabled`` session property.
* Add a new configuration property ``experimental.distinct-aggregation-large-block-size-threshold`` to define the threshold size beyond which the block will be spilled into a separate spill file.  This can be overridden by ``distinct_aggregation_large_block_size_threshold`` session property.

highker · 2022-01-05T21:00:03Z

The PR is on my radar. I will review it next week.

highker

High-level comments + nits only. Didn't dig deep into logic details.
Do we have unit tests coverage?

highker · 2022-01-13T23:21:50Z

...o-main/src/main/java/com/facebook/presto/operator/aggregation/GenericAccumulatorFactory.java

Generic comment: this class has become way too big. It would be good to split it into multiple files in a followup PR

highker · 2022-01-13T23:45:53Z

...o-main/src/main/java/com/facebook/presto/operator/aggregation/GenericAccumulatorFactory.java

remove else {

highker · 2022-01-13T23:47:04Z

...o-main/src/main/java/com/facebook/presto/operator/aggregation/GenericAccumulatorFactory.java

s/page -> page.getPositionCount()/Page::getPositionCount

highker · 2022-01-13T23:50:17Z

presto-main/src/main/java/com/facebook/presto/spiller/StandaloneSpiller.java

Would be good to add a javadoc to explain the difference between this and Spiller

highker · 2022-01-13T23:51:23Z

presto-main/src/main/java/com/facebook/presto/spiller/StandaloneSpiller.java

Would recommend to use a class to represent file handle like SpillerHandle or SpilledFileHandle or whatever the name making sense. byte[] could be confusing. My first glance on it made me think it's the returning stream.

highker · 2022-01-13T23:56:53Z