Add segmented aggregation in AggregationNode#17458
Add segmented aggregation in AggregationNode#17458rschlussel merged 3 commits intoprestodb:masterfrom
Conversation
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
621cd0b to
204fc3c
Compare
presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java
Outdated
Show resolved
Hide resolved
204fc3c to
d485f8c
Compare
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
258edde to
7a629b7
Compare
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
7a629b7 to
0ed7385
Compare
yuanzhanhku
left a comment
There was a problem hiding this comment.
LGTM. Thanks for implementing this feature!
|
@rschlussel gentle ping just incase you miss the notification |
rschlussel
left a comment
There was a problem hiding this comment.
I haven't reviewed the tests yet because I was a bit confused about them. Is this feature available already on the execution side?
presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/com/facebook/presto/sql/planner/optimizations/HashGenerationOptimizer.java
Outdated
Show resolved
Hide resolved
No, this is the worker side change: #17618 |
rschlussel
left a comment
There was a problem hiding this comment.
I think you need to add a session property specific to segmented aggregation that is disabled by default for now. Otherwise we will have a regression while we wait for the execution side to be ready.
...n/java/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughJoin.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/com/facebook/presto/sql/planner/optimizations/HashGenerationOptimizer.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/AddLocalExchanges.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/planPrinter/PlanPrinter.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @rschlussel
|
3ea8767 to
6feadd0
Compare
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
Enable segmented aggregation if the prefix of the sorted-by columns is a subset of the group by column
Currently we add sorted-columns to both streamPartitionColumns and localProperties only when the bucket columns are the same as the prefix of the sort columns, there are two issues 1.the when condition is too strict and eliminates some cases where we can also expose those properties 2.adding sorted-columns as streamPartitionColumns also tighten the condition, for example table that is bucketed by A and sorted by <A, B>; using <A, B> as the streamPartitionColumns is a more strict rule when we should only use A instead Instead now we: Add bucketed-by columns to streamPartitionColumns Add sorted-by columns to localProperty
6feadd0 to
c4e7c70
Compare
If the grouped-by keys contains elements from the prefix of the sorted-by key, we can enable segmented aggregation
For example:the table is sorted by F1, F3 and and we do Group by F1, F2
F2 is not sorted, so we can’t do streaming aggregation for each <F1, F2> group; however since F1 is sorted, we segment the data by F1’s value, for example the first segment, F1’s values are all a, now we can build a hashtable for each segment and do calculation and flush the data once a segment is finished
so it still saves CPU because we don’t do look up for F1, and the result hash table we keep is also smaller compared to the full hash table
