Add FuseCrossJoinedGlobalAggregations rule#14271
Add FuseCrossJoinedGlobalAggregations rule#14271lukasz-stec wants to merge 1 commit intotrinodb:masterfrom
Conversation
0bd1ca9 to
f20aee4
Compare
|
Improvement for a basic query that matches this rule: after |
core/trino-main/src/main/java/io/trino/sql/planner/OptimizerConfig.java
Outdated
Show resolved
Hide resolved
...in/src/main/java/io/trino/sql/planner/iterative/rule/RemoveRedundantIdentityProjections.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
...est/java/io/trino/sql/planner/iterative/rule/fuse/TestFuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
f4acdd9 to
5523436
Compare
lukasz-stec
left a comment
There was a problem hiding this comment.
most comments addressed, the documentation of the fuse operation is still pending
...in/src/main/java/io/trino/sql/planner/iterative/rule/RemoveRedundantIdentityProjections.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can we somehow fix this? Because it seems like a pretty imp test
There was a problem hiding this comment.
fixing the entire infra is hard and I don't know if we want to do it just for this case because it makes a lot of things easier.
Also, I have this case tested in AbstractTestJoinQueries.testFuseCrossJoinOnGlobalAggregationWithDuplicatedAggregation.
There was a problem hiding this comment.
There's no point in adding a disabled test. I didn't get what's the issue from "test infra assumption that projection always has unique target expression"
There was a problem hiding this comment.
There's no point in adding a disabled test.
I added it for 2 reasons. One, to avoid questions now and in the future, why this case is not tested here, second, if the testing infra allows for not unique symbol target, we could enable this test and drop the AbstractTestJoinQueries.testFuseCrossJoinOnGlobalAggregationWithDuplicatedAggregation.
I didn't get what's the issue from "test infra assumption that projection always has unique target expression"
The issue is that AliasMatcher assumes that the plan will have unique symbol to expression assignments so for example plan like
project(ImmutableMap.of(
"sumLeft", PlanMatchPattern.expression("sumLeft"),
"sumRight", PlanMatchPattern.expression("sumLeft")),
is not allowed because two different symbols are mapped to the same expression (symbol sumLeft).
In this case you get
java.lang.IllegalStateException: Ambiguous expression "sumLeft" matches multiple assignments ["sumLeft", "sumLeft"]
at com.google.common.base.Preconditions.checkState(Preconditions.java:821)
at io.trino.sql.planner.assertions.ExpressionMatcher.getAssignedSymbol(ExpressionMatcher.java:88)
at io.trino.sql.planner.assertions.AliasMatcher.detailMatches(AliasMatcher.java:57)
I don't think it can be easily fixed, at least I don't know how.
...est/java/io/trino/sql/planner/iterative/rule/fuse/TestFuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...est/java/io/trino/sql/planner/iterative/rule/fuse/TestFuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
5523436 to
bb48f0c
Compare
lukasz-stec
left a comment
There was a problem hiding this comment.
Comments addressed
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
...est/java/io/trino/sql/planner/iterative/rule/fuse/TestFuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...est/java/io/trino/sql/planner/iterative/rule/fuse/TestFuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
fixing the entire infra is hard and I don't know if we want to do it just for this case because it makes a lot of things easier.
Also, I have this case tested in AbstractTestJoinQueries.testFuseCrossJoinOnGlobalAggregationWithDuplicatedAggregation.
bb48f0c to
079b03f
Compare
raunaqmorarka
left a comment
There was a problem hiding this comment.
preliminary comments, will add more
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
...rc/main/java/io/trino/sql/planner/iterative/rule/fuse/FuseCrossJoinedGlobalAggregations.java
Outdated
Show resolved
Hide resolved
5bffedb to
5265360
Compare
|
added prefix PR #14559 for |
da6dec8 to
adc2eef
Compare
adc2eef to
2ea95bc
Compare
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
82399c1 to
aeb511f
Compare
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Returning symbol when mapping doesn't exist is a bit risky, it's assuming that all the un-mapped symbols come from the left side output symbols.
What if we add identity symbol mappings for the symbols from left node output, then we can always require that there is a non-null mapped ?
There was a problem hiding this comment.
The logic is that we want to use the mapped (left) symbol if the mapping exists and use the right symbol if it does not. This stems directly from the fuse operation definition (see below) that either the right symbol is unique (not on the left) or it should be mapped. Put it differently, if there is no mapping than the right symbol has to be in the fused plan outputs.
If Fuse(P1, P2) = (P, M, L, R), then:
- P is the fused resulting plan. The schema of P includes
all output columns in P1 and, optionally, additional
output columns from P2.
- M is a mapping from the output columns of P2 to output
columns of P.
- L and R are two filter conditions defined over the output
columns of P to restore P1 and P2, respectively
What if we add identity symbol mappings for the symbols from left node output
that wouldn't work. We could add identity mappings from the right node but that would be deviation from the paper (the paper describes in detail how the mapping is constructed for each plan node) and IMO keeping the implementation close to the paper makes it more straightforward and less error prune
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
aeb511f to
776196a
Compare
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
it's an effect of adding another RemoveRedundantIdentityProjections rule. I don't know the exact details.
testing/trino-tests/src/test/resources/sql/presto/tpcds/hive/partitioned/q66.plan.txt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
The logic is that we want to use the mapped (left) symbol if the mapping exists and use the right symbol if it does not. This stems directly from the fuse operation definition (see below) that either the right symbol is unique (not on the left) or it should be mapped. Put it differently, if there is no mapping than the right symbol has to be in the fused plan outputs.
If Fuse(P1, P2) = (P, M, L, R), then:
- P is the fused resulting plan. The schema of P includes
all output columns in P1 and, optionally, additional
output columns from P2.
- M is a mapping from the output columns of P2 to output
columns of P.
- L and R are two filter conditions defined over the output
columns of P to restore P1 and P2, respectively
What if we add identity symbol mappings for the symbols from left node output
that wouldn't work. We could add identity mappings from the right node but that would be deviation from the paper (the paper describes in detail how the mapping is constructed for each plan node) and IMO keeping the implementation close to the paper makes it more straightforward and less error prune
There was a problem hiding this comment.
We could, but I think it's useful. First, it directly matches the paper definition, second, it allows for convenience methods like map (with default return) and requiredMap
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/OptimizerConfig.java
Outdated
Show resolved
Hide resolved
776196a to
cfb5f9a
Compare
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I don't think so, AggregationNode.Aggregation#equals uses mask for comparison.
core/trino-main/src/main/java/io/trino/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Instead of exposing FusedPlanNode in public API, I think this logic can move to PlanNodeFuser and we have that return a Optional<PlanNode> instead
There was a problem hiding this comment.
IMO FusedPlanNode is the public API for fusion.
Now, for this specific case, I could move the check but for the more general version of this rule that is coming (JoinOnKeys) and for the other rules, the filters can be there
There was a problem hiding this comment.
Okay, are the filters going to be handled differently based on what is the parent node ?
Can we move the part about adding projections into PlanNodeFuser by adding Set<Symbol> resultOutputSymbols to API or is this part also specific to this rule ?
There was a problem hiding this comment.
are the filters going to be handled differently based on what is the parent node ?
Yes, at least this is what I see in the paper (the implementation can vary)
Can we move the part about adding projections into PlanNodeFuser by adding Set resultOutputSymbols to API or is this part also specific to this rule ?
I don't think it's specific to this rule but also we don't have to add it to the fusion API.
I refactored this and moved this logic to the static io.trino.sql.planner.iterative.rule.fuse.PlanNodeFuser#fuse(io.trino.sql.planner.iterative.Rule.Context, io.trino.sql.planner.plan.PlanNode, io.trino.sql.planner.plan.PlanNode)
There was a problem hiding this comment.
Why are using rightAggregation.getMask() instead of mappedMask ?
There was a problem hiding this comment.
Good catch, I think mappedMask should be used. I need to figure out a test case for this (in progress)
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
|
Please rebase to fix conflicts |
cfb5f9a to
5460671
Compare
lukasz-stec
left a comment
There was a problem hiding this comment.
comments addressed, test pending for the aggregation with mask case
There was a problem hiding this comment.
IMO FusedPlanNode is the public API for fusion.
Now, for this specific case, I could move the check but for the more general version of this rule that is coming (JoinOnKeys) and for the other rules, the filters can be there
There was a problem hiding this comment.
Good catch, I think mappedMask should be used. I need to figure out a test case for this (in progress)
There was a problem hiding this comment.
We can probably use node.getId() here as the original JoinNode should get discarded
There was a problem hiding this comment.
I moved this to PlanNodeFuser so it's not easy to reuse the id.
There was a problem hiding this comment.
Okay, are the filters going to be handled differently based on what is the parent node ?
Can we move the part about adding projections into PlanNodeFuser by adding Set<Symbol> resultOutputSymbols to API or is this part also specific to this rule ?
5460671 to
7da1f63
Compare
There was a problem hiding this comment.
Maybe inversion would be better to read !(a && b) == (!a || !b)
There was a problem hiding this comment.
Honestly, I'm conflicted. Usually logical and is easier to read but in this case, the logic is: if any of the corresponding values are different, bailout. For this case, or over not seems more readable to me.
7da1f63 to
05e42f6
Compare
There was a problem hiding this comment.
are the filters going to be handled differently based on what is the parent node ?
Yes, at least this is what I see in the paper (the implementation can vary)
Can we move the part about adding projections into PlanNodeFuser by adding Set resultOutputSymbols to API or is this part also specific to this rule ?
I don't think it's specific to this rule but also we don't have to add it to the fusion API.
I refactored this and moved this logic to the static io.trino.sql.planner.iterative.rule.fuse.PlanNodeFuser#fuse(io.trino.sql.planner.iterative.Rule.Context, io.trino.sql.planner.plan.PlanNode, io.trino.sql.planner.plan.PlanNode)
There was a problem hiding this comment.
I moved this to PlanNodeFuser so it's not easy to reuse the id.
There was a problem hiding this comment.
Honestly, I'm conflicted. Usually logical and is easier to read but in this case, the logic is: if any of the corresponding values are different, bailout. For this case, or over not seems more readable to me.
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/fuse/PlanNodeFuser.java
Outdated
Show resolved
Hide resolved
Add simplified version of the JoinOnKeys rule from "Computation Reuse via Fusion in Amazon Athena" paper that acts only on cross join over global aggregations over table scans with possible filter. This can transform: select from (select count(*) c1 from table where a = 1), (select count(*) c2 from table where b = 1) into select count(*) filter (where a = 1), count(*) filter (where b = 1) from table where a = 1 or b = 1 making the query to read the table only once.
05e42f6 to
0caf276
Compare
Description
Add a simplified version of the
JoinOnKeysrulefrom "Computation Reuse via Fusion in Amazon Athena" paper that acts only on cross join over global
aggregations over table scans with possible filter. This can transform:
into
making the query read the table only once.
Non-technical explanation
Speed up queries that have a subquery that matches the specific pattern of
cross join over global aggregation on the same table by reading the table only once.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( X) Release notes are required, with the following suggested text: