Do not extract distinct operator when de-correlating global aggregation#12582
Do not extract distinct operator when de-correlating global aggregation#12582kasiafi merged 1 commit intotrinodb:masterfrom
Conversation
"Fixes" ? |
Not sure. The plan still differs in some aspects from the expected plan described in the issue. I think this solution is good, but we could investigate further. |
5e48662 to
b18b8de
Compare
Before this change, rules `TestTransformCorrelatedGlobalAggregationWithoutProjection` and `TestTransformCorrelatedGlobalAggregationWithProjection` extracted and handled two aggregations in the correlated subquery: - a global aggregation - a "distinct operator" - i.e. an aggregation with grouping symbols but without any aggregate functions It is a common case that such two aggregations are present. They result from a call like: `count(distinct x)` Before this change, if both aggregations were present in the subquery, the rules would move them both on top of the de-correlated join. This behavior was suboptimal in some cases, specifically when the "distinct operator" could be de-correlated in place. Moving the "distinct operator" on top of the join blocked other optimizations, e.g. `PushAggregationThroughOuterJoin`. After this change, the "distinct operator" is moved on top of the de-correlated join only if it can't be de-correlated in the subquery.
b18b8de to
8d22d12
Compare
| @Test | ||
| public void testCorrelatedDistinctAggregationRewriteToLeftOuterJoin() | ||
| { | ||
| assertPlan( |
There was a problem hiding this comment.
so this is testing PushAggregationThroughOuterJoin?
There was a problem hiding this comment.
Yes. It checks the de-correlation rule TransformCorrelatedGlobalAggregationWithoutProjection combined with PushAggregationThroughOuterJoin. Before the change, PushAggregationThroughOuterJoin didn't fire because of misplaced "distinct operator".
wdym? |
@sopel39 I checked the EXPLAIN closely. I can see one difference between the plan in version The mask was added as a correctness fix, which addresses any aggregations which are "null-sensitive", that is, return different results on empty input than on input consisting of My conclusion is that this PR fixes the issue even though the mask was introduced. |
|
@sopel39 let me know if you have any more concerns or you think it is ready to go. |
|
I think it looks good |
Fixes: #12564
Credits to @sopel39 for investigating this complicated case.
Before this change, rules
TestTransformCorrelatedGlobalAggregationWithoutProjectionandTestTransformCorrelatedGlobalAggregationWithProjectionextractedand handled two aggregations in the correlated subquery:
but without any aggregate functions
It is a common case that such two aggregations are present. They
result from a call like:
count(distinct x)Before this change, if both aggregations were present in the subquery,
the rules would move them both on top of the de-correlated join.
This behavior was suboptimal in some cases, specifically when the
"distinct operator" could be de-correlated in place. Moving the
"distinct operator" on top of the join blocked other optimizations,
e.g.
PushAggregationThroughOuterJoin.After this change, the "distinct operator" is moved on top of the
de-correlated join only if it can't be de-correlated in the
subquery.
This change might need a release notes entry as a perf improvement / regression fix.