Support pushdown of count aggregation with distinct#8562
Support pushdown of count aggregation with distinct#8562losipiuk merged 4 commits intotrinodb:masterfrom
count aggregation with distinct#8562Conversation
i am afraid there isn't, since we don't have a mock connector for exercising aggregation pushdown SPI calls. |
|
Taking these words back. Since you're making |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/expression/ImplementCount.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/expression/ImplementCount.java
Outdated
Show resolved
Hide resolved
|
@alexjo2144 can you please review build output? |
Yeah, I need to think about this a little more. There are some pushdowns that I'm still not fully understanding. For example, with the original code |
5086bb1 to
cca1e07
Compare
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
75311f9 to
3831de0
Compare
3ef0814 to
950f732
Compare
59c5585 to
02581d8
Compare
ad90d51 to
32b3974
Compare
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
1c6d6f2 to
94e67a3
Compare
94e67a3 to
cf2e758
Compare
|
@findepi should be good now (I hope). Please give it another look. |
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/TestingH2JdbcClient.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Allows pushdown of multiple distinct aggregations as long as
only one non-count aggregation is distinct.
i may have asked about that but don't remember answer -- why "only one non-count" ?
There was a problem hiding this comment.
I am not sure what exactly @alexjo2144 had in mind here.
There are many constraints here. And definitely not all queries which satisfy the condition from commit message will be be fully pushed down
E.g. this one will not:
select max(distinct regionkey), count(distinct regionkey), count(distinct nationkey) from nation;
I suggest to just leave title line in commit message and drop the rest.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/test/java/io/trino/plugin/phoenix5/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
6d29af0 to
86fa04e
Compare
findepi
left a comment
There was a problem hiding this comment.
"core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java"
There was a problem hiding this comment.
add a comment why this is inside pushIntoTableScanRulesExceptJoins set, otherwise looks like unintentional
maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct could come together with MultipleDistinctAggregationToMarkDistinct?
There was a problem hiding this comment.
maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct could come together with MultipleDistinctAggregationToMarkDistinct?
per @kasiafi
After removing MultipleDistinctAggregationToMarkDistinct from an IterativeOptimizer, there's no way that a MarkDistinctNode could be present at that stage, so the rule PushLimitThroughMarkDistinct can be also removed. However, I'd keep it there next to other PushLimit rules in case it's needed again in the future. Actually, the PushLimit rules could be extracted like projectionPushdownRules.
There was a problem hiding this comment.
We had MultipleDistinctAggregationToMarkDistinct once and now we have twice. What's the reason?
|
@kasiafi done (hopefully) as we discussed. Much simpler. Thanks. Let's see what CI says. |
There was a problem hiding this comment.
| .addAll(limitPushdownRules) | |
| .addAll(limitPushdownRules) // including through MarkDistinct |
There was a problem hiding this comment.
Actually having this here results in test failures. @kasiafi looked deeply into that and the problem comes from extra call of PushLimitThroughOuterJoin and the fact that applyJoin will not consume same limit twice.
So without this optimization went
... Limit -> OuterJoin -> Scan(no limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(no limit)
(PushLimitIntoTableScan)
... Limit -> OuterJoin -> Scan(with limit)
now join is pushable to TS
With this extra line the optimization continues from when it completed previously
... Limit -> OuterJoin -> Scan(with limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(with limit)
(PushLimitIntoTableScan; applyJoin return Optional.empty() because of https://github.com/trinodb/trino/blob/408476e5d4067db61324f773510c27731373b91b/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java#L459-L461)
... Limit -> OuterJoin -> Limit -> Scan(with limit)
!!!!! join no longer pushable to table scan
This could be addressed with #9056 but I am not super convinced this is the way to go.
There was a problem hiding this comment.
The change in PlanOptimizers looks OK.
There was a problem hiding this comment.
... so that multiple distinct aggregations can be pushed into a connector
There was a problem hiding this comment.
i wonder whether we actually considered solving the problem differently. instead of changing rule order, we could have a rule that matches to MarkDistinct and invokes aggregation pushdown.
trying this out could be a follow up. if successful, we could simply this class, presumably even rolling the changes back
There was a problem hiding this comment.
MarkDistinct is not always a result of the MultipleDistinctAggregationToMarkDistinct rule. And when it is, it could be hard to identify the original aggregations because of intervening rules.
Also, applying rule MultipleDistinctAggregationToMarkDistinct later might be possibly beneficial from the viewpoint of decorrelation.
plugin/trino-base-jdbc/src/test/java/io/trino/plugin/jdbc/BaseJdbcConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix/src/test/java/io/trino/plugin/phoenix/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-phoenix5/src/test/java/io/trino/plugin/phoenix5/TestPhoenixConnectorTest.java
Outdated
Show resolved
Hide resolved
e13aba1 to
5d919d9
Compare
5d919d9 to
1d9830c
Compare
1d9830c to
18444e5
Compare
Moves the distinct aggregation rewriting optimizers to after PushAggregationIntoTableScan.
TODO: I'm not as familiar with testing around PlanOptimizers, is there a good way to test this?
#4323