Skip to content

Support pushdown of count aggregation with distinct#8562

Merged
losipiuk merged 4 commits intotrinodb:masterfrom
alexjo2144:pushdown-count-distinct
Sep 2, 2021
Merged

Support pushdown of count aggregation with distinct#8562
losipiuk merged 4 commits intotrinodb:masterfrom
alexjo2144:pushdown-count-distinct

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 commented Jul 14, 2021

Moves the distinct aggregation rewriting optimizers to after PushAggregationIntoTableScan.

TODO: I'm not as familiar with testing around PlanOptimizers, is there a good way to test this?

#4323

@cla-bot cla-bot bot added the cla-signed label Jul 14, 2021
@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 15, 2021

TODO: I'm not as familiar with testing around PlanOptimizers, is there a good way to test this?

i am afraid there isn't, since we don't have a mock connector for exercising aggregation pushdown SPI calls.
Unless you create one.

@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 15, 2021

Taking these words back. Since you're making ImplementCount support DISTINCT aggregations, you should be able to verify pushdown is happening in io.trino.plugin.jdbc.BaseJdbcConnectorTest#testAggregationPushdown (eg with PostgreSQL)

@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 16, 2021

@alexjo2144 can you please review build output?

@alexjo2144
Copy link
Copy Markdown
Member Author

@alexjo2144 can you please review build output?

Yeah, I need to think about this a little more. There are some pushdowns that I'm still not fully understanding.

For example, with the original code SELECT min(DISTINCT regionkey) is fully pushed down as SELECT min("regionkey") AS "tmp" FROM (SELECT "regionkey" FROM "tpch"."nation" GROUP BY "regionkey"), which is a little odd. However, after my changes only the groupby is pushded down, which is definitely not what we want.

@alexjo2144 alexjo2144 force-pushed the pushdown-count-distinct branch 7 times, most recently from 5086bb1 to cca1e07 Compare July 21, 2021 15:49
@alexjo2144 alexjo2144 force-pushed the pushdown-count-distinct branch 3 times, most recently from 75311f9 to 3831de0 Compare July 26, 2021 21:20
@alexjo2144 alexjo2144 force-pushed the pushdown-count-distinct branch 4 times, most recently from 3ef0814 to 950f732 Compare August 13, 2021 18:55
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch 7 times, most recently from 59c5585 to 02581d8 Compare August 18, 2021 12:46
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from ad90d51 to 32b3974 Compare August 25, 2021 21:52
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from 1c6d6f2 to 94e67a3 Compare August 26, 2021 15:38
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from 94e67a3 to cf2e758 Compare August 26, 2021 17:23
@losipiuk
Copy link
Copy Markdown
Member

@findepi should be good now (I hope). Please give it another look.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows pushdown of multiple distinct aggregations as long as
only one non-count aggregation is distinct.

i may have asked about that but don't remember answer -- why "only one non-count" ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what exactly @alexjo2144 had in mind here.
There are many constraints here. And definitely not all queries which satisfy the condition from commit message will be be fully pushed down

E.g. this one will not:

select max(distinct regionkey), count(distinct regionkey), count(distinct nationkey) from nation;

I suggest to just leave title line in commit message and drop the rest.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redudantn move

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related?

Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment why this is inside pushIntoTableScanRulesExceptJoins set, otherwise looks like unintentional

maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct could come together with MultipleDistinctAggregationToMarkDistinct?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it shouldn't be here? instead, PushLimitThroughMarkDistinct could come together with MultipleDistinctAggregationToMarkDistinct?

per @kasiafi

After removing MultipleDistinctAggregationToMarkDistinct from an IterativeOptimizer, there's no way that a MarkDistinctNode could be present at that stage, so the rule PushLimitThroughMarkDistinct can be also removed. However, I'd keep it there next to other PushLimit rules in case it's needed again in the future. Actually, the PushLimit rules could be extracted like projectionPushdownRules.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind. Reworking after chat with @kasiafi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had MultipleDistinctAggregationToMarkDistinct once and now we have twice. What's the reason?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not make sense.

@losipiuk
Copy link
Copy Markdown
Member

@kasiafi done (hopefully) as we discussed. Much simpler. Thanks. Let's see what CI says.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.addAll(limitPushdownRules)
.addAll(limitPushdownRules) // including through MarkDistinct

Copy link
Copy Markdown
Member

@losipiuk losipiuk Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually having this here results in test failures. @kasiafi looked deeply into that and the problem comes from extra call of PushLimitThroughOuterJoin and the fact that applyJoin will not consume same limit twice.
So without this optimization went

... Limit -> OuterJoin -> Scan(no limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(no limit)
(PushLimitIntoTableScan)
... Limit -> OuterJoin -> Scan(with limit)

now join is pushable to TS

With this extra line the optimization continues from when it completed previously

... Limit -> OuterJoin -> Scan(with limit)
(PushLimitThroughOuterJoin)
... Limit -> OuterJoin -> Limit -> Scan(with limit)
(PushLimitIntoTableScan; applyJoin return Optional.empty() because of https://github.com/trinodb/trino/blob/408476e5d4067db61324f773510c27731373b91b/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/DefaultJdbcMetadata.java#L459-L461)
... Limit -> OuterJoin -> Limit -> Scan(with limit)

!!!!! join no longer pushable to table scan

This could be addressed with #9056 but I am not super convinced this is the way to go.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi, @kasiafi I suggest we go without limit pushdown here for now. It should not matter. What was to be pushed down should be already.
Do you feel it is good to be merged?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in PlanOptimizers looks OK.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... so that multiple distinct aggregations can be pushed into a connector

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder whether we actually considered solving the problem differently. instead of changing rule order, we could have a rule that matches to MarkDistinct and invokes aggregation pushdown.

trying this out could be a follow up. if successful, we could simply this class, presumably even rolling the changes back

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkDistinct is not always a result of the MultipleDistinctAggregationToMarkDistinct rule. And when it is, it could be hard to identify the original aggregations because of intervening rules.
Also, applying rule MultipleDistinctAggregationToMarkDistinct later might be possibly beneficial from the viewpoint of decorrelation.

@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from e13aba1 to 5d919d9 Compare September 1, 2021 11:31
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from 5d919d9 to 1d9830c Compare September 1, 2021 11:32
@losipiuk losipiuk force-pushed the pushdown-count-distinct branch from 1d9830c to 18444e5 Compare September 2, 2021 09:20
@losipiuk losipiuk merged commit d55bb1c into trinodb:master Sep 2, 2021
@ebyhr ebyhr added this to the 362 milestone Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants