Split multiple distinct aggregations to sub queries#22355
Merged
raunaqmorarka merged 4 commits intotrinodb:masterfrom Jul 26, 2024
Merged
Split multiple distinct aggregations to sub queries#22355raunaqmorarka merged 4 commits intotrinodb:masterfrom
raunaqmorarka merged 4 commits intotrinodb:masterfrom
Conversation
7063e24 to
db41f8a
Compare
lukasz-stec
approved these changes
Jun 11, 2024
...in/src/main/java/io/trino/sql/planner/iterative/rule/DistinctAggregationStrategyChooser.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
.../main/java/io/trino/sql/planner/iterative/rule/MultipleDistinctAggregationsToSubqueries.java
Outdated
Show resolved
Hide resolved
...t/java/io/trino/sql/planner/iterative/rule/TestMultipleDistinctAggregationsToSubqueries.java
Outdated
Show resolved
Hide resolved
7023050 to
fa6b76a
Compare
core/trino-main/src/test/java/io/trino/sql/planner/iterative/rule/test/PlanBuilder.java
Outdated
Show resolved
Hide resolved
raunaqmorarka
approved these changes
Jun 17, 2024
7d1da53 to
51cf7ad
Compare
raunaqmorarka
approved these changes
Jul 25, 2024
core/trino-main/src/test/java/io/trino/connector/MockConnector.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestDistinctToSubqueriesAggregations.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/connector/MockConnectorFactory.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/test/java/io/trino/connector/MockConnectorFactory.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestDistinctToSubqueriesAggregations.java
Outdated
Show resolved
Hide resolved
The rule splits distinct aggregations on different arguments to sub-queries and joins the grouped results using grouping keys if any. This allows SingleDistinctAggregationToGroupBy to kick in and improve parallelism and performance significantly when the grouped query is cheap to duplicate.
Make different distinct aggregation strategy choices exclusive, so that order of optimizer rules does not matter.
Make MultipleDistinctAggregationsToSubqueries to fire when distinct_aggregations_strategy=AUTOMATIC and we can be confident based on stats that the rule will be beneficial. Aggregation source is limited to table scan, filter, and project.
raunaqmorarka
approved these changes
Jul 26, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Introduce new rule that splits distinct aggregations on different arguments to sub-queries and joins the grouped
results using grouping keys if any.
This allows
SingleDistinctAggregationToGroupByto kick in and improve parallelism and performance significantly when the grouped query is cheap to duplicate.Benchmarks

Some queries (simple aggregation on top of table scan, with low cardinality group by) can be significantly improved like 5s vs 40s using MarkDistinct
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text: