-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Remove redundant distinct over group by #18512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kaikalur
merged 1 commit into
prestodb:master
from
feilong-liu:remove_distinct_over_group_by
Oct 21, 2022
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
44 changes: 44 additions & 0 deletions
44
...n/java/com/facebook/presto/benchmark/SqlRemoveRedundantDistinctAggregationBenchmarks.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| /* | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package com.facebook.presto.benchmark; | ||
|
|
||
| import com.facebook.airlift.log.Logger; | ||
| import com.facebook.presto.testing.LocalQueryRunner; | ||
| import com.google.common.collect.ImmutableMap; | ||
| import org.intellij.lang.annotations.Language; | ||
|
|
||
| import java.util.Map; | ||
|
|
||
| import static com.facebook.presto.benchmark.BenchmarkQueryRunner.createLocalQueryRunner; | ||
|
|
||
| public class SqlRemoveRedundantDistinctAggregationBenchmarks | ||
| extends AbstractSqlBenchmark | ||
| { | ||
| private static final Logger LOGGER = Logger.get(SqlRewriteConditionalAggregationBenchmarks.class); | ||
|
|
||
| public SqlRemoveRedundantDistinctAggregationBenchmarks(LocalQueryRunner localQueryRunner, @Language("SQL") String sql) | ||
| { | ||
| super(localQueryRunner, "remove_redundant_distinct_aggregation", 10, 20, sql); | ||
| } | ||
|
|
||
| public static void main(String[] args) | ||
| { | ||
| Map<String, String> disableOptimization = ImmutableMap.of("remove_redundant_distinct_aggregation_enabled", "false"); | ||
| String sql = "select distinct orderkey, partkey, suppkey, avg(extendedprice) from lineitem group by orderkey, partkey, suppkey"; | ||
| LOGGER.info("Without optimization"); | ||
| new SqlRemoveRedundantDistinctAggregationBenchmarks(createLocalQueryRunner(disableOptimization), sql).runBenchmark(new SimpleLineBenchmarkResultWriter(System.out)); | ||
| LOGGER.info("With optimization"); | ||
| new SqlRemoveRedundantDistinctAggregationBenchmarks(createLocalQueryRunner(), sql).runBenchmark(new SimpleLineBenchmarkResultWriter(System.out)); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
142 changes: 142 additions & 0 deletions
142
...ava/com/facebook/presto/sql/planner/optimizations/RemoveRedundantDistinctAggregation.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| /* | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package com.facebook.presto.sql.planner.optimizations; | ||
|
|
||
| import com.facebook.presto.Session; | ||
| import com.facebook.presto.spi.WarningCollector; | ||
| import com.facebook.presto.spi.plan.AggregationNode; | ||
| import com.facebook.presto.spi.plan.PlanNode; | ||
| import com.facebook.presto.spi.plan.PlanNodeIdAllocator; | ||
| import com.facebook.presto.spi.plan.ProjectNode; | ||
| import com.facebook.presto.spi.relation.VariableReferenceExpression; | ||
| import com.facebook.presto.sql.planner.PlanVariableAllocator; | ||
| import com.facebook.presto.sql.planner.TypeProvider; | ||
| import com.facebook.presto.sql.planner.plan.InternalPlanVisitor; | ||
| import com.google.common.collect.ImmutableList; | ||
|
|
||
| import java.util.List; | ||
| import java.util.Set; | ||
|
|
||
| import static com.facebook.presto.SystemSessionProperties.isRemoveRedundantDistinctAggregationEnabled; | ||
| import static com.facebook.presto.spi.plan.AggregationNode.isDistinct; | ||
| import static com.facebook.presto.sql.planner.plan.ChildReplacer.replaceChildren; | ||
| import static com.google.common.collect.ImmutableList.toImmutableList; | ||
| import static com.google.common.collect.ImmutableSet.toImmutableSet; | ||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| /** | ||
| * Remove the redundant distinct if output is already distinct. | ||
| * For example, for query select distinct k, sum(x) from table group by k | ||
| * The plan will change | ||
| * <p> | ||
| * From: | ||
| * <pre> | ||
| * - Aggregation group by k, sum | ||
| * - Aggregation (sum <- AGG(x)) group by k | ||
| * </pre> | ||
| * To: | ||
| * <pre> | ||
| * - Aggregation (sum <- AGG(x)) group by k | ||
| * </pre> | ||
| * <p> | ||
| */ | ||
| public class RemoveRedundantDistinctAggregation | ||
| implements PlanOptimizer | ||
| { | ||
| @Override | ||
| public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, PlanVariableAllocator variableAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector) | ||
| { | ||
| if (isRemoveRedundantDistinctAggregationEnabled(session)) { | ||
| PlanWithProperties result = new RemoveRedundantDistinctAggregation.Rewriter().accept(plan); | ||
| return result.getNode(); | ||
| } | ||
| return plan; | ||
| } | ||
|
|
||
| private static class PlanWithProperties | ||
| { | ||
| private final PlanNode node; | ||
| // Variables in each set combines to be distinct in the output of the plan node. | ||
| private final List<Set<VariableReferenceExpression>> distinctVariableSet; | ||
|
|
||
| public PlanWithProperties(PlanNode node, List<Set<VariableReferenceExpression>> distinctVariableSet) | ||
| { | ||
| this.node = requireNonNull(node, "node is null"); | ||
| this.distinctVariableSet = requireNonNull(distinctVariableSet, "StreamProperties is null"); | ||
| } | ||
|
|
||
| public PlanNode getNode() | ||
| { | ||
| return node; | ||
| } | ||
|
|
||
| public List<Set<VariableReferenceExpression>> getProperties() | ||
| { | ||
| return distinctVariableSet; | ||
| } | ||
| } | ||
|
|
||
| private static class Rewriter | ||
| extends InternalPlanVisitor<PlanWithProperties, Void> | ||
| { | ||
| @Override | ||
| public PlanWithProperties visitPlan(PlanNode node, Void context) | ||
| { | ||
| // For nodes such as join, unnest etc. the distinct properties may be violated, hence pass empty list for these cases. | ||
| return planAndRecplace(node, false); | ||
| } | ||
|
|
||
| @Override | ||
| public PlanWithProperties visitAggregation(AggregationNode node, Void context) | ||
| { | ||
| PlanWithProperties child = accept(node.getSource()); | ||
| if (isDistinct(node) && child.getProperties().stream().anyMatch(node.getGroupingKeys()::containsAll)) { | ||
| return child; | ||
| } | ||
| ImmutableList.Builder<Set<VariableReferenceExpression>> properties = ImmutableList.builder(); | ||
| // Only do it for aggregations with one single grouping set | ||
| if (node.getGroupingSetCount() == 1 && !node.getGroupingKeys().isEmpty()) { | ||
| properties.add(node.getGroupingKeys().stream().collect(toImmutableSet())); | ||
| } | ||
| PlanNode newAggregation = node.replaceChildren(ImmutableList.of(child.getNode())); | ||
| return new PlanWithProperties(newAggregation, properties.build()); | ||
| } | ||
|
|
||
| @Override | ||
| public PlanWithProperties visitProject(ProjectNode node, Void context) | ||
| { | ||
| return planAndRecplace(node, true); | ||
| } | ||
|
|
||
| private PlanWithProperties accept(PlanNode node) | ||
| { | ||
| PlanWithProperties result = node.accept(this, null); | ||
| return new PlanWithProperties( | ||
| result.getNode().assignStatsEquivalentPlanNode(node.getStatsEquivalentPlanNode()), | ||
| result.getProperties()); | ||
| } | ||
|
|
||
| private PlanWithProperties planAndRecplace(PlanNode node, boolean passProperties) | ||
| { | ||
| List<PlanWithProperties> children = node.getSources().stream().map(this::accept).collect(toImmutableList()); | ||
| PlanNode result = replaceChildren(node, children.stream().map(PlanWithProperties::getNode).collect(toImmutableList())); | ||
| if (!passProperties) { | ||
| return new PlanWithProperties(result, ImmutableList.of()); | ||
| } | ||
| ImmutableList.Builder<Set<VariableReferenceExpression>> properties = ImmutableList.builder(); | ||
| children.stream().map(PlanWithProperties::getProperties).forEach(properties::addAll); | ||
| return new PlanWithProperties(result, properties.build()); | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure we want to enable it by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is a very general optimization that should always help and we are being very conservative and also adding more tests for making sure it works for different patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry is that if there is a bug in the code and it would big pain to do fixes in prod. If we have high confidence with full correctness in verifiers, I'm also ok either way.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - we will do some targeted verifier runs as the pattern is relatively easy to look for in the logs.
Main issue is if we add something turned off we rarely turn it on - e.g optimize_nulls_in_join has been there for ~2 years but we never turned it on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I will run verifier test and report here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get about 40 queries which trigger this optimization, with most queries showing about 20% reduction in cpu time.