Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar by Praveen2112 · Pull Request #441 · trinodb/trino

Praveen2112 · 2019-03-11T11:53:00Z

No description provided.

Praveen2112 · 2019-03-11T11:53:17Z

@martint , @findepi , @kokosing Can you please review this ?

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneTopNOverScalar.java

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneSortOverScalar.java

Praveen2112 · 2019-03-11T13:51:37Z

@findepi Have updated it. Please let me know if there are any more changes.

kokosing

There is also DistinctLimitNode

kokosing · 2019-03-11T14:20:46Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java

Use https://github.com/prestosql/presto/blob/master/presto-main/src/main/java/io/prestosql/sql/planner/plan/Patterns.java#L239?

kokosing · 2019-03-11T14:22:42Z

presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java

Update commit message, here in other commits

kokosing · 2019-03-11T14:23:06Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneTopNOverScalar.java

Remove comments that do not add value to the code, the name PruneTopNOverScalar says the same.

martint · 2019-03-11T17:46:17Z

presto-main/src/test/java/io/prestosql/sql/planner/iterative/rule/TestPruneLimitOverScalar.java

See #437 (comment)

martint · 2019-03-11T17:46:28Z

presto-main/src/test/java/io/prestosql/sql/planner/iterative/rule/TestPruneLimitOverScalar.java

martint · 2019-03-11T17:46:35Z

presto-main/src/test/java/io/prestosql/sql/planner/iterative/rule/TestPruneLimitOverScalar.java

martint · 2019-03-11T17:48:47Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java

"... when the subplan is guaranteed to produce fewer rows than the limit"

martint · 2019-03-11T17:49:27Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java

I'd call this "RemoveRedundantLimit"

Praveen2112 · 2019-03-12T12:48:28Z

@findepi , @martint , @kokosing Have updated the code. Please let me know if there is any changes.

presto-tests/src/main/java/io/prestosql/tests/AbstractTestQueries.java

kokosing · 2019-03-12T13:34:31Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveRedundantLimit.java

What is wrong with limitCount == 0?

Since both RemoveRedundantLimit and EvaluateZeroLimit rules are in the same optimizer , we don't want LimitNode with 0 count to be removed if its source is scalar as it might not fire EvaluateZeroLimit rule

Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

I like that.

@kokosing , @martint Have merged both of the two rules

presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java

martint

A few comments. Also, can you add some tests similar to those in presto-main/src/test/java/io/prestosql/sql/query ?

martint · 2019-03-12T22:03:33Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveRedundantTopN.java

TopN cannot be removed blindly. The ordering matters. You can replace it with a SortNode in this case.

It's only safe to remove if the row count is guaranteed to be 1.

martint · 2019-03-12T22:07:14Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneSortOverScalar.java

Rename to RemoveSingleRowSort. "scalar" is no a proper classification for a subquery -- it's a feature of the context in which it's used. I.e., "in a place that expects a scalar value". (The "isScalar" method in QueryCardinalityUtil is misnamed)

martint · 2019-03-12T22:09:28Z

...main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneDistinctLimitOverScalar.java

Rename to RemoveSingleRowDistinctLimit

kokosing · 2019-03-13T09:51:59Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveRedundantLimit.java

Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

kokosing · 2019-03-13T09:56:22Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

Another approach would be to use something similar to io.prestosql.sql.planner.TestLogicalPlanner#assertPlanContainsNoApplyOrAnyJoin where you would check that there is no Limit or TopN in the plan. Plan assertion is a bit simpler that way. Reasoning of anyNot that is wrapping anyTree might be not trivial, and I am not sure if the pattern is correct.

kokosing · 2019-03-13T09:56:59Z

presto-main/src/test/java/io/prestosql/sql/query/TestSubqueries.java

Please upper case SQL keywords, here in other tests below as separate commit before this change.

kokosing · 2019-03-13T09:58:48Z

presto-tests/src/main/java/io/prestosql/tests/AbstractTestQueries.java

Nice, you have just extended support for correlated subqueries a bit ;)

kokosing · 2019-03-13T09:59:19Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

same comments

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

kokosing · 2019-03-13T10:02:10Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

same comments

kokosing · 2019-03-13T10:03:16Z

...main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveSingleRowDistinctLimit.java

you extend this rule to support regular MarkDistinctNode as well

kokosing · 2019-03-13T10:08:09Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

DisitinctLimitNode was not pruned.

But the subplan here is not a scalar

Right, I overlooked GROUP BY.

kokosing · 2019-03-13T10:08:55Z

...main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveSingleRowDistinctLimit.java

I think you are losing output symbols here. See hashSymbol in DistinctLimitNode. Also notice that DistinctLimitNode::getOutputSymbolsreturn distinctSymbols which might be different than node.getSource().getOutputSymbols().

I wonder why test didn't find that already, so please make sure that there is test coverage for that. Can you please run your test from TestLogicalPlanner with coverage or debugging to see if you rule was triggered?

Yes but the hashSymbol in DistinctLimitNode will be added only if we set optimize_hash_generation as true and IIRC it will be added in HashGenerationOptimizer which will be invoked after this optimizer. So we can safely assume that hashSymbol will be empty.

Then please verify in rule that hashSymbol is empty. Also verify that DistinctLimitNode::getOutputSymbols are same as node.getSource().getOutputSymbols().

kokosing · 2019-03-14T09:34:29Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

Also check there is no MarkDistinctNode. You could also extract a method from this and reuse in assertion below, like:

assertFalse(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10"); assertTrue(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10"));

Please do the same for TopN and Sort.

kokosing · 2019-03-14T09:35:45Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

Right, I overlooked GROUP BY.

kokosing · 2019-03-14T09:39:41Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

Unexpected node for the above query -> format("Unexpected sort node for query: '%s'", query). To the same for all below and above.

martint

@Praveen2112, I think there are still some comments that need to be addressed.

martint · 2019-04-29T22:30:01Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

Typo: "sor"

kokosing · 2019-04-30T07:28:58Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

static import for OPTIMIZED

kokosing · 2019-04-30T07:29:45Z

presto-main/src/test/java/io/prestosql/sql/query/TestSubqueries.java

this should belong to previous commit

kokosing · 2019-04-30T07:30:52Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

can you also test SELECT * FROM (VALUES 1,2,3,4,5,6) LIMIT 10?

kokosing · 2019-04-30T07:31:56Z

presto-main/src/test/java/io/prestosql/sql/query/TestSubqueries.java

// cannot enforce LIMIT on correlated subquery

kokosing · 2019-04-30T07:38:25Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/ReplaceTopNWithSort.java

shouldn't you use isAtMost here?

kokosing · 2019-04-30T07:39:11Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/ReplaceTopNWithSort.java

Yes, but inlining this here would save us one iterative optimizer loop. Also I think you could handle the case with limit = 0 here.

However, as you pointed it could be a matter of taste. Up to you.

kokosing · 2019-04-30T07:43:19Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveSingleRowSort.java

also handle the case where cardinality is 0

kokosing · 2019-04-30T07:46:25Z

presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java

Also, DistinctLimit with limit higher than cardinality of its source node can be rewritten to DistinctNode

Will merge that feature to RemoveSingleRowDistinctLimit

kokosing · 2019-05-06T09:26:03Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

nit: This could be extracted as separate commit.

kokosing · 2019-05-06T09:28:29Z

presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java

Commit message

Prune unnecessary TopNNode Replace TopN node 1. With a Sort node when the subplan is guaranteed to produce fewer rows than N 2. With it's source node when the subplan produces single row 3. With a Values node when N is 0

kokosing · 2019-05-06T09:29:00Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveRedundantTopN.java

nit: else is redundant

kokosing · 2019-05-06T09:31:07Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

testRedundantTopNRemoval?

kokosing · 2019-05-06T09:32:01Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

can you please extract each test case as separate test method?

kokosing · 2019-05-06T09:32:07Z

presto-main/src/test/java/io/prestosql/sql/planner/iterative/rule/TestRemoveRedundantTopN.java

can you please extract each test case as separate test method?

kokosing · 2019-05-06T09:32:48Z

presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveSingleRowSort.java

kokosing · 2019-05-06T09:32:54Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

can you please extract each test case as separate test method?

kokosing · 2019-05-06T09:34:27Z

...main/src/main/java/io/prestosql/sql/planner/iterative/rule/RemoveRedundantDistinctLimit.java

Why replacing Distinct with Aggregation is better? Shouldn't you use regular DistinctNode here?

Using MarkDistinct node requires an additional FilterNode and ProjectNode so used the AggregatioNode with no aggregation functions

Yes, that's ok. MarkDistinct serves a different purpose. @kokosing, there's no explicit DistinctNode -- it's planned as an GROUP BY with no aggregation functions.

Sounds good.

kokosing · 2019-05-06T09:35:18Z

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java

can you please extract each test case as separate test method?

We have added each test method for the optimizer we implemented. So should we write each pattern of queries as separate method ?

…n is know to single row or less rows than requested Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>

In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>

cla-bot bot added the cla-signed label Mar 11, 2019

findepi reviewed Mar 11, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch from f32461d to f11830c Compare March 11, 2019 13:49

kokosing reviewed Mar 11, 2019

View reviewed changes

martint requested changes Mar 11, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 53d2872 to 38f4a2a Compare March 12, 2019 12:46

Praveen2112 changed the title ~~Remove Limit/TopN/Sort node if it's source is a scalar~~ Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar Mar 12, 2019

kokosing reviewed Mar 12, 2019

View reviewed changes

martint requested changes Mar 12, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 5b1c546 to 3ec0abb Compare March 13, 2019 09:43

kokosing reviewed Mar 13, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 6bd6771 to fd9e2a9 Compare March 14, 2019 03:16

kokosing reviewed Mar 14, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch from fd9e2a9 to 3105607 Compare March 16, 2019 14:27

martint mentioned this pull request Apr 11, 2019

Indicate progress only if limit is applied #618

Merged

martint self-requested a review April 22, 2019 22:07

Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 582e861 to 354c49f Compare April 29, 2019 14:10

martint reviewed Apr 29, 2019

View reviewed changes

presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java Outdated

Copy link

Member

martint Apr 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "sor"

kokosing reviewed Apr 30, 2019

View reviewed changes

Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from febdab8 to e4d4d1b Compare May 5, 2019 05:26

kokosing reviewed May 6, 2019

View reviewed changes

martint self-requested a review May 6, 2019 23:08

ebyhr mentioned this pull request Mar 28, 2024

Order by limit sorting problem, is there any forced sorting configuration? #21300

Closed

Conversation

Praveen2112 commented Mar 11, 2019

Uh oh!

Praveen2112 commented Mar 11, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Praveen2112 commented Mar 11, 2019

Uh oh!

kokosing left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Praveen2112 commented Mar 12, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

martint left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment