Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar#441
Conversation
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneTopNOverScalar.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneSortOverScalar.java
Outdated
Show resolved
Hide resolved
f32461d to
f11830c
Compare
|
@findepi Have updated it. Please let me know if there are any more changes. |
kokosing
left a comment
There was a problem hiding this comment.
There is also DistinctLimitNode
There was a problem hiding this comment.
There was a problem hiding this comment.
Update commit message, here in other commits
There was a problem hiding this comment.
Remove comments that do not add value to the code, the name PruneTopNOverScalar says the same.
There was a problem hiding this comment.
"... when the subplan is guaranteed to produce fewer rows than the limit"
There was a problem hiding this comment.
I'd call this "RemoveRedundantLimit"
53d2872 to
38f4a2a
Compare
presto-tests/src/main/java/io/prestosql/tests/AbstractTestQueries.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
What is wrong with limitCount == 0?
There was a problem hiding this comment.
Since both RemoveRedundantLimit and EvaluateZeroLimit rules are in the same optimizer , we don't want LimitNode with 0 count to be removed if its source is scalar as it might not fire EvaluateZeroLimit rule
There was a problem hiding this comment.
Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.
There was a problem hiding this comment.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.
I like that.
presto-main/src/main/java/io/prestosql/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
martint
left a comment
There was a problem hiding this comment.
A few comments. Also, can you add some tests similar to those in presto-main/src/test/java/io/prestosql/sql/query ?
There was a problem hiding this comment.
TopN cannot be removed blindly. The ordering matters. You can replace it with a SortNode in this case.
It's only safe to remove if the row count is guaranteed to be 1.
There was a problem hiding this comment.
Rename to RemoveSingleRowSort. "scalar" is no a proper classification for a subquery -- it's a feature of the context in which it's used. I.e., "in a place that expects a scalar value". (The "isScalar" method in QueryCardinalityUtil is misnamed)
There was a problem hiding this comment.
Rename to RemoveSingleRowDistinctLimit
5b1c546 to
3ec0abb
Compare
There was a problem hiding this comment.
Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.
There was a problem hiding this comment.
Another approach would be to use something similar to io.prestosql.sql.planner.TestLogicalPlanner#assertPlanContainsNoApplyOrAnyJoin where you would check that there is no Limit or TopN in the plan. Plan assertion is a bit simpler that way. Reasoning of anyNot that is wrapping anyTree might be not trivial, and I am not sure if the pattern is correct.
There was a problem hiding this comment.
Please upper case SQL keywords, here in other tests below as separate commit before this change.
There was a problem hiding this comment.
Nice, you have just extended support for correlated subqueries a bit ;)
presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
you extend this rule to support regular MarkDistinctNode as well
There was a problem hiding this comment.
DisitinctLimitNode was not pruned.
There was a problem hiding this comment.
But the subplan here is not a scalar
There was a problem hiding this comment.
I think you are losing output symbols here. See hashSymbol in DistinctLimitNode. Also notice that DistinctLimitNode::getOutputSymbolsreturn distinctSymbols which might be different than node.getSource().getOutputSymbols().
I wonder why test didn't find that already, so please make sure that there is test coverage for that. Can you please run your test from TestLogicalPlanner with coverage or debugging to see if you rule was triggered?
There was a problem hiding this comment.
Yes but the hashSymbol in DistinctLimitNode will be added only if we set optimize_hash_generation as true and IIRC it will be added in HashGenerationOptimizer which will be invoked after this optimizer. So we can safely assume that hashSymbol will be empty.
There was a problem hiding this comment.
Then please verify in rule that hashSymbol is empty. Also verify that DistinctLimitNode::getOutputSymbols are same as node.getSource().getOutputSymbols().
6bd6771 to
fd9e2a9
Compare
There was a problem hiding this comment.
Also check there is no MarkDistinctNode. You could also extract a method from this and reuse in assertion below, like:
assertFalse(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10");
assertTrue(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10"));
Please do the same for TopN and Sort.
There was a problem hiding this comment.
Unexpected node for the above query -> format("Unexpected sort node for query: '%s'", query). To the same for all below and above.
fd9e2a9 to
3105607
Compare
582e861 to
354c49f
Compare
martint
left a comment
There was a problem hiding this comment.
@Praveen2112, I think there are still some comments that need to be addressed.
There was a problem hiding this comment.
this should belong to previous commit
There was a problem hiding this comment.
can you also test SELECT * FROM (VALUES 1,2,3,4,5,6) LIMIT 10?
There was a problem hiding this comment.
// cannot enforce LIMIT on correlated subquery
There was a problem hiding this comment.
shouldn't you use isAtMost here?
There was a problem hiding this comment.
Yes, but inlining this here would save us one iterative optimizer loop. Also I think you could handle the case with limit = 0 here.
However, as you pointed it could be a matter of taste. Up to you.
There was a problem hiding this comment.
also handle the case where cardinality is 0
There was a problem hiding this comment.
Also, DistinctLimit with limit higher than cardinality of its source node can be rewritten to DistinctNode
There was a problem hiding this comment.
Will merge that feature to RemoveSingleRowDistinctLimit
febdab8 to
e4d4d1b
Compare
There was a problem hiding this comment.
nit: This could be extracted as separate commit.
There was a problem hiding this comment.
Commit message
Prune unnecessary TopNNode
Replace TopN node
1. With a Sort node when the subplan is guaranteed to produce fewer rows than N
2. With it's source node when the subplan produces single row
3. With a Values node when N is 0
There was a problem hiding this comment.
can you please extract each test case as separate test method?
There was a problem hiding this comment.
can you please extract each test case as separate test method?
There was a problem hiding this comment.
can you please extract each test case as separate test method?
There was a problem hiding this comment.
Why replacing Distinct with Aggregation is better? Shouldn't you use regular DistinctNode here?
There was a problem hiding this comment.
Using MarkDistinct node requires an additional FilterNode and ProjectNode so used the AggregatioNode with no aggregation functions
There was a problem hiding this comment.
Yes, that's ok. MarkDistinct serves a different purpose. @kokosing, there's no explicit DistinctNode -- it's planned as an GROUP BY with no aggregation functions.
There was a problem hiding this comment.
can you please extract each test case as separate test method?
There was a problem hiding this comment.
We have added each test method for the optimizer we implemented. So should we write each pattern of queries as separate method ?
…n is know to single row or less rows than requested Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
No description provided.