Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit#14915
Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit#14915rschlussel merged 2 commits intoprestodb:masterfrom
Conversation
237c0b0 to
915e3e5
Compare
|
@mbasmanova @tdcmeehan this commit is ready for review. thank you! |
b6e735d to
de71aad
Compare
cd08001 to
b77e70e
Compare
|
Hi @rschlussel could you please help review this pr? this item is perf-related, thank you! |
rschlussel
left a comment
There was a problem hiding this comment.
I recommend adding a new rule to replace any input that isAtMost(0) with values node, and then removing the specialized logic here for 0 input.
This new rule can also replace the EvaluateZeroSample rule if you add SampleNode to the CardinalityExtractorPlanVisitor, and have it return zero for 0% sample (and otherwise, atLeast 0L).
There was a problem hiding this comment.
when would this e true. this doesn't seem related to sort nodes. Maybe it should be its own rule to replace subplans that would return zero rows with an empty values.
There was a problem hiding this comment.
So the new rule is EvaluateZeroCount which evaluates zeroSample, zero-topN, zero-limit, and zero-distinctLimit count.
There was a problem hiding this comment.
don't need this if block anymore.
There was a problem hiding this comment.
if we change the isAtMost(0) rule to be separate, then this should be isAtMostScalar()
There was a problem hiding this comment.
Since the new rule does not implement a particular planNode, my take is to remain this code as it is to lookup from the tree and can be verified by testForZeroCardinality
There was a problem hiding this comment.
same comment as above regarding zero and scalar
There was a problem hiding this comment.
this check is moved to the new rule now
|
also, can you separate out the changes in the analyzer/planner (don't plan unnecessary sort) into its own commit separate from the new optimizers, so it's easier to see what's going on there. I haven't looked at that part yet because it's hard to find all the related pieces. |
|
I'm thinking this PR should be broken up into 2 or even 3. In particular, it will be good to do redundant orderby in a separate PR |
|
@kaikalur @rschlussel thank you all for reviewing the code changes. I am currently reworking this PR into multiple commits and put the unnecessary sort and redundant orderby in different PRs. |
|
@rschlussel could you please help review this PR again? all fix are in right commit version now. many thanks! |
|
@rschlussel could you please help approve this fix to be merged? This is a good performance fix to be included if possible soon. many thanks for help! |
rschlussel
left a comment
There was a problem hiding this comment.
The title for the first commit is too long. Shorten it to "Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit", and then add more description in the commit message body. We generally follow these guidelines: https://chris.beams.io/posts/git-commit/
...to-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/EvaluateZeroSample.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
this belongs in the previous commit where DistinctLimit was introduced.
There was a problem hiding this comment.
why is this test removed?
There was a problem hiding this comment.
actually this is working as design because ORDER BY in a subquery can be ignored. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order, only when ORDER BY ... LIMIT changes the result, the set of rows.
There was a problem hiding this comment.
why is this test removed?
There was a problem hiding this comment.
The purpose of the 2nd commit is to make the subquery's ORDER BY not preserved. The ordering only makes sense on the outermost query now. The original design of this cherry-pick states the same concept.
a87e1ac to
4da8ff6
Compare
|
Hi @rschlussel I revised both commits, regarding the last test removed, I have revised the testcase. Basically the 2nd commit introduces less preserved ordering in the subquery in which only combination of |
ba959e5 to
f3eb48c
Compare
6ffd10f to
6d3513b
Compare
rschlussel
left a comment
There was a problem hiding this comment.
Nearly there. just some small comments about the tests.
There was a problem hiding this comment.
It seems in this case the limit is also redundant. We're just not smart enough to remove it. Maybe instead have 2 rows in the values node.
There was a problem hiding this comment.
or I can change it to an upper bound of LIMIT, then it will make sense,but this is a good catch, we should remove limit if there's a enforceSingleRow like count(*) to make the optimizer smarter. I propose we fix it in a separate PR so that TransformCorrelatedSingleRowSubqueryToProject can take it into account
There was a problem hiding this comment.
I think limit here was incidental and we still want this test for unsupported subqueries to ensure they throw proper errors.
There was a problem hiding this comment.
then I will add a seperate tc with group by a limit 1 to guard this sanity test
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
Cherry-pick of trinodb/trino#818 Co-author: Martin Traverso <mtraverso@gmail.com>
rschlussel
left a comment
There was a problem hiding this comment.
Looks good. Thanks for sticking with it!
|
I'm not sure if we should hav those plan tests. We are planning to do more rewrites/optimizer rules so it's going to be painful to test these in the future. I say the plan tests should not be included. @rongrong WDYT? |
Which plan tests do you mean? If you mean the tpch plan tests, those already exist, so if you want to get rid of them, i think that belongs in a separate pr. (I think the purpose of them is just to let you know if you are altering tpch or tpcds plans, so that you can benchmark whether there was a regression). |
Cherry-pick of trinodb/trino#441 and trinodb/trino#818
fixes #14897