Skip to content

Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit#14915

Merged
rschlussel merged 2 commits intoprestodb:masterfrom
fgwang7w:14897
Dec 28, 2020
Merged

Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit#14915
rschlussel merged 2 commits intoprestodb:masterfrom
fgwang7w:14897

Conversation

@fgwang7w
Copy link
Member

@fgwang7w fgwang7w commented Jul 29, 2020

Cherry-pick of trinodb/trino#441 and trinodb/trino#818

== RELEASE NOTES ==

General Changes
* Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit when relation is know to single row or less rows than requested
* The analyzer will emit a warning if a redundant ORDER BY is present

fixes #14897

@fgwang7w fgwang7w requested a review from tdcmeehan July 29, 2020 05:53
@fgwang7w fgwang7w requested a review from mbasmanova July 29, 2020 05:54
@fgwang7w fgwang7w force-pushed the 14897 branch 4 times, most recently from 237c0b0 to 915e3e5 Compare July 30, 2020 01:04
@fgwang7w
Copy link
Member Author

@mbasmanova @tdcmeehan this commit is ready for review. thank you!

@tdcmeehan tdcmeehan requested review from kaikalur and rongrong July 30, 2020 14:13
@fgwang7w fgwang7w force-pushed the 14897 branch 4 times, most recently from b6e735d to de71aad Compare August 3, 2020 17:18
@fgwang7w fgwang7w force-pushed the 14897 branch 4 times, most recently from cd08001 to b77e70e Compare August 19, 2020 03:27
@fgwang7w fgwang7w requested a review from rschlussel August 19, 2020 18:47
@fgwang7w
Copy link
Member Author

Hi @rschlussel could you please help review this pr? this item is perf-related, thank you!

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding a new rule to replace any input that isAtMost(0) with values node, and then removing the specialized logic here for 0 input.

This new rule can also replace the EvaluateZeroSample rule if you add SampleNode to the CardinalityExtractorPlanVisitor, and have it return zero for 0% sample (and otherwise, atLeast 0L).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would this e true. this doesn't seem related to sort nodes. Maybe it should be its own rule to replace subplans that would return zero rows with an empty values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the new rule is EvaluateZeroCount which evaluates zeroSample, zero-topN, zero-limit, and zero-distinctLimit count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this if block anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change the isAtMost(0) rule to be separate, then this should be isAtMostScalar()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new rule does not implement a particular planNode, my take is to remain this code as it is to lookup from the tree and can be verified by testForZeroCardinality

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above regarding zero and scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is moved to the new rule now

@rschlussel
Copy link
Contributor

also, can you separate out the changes in the analyzer/planner (don't plan unnecessary sort) into its own commit separate from the new optimizers, so it's easier to see what's going on there. I haven't looked at that part yet because it's hard to find all the related pieces.

@kaikalur
Copy link
Contributor

kaikalur commented Sep 1, 2020

I'm thinking this PR should be broken up into 2 or even 3. In particular, it will be good to do redundant orderby in a separate PR

@fgwang7w
Copy link
Member Author

fgwang7w commented Sep 2, 2020

@kaikalur @rschlussel thank you all for reviewing the code changes. I am currently reworking this PR into multiple commits and put the unnecessary sort and redundant orderby in different PRs.

@fgwang7w
Copy link
Member Author

fgwang7w commented Nov 6, 2020

@rschlussel could you please help review this PR again? all fix are in right commit version now. many thanks!

@fgwang7w
Copy link
Member Author

fgwang7w commented Dec 9, 2020

@rschlussel could you please help approve this fix to be merged? This is a good performance fix to be included if possible soon. many thanks for help!

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title for the first commit is too long. Shorten it to "Avoid planning unnecessary LIMIT/TopN/Sort/DistinctLimit", and then add more description in the commit message body. We generally follow these guidelines: https://chris.beams.io/posts/git-commit/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs in the previous commit where DistinctLimit was introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this test removed?

Copy link
Member Author

@fgwang7w fgwang7w Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this is working as design because ORDER BY in a subquery can be ignored. Rows in a table (or in a subquery in the FROM clause) do not come in any specific order, only when ORDER BY ... LIMIT changes the result, the set of rows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this test removed?

Copy link
Member Author

@fgwang7w fgwang7w Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the 2nd commit is to make the subquery's ORDER BY not preserved. The ordering only makes sense on the outermost query now. The original design of this cherry-pick states the same concept.

@fgwang7w fgwang7w force-pushed the 14897 branch 7 times, most recently from a87e1ac to 4da8ff6 Compare December 14, 2020 23:21
@fgwang7w
Copy link
Member Author

Hi @rschlussel I revised both commits, regarding the last test removed, I have revised the testcase. Basically the 2nd commit introduces less preserved ordering in the subquery in which only combination of ORDER BYand LIMIT changes the specific order and set of rows, otherwise it would remain as unordered rows per SQL standard. @kaikalur please FYI

@fgwang7w fgwang7w force-pushed the 14897 branch 2 times, most recently from ba959e5 to f3eb48c Compare December 15, 2020 05:37
@fgwang7w fgwang7w requested a review from rschlussel December 15, 2020 18:22
@fgwang7w fgwang7w force-pushed the 14897 branch 2 times, most recently from 6ffd10f to 6d3513b Compare December 16, 2020 22:27
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there. just some small comments about the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems in this case the limit is also redundant. We're just not smart enough to remove it. Maybe instead have 2 rows in the values node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I can change it to an upper bound of LIMIT, then it will make sense,but this is a good catch, we should remove limit if there's a enforceSingleRow like count(*) to make the optimizer smarter. I propose we fix it in a separate PR so that TransformCorrelatedSingleRowSubqueryToProject can take it into account

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think limit here was incidental and we still want this test for unsupported subqueries to ensure they throw proper errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I will add a seperate tc with group by a limit 1 to guard this sanity test

In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit,
there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit

Cherry-pick of trinodb/trino#441

Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
Cherry-pick of trinodb/trino#818

Co-author: Martin Traverso <mtraverso@gmail.com>
@fgwang7w fgwang7w requested a review from rschlussel December 28, 2020 08:29
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for sticking with it!

@kaikalur
Copy link
Contributor

kaikalur commented Dec 28, 2020

I'm not sure if we should hav those plan tests. We are planning to do more rewrites/optimizer rules so it's going to be painful to test these in the future. I say the plan tests should not be included. @rongrong WDYT?

@rschlussel
Copy link
Contributor

I'm not sure if we should hav those plan tests. We are planning to do more rewrites/optimizer rules so it's going to be painful to test these in the future. I say the plan tests should not be included. @rongrong WDYT?

Which plan tests do you mean? If you mean the tpch plan tests, those already exist, so if you want to get rid of them, i think that belongs in a separate pr. (I think the purpose of them is just to let you know if you are altering tpch or tpcds plans, so that you can benchmark whether there was a regression).

Copy link
Contributor

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove Limit/TopN/Sort/DistinctLimit when relation is know to single row or less rows than requested

5 participants