Skip to content

Add support for FETCH FIRST WITH TIES clause#832

Closed
kasiafi wants to merge 5 commits intotrinodb:masterfrom
kasiafi:FetchFirstWithTies
Closed

Add support for FETCH FIRST WITH TIES clause#832
kasiafi wants to merge 5 commits intotrinodb:masterfrom
kasiafi:FetchFirstWithTies

Conversation

@kasiafi
Copy link
Member

@kasiafi kasiafi commented May 28, 2019

Relates to #1

@cla-bot cla-bot bot added the cla-signed label May 28, 2019
@kasiafi kasiafi force-pushed the FetchFirstWithTies branch from ba9e3c4 to 128425e Compare May 28, 2019 13:25
@martint martint self-requested a review May 30, 2019 02:43
@kasiafi kasiafi force-pushed the FetchFirstWithTies branch from 128425e to 7a1d23d Compare May 30, 2019 13:18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this "non-trivial" rewrite that looks almost the same code as the other ones got me thinking about whether we're representing "with ties" properly in the plan. Let me describe my thought process and then suggest how we might want to do it:

  1. The rewrites look similar and are not trivial so we should somehow share the implementation
  2. At a fundamental level, what we're doing is look for a "limit with ties" over something that is "ordered"
  3. One option would be to somehow tag the Sort / TopN node with a marker interface and a single method to get the ordering they compute. However, this wouldn't work for the case with project in between -- we'd still need separate rules.
  4. Could we model this with traits (not supported yet, but...)? Unfortunately, this is not sufficient. We could attempt to match a "limit with ties" over "something that guarantees some kind of ordering". Traits might describe a tighter guarantee than what was originally in the ORDER BY clause. Imagine, for instance, a case where the data source is sorted by more columns than show up in the ORDER BY clause.

Which leads me to the following conclusion: "limit with ties" is not just a simple limit and a flag. The set of columns used to define the ties is meaningful. Conceptually one could think of FETCH FIRST vs ORDER BY ... FETCH FIRST WITH TIES as two completely different operations that happen to share some of the same syntax in the language.

And the suggestion on how we might approach it: add the ordering scheme to "limit with ties" (or create a new node for it). These rules would just find a "limit with ties" and rewrite into the new form without worrying about whether there's a sort or topN or anything else under them.

Copy link
Member Author

@kasiafi kasiafi May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which leads me to the following conclusion: "limit with ties" is not just a simple limit and a flag. The set of columns used to define the ties is meaningful.

Of course, FETCH FIRST ... WITH TIES must be aware of the exact ordering scheme from the corresponding ORDER BY clause. Adding the ordering scheme to the LimitNode would simplify things a lot.

However, that would be quite a different approach from how ORDER BY ... LIMIT is handled.
As I recall, LimitNode and SortNode are planned distinctly, possibly with an OffsetNode between them, and there are rules responsible for merging them into TopNNode.
I thought it was right to let QueryPlanner simply rewrite a query into a plan and let the rules take care of the implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the analogy between 'with ties' and TopN isn't so close. Pairing a LimitNode with ordering scheme to create a TopNNode is a matter of optimisation, while in the case of 'with ties' it is a necessity.

Copy link
Member Author

@kasiafi kasiafi May 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought.
If we are going to add the ordering scheme to LimitNode with ties, then why don't we add it to LimitNode without ties, too? I'm aware this idea doesn't comply with the principle of QueryPlanner being merely a translator AST -> plan. However, there are benefits:

  • simplicity at planning stage (no check for with ties)
  • easy transformation of LimitNode without ties into TopNNode (currently there are multiple rules to check if the source is sorted / project + sorted). Same as for LimitNode with ties.
  • maybe we could even quit planning Sort at all when Limit is present and there are no other sort-dependent clauses (Offset). LimitNode without ties would be transformed into TopNNode, and LimitNode with ties would be transformed into WindowNode with OrderingScheme, which subsequently would emit a distributed Sort for efficiency (or so I heard ;) )

@kasiafi kasiafi force-pushed the FetchFirstWithTies branch from 7a1d23d to 29e7dc0 Compare June 8, 2019 19:34
@kasiafi
Copy link
Member Author

kasiafi commented Jun 8, 2019

@martint could you please review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could prune any columns that are not needed for breaking ties

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I mistakenly assumed that Limit with ties would never match anyway.
I'll add it to the rule.
BTW if I was right, then what was the proper thing to do? Exclude it in the pattern? Or make the rule so that it takes account of all possibilities?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding in the pattern is fine, assuming that's what we need to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Everything this method does should apply to limit with ties if it were ever encountered, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I wanted to be sure that LimitNode with ties never reaches this point. The same in the other places you pointed out. Should I support LimitNode with ties there? Is there a way to say "no LimitNodes with ties behind this point"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's fine to not support that here for now. I'm more concerned about other optimizers like UnaliasSymbolReferences, etc, which tend to scattered all over the place.

Copy link
Member Author

@kasiafi kasiafi Jun 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please specify, in which optimizers I need to support Limit with ties? And whether other optimizers should throw exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnaliasSymbolReferences and PruneUnreferencedOutputs and LimitPushdown (for now, since we're in the process of replacing it with Rules).

We should look into whether we need to do anything for WindowFilterPushdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can all go on the same line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a derived property, so it shouldn't be part of the json serialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is already removed in the following commit where final implementation of LimitNode with ties is added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I missed that when reviewing commits individually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't abbreviate variable names. It makes the code harder to read

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renders as: Limit[10 withTies: ]. Did you mean to put the colon before withTies? Maybe do Limit[10+ties] instead.

kasiafi added 2 commits June 10, 2019 22:44
This change adds withTies::boolean to LimitNode.
Existing optimizer rules are changed and commented accordingly
to LimitNode with ties expected behavior.
Implementation of LimitNode with ties is not yet added,
but IllegalStateException("Unexpected node") is thrown
when LimitNode with ties is encountered past the point
of its replacement.
This change adds an Optimizer rule to replace LimitNode
having 'with ties' property.
Also, Plan tests are added, and AbstractTestQueries to prove
correct semantics and demonstrate how WITH TIES clause
cooperates with ORDER BY and OFFSET clauses.
@kasiafi kasiafi force-pushed the FetchFirstWithTies branch from 29e7dc0 to bf2affc Compare June 10, 2019 20:45
@kasiafi
Copy link
Member Author

kasiafi commented Jun 10, 2019

Applied comments.

@martint
Copy link
Member

martint commented Jun 11, 2019

Good job! I made a couple of minor adjustments to a commit message and comment and merged it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants