Add support for FETCH FIRST WITH TIES clause#832
Add support for FETCH FIRST WITH TIES clause#832kasiafi wants to merge 5 commits intotrinodb:masterfrom
Conversation
ba9e3c4 to
128425e
Compare
128425e to
7a1d23d
Compare
There was a problem hiding this comment.
Seeing this "non-trivial" rewrite that looks almost the same code as the other ones got me thinking about whether we're representing "with ties" properly in the plan. Let me describe my thought process and then suggest how we might want to do it:
- The rewrites look similar and are not trivial so we should somehow share the implementation
- At a fundamental level, what we're doing is look for a "limit with ties" over something that is "ordered"
- One option would be to somehow tag the Sort / TopN node with a marker interface and a single method to get the ordering they compute. However, this wouldn't work for the case with project in between -- we'd still need separate rules.
- Could we model this with traits (not supported yet, but...)? Unfortunately, this is not sufficient. We could attempt to match a "limit with ties" over "something that guarantees some kind of ordering". Traits might describe a tighter guarantee than what was originally in the ORDER BY clause. Imagine, for instance, a case where the data source is sorted by more columns than show up in the ORDER BY clause.
Which leads me to the following conclusion: "limit with ties" is not just a simple limit and a flag. The set of columns used to define the ties is meaningful. Conceptually one could think of FETCH FIRST vs ORDER BY ... FETCH FIRST WITH TIES as two completely different operations that happen to share some of the same syntax in the language.
And the suggestion on how we might approach it: add the ordering scheme to "limit with ties" (or create a new node for it). These rules would just find a "limit with ties" and rewrite into the new form without worrying about whether there's a sort or topN or anything else under them.
There was a problem hiding this comment.
Which leads me to the following conclusion: "limit with ties" is not just a simple limit and a flag. The set of columns used to define the ties is meaningful.
Of course, FETCH FIRST ... WITH TIES must be aware of the exact ordering scheme from the corresponding ORDER BY clause. Adding the ordering scheme to the LimitNode would simplify things a lot.
However, that would be quite a different approach from how ORDER BY ... LIMIT is handled.
As I recall, LimitNode and SortNode are planned distinctly, possibly with an OffsetNode between them, and there are rules responsible for merging them into TopNNode.
I thought it was right to let QueryPlanner simply rewrite a query into a plan and let the rules take care of the implementation.
There was a problem hiding this comment.
Maybe the analogy between 'with ties' and TopN isn't so close. Pairing a LimitNode with ordering scheme to create a TopNNode is a matter of optimisation, while in the case of 'with ties' it is a necessity.
There was a problem hiding this comment.
Another thought.
If we are going to add the ordering scheme to LimitNode with ties, then why don't we add it to LimitNode without ties, too? I'm aware this idea doesn't comply with the principle of QueryPlanner being merely a translator AST -> plan. However, there are benefits:
- simplicity at planning stage (no check for with ties)
- easy transformation of LimitNode without ties into TopNNode (currently there are multiple rules to check if the source is sorted / project + sorted). Same as for LimitNode with ties.
- maybe we could even quit planning Sort at all when Limit is present and there are no other sort-dependent clauses (Offset). LimitNode without ties would be transformed into TopNNode, and LimitNode with ties would be transformed into WindowNode with OrderingScheme, which subsequently would emit a distributed Sort for efficiency (or so I heard ;) )
Analyzer throws NOT_SUPPORTED exception.
7a1d23d to
29e7dc0
Compare
|
@martint could you please review? |
There was a problem hiding this comment.
This could prune any columns that are not needed for breaking ties
There was a problem hiding this comment.
Because I mistakenly assumed that Limit with ties would never match anyway.
I'll add it to the rule.
BTW if I was right, then what was the proper thing to do? Exclude it in the pattern? Or make the rule so that it takes account of all possibilities?
There was a problem hiding this comment.
Excluding in the pattern is fine, assuming that's what we need to do.
There was a problem hiding this comment.
Why? Everything this method does should apply to limit with ties if it were ever encountered, no?
There was a problem hiding this comment.
Yes, but I wanted to be sure that LimitNode with ties never reaches this point. The same in the other places you pointed out. Should I support LimitNode with ties there? Is there a way to say "no LimitNodes with ties behind this point"?
There was a problem hiding this comment.
I guess it's fine to not support that here for now. I'm more concerned about other optimizers like UnaliasSymbolReferences, etc, which tend to scattered all over the place.
There was a problem hiding this comment.
Could you please specify, in which optimizers I need to support Limit with ties? And whether other optimizers should throw exception.
There was a problem hiding this comment.
UnaliasSymbolReferences and PruneUnreferencedOutputs and LimitPushdown (for now, since we're in the process of replacing it with Rules).
We should look into whether we need to do anything for WindowFilterPushdown.
There was a problem hiding this comment.
These can all go on the same line
There was a problem hiding this comment.
This is a derived property, so it shouldn't be part of the json serialization.
There was a problem hiding this comment.
This check is already removed in the following commit where final implementation of LimitNode with ties is added.
There was a problem hiding this comment.
Got it. I missed that when reviewing commits individually.
There was a problem hiding this comment.
Don't abbreviate variable names. It makes the code harder to read
There was a problem hiding this comment.
This renders as: Limit[10 withTies: ]. Did you mean to put the colon before withTies? Maybe do Limit[10+ties] instead.
This change adds withTies::boolean to LimitNode.
Existing optimizer rules are changed and commented accordingly
to LimitNode with ties expected behavior.
Implementation of LimitNode with ties is not yet added,
but IllegalStateException("Unexpected node") is thrown
when LimitNode with ties is encountered past the point
of its replacement.
This change adds an Optimizer rule to replace LimitNode having 'with ties' property. Also, Plan tests are added, and AbstractTestQueries to prove correct semantics and demonstrate how WITH TIES clause cooperates with ORDER BY and OFFSET clauses.
29e7dc0 to
bf2affc
Compare
|
Applied comments. |
|
Good job! I made a couple of minor adjustments to a commit message and comment and merged it. Thanks! |
Relates to #1