Extend fast inequality join#8614
Conversation
758a8ee to
c292c1e
Compare
|
@kokosing, @sopel39 , @anusudarsan I cleaned this up as the whole feature from the start felt to messy to me. Please give it (hopefully) last read. |
c292c1e to
0750369
Compare
0750369 to
20cc040
Compare
anusudarsan
left a comment
There was a problem hiding this comment.
@losipiuk LGTM mod comment. Also, I fixed a checkstyle error.
There was a problem hiding this comment.
Should we rename applyLessThanFunction -> applySearchFunction as a part of the refactoring commit 9e5b4b38040a59120bd3de6ac67c832403efb97e ?
8c10e51 to
4e689f5
Compare
|
@sopel39 last pass? |
findepi
left a comment
There was a problem hiding this comment.
One important comment about correctness, a bit of refactor and wording. Feel free to ignore anything you find a nit-picking.
There was a problem hiding this comment.
- while cleanup up this javadoc, first line is:
This class assumes that lessThanFunction is a superset of the whole filtering
This is remotely related to truth. Before other changes in this PR, no ANDs nor ORs are supported.
After other changes, only specific ANDs and ORs are supported (those that compare to same build side symbol).
- Further, in the following
by passing any of the filterFunction_i to the SortedPositionLinks. We could not
s/We could not/We cannot/
There was a problem hiding this comment.
I simplified the Javadoc very much.
There was a problem hiding this comment.
s/buildSymbolRef/{@code buildSymbolRef}
There was a problem hiding this comment.
s/{@code f(probePosition)}/{@code f(probeColumn1, probeColumn2, ..., probeColumnN)}- note about binary search is a bit misleading, we do binary search only for
>, >=cases
There was a problem hiding this comment.
I will think about the binary search case
There was a problem hiding this comment.
s/a/{@code a}, same for x,y,z
There was a problem hiding this comment.
In SortedPositionLinks you said about >, <, <=, >=. Here, < is only for terseness or this class indeed supports only <?
There was a problem hiding this comment.
pls do not overwrite input param startingPosition
There was a problem hiding this comment.
Return left (ie any) and put TODO to make it cost-based decision.
If there are two < conditions on two different build symbols, better to have sorted position links on one of them than on none.
There was a problem hiding this comment.
Why not a list? Expression de-duplication is something optimizer should be doing, a list should suffice.
There was a problem hiding this comment.
replace \n with in the queries
There was a problem hiding this comment.
is there a point in renaming & extracting the class you remove 2 commits later?
There was a problem hiding this comment.
Discussed. Let's leave it.
|
@anusudarsan I am addressing the comments |
4e689f5 to
4229b05
Compare
There was a problem hiding this comment.
Interesting what does it do in plan printer. Unconditionally to the flag enabling inequi join. (no change requested)
There was a problem hiding this comment.
Consider converting it to array here.
There was a problem hiding this comment.
Isn't links==null the case of no collisions?
In original code, we had the following here:
if (applyLessThanFunction(startingPosition, probePosition, allProbeChannelsPage)) {
return startingPosition;
}
and this seemingly could return startingPosition even if sortedPositionLinks[startingPosition]==null
There was a problem hiding this comment.
You can safely inline left & right
There was a problem hiding this comment.
Now i dont understand what this if does here. Looks like it's redundant and lowerBound could cover it. (no change required, but you can think about it..)
There was a problem hiding this comment.
retain .stream() on prev line
There was a problem hiding this comment.
Add a test in TestPositionLinks too
There was a problem hiding this comment.
Shan't this new cool method be used in next() too?
4d9bcbf to
d32d2ed
Compare
|
Squashed fixups. |
1563e88 to
7883b37
Compare
|
@findepi. Would you like to take a look in some spare time? ;) |
7883b37 to
1f0e036
Compare
There was a problem hiding this comment.
This looks suspicious: call to Optional.get without .ifPresent. You can avoid having those misleading Optional-s, if you replace .collect(groupingBy(...)) with .collect(toMap(SortExpressionContext::getSortExpression, c->c, SortExpressionExtractor::merge).
There was a problem hiding this comment.
cmt msg "Consider different sort expressions in SortExpressionExtractor" -- it is not quite clear. Maybe
"Extract sort expressions from complex join filters" ?
There was a problem hiding this comment.
Consider using .reversed() instead of -
There was a problem hiding this comment.
I tried that before. As well as Comparator.reverseOrder. But then I have to add explicit cast of context to SortExpressionContext.
Like this:
.sorted(comparing(context -> ((SortExpressionContext) context).getSearchExpressions().size()).reversed())
I prefer - in that case unless you can help me get rid of cast.
There was a problem hiding this comment.
Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->.
You can also -1 * instead of - to make it more visible. Or leave as is, not a big deal.
|
Addressed comments. One question. |
There was a problem hiding this comment.
Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->.
You can also -1 * instead of - to make it more visible. Or leave as is, not a big deal.
f4deff6 to
f4e0b62
Compare
Use Row prefix to point fact that class operates in channels domain.
Add SortExpressionContext which captures logical sort expression.
Pass explicit searchExpression when doing inequality filtering for join. Previously whole filterFunction was assummed to be the search expression. With explici searchExpression we can capture more cases when we want to use subset of filter function conjunts (possibly transformed) as search function.
f4e0b62 to
318332a
Compare
The sorted position links is searched for each of the expression in the range predicate. Thus this optimization works only for predicates with AND (conjuncts). The iteration over the position links is stopped as soon as any of the filter expression evaluates to false.
TestPositionLinks used `rightPage` to simulate access to build-side data. Changing code to use TEST_PAGE which more accuratelly simulates implementation of behaviour of standard implementation of JoinFilterFunction which have build-side data embeded.
This extends functionality added in #7097, for #6922.
Internal review - Teradata#630
The PR extends the functionality to speed up query with range predicates eg: benchmarkRangePredicateJoin . But I added benchmark tests for other queries which were already addressed by the optimization. So you can see the comparison below with and without this optimization.
Complete Benchmarking results
@losipiuk