Skip to content

Extend fast inequality join#8614

Merged
losipiuk merged 20 commits intoprestodb:masterfrom
Teradata:extend-fast-inequality-join
Sep 11, 2017
Merged

Extend fast inequality join#8614
losipiuk merged 20 commits intoprestodb:masterfrom
Teradata:extend-fast-inequality-join

Conversation

@anusudarsan
Copy link
Copy Markdown
Contributor

@anusudarsan anusudarsan commented Jul 27, 2017

This extends functionality added in #7097, for #6922.

Internal review - Teradata#630

The PR extends the functionality to speed up query with range predicates eg: benchmarkRangePredicateJoin . But I added benchmark tests for other queries which were already addressed by the optimization. So you can see the comparison below with and without this optimization.

								(buckets)  (fastInequalityJoins)     (master)	            (PR branch)
BenchmarkInequalityJoin.benchmarkJoin 				100		true		222.267 ±  36.490  ms/op   234.191 ±  33.999  ms/op
BenchmarkInequalityJoin.benchmarkJoin 				100		false	       2409.789 ± 193.371  ms/op  2360.016 ± 189.837  ms/op
BenchmarkInequalityJoin. benchmarkJoinWithArithmeticInPredicate	100		true		279.125 ±  23.798  ms/op   280.396 ±  17.991  ms/op
BenchmarkInequalityJoin. benchmarkJoinWithArithmeticInPredicate	100		false	       2375.963 ±  78.662  ms/op  2376.180 ± 125.109  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate	100		true		193.858 ±  12.845  ms/op   216.786 ±  12.600  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate	100		false	       2288.445 ±  55.931  ms/op  2408.483 ±  97.140  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin		100		true	      2435.688 ± 143.372  ms/op	   247.428 ±  11.549  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin		100		false	      2433.708 ±  64.086  ms/op   2487.442 ±  60.085  ms/op

Complete Benchmarking results

Benchmark                                                     (buckets)  (fastInequalityJoins)  (filterOutCoefficient)  Mode  Cnt     Score     Error  Units
BenchmarkInequalityJoin.benchmarkJoin                               100                   true                      10  avgt   30   234.191 ±  33.999  ms/op
BenchmarkInequalityJoin.benchmarkJoin                               100                  false                      10  avgt   30  2360.016 ± 189.837  ms/op
BenchmarkInequalityJoin.benchmarkJoin                              1000                   true                      10  avgt   30   187.426 ±  24.792  ms/op
BenchmarkInequalityJoin.benchmarkJoin                              1000                  false                      10  avgt   30   414.487 ±  27.297  ms/op
BenchmarkInequalityJoin.benchmarkJoin                             10000                   true                      10  avgt   30   198.977 ±  35.756  ms/op
BenchmarkInequalityJoin.benchmarkJoin                             10000                  false                      10  avgt   30   239.980 ±  18.026  ms/op
BenchmarkInequalityJoin.benchmarkJoin                             60000                   true                      10  avgt   30   173.009 ±   6.956  ms/op
BenchmarkInequalityJoin.benchmarkJoin                             60000                  false                      10  avgt   30   181.165 ±   8.649  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate        100                   true                      10  avgt   30   280.396 ±  17.991  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate        100                  false                      10  avgt   30  2376.180 ± 125.109  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate       1000                   true                      10  avgt   30   210.539 ±  11.744  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate       1000                  false                      10  avgt   30   471.721 ±  45.251  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate      10000                   true                      10  avgt   30   203.669 ±   9.373  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate      10000                  false                      10  avgt   30   259.281 ±  13.036  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate      60000                   true                      10  avgt   30   203.048 ±  10.953  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithArithmeticInPredicate      60000                  false                      10  avgt   30   199.349 ±   9.362  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate          100                   true                      10  avgt   30   216.786 ±  12.600  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate          100                  false                      10  avgt   30  2408.483 ±  97.140  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate         1000                   true                      10  avgt   30   195.763 ±  13.215  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate         1000                  false                      10  avgt   30   580.025 ± 120.265  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate        10000                   true                      10  avgt   30   226.885 ±  24.685  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate        10000                  false                      10  avgt   30   274.404 ±  18.313  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate        60000                   true                      10  avgt   30   197.643 ±  11.469  ms/op
BenchmarkInequalityJoin.benchmarkJoinWithFunctionPredicate        60000                  false                      10  avgt   30   210.390 ±  15.268  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin                 100                   true                      10  avgt   30   247.428 ±  11.549  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin                 100                  false                      10  avgt   30  2487.442 ±  60.085  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin                1000                   true                      10  avgt   30   240.810 ±  14.584  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin                1000                  false                      10  avgt   30   527.124 ±  47.483  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin               10000                   true                      10  avgt   30   226.683 ±  11.559  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin               10000                  false                      10  avgt   30   270.130 ±  13.167  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin               60000                   true                      10  avgt   30   226.237 ±   8.305  ms/op
BenchmarkInequalityJoin.benchmarkRangePredicateJoin               60000                  false                      10  avgt   30   218.149 ±   9.184  ms/op

@losipiuk

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch 2 times, most recently from 758a8ee to c292c1e Compare August 10, 2017 14:23
@losipiuk losipiuk requested review from kokosing and sopel39 August 10, 2017 14:24
@losipiuk
Copy link
Copy Markdown
Contributor

@kokosing, @sopel39 , @anusudarsan I cleaned this up as the whole feature from the start felt to messy to me.
The refactoring commits should probably by squashed together (please decide). I leave those separated for sake of review.

Please give it (hopefully) last read.

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from c292c1e to 0750369 Compare August 10, 2017 14:28
@anusudarsan anusudarsan force-pushed the extend-fast-inequality-join branch from 0750369 to 20cc040 Compare August 17, 2017 07:50
Copy link
Copy Markdown
Contributor Author

@anusudarsan anusudarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk LGTM mod comment. Also, I fixed a checkstyle error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename applyLessThanFunction -> applySearchFunction as a part of the refactoring commit 9e5b4b38040a59120bd3de6ac67c832403efb97e ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added renaming commit.

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch 3 times, most recently from 8c10e51 to 4e689f5 Compare August 18, 2017 12:46
@losipiuk
Copy link
Copy Markdown
Contributor

@sopel39 last pass?

Copy link
Copy Markdown
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important comment about correctness, a bit of refactor and wording. Feel free to ignore anything you find a nit-picking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • while cleanup up this javadoc, first line is:

This class assumes that lessThanFunction is a superset of the whole filtering

This is remotely related to truth. Before other changes in this PR, no ANDs nor ORs are supported.
After other changes, only specific ANDs and ORs are supported (those that compare to same build side symbol).

  • Further, in the following

by passing any of the filterFunction_i to the SortedPositionLinks. We could not

s/We could not/We cannot/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the Javadoc very much.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/buildSymbolRef/{@code buildSymbolRef}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N/A

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • s/{@code f(probePosition)}/{@code f(probeColumn1, probeColumn2, ..., probeColumnN)}
  • note about binary search is a bit misleading, we do binary search only for >, >= cases

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will think about the binary search case

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/a/{@code a}, same for x,y,z

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SortedPositionLinks you said about >, <, <=, >=. Here, < is only for terseness or this class indeed supports only <?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N/A

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do not overwrite input param startingPosition

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return left (ie any) and put TODO to make it cost-based decision.
If there are two < conditions on two different build symbols, better to have sorted position links on one of them than on none.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a list? Expression de-duplication is something optimizer should be doing, a list should suffice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no longer static?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace \n with in the queries

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a point in renaming & extracting the class you remove 2 commits later?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. Let's leave it.

@losipiuk
Copy link
Copy Markdown
Contributor

@anusudarsan I am addressing the comments

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from 4e689f5 to 4229b05 Compare August 24, 2017 13:55
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting what does it do in plan printer. Unconditionally to the flag enabling inequi join. (no change requested)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider converting it to array here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't links==null the case of no collisions?

In original code, we had the following here:

if (applyLessThanFunction(startingPosition, probePosition, allProbeChannelsPage)) {
            return startingPosition;
}

and this seemingly could return startingPosition even if sortedPositionLinks[startingPosition]==null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can safely inline left & right

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i dont understand what this if does here. Looks like it's redundant and lowerBound could cover it. (no change required, but you can think about it..)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retain .stream() on prev line

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unel.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply return false

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test in TestPositionLinks too

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shan't this new cool method be used in next() too?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg typo "embended"

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from 4d9bcbf to d32d2ed Compare August 29, 2017 11:41
@losipiuk
Copy link
Copy Markdown
Contributor

Squashed fixups.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can not" is not valid, see #8770

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch 3 times, most recently from 1563e88 to 7883b37 Compare September 8, 2017 11:57
@losipiuk
Copy link
Copy Markdown
Contributor

losipiuk commented Sep 8, 2017

@findepi. Would you like to take a look in some spare time? ;)

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from 7883b37 to 1f0e036 Compare September 8, 2017 14:18
Copy link
Copy Markdown
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious: call to Optional.get without .ifPresent. You can avoid having those misleading Optional-s, if you replace .collect(groupingBy(...)) with .collect(toMap(SortExpressionContext::getSortExpression, c->c, SortExpressionExtractor::merge).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmt msg "Consider different sort expressions in SortExpressionExtractor" -- it is not quite clear. Maybe

"Extract sort expressions from complex join filters" ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/search expression/&s

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm LF

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using .reversed() instead of -

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that before. As well as Comparator.reverseOrder. But then I have to add explicit cast of context to SortExpressionContext.
Like this:

                .sorted(comparing(context -> ((SortExpressionContext) context).getSearchExpressions().size()).reversed())

I prefer - in that case unless you can help me get rid of cast.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->.

You can also -1 * instead of - to make it more visible. Or leave as is, not a big deal.

@losipiuk
Copy link
Copy Markdown
Contributor

Addressed comments. One question.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas. However, rather than cast, you should rather s/context ->/(SortExpressionContext context) ->.

You can also -1 * instead of - to make it more visible. Or leave as is, not a big deal.

@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from f4deff6 to f4e0b62 Compare September 11, 2017 08:09
losipiuk and others added 10 commits September 11, 2017 10:13
Use Row prefix to point fact that class operates in
channels domain.
Add SortExpressionContext which captures logical sort expression.
Pass explicit searchExpression when doing inequality filtering for join.
Previously whole filterFunction was assummed to be the search expression.
With explici searchExpression we can capture more cases when we want
to use subset of filter function conjunts (possibly transformed)
as search function.
@losipiuk losipiuk force-pushed the extend-fast-inequality-join branch from f4e0b62 to 318332a Compare September 11, 2017 08:13
losipiuk and others added 10 commits September 11, 2017 10:13
The sorted position links is searched for each of the expression in the
range predicate. Thus this optimization works only for predicates with AND (conjuncts).
The iteration over the position links is stopped as soon as any of the
filter expression evaluates to false.
TestPositionLinks used `rightPage` to simulate access to build-side
data. Changing code to use TEST_PAGE which more accuratelly simulates
implementation of behaviour of standard implementation of
JoinFilterFunction which have build-side data embeded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants