Conversation
|
@anu can you post what performance results you had before applying your patch? |
kokosing
left a comment
There was a problem hiding this comment.
I just skimmed this so far. I need a closer look.
It is awesome that you worked on that!
There was a problem hiding this comment.
should not it belong to previous commit?
There was a problem hiding this comment.
Use Optional.map as in the statement above instead of ternary operator
There was a problem hiding this comment.
instead of creating expressions by hand you can use something like:
expression("b1 > p1 OR b2 <= p1")
where expression is:
private Expression expression(String sql) {
return rewriteIdentifiersToSymbolReferences(new SqlParser().createExpression(sql));
}
|
Why is that targeted for |
losipiuk
left a comment
There was a problem hiding this comment.
I can not grasp this PR. Could we meet so you explain to me what is happening here?
There was a problem hiding this comment.
I would explicitly state we are talking about arithmetics here. Maybe call method benchmarkJoinWithArithmeticInPredicate
There was a problem hiding this comment.
It seems you renamed the wrong test method. The one with sin(). But left original name for test method with arithmetic.
There was a problem hiding this comment.
Shouldn't comment here be sth like:
// test with function calls in predicate
There was a problem hiding this comment.
I pushed the fixup commit 518d7ea to PR branch which simplifies the extractor. See what you think.
I found the visitExpression logic with HashSet hard to follow. You can naturally put logic of supporting just AND with matching sort expressions within visitLogicalBinaryExpression.
Take a look at commit and see what you think.
There was a problem hiding this comment.
You can do it in more functional way (IMO more a bit more readable) as:
return expression.getChildren().stream()
.flatMap(child -> process(child).stream())
.collect(toImmutableList());There was a problem hiding this comment.
I do not understand why we keep old lessThanFunction and add a list of filterFunctions.
Would lessThanFunction not be on of the functions in filterFunctions list?
There was a problem hiding this comment.
We use the lessThanFunction (a<b AND b< a+10 in case of range predicates), for the next() method. I can get rid of it and use filterFunctions ({a< b, b< a+10}) in a loop in next() method too (like I do in start()).
There was a problem hiding this comment.
I would rename filterFunctions to inequalityFilterConjuncts or inequalityFilterExpressions and then either remove lessThanFunction.
Or rename it to combinedIneqalityFIlterExpressions and ensure that it is clear when this is constructed that it is just a conjunction of function passed in the other parameter.
|
This is not going to sprint-59. I had the PR branch locally rebased on that. I will change it once the review is done |
There was a problem hiding this comment.
This does not seem good.
It will work correctly for case like this
a < x AND a < x +10
But what if we have other conjuncts in filter function. Which are not in shape of
[sort_symbol] [<=|>=|<|>] [expression using probe side symbols>].
Then those conjuncts will be used as filter functions in SortedPositionLInks.start(). And they will not work fine when passed as lessThanFunction to binary search.
Am I right?
For sure we need tests for that.
There was a problem hiding this comment.
as discussed added test in ATQ.
cbc96f1 to
ce681df
Compare
|
Can you please squash fixup commits? |
ce681df to
da032be
Compare
|
@kokosing done |
losipiuk
left a comment
There was a problem hiding this comment.
Looks good. Some minor comments. Mostly concerning clarifications in tests.
There was a problem hiding this comment.
It seems you renamed the wrong test method. The one with sin(). But left original name for test method with arithmetic.
There was a problem hiding this comment.
Please explain in the comment what are you testing here. This is not supported case for inequality fast joins.
There was a problem hiding this comment.
This case is supported by this optimization since the non-equi join condition occurs in the ON clause. updated the comment.
There was a problem hiding this comment.
Please format the queries so every condition is in separate line. This is hard to read in this shape.
Also add comment what is this test testing.
I understand that it for testing regression in fast inequality join code.
And the code is working for this query because expressions used in equality conditions are rewritten to symbols. So actual expression in join is in shape of build_symbol < probe_expression.
What about seconde query?
There was a problem hiding this comment.
Actually this Javadoc is out of sync with the codebase. As we are not supporting g(build_column, ....) but only build_column as sort expression. Maybe add a commit which puts that javadoc in sync with current implementation?
There was a problem hiding this comment.
We do support g(build_column) as sort expression, provided the expression is pushed to the Scan node (eg: when the expression appears in the ON clause). Updated the doc accordingly.
There was a problem hiding this comment.
Yeah - but in this case it is essentially a symbol when it comes to filter function in join.
I would rather keep in javadoc that g(b_symbol) is not supported. As this javadoc is describing what is supported locally.
It should not need to be updated anytime we add some optimizer in other place which transforms expression which does not have supported shape to one which has.
Then this Javadoc would be unmaintainble (if it is not such already :) )
There was a problem hiding this comment.
Make paremeter ordering consistent with applyLessThanFunction. I.e. make filterFunction first parameter.
There was a problem hiding this comment.
What about using Optional.map()
List<Class<? extends InternalJoinFilterFunction>> internalInequalityJoinFilterConjuncts =
sortChannel.map(channel -> channel.getInequalityJoinFilterConjuncts().stream()
.map(rowExpression -> compileInternalJoinFilterFunction(rowExpression, leftBlocksSize))
.collect(toImmutableList()))
.orElse(ImmutableMap.of());If you do not like this please at least use ImmutableMap.of() instead new ArrayList(). We use ImmutableMap.of() as empty list by convention.
There was a problem hiding this comment.
tried this, but had to change the type of internalInequalityJoinFilterConjuncts to ImmutableList which also needs changing the constructor of IsolatedJoinFilterFunctionFactory. So keeping it as-is.
There was a problem hiding this comment.
JoinFilterCacheKey currently has sortChannel field?
It seems unnecessary to me. Anyway, it should either be removed. Or taken into consideration in equals and hashCode.
It is not related strictly to this PR but maybe you could fix that that along the way as you are working on this class anyway. As a separate commit of course :).
There was a problem hiding this comment.
The sortChannel is still being used to call internalCompileFilterFunctionFactory. So added it to equals and hashcode
There was a problem hiding this comment.
Not up to date. Probe side expression can be aribtrary one. Not just symbol.
There was a problem hiding this comment.
Added a new javadoc commit to reflect what was supported, and later added info about range predicate support.
|
And one more question. How did evaluating conjuncts one by one influence performance? |
|
no regression after evaluating conjuncts one by one . Here is the result: I will address the rest of the comments. |
f7e97a4 to
1c62815
Compare
anusudarsan
left a comment
There was a problem hiding this comment.
addressed comments @losipiuk
There was a problem hiding this comment.
We do support g(build_column) as sort expression, provided the expression is pushed to the Scan node (eg: when the expression appears in the ON clause). Updated the doc accordingly.
There was a problem hiding this comment.
Added a new javadoc commit to reflect what was supported, and later added info about range predicate support.
There was a problem hiding this comment.
The sortChannel is still being used to call internalCompileFilterFunctionFactory. So added it to equals and hashcode
There was a problem hiding this comment.
This case is supported by this optimization since the non-equi join condition occurs in the ON clause. updated the comment.
There was a problem hiding this comment.
tried this, but had to change the type of internalInequalityJoinFilterConjuncts to ImmutableList which also needs changing the constructor of IsolatedJoinFilterFunctionFactory. So keeping it as-is.
1c62815 to
66ae46c
Compare
|
Travis is red. Please change base of the PR to prestodb/master. I think you can do it without creating new one. |
66ae46c to
da718de
Compare
da718de to
1f0d3bf
Compare
|
travis is green. rebased on master and squashed the commits. |
|
Thanks, I will do one last pass tomorrow and merge if everything looks good. |
There was a problem hiding this comment.
please reintroduce the check in new code checking if provided List is not empty.
There was a problem hiding this comment.
Or actually it can be moved to SortedPositionLinks constructor.
There was a problem hiding this comment.
done. closing this and opening a PR upstream
The sorted position links is searched for each of the expression in the range predicate. Thus this optimization works only for predicates with AND (conjuncts). The iteration over the position links is stopped as soon as each of the filter expression is false.
1f0d3bf to
7bf10af
Compare
|
prestodb#8614 created. closing this. |
Follow up PR to extend functionality added in prestodb#7097. The tests result below. @losipiuk Please review.
PR branch
sprint-59 branch
The PR extends the functionality to speed up query with range predicates eg:
benchmarkRangePredicateJoin. But I added benchmark tests for other queries which were already addressed by the optimization. So you can see the comparison below with and without this optimization.