Add optimization to rewrite left join with array contains to equi-join#21420
Add optimization to rewrite left join with array contains to equi-join#21420feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
ee36d43 to
4c0b21a
Compare
vivek-bharathan
left a comment
There was a problem hiding this comment.
This looks good to me overall. Just one open question - do you think there is any benefit in making this optimization a cost-based decision? Maybe based on the cardinality of the array? Maybe based on HBO?
Some of the problems to accomplish this are -
@feilong-liu Without optimizationA Broadcast/Replicated Join is chosen With optimizationA Partitioned Join is chosen, even though the build side would be small enough to replicate IMO, we should keep the default for this feature flag to |
There was a problem hiding this comment.
does the join criteria need to be empty? if we have a query like this
r left join s on (a=b and contains(...))
wouldn't we benefit from applying the same transformation?
There was a problem hiding this comment.
For this case, the two tables will be partitioned on a and b and is already an equi-join. The benefit of apply this transformation will be unknown.
There was a problem hiding this comment.
perhaps you can use Stream.findFirst to do this, something like this:
Optional<RowExpression> arrayContains = andConjuncts.stream().findFirst(...)
if (!arrayContains.isPresent()) ...
There was a problem hiding this comment.
I think using findFirst will lead to code like: andConjuncts.stream().filter(...).findFirst()?
The logic in the for loop is not one or two simple functions, I think it may be better to use a for loop here.
There was a problem hiding this comment.
you're right, it requires a function with a lambda expression to be applied first but I think the function is the same few lines you have below (and can even be refactored into a function):
Optional<RowExpression> arrayContains = andConjuncts.stream().filter(conjunct->isSupportedArrayContains(conjunct)).findFirst();
if (!arrayContains.isPresent()) {
remainingConjuncts = andConjuncts.stream().filter(x -> !x.equals(arrayContains.get()).collect(toImmutableList());
}
There was a problem hiding this comment.
these are set at the same time so we probably don't need to check the disjunction.
Also, this may not be necessary if you rewrite it with the findFirst suggestion above
e61e34e to
e938638
Compare
Agree that the UNNEST cardinality estimation is a problem which we need to solve. Hopefully HBO will be able to resolve most of it. I've made it default to false per suggestion. |
There was a problem hiding this comment.
this looks like we're doing a n^2 computation: scanning andConjuncts again in a for-loop on top of andConjuncts.
I know that the break below makes it linear and non-quadratic but I think it is confusing and less readable that way.
e938638 to
7708108
Compare
7708108 to
ab79ebd
Compare
ab79ebd to
d588f23
Compare
Yeah, ideally a cost based decision is better. But most cases this optimization is beneficial, as it avoids doing a cartesian product for the join. |
d588f23 to
cbd4fd9
Compare
Description
Left join with array contains condition and no equi join condition result in a join with a filter and no equi join condition, and can only be executed with broadcast and not able to be partitioned. In this PR, I add an optimization to change the plan into a equi join.
Motivation and Context
To improve performance for left join with array contains condition.
Impact
Improve performance for left join with array contains condition.
Test Plan
Unit test and verifier suite
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.