Fix join reordering for non-trivial equi join conditions#20413
Fix join reordering for non-trivial equi join conditions#20413ajaygeorge merged 1 commit intoprestodb:masterfrom
Conversation
0d9dd72 to
46c8624
Compare
d077299 to
98b73d9
Compare
presto-expressions/src/main/java/com/facebook/presto/expressions/LogicalRowExpressions.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java
Outdated
Show resolved
Hide resolved
98b73d9 to
fe72d1f
Compare
rschlussel
left a comment
There was a problem hiding this comment.
I'm not sure i'll get a chance to do a full review, but one request would be to add tests for joins with some complex predicates that can't be optimized (e.g. variables from both sides of the join on both sides of the equality, or variables from only one side of the join is in the predicate at all). Also, nested joins that have multiple such predicates. I see you have some example tests like this in TestJoinEnumerator, but would be good to have some full SQL tests to to ensure correct handling for plan and result correctness.
fe72d1f to
69d60cc
Compare
@rschlussel I've added the tests for non-optimizable predicates and a nested query with a mix of optimizable and non-optimizable predicates now in TestReorderJoins. I do have one full SQL test added as well. While I could clone the plan tests here, I think plan equivalence tests gives us good coverage (A secondary reason is that H2 based result verification for large joins is prohibitively slow) |
presto-expressions/src/main/java/com/facebook/presto/expressions/LogicalRowExpressions.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/iterative/rule/ReorderJoins.java
Outdated
Show resolved
Hide resolved
...to-main/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestJoinEnumerator.java
Outdated
Show resolved
Hide resolved
...to-main/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestJoinEnumerator.java
Outdated
Show resolved
Hide resolved
69d60cc to
84fd0a4
Compare
...to-main/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestJoinEnumerator.java
Outdated
Show resolved
Hide resolved
|
Do we have a session param to control this optimization? I don't see it. I say we have to do that if not already there before we merge this PR. |
@kaikalur I chose not to add a session property since IMO we have a bug here - we get different query plans for a fully connected join graph depending upon how the query is written. This shouldn't happen if join re-ordering is turned on. Can you elaborate on your concerns on why we need a session flag ? |
|
Is the plan incorrect or just (potentially) inefficient? If it's the latter, I say we still need to add the flag |
|
The plan is incorrect since we expect join-reordering to work for fully connected graphs, but it doesn't. With join reordering turned on, we shouldn't get a CrossJoin node added to the plan because the join graph is fully connected. But the join space of the query gets broken up because of the current implementation of the ReorderJoins rule, which IMO is wrong (not just inefficient). Are you concerned that there are user's relying on this broken behavior who would be surprised because of this change ? |
Yes :) someone's bug is someone elses's feature :) For large tables, breaking up plan can help sometimes. |
84fd0a4 to
ee95aec
Compare
|
@kaikalur I've added a session property |
|
Test failure is unrelated. See #20840 for details |
Join predicates like `left.key = right1.key1 + right2.key2` can reduce the join space by appearing as Project nodes or Join noes with no equi-join clauses in the join graph. This commit fixes this behavior by removing any intermediate Projects in the join graph and only creating them on-the-fly while choosing the join order
ee95aec to
df705ef
Compare
|
@kaikalur / @rschlussel Can you please merge this PR ? The last remaining test failure is unrelated. I've created #20850 for this; we would need to figure out a proper fix for it |
ajaygeorge
left a comment
There was a problem hiding this comment.
Stamping since this is already reviewed by multiple folks.
|
@aaneja This PR has a bug, you can try to run query |
|
@feilong-liu Working on a fix |
|
@feilong-liu @ajaygeorge I think we should revert while I make a fix; I created - #20943 |
Description
Enhance join-reordering to work with non-simple equi-join predicates
Motivation and Context
Join predicates like
left.key = right0.key1 + right1.key2can reduce the join space by appearing as Project nodes or Join noes with no equi-join clauses in the join graph. This commit fixes this behavior by removing any intermediate Projects in the join graph and only creating them on-the-fly while choosing the join orderImpact
Queries with completely connected join graphs would have CrossJoin's in them, depending on the order of specification of the FROM clause
This is fixed after this PR, and a cheaper plan with only inner joins is chosen :
Test Plan
New unit tests added
Contributor checklist
Release Notes