-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22211][SQL] Remove incorrect FOJ limit pushdown #19647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## What changes were proposed in this pull request? It's not safe in all cases to push down a LIMIT below a FULL OUTER JOIN. If the limit is pushed to one side of the FOJ, the physical join operator can not tell if a row in the non-limited side would have a match in the other side. *If* the join operator guarantees that unmatched tuples from the limited side are emitted before any unmatched tuples from the other side, pushing down the limit is safe. But this is impractical for some join implementations, e.g. SortMergeJoin. For now, disable limit pushdown through a FULL OUTER JOIN, and we can evaluate whether a more complicated solution is necessary in the future. ## How was this patch tested? Ran org.apache.spark.sql.* tests. Altered full outer join tests in LimitPushdownSuite.
| val newJoin = joinType match { | ||
| case RightOuter => join.copy(right = maybePushLocalLimit(exp, right)) | ||
| case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left)) | ||
| case FullOuter => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it! We should still keep it.
Let me fix it based on my original PR: #10454
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decide to disable it and merge it to master and the previous release
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #83417 has started for PR 19647 at commit |
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #83432 has finished for PR 19647 at commit
|
|
retest this please |
|
ok to test |
|
test this please |
|
Test build #83454 has finished for PR 19647 at commit
|
It's not safe in all cases to push down a LIMIT below a FULL OUTER JOIN. If the limit is pushed to one side of the FOJ, the physical join operator can not tell if a row in the non-limited side would have a match in the other side. *If* the join operator guarantees that unmatched tuples from the limited side are emitted before any unmatched tuples from the other side, pushing down the limit is safe. But this is impractical for some join implementations, e.g. SortMergeJoin. For now, disable limit pushdown through a FULL OUTER JOIN, and we can evaluate whether a more complicated solution is necessary in the future. Ran org.apache.spark.sql.* tests. Altered full outer join tests in LimitPushdownSuite. Author: Henry Robinson <henry@cloudera.com> Closes #19647 from henryr/spark-22211. (cherry picked from commit 6c66266) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
|
Thanks! Merged to master/2.2 |
| comparePlans(optimized, correctAnswer) | ||
| } | ||
|
|
||
| test("full outer join where neither side is limited and both sides have same statistics") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @gatorsmile and @henryr .
This breaks branch-2.2.
[info] - full outer join where neither side is limited and both sides have same statistics *** FAILED *** (43 milliseconds)
[info] - full outer join where neither side is limited and left side has larger statistics *** FAILED *** (12 milliseconds)
[info] - full outer join where neither side is limited and right side has larger statistics *** FAILED *** (9 milliseconds)
cc @felixcheung for 2.2.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you submit a fix to resolve the test failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm taking a look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #19701. I think something got dropped in the backport to branch-2.2, so thankfully it's an easy fix.
It's not safe in all cases to push down a LIMIT below a FULL OUTER JOIN. If the limit is pushed to one side of the FOJ, the physical join operator can not tell if a row in the non-limited side would have a match in the other side. *If* the join operator guarantees that unmatched tuples from the limited side are emitted before any unmatched tuples from the other side, pushing down the limit is safe. But this is impractical for some join implementations, e.g. SortMergeJoin. For now, disable limit pushdown through a FULL OUTER JOIN, and we can evaluate whether a more complicated solution is necessary in the future. Ran org.apache.spark.sql.* tests. Altered full outer join tests in LimitPushdownSuite. Author: Henry Robinson <henry@cloudera.com> Closes apache#19647 from henryr/spark-22211. (cherry picked from commit 6c66266) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
It's not safe in all cases to push down a LIMIT below a FULL OUTER
JOIN. If the limit is pushed to one side of the FOJ, the physical
join operator can not tell if a row in the non-limited side would have a
match in the other side.
If the join operator guarantees that unmatched tuples from the limited
side are emitted before any unmatched tuples from the other side,
pushing down the limit is safe. But this is impractical for some join
implementations, e.g. SortMergeJoin.
For now, disable limit pushdown through a FULL OUTER JOIN, and we can
evaluate whether a more complicated solution is necessary in the future.
How was this patch tested?
Ran org.apache.spark.sql.* tests. Altered full outer join tests in
LimitPushdownSuite.