-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: Fix semi join result mismatch with filter and multi duplicated matched rows #13121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
4018c2d to
9370b0a
Compare
|
@rui-mo @jinchengchenghh @zhli1142015 Can you help to review this PR? Thanks. |
|
@pedroerp @xiaoxmeng Can you help to review this PR? Thanks. |
xiaoxmeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf thanks for the fix. LGTM % one minor.
|
@xiaoxmeng Thanks for your review. Have resolved all your comments. Can you help to review again? Thanks. |
|
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@xiaoxmeng Thank you for helping review and land this PR! |
|
@xiaoxmeng merged this pull request in 27299ef. |
…atched rows (facebookincubator#13121) Summary: In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](facebookincubator#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied. For example: ``` Left Record Right Record a b c d 2 5 2 4 2 5 ``` With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`. This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept. Pull Request resolved: facebookincubator#13121 Reviewed By: kevinwilfong Differential Revision: D77203870 Pulled By: xiaoxmeng fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. PR#13096 addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied.
For example:
With the join condition
a == cand the filterb > d, selecting only the last matched row results in right side(2, 5)which does not satisfy the filter, leading to an empty result. The correct result should be(2, 5, 2, 4).This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept.