-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: Fix the semi merge join with duplicate match vectors #13096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D73397297 |
✅ Deploy Preview for meta-velox canceled.
|
kevinwilfong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
…cubator#13096) Summary: When semi-join with multiple matched vectors, it produce redundant matched rows with one per each matched vector but we actually just need to produce exact one matched row. This PR fixes this and verified with unit test Reviewed By: kevinwilfong Differential Revision: D73397297
905283b to
3a02b38
Compare
|
This pull request was exported from Phabricator. Differential Revision: D73397297 |
|
This pull request has been merged in e11e394. |
|
@xiaoxmeng @kevinwilfong This issue has fixed in #11771. And #11771 also fixed the anti and full outer join issues. Can you help to review if you have time? Thanks. |
…atched rows (#13121) Summary: In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied. For example: ``` Left Record Right Record a b c d 2 5 2 4 2 5 ``` With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`. This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept. Pull Request resolved: #13121 Reviewed By: kevinwilfong Differential Revision: D77203870 Pulled By: xiaoxmeng fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
…atched rows (facebookincubator#13121) Summary: In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](facebookincubator#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied. For example: ``` Left Record Right Record a b c d 2 5 2 4 2 5 ``` With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`. This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept. Pull Request resolved: facebookincubator#13121 Reviewed By: kevinwilfong Differential Revision: D77203870 Pulled By: xiaoxmeng fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
Summary:
When semi-join with multiple matched vectors, it produce redundant matched rows with one per each matched vector but we actually just
need to produce exact one matched row. This PR fixes this and verified with unit test
Differential Revision: D73397297