Skip to content

Conversation

@JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented Apr 23, 2025

In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. PR#13096 addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied.
For example:

Left Record	    Right Record
a      b	       c      d
2      5	       2      4
                       2      5

With the join condition a == c and the filter b > d, selecting only the last matched row results in right side (2, 5) which does not satisfy the filter, leading to an empty result. The correct result should be (2, 5, 2, 4).

This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2025
@JkSelf JkSelf requested review from kevinwilfong and xiaoxmeng and removed request for xiaoxmeng April 23, 2025 08:45
@netlify
Copy link

netlify bot commented Apr 23, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 04240a2
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/685a3373208bed0008f8ccfa

@JkSelf JkSelf force-pushed the semi-bug-fix branch 2 times, most recently from 4018c2d to 9370b0a Compare April 23, 2025 09:17
@JkSelf JkSelf requested a review from xiaoxmeng April 23, 2025 09:17
@JkSelf JkSelf changed the title fix: Fix semi join result mismatch with filter and duplicated matched rows fix: Fix semi join result mismatch with filter and multi duplicated matched rows Apr 23, 2025
@JkSelf
Copy link
Collaborator Author

JkSelf commented Jun 11, 2025

@rui-mo @jinchengchenghh @zhli1142015 Can you help to review this PR? Thanks.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Jun 11, 2025

@pedroerp @xiaoxmeng Can you help to review this PR? Thanks.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf thanks for the fix. LGTM % one minor.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Jun 23, 2025

@xiaoxmeng Thanks for your review. Have resolved all your comments. Can you help to review again? Thanks.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Jun 24, 2025

@xiaoxmeng Thank you for helping review and land this PR!

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in 27299ef.

@FelixYBW
Copy link

@JkSelf with this PR merged, do we still need #11771

Can we remove oap/velox fix? a575535

@JkSelf
Copy link
Collaborator Author

JkSelf commented Jun 25, 2025

@JkSelf with this PR merged, do we still need #11771

Can we remove oap/velox fix? a575535

@FelixYBW Yes. We still need #11771 to fix the anti and full outer join result mismatch issue. And I have rebased #11771 and resolved the conflicts.

nimesh1601 pushed a commit to nimesh1601/velox that referenced this pull request Jul 7, 2025
…atched rows (facebookincubator#13121)

Summary:
In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](facebookincubator#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied.
For example:
```
Left Record	    Right Record
a      b	       c      d
2      5	       2      4
                       2      5
```
With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`.

This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept.

Pull Request resolved: facebookincubator#13121

Reviewed By: kevinwilfong

Differential Revision: D77203870

Pulled By: xiaoxmeng

fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants