Skip to content

Conversation

@xiaoxmeng
Copy link
Contributor

Summary:
When semi-join with multiple matched vectors, it produce redundant matched rows with one per each matched vector but we actually just
need to produce exact one matched row. This PR fixes this and verified with unit test

Differential Revision: D73397297

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73397297

@netlify
Copy link

netlify bot commented Apr 21, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3a02b38
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6806faf7487ae400084b733a

Copy link
Contributor

@kevinwilfong kevinwilfong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

…cubator#13096)

Summary:

When semi-join with multiple matched vectors, it produce redundant matched rows with one per each matched vector but we actually just
need to produce exact one matched row. This PR fixes this and verified with unit test

Reviewed By: kevinwilfong

Differential Revision: D73397297
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73397297

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in e11e394.

@JkSelf
Copy link
Collaborator

JkSelf commented Apr 23, 2025

@xiaoxmeng @kevinwilfong This issue has fixed in #11771. And #11771 also fixed the anti and full outer join issues. Can you help to review if you have time? Thanks.

JkSelf added a commit to JkSelf/velox that referenced this pull request Apr 23, 2025
zhouyuan added a commit to oap-project/velox that referenced this pull request Apr 23, 2025
facebook-github-bot pushed a commit that referenced this pull request Jun 24, 2025
…atched rows (#13121)

Summary:
In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied.
For example:
```
Left Record	    Right Record
a      b	       c      d
2      5	       2      4
                       2      5
```
With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`.

This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept.

Pull Request resolved: #13121

Reviewed By: kevinwilfong

Differential Revision: D77203870

Pulled By: xiaoxmeng

fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
nimesh1601 pushed a commit to nimesh1601/velox that referenced this pull request Jul 7, 2025
…atched rows (facebookincubator#13121)

Summary:
In a semi join, only the first matching record is retained when there are multiple matching rows on the other side. [ PR#13096](facebookincubator#13096) addresses this issue by selecting only the last matched row for the final output, which resolves result mismatch issue when no filter expression is present. However, this approach can not result mismatches if a filter is applied.
For example:
```
Left Record	    Right Record
a      b	       c      d
2      5	       2      4
                       2      5
```
With the join condition `a == c `and the filter `b > d`, selecting only the last matched row results in right side `(2, 5)` which does not satisfy the filter, leading to an empty result. The correct result should be `(2, 5, 2, 4)`.

This PR ensures that all matched rows are retained when a filter expression exists. During the filtering process, only the first matched row is kept.

Pull Request resolved: facebookincubator#13121

Reviewed By: kevinwilfong

Differential Revision: D77203870

Pulled By: xiaoxmeng

fbshipit-source-id: 0dcbf57148309347770aa981dec27fd148525d44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants