Skip to content

Conversation

@kevinwilfong
Copy link
Contributor

Summary:
While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.

They fit into two classes:

  1. Skipping over rows on the right side with NULL keys. This is the correct thing to do for
    INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
    JOINs (they can't hit given our NULL semantics).
  2. Writing off the end of the output buffer trying to write out this misses. We need to make
    sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.

This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).

Differential Revision: D73077550

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73077550

@netlify
Copy link

netlify bot commented Apr 16, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f5ab1bb
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/68002b99f8de120008fdc1df

@kevinwilfong kevinwilfong marked this pull request as draft April 16, 2025 17:06
kevinwilfong added a commit to kevinwilfong/velox that referenced this pull request Apr 16, 2025
…t in the keys (facebookincubator#13039)

Summary:

While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.

They fit into two classes:
1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for 
INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
JOINs (they can't hit given our NULL semantics).
2) Writing off the end of the output buffer trying to write out this misses. We need to make
sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.

This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).

Differential Revision: D73077550
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73077550

@kevinwilfong kevinwilfong marked this pull request as ready for review April 16, 2025 19:06
…t in the keys (facebookincubator#13039)

Summary:

While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.

They fit into two classes:
1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for 
INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
JOINs (they can't hit given our NULL semantics).
2) Writing off the end of the output buffer trying to write out this misses. We need to make
sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.

This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).

Reviewed By: xiaoxmeng

Differential Revision: D73077550
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73077550

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 53ea8a6.

@zhouyuan
Copy link
Collaborator

@kevinwilfong @xiaoxmeng would you please also help to review below two merge join fixes? There were discovered in Spark/Gluten tests and fixed by @JkSelf
#11771
#11772

Thanks, -yuan

@pedroerp
Copy link
Contributor

Right outer join is indeed not very stable. We discussed that ideally we should only implement a left outer join, and for right outer join we just swap the inputs and run the same left outer join path. So at least we would have a single code path to test.

@JkSelf
Copy link
Collaborator

JkSelf commented Apr 18, 2025

@pedroerp @xiaoxmeng @kevinwilfong
The internal testing for Gluten has already enabled sort merge join and incorporated these two PRs #11771
#11772 into our internal code branch oap-project@5888209. Recently, there have been numerous refactors and fixes related to merge join in the community, which have caused issues with our internal testing and rebasing. Therefore, if possible, could you please prioritize reviewing these two PRs? Thanks for your help.

zhanglistar pushed a commit to bigo-sg/velox that referenced this pull request Apr 22, 2025
…t in the keys (facebookincubator#13039)

Summary:
Pull Request resolved: facebookincubator#13039

While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.

They fit into two classes:
1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for
INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
JOINs (they can't hit given our NULL semantics).
2) Writing off the end of the output buffer trying to write out this misses. We need to make
sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.

This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).

Reviewed By: xiaoxmeng

Differential Revision: D73077550

fbshipit-source-id: 82d914d38835b51f52676cfa2317fdce164ee0fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants