-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: MergeJoin is buggy with RIGHT OUTER JOINs where NULLs are present in the keys #13039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D73077550 |
✅ Deploy Preview for meta-velox canceled.
|
efe7626 to
fbaaf1d
Compare
…t in the keys (facebookincubator#13039) Summary: While working on another change I discovered that JoinFuzzer does not test MergeJoin with RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I discovered a bug. While trying to add a unit test for it, I discovered a few more. They fit into two classes: 1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER JOINs (they can't hit given our NULL semantics). 2) Writing off the end of the output buffer trying to write out this misses. We need to make sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it. This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior to this change). Differential Revision: D73077550
|
This pull request was exported from Phabricator. Differential Revision: D73077550 |
…t in the keys (facebookincubator#13039) Summary: While working on another change I discovered that JoinFuzzer does not test MergeJoin with RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I discovered a bug. While trying to add a unit test for it, I discovered a few more. They fit into two classes: 1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER JOINs (they can't hit given our NULL semantics). 2) Writing off the end of the output buffer trying to write out this misses. We need to make sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it. This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior to this change). Reviewed By: xiaoxmeng Differential Revision: D73077550
fbaaf1d to
f5ab1bb
Compare
|
This pull request was exported from Phabricator. Differential Revision: D73077550 |
|
This pull request has been merged in 53ea8a6. |
|
@kevinwilfong @xiaoxmeng would you please also help to review below two merge join fixes? There were discovered in Spark/Gluten tests and fixed by @JkSelf Thanks, -yuan |
|
Right outer join is indeed not very stable. We discussed that ideally we should only implement a left outer join, and for right outer join we just swap the inputs and run the same left outer join path. So at least we would have a single code path to test. |
|
@pedroerp @xiaoxmeng @kevinwilfong |
…t in the keys (facebookincubator#13039) Summary: Pull Request resolved: facebookincubator#13039 While working on another change I discovered that JoinFuzzer does not test MergeJoin with RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I discovered a bug. While trying to add a unit test for it, I discovered a few more. They fit into two classes: 1) Skipping over rows on the right side with NULL keys. This is the correct thing to do for INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER JOINs (they can't hit given our NULL semantics). 2) Writing off the end of the output buffer trying to write out this misses. We need to make sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it. This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior to this change). Reviewed By: xiaoxmeng Differential Revision: D73077550 fbshipit-source-id: 82d914d38835b51f52676cfa2317fdce164ee0fc
Summary:
While working on another change I discovered that JoinFuzzer does not test MergeJoin with
RIGHT OUTER JOINs. Enabling it locally (I'll publish a change to enable it separately) I
discovered a bug. While trying to add a unit test for it, I discovered a few more.
They fit into two classes:
INNER and LEFT OUTER JOINs but we need to output misses for these rows in RIGHT OUTER
JOINs (they can't hit given our NULL semantics).
sure the size of output_ hasn't yet reached outputBatchSize_ before writing misses to it.
This diff fixes the bugs I found and adds unit tests covering NULL keys (I didn't see any prior
to this change).
Differential Revision: D73077550