-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26138][SQL] Cross join requires push LocalLimit in LimitPushDown rule #23104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
deb1809
[SPARK-26138][SQL] LimitPushDown cross join requires maybeBushLocalLimit
a0b9cb4
[SPARK-26138][SQL] add three UTs
588c151
[SPARK-26138][SQL] Push limit to the right side
e173962
[SPARK-26138][SQL] Fix the scala style. File line length exceeds 100 …
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about inner join without condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can match
InnerLikewhen condition is empty.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A = {(a, 0), (b, 1), (c, 2), (d, 0), (e, 1), (f, 2)}
B = {(e, 1), (f, 2)}
A inner join B limit 2
If there is limit 2, (a, 0), (b, 1) inner join {(e, 1), (f, 2)}, the result is empty. But the real result is not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inner join without condition is literally cross join.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When set spark.sql.crossJoin.enabled=true,
inner join without condition, LeftOuter without condition, RightOuter without condition, FullOuter without condition, all these are iterally cross join?
@cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guoxiaolongzte nope.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, if when set spark.sql.crossJoin.enabled=true, if Inner join without condition, LeftOuter join without condition, RightOuter join without condition, FullOuter join without condition , limit should be pushed down on both sides, just like cross join limit in this PR.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan
Please give me some advice. Thank you.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is no join condition, I think join type doesn't matter and we can always push down limits. We may need to look into left anti join though.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two tables as follows:
CREATE TABLE
test1(idint,nameint);CREATE TABLE
test2(idint,nameint);test1 table data:
2,2
1,1
test2 table data:
2,2
3,3
4,4
Execute sql select * from test1 t1 left anti join test2 t2 on t1.id=t2.id limit 1; The result:
1,1
But
we push the limit 1 on left side, the result is not correct. Result is empty.
we push the limit 1 on right side, the result is not correct. Result is empty.
So
left anti join no need to push down limit. Similarly, left semi join is the same logic.