-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve the order of right table in NestedLoopJoinExec #12504
Preserve the order of right table in NestedLoopJoinExec #12504
Conversation
@@ -238,6 +238,19 @@ impl NestedLoopJoinExec { | |||
|
|||
PlanProperties::new(eq_properties, output_partitioning, mode) | |||
} | |||
|
|||
fn maintains_input_order(join_type: JoinType) -> Vec<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it would be great to have some doc on what this method does and return. Specifically what false/true means, what is the benefit of preserving the order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 09a6cee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's apply what @comphead suggested, and then it will be ready to be merged.
This PR seems to introduce a performance regression due to the similar reasons as it was in #9830 (comment) -- now
and for the single partition execution I've got the following results:
( Any thoughts on it? |
Thanks @korowa for checking this, the join time is 9x now @berkaysynnada @alihan-synnada |
Filed #12528 |
If this PR enforces ordering by the price of slowdown for all types of inputs (the example shows fairly small build side input, but it also may be enough), it, perhaps, should be reverted and reimplemented -- I suppose, there exist more compromise solutions for this issue, since, if the root cause of slowdown demonstrated in the example, is increased number calls to |
Thanks for reporting this @korowa. I believe it’s possible to maintain the order without affecting performance. We will work on fixing this shortly. |
Which issue does this PR close?
Closes #11332.
Rationale for this change
See the linked issue.
What changes are included in this PR?
NestedLoopJoinExec
now preserves the order of the right table forInner
,Right
,RightAnti
andRightSemi
join types.Are these changes tested?
Tested with
join_maintains_right_order
and relevant SQL logic testsAre there any user-facing changes?
This PR changes the output ordering of some queries with
JOIN
, as reflected in SQL logic tests. The output values and the API remain unchanged.