-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23172][SQL] Expand the ReorderJoin rule to handle Project nodes #20345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #86449 has finished for PR 20345 at commit
|
|
Test build #86485 has finished for PR 20345 at commit
|
|
retest this please |
|
This is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a little while to understand that this can handle (a join b) join c versus a join (b join c) correctly. Would be great if we can explain how it works in the function comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def testExtractInnerJoins(
plan: LogicalPlan,
expected: Option[(Seq[(LogicalPlan, InnerLike)], Seq[Expression])]) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this ignore the plans sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: adds -> may add
|
Thanks! @jiangxb1987 I'll address your comments and check again? |
|
Test build #86499 has finished for PR 20345 at commit
|
|
retest this please. |
|
Test build #86504 has finished for PR 20345 at commit
|
|
Test build #86511 has finished for PR 20345 at commit
|
|
Also cc @wzhfy Do you have a bandwidth to review PRs? |
|
ping |
|
NVM, I understood: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to make sure the project has attributes only, should it be p.projectList.forall(_.isInstanceOf[Attribute])?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: when projects having attributes only => when the project has attributes only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skip projections with attributes only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary? I think check originalPlan.output != orderedJoins.output is enough, and faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have this check, operatorOptimizationRuleSet reaches fixedPoint because ReorderJoin is re-applied in the same join trees every time the optimization rule batch invoked. This case does not happen in the master because reordered joins have Project in internal nodes (Project added by following optimization rules, e.g., ColumnPruning) and this plan structure guards this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case which would fail to reorder joins before the fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
@wzhfy Thanks for the review and I'll update in a few days! |
|
Test build #88442 has finished for PR 20345 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wzhfy Added this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case can also happen without star schema enabled, right? Is it possible to use a simpler case like the one in pr description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC join reorder only happens when star schema enabled now? I think this test checks the simper case?
|
Test build #88464 has finished for PR 20345 at commit
|
|
Test build #88465 has finished for PR 20345 at commit
|
|
ping @gatorsmile @wzhfy |
|
Test build #88503 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #88511 has finished for PR 20345 at commit
|
|
kindly ping |
|
ping |
|
retest this please |
|
Test build #95065 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #95087 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #95104 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #95131 has finished for PR 20345 at commit
|
39462fb to
025c540
Compare
|
Test build #114189 has finished for PR 20345 at commit
|
025c540 to
1df7d2f
Compare
1df7d2f to
f7f3451
Compare
|
Test build #115350 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #115354 has finished for PR 20345 at commit
|
f7f3451 to
f63bee3
Compare
|
Test build #115369 has finished for PR 20345 at commit
|
|
retest this please |
|
Test build #116762 has finished for PR 20345 at commit
|
f63bee3 to
37e5fe2
Compare
|
Test build #117188 has finished for PR 20345 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
The current
ReorderJoinoptimizer rule cannot flatten a patternJoin -> Project -> JoinbecauseExtractFiltersAndInnerJoinsdoesn't handleProjectnodes. So, the current master cannot reorder joins in a query below;To reorder the query, this pr added code to handle
ProjectinExtractFiltersAndInnerJoins.This pr also fixed an output attribute reorder problem when joins reordered; it checks if a join reordered plan and an original plan have the same output attribute order with each other. If not,
ReorderJoinaddsProjectin the top of the join reordered plan.How was this patch tested?
This pr added new tests in
JoinOptimizationSuiteand modified some existing tests inStarJoinReorderSuiteto check ifReorderJoincan handleProjectnodes correctly. Also, it modified the existing tests inJoinReorderSuitefor the output attribute reorder issue.