-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -81,7 +81,7 @@ abstract class QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]] { | |
| childPlans.map { childPlan => | ||
| // Replace the placeholder by the child plan | ||
| candidateWithPlaceholders.transformUp { | ||
| case p if p == placeholder => childPlan | ||
| case p if p.eq(placeholder) => childPlan | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch... this fix is simple and ok to me in this case though, I think we'd be better to compare placeholder nodes only here, right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure what you mean exactly here, may you elaborate a bit more please? Thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm, just a quesion. I was thinking why we couldn't write here like; then,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we could do that only in 3.0, as it would be a breaking change for those who are using a custom QueryPlanner.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea, thanks. anyway, this fix looks pretty ok.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably we can also move
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we do ExprID dedup for the children of UNION in the Analyzer stage, Is the problem fixed?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't have
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This condition will be always false after we dedup the expression id. Please let me know if yoany of you can find another test case to break it. Thanks!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can reproduce the bug with any plan which has more than one child but doesn't dedup the expr id.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. moreover, I am not sure if there are other cases which result in
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Union is the last one we are not doing the dedup. I believe we need to fix it. If we dedup Union children, we do not have a valid test case for this PR. @mgaido91 Do you have any test case?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have other examples, but I cannot exclude that there are. And I don't see any benefit in getting back to the previous solution. So I think the current code is safer.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we dedup expr ids for union, then I think this patch becomes a code cleanup instead of bug fix, and we can remove this test.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, the test can be used for testing the other patch then or removed |
||
| } | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -704,6 +704,23 @@ class PlannerSuite extends SharedSQLContext { | |
| df.queryExecution.executedPlan.execute() | ||
| } | ||
|
|
||
| test("SPARK-25278: physical nodes should be different instances for same logical nodes") { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we demonstrate the metrics problem in this test? I think that's a real bug.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm, I saw it in the next test suite |
||
| val range = Range(1, 1, 1, 1) | ||
| val df = Union(range, range) | ||
| val ranges = df.queryExecution.optimizedPlan.collect { | ||
| case r: Range => r | ||
| } | ||
| assert(ranges.length == 2) | ||
| // Ensure the two Range instances are equal according to their equal method | ||
| assert(ranges.head == ranges.last) | ||
| val execRanges = df.queryExecution.sparkPlan.collect { | ||
| case r: RangeExec => r | ||
| } | ||
| assert(execRanges.length == 2) | ||
| // Ensure the two RangeExec instances are different instances | ||
| assert(!execRanges.head.eq(execRanges.last)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it difficult to test the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mmmh, what do you mean exactly?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g.,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is doable, but I don't see the advantage. May you please explain me why that is better? Thanks.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just a question to check if we can do that. The current test is ok to me.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cool, thanks. Anyway, yes, I think we can do that. Thanks. |
||
| } | ||
|
|
||
| test("SPARK-24556: always rewrite output partitioning in ReusedExchangeExec " + | ||
| "and InMemoryTableScanExec") { | ||
| def checkOutputPartitioningRewrite( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -497,6 +497,19 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared | |
| } | ||
| } | ||
|
|
||
| test("SPARK-25278: output metrics are wrong for plans repeated in the query") { | ||
| val name = "demo_view" | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wrap the code with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doing, thanks! |
||
| withView(name) { | ||
| sql(s"CREATE OR REPLACE VIEW $name AS VALUES 1,2") | ||
| val view = spark.table(name) | ||
| val union = view.union(view) | ||
| testSparkPlanMetrics(union, 1, Map( | ||
| 0L -> ("Union" -> Map()), | ||
| 1L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)), | ||
| 2L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)))) | ||
| } | ||
| } | ||
|
|
||
| test("writing data out metrics: parquet") { | ||
| testMetricsNonDynamicPartition("parquet", "t1") | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the placeholders are collected from
candidateWithPlaceholders, I think we will definitely have a matched child plan by reference equality here, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, right.