[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284

mgaido91 · 2018-08-30T14:37:49Z

What changes were proposed in this pull request?

In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan.

In this second phase, we rely on the == comparison, ie. the equals method. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing.

This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly.

The PR proposes to use the eq method in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them.

How was this patch tested?

added UT

…plan appears in the query

SparkQA · 2018-08-30T18:28:51Z

Test build #95471 has finished for PR 22284 at commit e945bf1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-08-31T01:52:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala

                // Replace the placeholder by the child plan
                candidateWithPlaceholders.transformUp {
-                  case p if p == placeholder => childPlan
+                  case p if p.eq(placeholder) => childPlan


As the placeholders are collected from candidateWithPlaceholders, I think we will definitely have a matched child plan by reference equality here, right?

yes, right.

viirya · 2018-08-31T01:54:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

  }

+  test("SPARK-25278: output metrics are wrong for plans repeated in the query") {
+    val name = "demo_view"


Wrap the code with withView?

Doing, thanks!

maropu · 2018-08-31T01:56:43Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala

                // Replace the placeholder by the child plan
                candidateWithPlaceholders.transformUp {
-                  case p if p == placeholder => childPlan
+                  case p if p.eq(placeholder) => childPlan


good catch... this fix is simple and ok to me in this case though, I think we'd be better to compare placeholder nodes only here, right?

not sure what you mean exactly here, may you elaborate a bit more please? Thanks.

nvm, just a quesion. I was thinking why we couldn't write here like;

trait PlaceHolder; case class PlanLater extends LeafExecNode with PlaceHolder;

then,

candidateWithPlaceholders.transformUp { case p: PlaceHolder if p.eq(placeholder) => childPlan }

I think we could do that only in 3.0, as it would be a breaking change for those who are using a custom QueryPlanner.

yea, thanks. anyway, this fix looks pretty ok.

Probably we can also move PlanLater to catalyst and use that instead of introducing a new trait. I think it can be proposed for 3.0. Thanks.

maropu · 2018-08-31T01:58:59Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+      1L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)),
+      2L -> ("LocalTableScan" -> Map("number of output rows" -> 2L))))
+  }
+


In addition to this end-to-end test, can we add fine-grained tests for the scenario you described in the PR description?

yes, I am adding a test to the PlannerSuite which ensures that plans are different instances. Thanks.

mgaido91 · 2018-08-31T07:52:34Z

cc @cloud-fan @ueshin

cloud-fan · 2018-08-31T08:16:18Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

    df.queryExecution.executedPlan.execute()
  }

+  test("SPARK-25278: physical nodes should be different instances for same logical nodes") {


can we demonstrate the metrics problem in this test? I think that's a real bug.

nvm, I saw it in the next test suite

cloud-fan · 2018-08-31T08:17:40Z

good catch! cc @zsxwing to see if this is a problem for streaming source metrics.

SparkQA · 2018-08-31T11:39:24Z

Test build #95528 has finished for PR 22284 at commit 193d7b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-09-04T10:55:58Z

any more comments @cloud-fan @maropu @viirya ?

mgaido91 · 2018-09-06T12:07:04Z

@cloud-fan shall we consider this for 2.4? I don't see any real concern/comment about it, so I think it would be great if we can include it as it is a bug.

maropu · 2018-09-06T12:30:10Z

sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala

+    }
+    assert(execRanges.length == 2)
+    // Ensure the two RangeExec instances are different instances
+    assert(!execRanges.head.eq(execRanges.last))


Is it difficult to test the eq behaivour by using the QueryPlanner class?

mmmh, what do you mean exactly?

e.g.,

case class DummyPlanner() extends QueryPlanner[LogicalPlan] { ... } assert(DummyPlanner().plan(eqTestPlan) === expectedPlan))

I think it is doable, but I don't see the advantage. May you please explain me why that is better? Thanks.

just a question to check if we can do that. The current test is ok to me.

cool, thanks. Anyway, yes, I think we can do that. Thanks.

cloud-fan · 2018-09-06T13:00:45Z

retest this please

cloud-fan · 2018-09-06T13:01:36Z

This is a bug for sql metrics, let's include it in Spark 2.4.

maropu · 2018-09-06T13:47:44Z

no more comment, LGTM

SparkQA · 2018-09-06T16:56:17Z

Test build #95759 has finished for PR 22284 at commit 193d7b3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2018-09-10T07:45:20Z

kindly ping @cloud-fan

cloud-fan · 2018-09-10T11:42:21Z

thanks, merging to master/2.4!

…plan appears in the query ## What changes were proposed in this pull request? In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan. In this second phase, we rely on the `==` comparison, ie. the `equals` method. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing. This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly. The PR proposes to use the `eq` method in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them. ## How was this patch tested? added UT Closes #22284 from mgaido91/SPARK-25278. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 12e3e9f) Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by #22284 and now we have corrected SQL metrics for batch queries. Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries. ## How was this patch tested? existing tests Closes #22380 from cloud-fan/followup. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by apache#22284 and now we have corrected SQL metrics for batch queries. Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries. ## How was this patch tested? existing tests Closes apache#22380 from cloud-fan/followup. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

gatorsmile · 2018-12-25T02:29:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/QueryPlanner.scala

                // Replace the placeholder by the child plan
                candidateWithPlaceholders.transformUp {
-                  case p if p == placeholder => childPlan
+                  case p if p.eq(placeholder) => childPlan


If we do ExprID dedup for the children of UNION in the Analyzer stage, Is the problem fixed?

we don't have exprId for query plan.

This condition will be always false after we dedup the expression id. Please let me know if yoany of you can find another test case to break it. Thanks!

We can reproduce the bug with any plan which has more than one child but doesn't dedup the expr id. Union is one of them. I'm not sure if Union is the only one though...

moreover, I am not sure if there are other cases which result in == being true when eq isn't and I'd argue that it is very hard to ensure such a thing. So I think this fix would be anyway needed.

Union is the last one we are not doing the dedup. I believe we need to fix it. If we dedup Union children, we do not have a valid test case for this PR. @mgaido91 Do you have any test case?

I don't have other examples, but I cannot exclude that there are. And I don't see any benefit in getting back to the previous solution. So I think the current code is safer.

if we dedup expr ids for union, then I think this patch becomes a code cleanup instead of bug fix, and we can remove this test.

I agree, the test can be used for testing the other patch then or removed

[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical …

e945bf1

…plan appears in the query

viirya reviewed Aug 31, 2018

View reviewed changes

maropu reviewed Aug 31, 2018

View reviewed changes

address comments

193d7b3

cloud-fan reviewed Aug 31, 2018

View reviewed changes

maropu reviewed Sep 6, 2018

View reviewed changes

asfgit closed this in 12e3e9f Sep 10, 2018

cloud-fan mentioned this pull request Sep 10, 2018

[SPARK-25278][SQL][followup] remove the hack in ProgressReporter #22380

Closed

gatorsmile reviewed Dec 25, 2018

View reviewed changes

[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284

[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284

Uh oh!

Conversation

mgaido91 commented Aug 30, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 commented Aug 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Aug 31, 2018

Uh oh!

SparkQA commented Aug 31, 2018

Uh oh!

mgaido91 commented Sep 4, 2018

Uh oh!

mgaido91 commented Sep 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Sep 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 6, 2018

Uh oh!

cloud-fan commented Sep 6, 2018

Uh oh!

maropu commented Sep 6, 2018

Uh oh!

SparkQA commented Sep 6, 2018

Uh oh!

mgaido91 commented Sep 10, 2018

Uh oh!

cloud-fan commented Sep 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Sep 6, 2018 •

edited

Loading