Skip to content

Conversation

@mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan.

In this second phase, we rely on the == comparison, ie. the equals method. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing.

This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly.

The PR proposes to use the eq method in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them.

How was this patch tested?

added UT

@SparkQA
Copy link

SparkQA commented Aug 30, 2018

Test build #95471 has finished for PR 22284 at commit e945bf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Replace the placeholder by the child plan
candidateWithPlaceholders.transformUp {
case p if p == placeholder => childPlan
case p if p.eq(placeholder) => childPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the placeholders are collected from candidateWithPlaceholders, I think we will definitely have a matched child plan by reference equality here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, right.

}

test("SPARK-25278: output metrics are wrong for plans repeated in the query") {
val name = "demo_view"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the code with withView?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing, thanks!

// Replace the placeholder by the child plan
candidateWithPlaceholders.transformUp {
case p if p == placeholder => childPlan
case p if p.eq(placeholder) => childPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch... this fix is simple and ok to me in this case though, I think we'd be better to compare placeholder nodes only here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean exactly here, may you elaborate a bit more please? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, just a quesion. I was thinking why we couldn't write here like;

trait PlaceHolder;
case class PlanLater extends LeafExecNode with PlaceHolder;

then,

candidateWithPlaceholders.transformUp {
  case p: PlaceHolder if p.eq(placeholder) => childPlan
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do that only in 3.0, as it would be a breaking change for those who are using a custom QueryPlanner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, thanks. anyway, this fix looks pretty ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can also move PlanLater to catalyst and use that instead of introducing a new trait. I think it can be proposed for 3.0. Thanks.

1L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)),
2L -> ("LocalTableScan" -> Map("number of output rows" -> 2L))))
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this end-to-end test, can we add fine-grained tests for the scenario you described in the PR description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I am adding a test to the PlannerSuite which ensures that plans are different instances. Thanks.

@mgaido91
Copy link
Contributor Author

cc @cloud-fan @ueshin

df.queryExecution.executedPlan.execute()
}

test("SPARK-25278: physical nodes should be different instances for same logical nodes") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we demonstrate the metrics problem in this test? I think that's a real bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I saw it in the next test suite

@cloud-fan
Copy link
Contributor

good catch! cc @zsxwing to see if this is a problem for streaming source metrics.

@SparkQA
Copy link

SparkQA commented Aug 31, 2018

Test build #95528 has finished for PR 22284 at commit 193d7b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Sep 4, 2018

any more comments @cloud-fan @maropu @viirya ?

@mgaido91
Copy link
Contributor Author

mgaido91 commented Sep 6, 2018

@cloud-fan shall we consider this for 2.4? I don't see any real concern/comment about it, so I think it would be great if we can include it as it is a bug.

}
assert(execRanges.length == 2)
// Ensure the two RangeExec instances are different instances
assert(!execRanges.head.eq(execRanges.last))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to test the eq behaivour by using the QueryPlanner class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh, what do you mean exactly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.,

case class DummyPlanner() extends QueryPlanner[LogicalPlan] { ... }

assert(DummyPlanner().plan(eqTestPlan) === expectedPlan))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is doable, but I don't see the advantage. May you please explain me why that is better? Thanks.

Copy link
Member

@maropu maropu Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question to check if we can do that. The current test is ok to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks. Anyway, yes, I think we can do that. Thanks.

@cloud-fan
Copy link
Contributor

retest this please

@cloud-fan
Copy link
Contributor

This is a bug for sql metrics, let's include it in Spark 2.4.

@maropu
Copy link
Member

maropu commented Sep 6, 2018

no more comment, LGTM

@SparkQA
Copy link

SparkQA commented Sep 6, 2018

Test build #95759 has finished for PR 22284 at commit 193d7b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

kindly ping @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

asfgit pushed a commit that referenced this pull request Sep 10, 2018
…plan appears in the query

## What changes were proposed in this pull request?

In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan.

In this second phase, we rely on the `==` comparison, ie. the `equals` method. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing.

This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly.

The PR proposes to use the `eq` method in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them.

## How was this patch tested?

added UT

Closes #22284 from mgaido91/SPARK-25278.

Authored-by: Marco Gaido <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 12e3e9f)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 12e3e9f Sep 10, 2018
asfgit pushed a commit that referenced this pull request Sep 11, 2018
## What changes were proposed in this pull request?

It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by #22284 and now we have corrected SQL metrics for batch queries.

Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries.

## How was this patch tested?

existing tests

Closes #22380 from cloud-fan/followup.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Sep 13, 2018
## What changes were proposed in this pull request?

It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by apache#22284 and now we have corrected SQL metrics for batch queries.

Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries.

## How was this patch tested?

existing tests

Closes apache#22380 from cloud-fan/followup.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
// Replace the placeholder by the child plan
candidateWithPlaceholders.transformUp {
case p if p == placeholder => childPlan
case p if p.eq(placeholder) => childPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do ExprID dedup for the children of UNION in the Analyzer stage, Is the problem fixed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have exprId for query plan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will be always false after we dedup the expression id. Please let me know if yoany of you can find another test case to break it. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reproduce the bug with any plan which has more than one child but doesn't dedup the expr id. Union is one of them. I'm not sure if Union is the only one though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moreover, I am not sure if there are other cases which result in == being true when eq isn't and I'd argue that it is very hard to ensure such a thing. So I think this fix would be anyway needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union is the last one we are not doing the dedup. I believe we need to fix it. If we dedup Union children, we do not have a valid test case for this PR. @mgaido91 Do you have any test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have other examples, but I cannot exclude that there are. And I don't see any benefit in getting back to the previous solution. So I think the current code is safer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we dedup expr ids for union, then I think this patch becomes a code cleanup instead of bug fix, and we can remove this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the test can be used for testing the other patch then or removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants