-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan appears in the query #22284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…plan appears in the query
|
Test build #95471 has finished for PR 22284 at commit
|
| // Replace the placeholder by the child plan | ||
| candidateWithPlaceholders.transformUp { | ||
| case p if p == placeholder => childPlan | ||
| case p if p.eq(placeholder) => childPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the placeholders are collected from candidateWithPlaceholders, I think we will definitely have a matched child plan by reference equality here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, right.
| } | ||
|
|
||
| test("SPARK-25278: output metrics are wrong for plans repeated in the query") { | ||
| val name = "demo_view" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap the code with withView?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing, thanks!
| // Replace the placeholder by the child plan | ||
| candidateWithPlaceholders.transformUp { | ||
| case p if p == placeholder => childPlan | ||
| case p if p.eq(placeholder) => childPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch... this fix is simple and ok to me in this case though, I think we'd be better to compare placeholder nodes only here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean exactly here, may you elaborate a bit more please? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, just a quesion. I was thinking why we couldn't write here like;
trait PlaceHolder;
case class PlanLater extends LeafExecNode with PlaceHolder;
then,
candidateWithPlaceholders.transformUp {
case p: PlaceHolder if p.eq(placeholder) => childPlan
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could do that only in 3.0, as it would be a breaking change for those who are using a custom QueryPlanner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, thanks. anyway, this fix looks pretty ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can also move PlanLater to catalyst and use that instead of introducing a new trait. I think it can be proposed for 3.0. Thanks.
| 1L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)), | ||
| 2L -> ("LocalTableScan" -> Map("number of output rows" -> 2L)))) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to this end-to-end test, can we add fine-grained tests for the scenario you described in the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I am adding a test to the PlannerSuite which ensures that plans are different instances. Thanks.
| df.queryExecution.executedPlan.execute() | ||
| } | ||
|
|
||
| test("SPARK-25278: physical nodes should be different instances for same logical nodes") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we demonstrate the metrics problem in this test? I think that's a real bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I saw it in the next test suite
|
good catch! cc @zsxwing to see if this is a problem for streaming source metrics. |
|
Test build #95528 has finished for PR 22284 at commit
|
|
any more comments @cloud-fan @maropu @viirya ? |
|
@cloud-fan shall we consider this for 2.4? I don't see any real concern/comment about it, so I think it would be great if we can include it as it is a bug. |
| } | ||
| assert(execRanges.length == 2) | ||
| // Ensure the two RangeExec instances are different instances | ||
| assert(!execRanges.head.eq(execRanges.last)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to test the eq behaivour by using the QueryPlanner class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmh, what do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.,
case class DummyPlanner() extends QueryPlanner[LogicalPlan] { ... }
assert(DummyPlanner().plan(eqTestPlan) === expectedPlan))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is doable, but I don't see the advantage. May you please explain me why that is better? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question to check if we can do that. The current test is ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks. Anyway, yes, I think we can do that. Thanks.
|
retest this please |
|
This is a bug for sql metrics, let's include it in Spark 2.4. |
|
no more comment, LGTM |
|
Test build #95759 has finished for PR 22284 at commit
|
|
kindly ping @cloud-fan |
|
thanks, merging to master/2.4! |
…plan appears in the query ## What changes were proposed in this pull request? In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan. In this second phase, we rely on the `==` comparison, ie. the `equals` method. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing. This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly. The PR proposes to use the `eq` method in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them. ## How was this patch tested? added UT Closes #22284 from mgaido91/SPARK-25278. Authored-by: Marco Gaido <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 12e3e9f) Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by #22284 and now we have corrected SQL metrics for batch queries. Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries. ## How was this patch tested? existing tests Closes #22380 from cloud-fan/followup. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? It turns out it's a bug that a `DataSourceV2ScanExec` instance may be referred to in the execution plan multiple times. This bug is fixed by apache#22284 and now we have corrected SQL metrics for batch queries. Thus we don't need the hack in `ProgressReporter` anymore, which fixes the same metrics problem for streaming queries. ## How was this patch tested? existing tests Closes apache#22380 from cloud-fan/followup. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
| // Replace the placeholder by the child plan | ||
| candidateWithPlaceholders.transformUp { | ||
| case p if p == placeholder => childPlan | ||
| case p if p.eq(placeholder) => childPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do ExprID dedup for the children of UNION in the Analyzer stage, Is the problem fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have exprId for query plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will be always false after we dedup the expression id. Please let me know if yoany of you can find another test case to break it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reproduce the bug with any plan which has more than one child but doesn't dedup the expr id. Union is one of them. I'm not sure if Union is the only one though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moreover, I am not sure if there are other cases which result in == being true when eq isn't and I'd argue that it is very hard to ensure such a thing. So I think this fix would be anyway needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union is the last one we are not doing the dedup. I believe we need to fix it. If we dedup Union children, we do not have a valid test case for this PR. @mgaido91 Do you have any test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have other examples, but I cannot exclude that there are. And I don't see any benefit in getting back to the previous solution. So I think the current code is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we dedup expr ids for union, then I think this patch becomes a code cleanup instead of bug fix, and we can remove this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the test can be used for testing the other patch then or removed
What changes were proposed in this pull request?
In the Planner, we collect the placeholder which need to be substituted in the query execution plan and once we plan them, we substitute the placeholder with the effective plan.
In this second phase, we rely on the
==comparison, ie. theequalsmethod. This means that if two placeholder plans - which are different instances - have the same attributes (so that they are equal, according to the equal method) they are both substituted with their corresponding new physical plans. So, in such a situation, the first time we substitute both them with the first of the 2 new generated plan and the second time we substitute nothing.This is usually of no harm for the execution of the query itself, as the 2 plans are identical. But since they are the same instance, now, the local variables are shared (which is unexpected). This causes issues for the metrics collected, as the same node is executed 2 times, so the metrics are accumulated 2 times, wrongly.
The PR proposes to use the
eqmethod in checking which placeholder needs to be substituted,; thus in the previous situation, actually both the two different physical nodes which are created (one for each time the logical plan appears in the query plan) are used and the metrics are collected properly for each of them.How was this patch tested?
added UT