-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL #24706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #105776 has finished for PR 24706 at commit
|
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds a little weird that the stages are independent and executing them in order by their dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the wording is no good? Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove independent? As they are run by some dependencies, sounds like they're not really independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not currentPhysicalPlan.canonicalized but initialPlan.canonicalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentPhysicalPlan can change after each iteration, this is means we cannot use it as a stable identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @hvanhovell's comment. Moreover, one important use of the "canonicalized" plan is for adaptive sub-query re-use, in which we for sure want to compare the initial plans and initial plans only.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to update the doc for this config. It isn't enabled when runtime query re-optimization is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it now and see if we should use the existing config spark.sql.adaptive.enabled instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the physical plan has been updated with the newly created stages, doesn't this new node (a newly created stage) always match one p?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the most complicated part here:
Normally we only need to replace an Exchange with a logical "Stage", simple straightforward logic. But when it comes to several physical nodes generated from a single logical node and there's an Exchange is in the middle of these nodes, we need to apply a little "trick" here, by replacing the top node (e.g., the final agg) with a logical "Stage" which wraps the whole sub-tree starting from the top node and of course containing the exchange stage.
The code comment for this method explains this logic. Alternatively, we could have logical aggregate that represents "final" and "local" aggregate in order not to go this route. Let's have a follow-up on that though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is hacky, how about we explicitly match the physical aggregate (hash and sort) here? Except for aggregate, we should always look for condition if p.eq(newNode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we set inherited logical plan into children, we can't set logical plan into them? So only the top SparkPlan has its logical plan, all its children just have inherited logical plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's true. And you could always "force set" this logical link if need be.
It's not necessary to draw a line between "logical plan" and "inherited logical plan" for the use of adaptive execution, so this is simply to make sure any future use of it can tell a top node from the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the plan node is updated by:
val newLogicalPlan = currentLogicalPlan.transformDown {
case p if p.eq(logicalNode) => newLogicalNode
}The logicalNode should be logicalLink of oldNode and oldNode is Xchg. Why isn't child be transformed, but the whole Agg-child?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is simple: Which logical node does Xchg correspond to? None. And in case of an Aggregate, you can't even find a whole sub-tree in the logical plan that corresponds to the Xchg.
Please also refer to #24706 (comment)
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial round. Will try to do a second one later this week.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain when this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a non-correlated sub-query itself fails the sanity check for being converted into a AdaptiveSparkPlanExec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my curiosity, why a TrieMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a concurrent map that supports getOrElseUpdate.
I thought we should change this "schema-to-a-list-of-plans" map to a "canonicalized-plan-to-plan" map so we can use a simple concurrent map and do not need to put a lock on the list. If we'll fix it, we'll fix it with the compile-time reuse maps together.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
| // Run preparation rules. | ||
| val preparations = AdaptiveSparkPlanExec.createQueryStagePreparationRules( | ||
| session.sessionState.conf, subqueryMap) | ||
| val newPlan = AdaptiveSparkPlanExec.applyPhysicalRules(plan, preparations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? This already seems to be done in AdaptiveSparkPlanExec when we submit the stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already seems to be done in AdaptiveSparkPlanExec when we submit the stage.
No.
The physical transformations we used in QueryExecution.preparations have now been split into two groups in adaptive execution here (also noted in the code comment):
- Rules that add or remove exchanges.
- Rules that are independent within each exchange, or say, stage.
InsertAdaptiveSparkPlan is now the first in QueryExecution.preparations, which means neither of these two groups has been applied yet. It is this way so that we do not need to manipulate (modify) the rule application order in QueryExecution.preparations for AQE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it makes more sense to this in the AdaptiveSparkPlanExec. The AdaptiveSparkPlanExec is now expecting a plan that can only be produced by this rule, and not any physical plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In one of the commits I had tried refactoring this into AdaptiveSparkPlanExec, but later found out that this would cause a problem in serializing/deserializing initialPlan in AdaptiveSparkPlanExec, for the initialPlan before applying the sub-query planning rule contains instances of expression.ScalarSubquery.
| * This rule wraps the query plan with an [[AdaptiveSparkPlanExec]], which executes the query plan | ||
| * and re-optimize the plan during execution based on runtime data statistics. | ||
| */ | ||
| case class InsertAdaptiveSparkPlan(session: SparkSession) extends Rule[SparkPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of general remarks:
- As fair as I understand this code subqueries for a given stage are now executed before the stage. This used to be that all the subqueries for a query were executed before the main query.
- We may need to consider moving this into the
AdaptiveSparkPlanExecto put most state in one place and make this stateless again. You could turn this into a mix-in if this adds too much LOC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As fair as I understand this code subqueries for a given stage are now executed before the stage. This used to be that all the subqueries for a query were executed before the main query.
Agreed. But this has nothing to do with what this rule does. It's just we don't call execute higher in the tree before stages below get to finish. So we may need to refactor the original SparkPlan.executeQuery logic to make this "wait-for-all-subqueries-to-finish" thing more explicit.
We may need to consider moving this into the AdaptiveSparkPlanExec to put most state in one place and make this stateless again. You could turn this into a mix-in if this adds too much LOC.
Agreed. Not the prettiest solution to put a stateful object into a rule.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
|
Test build #105869 has finished for PR 24706 at commit
|
|
Test build #105868 has finished for PR 24706 at commit
|
|
Test build #105870 has finished for PR 24706 at commit
|
|
Test build #105880 has finished for PR 24706 at commit
|
| // Wait on the next completed stage, which indicates new stats are available and probably | ||
| // new stages can be created. There might be other stages that finish at around the same | ||
| // time, so we process those stages too in order to reduce re-planning. | ||
| val nextMsg = events.take() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is kind of weird that we are defining a nextMsg and that we are not really using it. How about:
val newStageEvents = new util.ArrayList[StageMaterializationEvent]
newStageEvents.add(events.take())
events.drainTo(newStageEvents)
newStageEvents.asScala.foreach {
...
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll work too. But nextMsg is used in the current impl, otherwise the whole thing wouldn't work:
https://github.com/apache/spark/pull/24706/files/1c665d2e616a7c6ffc79c8be8307c9f3ff503b23#diff-6954dd8020a9ca298f1fb9602c0e831cR98.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
| var result = createQueryStages(currentPhysicalPlan) | ||
| currentPhysicalPlan.synchronized { | ||
| val events = new LinkedBlockingQueue[StageMaterializationEvent]() | ||
| val errors = new mutable.ArrayBuffer[SparkException]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could move this into the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue it'll have to be instantiated with every loop, yet used exactly once (when there's sth. wrong).
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
Outdated
Show resolved
Hide resolved
|
Test build #105924 has finished for PR 24706 at commit
|
|
Test build #105932 has finished for PR 24706 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
| private val conf = session.sessionState.conf | ||
|
|
||
| // Exchange-reuse is shared across the entire query, including sub-queries. | ||
| private val stageCache = new TrieMap[SparkPlan, QueryStageExec]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stage cache is passed to the created AdaptiveSparkPlanExec directly. Can we create the stage cache in AdaptiveSparkPlanExec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is we want to make all AdaptiveSparkPlanExecs of the main query and the subqueries share the same stageCache.
|
|
||
| private var currentStageId = 0 | ||
|
|
||
| @volatile private var currentPhysicalPlan = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we move these variable definitions before the doExecute method? To make the code a little easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had moved currentLogicalPlan into the doExecute method, but now it doesn't look like we can do it with any other variables. We need to provide access to currentPhysicalPlan (thru method executedPlan). Plus, if the plan ever gets executed again, we'll have the currentPhysicalPlan as the final plan, and the stages will not be created or run again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move these variables before doExecute, not inside doExecute...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got you.
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of smallish comments. This is almost good to go!
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStageStrategy.scala
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| override def cancel(): Unit = { | ||
| mapOutputStatisticsFuture match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This forces materialization right? It would be better to if we can check whether it is already running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, all the existing QueryStageExec nodes in the plan has been called "materialize" already, so it should not be a concern.
| } | ||
|
|
||
| override def cancel(): Unit = { | ||
| if (!plan.relationFuture.isDone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also forces materialization right?
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
| case other => other | ||
| } | ||
| def mapChild(child: Any): Any = child match { | ||
| case arg: TreeNode[_] if applyToAll || containsChild(arg) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not super intuitive to use the same flag to allow transformations on all element, and to force a copy. Can we maybe split them for documentation purposes?
I am also not entirely convinced that we need to transform all elements. This change now also transforms expressions in a plan node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the expressions, I think it totally depends on the usage, and right now we don't need to copy expressions for AQE I believe.
On the other hand, though, there's "fake leaf nodes" that derive from LeafNode but do have children nodes not declared as children, e.g., ReusedExchange. Again, right now for AQE usage, we only care about the logical plans, so we are probably OK? (can't think of any logical plan fake leaf nodes so far).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked and there's no "fake LeafNode" in the logical plan space. So I removed the "applyToAll" from the condition for transforming the elements and renamed it to "forceCopy". I've also changed the method name back to "mapChildren" since it's only for children nodes.
| } | ||
| case other => other | ||
| } | ||
| def mapChild(child: Any): Any = child match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapElement?
| * on children nodes. Also, when this is true, a copy of this node will be | ||
| * returned even if no elements have been changed. | ||
| */ | ||
| private def mapProductElements( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can then we should spin this off in a separate PR.
| // Run preparation rules. | ||
| val preparations = AdaptiveSparkPlanExec.createQueryStagePreparationRules( | ||
| session.sessionState.conf, subqueryMap) | ||
| val newPlan = AdaptiveSparkPlanExec.applyPhysicalRules(plan, preparations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it makes more sense to this in the AdaptiveSparkPlanExec. The AdaptiveSparkPlanExec is now expecting a plan that can only be produced by this rule, and not any physical plan.
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
Outdated
Show resolved
Hide resolved
|
@hvanhovell Just submitted #24876 for the TreeNode changes. Please take a look. |
|
Test build #106524 has finished for PR 24706 at commit
|
|
Test build #106526 has finished for PR 24706 at commit
|
|
Test build #106536 has finished for PR 24706 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merging to master. @maryannxue @carsonwang thanks for all the hard work! |
## What changes were proposed in this pull request?
Implemented a new SparkPlan that executes the query adaptively. It splits the query plan into independent stages and executes them in order according to their dependencies. The query stage materializes its output at the end. When one stage completes, the data statistics of the materialized output will be used to optimize the remainder of the query.
The adaptive mode is off by default, when turned on, user can see "AdaptiveSparkPlan" as the top node of a query or sub-query. The inner plan of "AdaptiveSparkPlan" is subject to change during query execution but becomes final once the execution is complete. Whether the inner plan is final is included in the EXPLAIN string. Below is an example of the EXPLAIN plan before and after execution:
Query:
```
SELECT * FROM testData JOIN testData2 ON key = a WHERE value = '1'
```
Before execution:
```
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- SortMergeJoin [key#13], [a#23], Inner
:- Sort [key#13 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(key#13, 5)
: +- Filter (isnotnull(value#14) AND (value#14 = 1))
: +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14]
: +- Scan[obj#12]
+- Sort [a#23 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(a#23, 5)
+- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- Scan[obj#22]
```
After execution:
```
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=true)
+- *(1) BroadcastHashJoin [key#13], [a#23], Inner, BuildLeft
:- BroadcastQueryStage 2
: +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
: +- ShuffleQueryStage 0
: +- Exchange hashpartitioning(key#13, 5)
: +- *(1) Filter (isnotnull(value#14) AND (value#14 = 1))
: +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14]
: +- Scan[obj#12]
+- ShuffleQueryStage 1
+- Exchange hashpartitioning(a#23, 5)
+- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- Scan[obj#22]
```
Credit also goes to carsonwang and cloud-fan
## How was this patch tested?
Added new UT.
Closes apache#24706 from maryannxue/aqe.
Authored-by: maryannxue <[email protected]>
Signed-off-by: herman <[email protected]>
| case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] { | ||
| case class CollapseCodegenStages( | ||
| conf: SQLConf, | ||
| codegenStageCounter: AtomicInteger = new AtomicInteger(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we not use the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could have made that part of the class body. OTOH this also works :)...
Implemented a new SparkPlan that executes the query adaptively. It splits the query plan into independent stages and executes them in order according to their dependencies. The query stage materializes its output at the end. When one stage completes, the data statistics of the materialized output will be used to optimize the remainder of the query.
The adaptive mode is off by default, when turned on, user can see "AdaptiveSparkPlan" as the top node of a query or sub-query. The inner plan of "AdaptiveSparkPlan" is subject to change during query execution but becomes final once the execution is complete. Whether the inner plan is final is included in the EXPLAIN string. Below is an example of the EXPLAIN plan before and after execution:
Query:
```
SELECT * FROM testData JOIN testData2 ON key = a WHERE value = '1'
```
Before execution:
```
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- SortMergeJoin [key#13], [a#23], Inner
:- Sort [key#13 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(key#13, 5)
: +- Filter (isnotnull(value#14) AND (value#14 = 1))
: +- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14]
: +- Scan[obj#12]
+- Sort [a#23 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(a#23, 5)
+- SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- Scan[obj#22]
```
After execution:
```
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=true)
+- *(1) BroadcastHashJoin [key#13], [a#23], Inner, BuildLeft
:- BroadcastQueryStage 2
: +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
: +- ShuffleQueryStage 0
: +- Exchange hashpartitioning(key#13, 5)
: +- *(1) Filter (isnotnull(value#14) AND (value#14 = 1))
: +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) AS value#14]
: +- Scan[obj#12]
+- ShuffleQueryStage 1
+- Exchange hashpartitioning(a#23, 5)
+- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#23, knownnotnull(assertnotnull(input[0, org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#24]
+- Scan[obj#22]
```
Credit also goes to carsonwang and cloud-fan
Added new UT.
Closes apache#24706 from maryannxue/aqe.
Authored-by: maryannxue <[email protected]>
Signed-off-by: herman <[email protected]>
…xecution This is to implement a ReduceNumShufflePartitions rule in the new adaptive execution framework introduced in apache#24706. This rule is used to adjust the post shuffle partitions based on the map output statistics. Added ReduceNumShufflePartitionsSuite Closes apache#24978 from carsonwang/reduceNumShufflePartitions. Authored-by: Carson Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Implemented a new SparkPlan that executes the query adaptively. It splits the query plan into independent stages and executes them in order according to their dependencies. The query stage materializes its output at the end. When one stage completes, the data statistics of the materialized output will be used to optimize the remainder of the query.
The adaptive mode is off by default, when turned on, user can see "AdaptiveSparkPlan" as the top node of a query or sub-query. The inner plan of "AdaptiveSparkPlan" is subject to change during query execution but becomes final once the execution is complete. Whether the inner plan is final is included in the EXPLAIN string. Below is an example of the EXPLAIN plan before and after execution:
Query:
Before execution:
After execution:
Credit also goes to @carsonwang and @cloud-fan
How was this patch tested?
Added new UT.