[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation #29224

andygrove · 2020-07-24T14:58:39Z

What changes were proposed in this pull request?

Provide a generic mechanism for plugins to inject rules into the AQE "query prep" stage that happens before query stage creation.

This goes along with https://issues.apache.org/jira/browse/SPARK-32332 where the current AQE implementation doesn't allow for users to properly extend it for columnar processing.

Why are the changes needed?

The issue here is that we create new query stages but we do not have access to the parent plan of the new query stage so certain things can not be determined because you have to know what the parent did. With this change it would allow you to add TAGs to be able to figure out what is going on.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

A new unit test is included in the PR.

tgravescs · 2020-07-24T15:08:43Z

ok to test

tgravescs · 2020-07-24T15:10:00Z

@cloud-fan this is fixing another issue with AQE (#29134) and overriding would your PR cover something like this as well?

SparkQA · 2020-07-24T15:14:54Z

Test build #126503 has finished for PR 29224 at commit 630e9f8.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

tgravescs · 2020-07-24T15:19:56Z

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala

+case class MyNewQueryStageRule() extends Rule[SparkPlan] {
+  override def apply(plan: SparkPlan): SparkPlan = plan.transformDown {
+    case plan if !plan.isInstanceOf[AdaptiveSparkPlanExec] =>
+      assert(plan.getTagValue(QueryPrepRuleHelper.myPrepTag).get == QueryPrepRuleHelper.myPrepTagValue)


need to wrap this line

tgravescs

Changes look good, pending Jenkins. Would be great for others to take a look.

dongjoon-hyun · 2020-07-24T15:35:18Z

Thank you, @andygrove and @tgravescs . I'll take a look, too.

dongjoon-hyun · 2020-07-24T15:38:39Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

  type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
  type FunctionDescription = (FunctionIdentifier, ExpressionInfo, FunctionBuilder)
  type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]


Please update the document (at line 47) accordingly.

Thanks. I pushed an update for this.

dongjoon-hyun

According to the PR content, this is introducing new API and is a kind of improvement. Can we consider SPARK-32430 as a Improvement instead of Bug? Or, do you think this should be in branch-3.0, @andygrove and @tgravescs ?

tgravescs · 2020-07-24T16:15:03Z

thanks for looking @dongjoon-hyun, I went back and forth on the improvement vs bug. I can see it both ways, I decided to file it as a bug because without it we can't properly override AQE as intended with the original change. It would be nice to pull these back to branch-3.0 as well since many people are starting to use AQE there and was thinking the change was pretty isolated to that code path.

dongjoon-hyun · 2020-07-24T17:46:12Z

Got it. Thank you for explanation, @tgravescs .
Since SparkSessionExtensions is marked as Experimental and this one is adding a new logic only, it sounds reasonable to me.

cc @cloud-fan and @gatorsmile since this aims to land on branch-3.0 for Apache Spark 3.0.1.

dongjoon-hyun

+1, LGTM. Thank you, @andygrove and @tgravescs .
GitHub Action passed. Merged to master/branch-3.0.

… AQE query stage preparation ### What changes were proposed in this pull request? Provide a generic mechanism for plugins to inject rules into the AQE "query prep" stage that happens before query stage creation. This goes along with https://issues.apache.org/jira/browse/SPARK-32332 where the current AQE implementation doesn't allow for users to properly extend it for columnar processing. ### Why are the changes needed? The issue here is that we create new query stages but we do not have access to the parent plan of the new query stage so certain things can not be determined because you have to know what the parent did. With this change it would allow you to add TAGs to be able to figure out what is going on. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? A new unit test is included in the PR. Closes #29224 from andygrove/insert-aqe-rule. Authored-by: Andy Grove <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 64a01c0) Signed-off-by: Dongjoon Hyun <[email protected]>

SparkQA · 2020-07-24T20:06:26Z

Test build #126504 has finished for PR 29224 at commit 68709ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-24T21:03:44Z

Test build #126505 has finished for PR 29224 at commit 182d4f9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-27T02:54:31Z

late LGTM. cc @maryannxue @JkSelf as well.

gatorsmile · 2020-08-17T02:58:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

  // A list of physical plan rules to be applied before creation of query stages. The physical
  // plan should reach a final status of query stages (i.e., no more addition or removal of
  // Exchange nodes) after running these rules.
  private def queryStagePreparationRules: Seq[Rule[SparkPlan]] = Seq(


It sounds like this function should be moved out of the physical node AdaptiveSparkPlanExec? cc @maryannxue @cloud-fan

gatorsmile · 2020-08-17T03:00:42Z

sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala

    createClone: (SparkSession, SessionState) => SessionState,
-    val columnarRules: Seq[ColumnarRule]) {
+    val columnarRules: Seq[ColumnarRule],
+    val queryStagePrepRules: Seq[Rule[SparkPlan]]) {


SessionState is a critical concept/class in Spark SQL. Adding queryStagePrepRules into SessionState looks weird to me. WDYT?

cc @hvanhovell

Preferably both columnarRules and queryStagePrepRules should be passed directly to the thing that is using them. This is not entirely easy because of how we deal with post-planning rules in QueryExecution. It would be great if someone could move those into the builder.

It's very hard. The extension rules are injected in the BaseSessionStateBuilder, and we need to carry it in SessionState to propagate it further. The custom columnarRules and queryStagePrepRules are no different from custom analyzer/optimizer/planner rules.

It is just refactoring right (famous last words)? You pull out the rules from query execution, put them in the session state builder, and manipulate them there.

Maybe I missed something. The session state builder is used to build SessionState. SparkSession or QueryExecition doesn't hold session state builder instance anymore.

Are you suggesting we create something like Analyzer, Optimizer, etc. to wrap the post-planning rules in BaseSessionStateBuilder, so that we just pass that wrapper around?

gatorsmile · 2020-08-17T03:06:36Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

  type ParserBuilder = (SparkSession, ParserInterface) => ParserInterface
  type FunctionDescription = (FunctionIdentifier, ExpressionInfo, FunctionBuilder)
  type ColumnarRuleBuilder = SparkSession => ColumnarRule
+  type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]


This is a public API. I think we also need to add a version information.

It is marked as unstable and experimental. Do we need to add version information in that case? I am fine with doing so, but then we should add it everywhere.

gatorsmile · 2020-08-17T03:07:18Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

  }

+  /**
+   * Inject a rule that can override the the query stage preparation phase of adaptive query


Nit: the the => the

hvanhovell · 2020-08-17T08:03:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

  private def queryStagePreparationRules: Seq[Rule[SparkPlan]] = Seq(
    ensureRequirements
-  )
+  ) ++ context.session.sessionState.queryStagePrepRules


Should we do this after ensure requirements? The queryStagePrepRules might break requirements.

The custom rules may need to see exchange nodes as well. We can do validation at the end to make sure exchange requirements are still required.

andygrove added 4 commits July 23, 2020 13:38

allow plugins to inject rules into AQE query stage preparation

d08003c

remove blank line

3a8309c

unit test

dcfabc5

address feedback

630e9f8

probot-autolabeler bot added the SQL label Jul 24, 2020

tgravescs reviewed Jul 24, 2020

View reviewed changes

andygrove added 2 commits July 24, 2020 09:20

fix compilation error

e884516

scalastyle

68709ea

tgravescs approved these changes Jul 24, 2020

View reviewed changes

dongjoon-hyun reviewed Jul 24, 2020

View reviewed changes

documentation update

182d4f9

dongjoon-hyun changed the title ~~[SPARK-32430] [SQL] Allow plugins to inject rules into AQE query stage preparation~~ [SPARK-32430] [SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation Jul 24, 2020

dongjoon-hyun changed the title ~~[SPARK-32430] [SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation~~ [SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation Jul 24, 2020

dongjoon-hyun approved these changes Jul 24, 2020

View reviewed changes

dongjoon-hyun closed this in 64a01c0 Jul 24, 2020

andygrove deleted the insert-aqe-rule branch July 24, 2020 18:04

gatorsmile reviewed Aug 17, 2020

View reviewed changes

hvanhovell reviewed Aug 17, 2020

View reviewed changes

revans2 mentioned this pull request Sep 28, 2020

[FEA] AQE and DPP work together in rapids NVIDIA/spark-rapids#863

Closed

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation #29224

[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage preparation #29224

Uh oh!

Conversation

andygrove commented Jul 24, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

tgravescs commented Jul 24, 2020

Uh oh!

tgravescs commented Jul 24, 2020

Uh oh!

SparkQA commented Jul 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgravescs left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jul 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

tgravescs commented Jul 24, 2020

Uh oh!

dongjoon-hyun commented Jul 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 24, 2020

Uh oh!

SparkQA commented Jul 24, 2020

Uh oh!

cloud-fan commented Jul 27, 2020

Uh oh!

gatorsmile Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hvanhovell Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

dongjoon-hyun commented Jul 24, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

gatorsmile Aug 17, 2020 •

edited

Loading

hvanhovell Aug 17, 2020 •

edited

Loading

cloud-fan Aug 17, 2020 •

edited

Loading