[SPARK-29552][SQL] Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and then can optimize the shuffle reader to local shuffle reader as much as possible. #26207

JkSelf · 2019-10-22T11:21:58Z

What changes were proposed in this pull request?

OptimizeLocalShuffleReader rule is very conservative and gives up optimization as long as there are extra shuffles introduced. It's very likely that most of the added local shuffle readers are fine and only one introduces extra shuffle.

However, it's very hard to make OptimizeLocalShuffleReader optimal, a simple workaround is to run this rule again right before executing a query stage.

Why are the changes needed?

Optimize more shuffle reader to local shuffle reader.

Does this PR introduce any user-facing change?

No

How was this patch tested?

existing ut

JkSelf · 2019-10-22T11:24:36Z

@cloud-fan Please help me review. Also thanks for your offline help.

cloud-fan · 2019-10-22T12:07:48Z

ok to test

cloud-fan · 2019-10-22T12:07:57Z

add to whitelist

cloud-fan · 2019-10-22T12:16:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

  @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
    ReuseAdaptiveSubquery(conf, subqueryCache),
+    // Here we need put the OptimizeLocalShuffleReader rule before
+    // ReduceNumShufflePartitions rule to avoid the further optimizaiton.


I think the comment needs to explain 2 things:

why execute this rule twice

why it must be run before OptimizeLocalShuffleReader

SparkQA · 2019-10-22T15:54:30Z

Test build #112459 has finished for PR 26207 at commit 997e994.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-10-22T21:05:05Z

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

      assert(bhj.size == 3)
      // additional shuffle exchange introduced, only one shuffle reader to local shuffle reader.
-      checkNumLocalShuffleReaders(adaptivePlan, 1)
+      checkNumLocalShuffleReaders(adaptivePlan, 2)


Just to confirm, would the change make this value consistently be 2? Because the value has changed to 2 but the value was actually flaky (neither 1 or 2 consistently) depending on the situation/randomness (maybe).

You may want to run the same for what I've discovered, 1) solely in local dev, 2) test suite in local dev, 3) trigger CI for 5 times or alike.

@HeartSaVioR With this patch, the value will consistently be 2. Because we already optimize all the possible local shuffle reader. And I have run in local dev and also the test suite, the value are all 2. Thanks.

OK thanks for confirming.

viirya · 2019-10-22T23:18:16Z

I think this PR title is not accurate as this is not just fix for flaky test, right?

JkSelf · 2019-10-23T05:04:36Z

@viirya updated the title. Thanks.

cloud-fan · 2019-10-23T05:23:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

  // optimizations should be stage-independent.
  @transient private val queryStageOptimizerRules: Seq[Rule[SparkPlan]] = Seq(
    ReuseAdaptiveSubquery(conf, subqueryCache),
+    // We will revert the all local shuffle reader node in OptimizeLocalShuffleReader rule


To polish it a little bit:

When adding local shuffle readers in `OptimizeLocalShuffleReader`, we revert all the local readers if additional shuffles are introduced. This may be too conservative: maybe there is only one local reader that introduces shuffle, and we can still keep other local readers. Here we re-execute this rule with the sub-plan-tree of a query stage, to make sure necessary local readers are added before executing the query stage. This rule must be executed before `ReduceNumShufflePartitions`, as local shuffle readers can't change number of partitions.

…e twice and before ReduceNumShufflePartitions

SparkQA · 2019-10-23T07:05:01Z

Test build #112511 has finished for PR 26207 at commit 1bc418e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-23T07:05:02Z

Test build #112516 has finished for PR 26207 at commit b372636.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-23T07:09:40Z

retest this please

SparkQA · 2019-10-23T11:15:55Z

Test build #112517 has finished for PR 26207 at commit b372636.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-10-23T17:18:18Z

thanks, merging to master!

fix the flaky test in multi-joins with local shuffle reader

997e994

JkSelf changed the title ~~fix the flaky test in multi-joins with local shuffle reader~~ [SPARK-29552][SQL] fix the flaky test in multi-joins with local shuffle reader Oct 22, 2019

cloud-fan reviewed Oct 22, 2019

View reviewed changes

HeartSaVioR reviewed Oct 22, 2019

View reviewed changes

dongjoon-hyun added the SQL label Oct 22, 2019

update the comments

1bc418e

cloud-fan reviewed Oct 23, 2019

View reviewed changes

polish the comment for why execute the OptimizaLocalShuffleReader rul…

b372636

…e twice and before ReduceNumShufflePartitions

cloud-fan closed this in 7e8e4c0 Oct 23, 2019

[SPARK-29552][SQL] Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and then can optimize the shuffle reader to local shuffle reader as much as possible. #26207

[SPARK-29552][SQL] Execute the "OptimizeLocalShuffleReader" rule when creating new query stage and then can optimize the shuffle reader to local shuffle reader as much as possible. #26207

Uh oh!

Conversation

JkSelf commented Oct 22, 2019 • edited by cloud-fan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

JkSelf commented Oct 22, 2019

Uh oh!

cloud-fan commented Oct 22, 2019

Uh oh!

cloud-fan commented Oct 22, 2019

Uh oh!

cloud-fan Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

JkSelf Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 22, 2019

Uh oh!

HeartSaVioR Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JkSelf Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Oct 23, 2019

Choose a reason for hiding this comment

Uh oh!

viirya commented Oct 22, 2019

Uh oh!

JkSelf commented Oct 23, 2019

Uh oh!

cloud-fan Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

cloud-fan commented Oct 23, 2019

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

cloud-fan commented Oct 23, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

JkSelf commented Oct 22, 2019 •

edited by cloud-fan

Loading

HeartSaVioR Oct 22, 2019 •

edited

Loading

cloud-fan Oct 23, 2019 •

edited

Loading