[SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE #28954

manuzhang · 2020-06-30T11:55:25Z

What changes were proposed in this pull request?

As suggested by @cloud-fan in #28916 (comment), apply CoalesceShufflePartitions with partitionSpecs of Nil when ShuffleQueryStageExec#mapStats is None.

Why are the changes needed?

For SQL like

SELECT b, COUNT(t1.a) as cnt
FROM t1
INNER JOIN t2
ON t1.id = t2.id
WHERE t1.id > 10
GROUP BY b

when all ids of t1 are smaller than 10. Many unnecessary tasks are launched for the final shuffle stage because CoalesceShufflePartitions is skipped when input RDD has 0 partitions.

Before

After

Does this PR introduce any user-facing change?

No

How was this patch tested?

Updated tests.

… 0 partitions with AQE

cloud-fan · 2020-06-30T14:52:54Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala

        }
+      }
+
+      if (validMetrics.isEmpty) {


if a query stage has multiple leaf shuffles, and only one of them has 0-partition input RDD. What shall we do?

I think it's like coalescing one less shuffles and handled by the nonEmpty codes.

cloud-fan · 2020-06-30T14:53:36Z

cc @maryannxue @JkSelf @viirya

viirya · 2020-06-30T15:57:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala

+      }
+
+      if (validMetrics.isEmpty) {
+        updatePlan(Nil)


Can you add a comment for the case of 0-partition?

SparkQA · 2020-06-30T22:26:20Z

Test build #124648 has finished for PR 28954 at commit 5bf6de5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-01T06:27:11Z

After more thoughts, maybe a better way is to add a new rule in AdaptiveSparkPlanExec.optimizer, which converts LogicalQueryStage to empty LocalRelation if the size is 0.

This is not really "coalesce partitions" and we'd better not do it in CoalesceShufflePartitions.

JkSelf · 2020-07-01T08:58:05Z

@manuzhang I run the test("Empty stage coalesced to 0-partition RDD") in AdaptiveQueryExecSuite. It seems there is no unnecessary tasks for the empty partitions. The Spark UI is as following. And can you give a simple examples to reproduce this issue? Thanks.

The related code is :

spark.conf.set("spark.sql.adaptive.enabled", "true")
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
val df1 = spark.range(10).withColumn("a", 'id)
val df2 = spark.range(10).withColumn("b", 'id)
val result = df1.where('a > 10).join(df2.where('b > 10), "id").groupBy('a).count()
result.show()

manuzhang · 2020-07-01T09:17:29Z

@JkSelf Try changing result.show() to result.collect()

JkSelf · 2020-07-01T09:22:39Z

@manuzhang It seems still only one stage and no unnecessary task for empty partitions.

related code:

spark.conf.set("spark.sql.adaptive.enabled", "true")
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
val df1 = spark.range(10).withColumn("a", 'id)
val df2 = spark.range(10).withColumn("b", 'id)
val result = df1.where('a > 10).join(df2.where('b > 10), "id").groupBy('a).count()
result.collect()

manuzhang · 2020-07-02T03:59:19Z

@JkSelf Could you check the jobs tab ?

JkSelf · 2020-07-02T13:30:00Z

@manuzhang Here is the jobs tab.

viirya · 2020-07-02T15:48:44Z

When input RDD has 0 partitions, even CoalesceShufflePartitions is skipped, why we will launch unnecessary tasks?

manuzhang · 2020-07-03T12:40:04Z

@JkSelf @viirya
Here is the partial SQL UI of running the same example with default number of shuffle partitions. (the binary is built from master branch till June 29). You can see SortMergeJoin is followed by an Exchange of 200 partitions.

manuzhang · 2020-07-03T13:08:38Z

@JkSelf
I get the same result as you when I set the numPartitions of source to 1 (By default, it's 16 on my Mac), i.e.

val df1 = spark.range(0, 10, 1, 1).withColumn("a", 'id)
val df2 = spark.range(0, 10, 1, 1).withColumn("b", 'id)
val testDf = df1.where('a > 10).join(df2.where('b > 10), "id").groupBy('a).count()
testDf.collect()

Compare the execution plan with above.

manuzhang · 2020-07-10T11:57:06Z

@cloud-fan I find an issue with updating metrics if I convert a LogicalQueryStage to LocalRelation when its child's mapStats is empty.

java.util.NoSuchElementException: key not found: 514
	at scala.collection.immutable.Map$Map1.apply(Map.scala:114)
	at org.apache.spark.sql.execution.ui.SQLAppStatusListener.$anonfun$aggregateMetrics$11(SQLAppStatusListener.scala:257)

The missing key is from the metrics of a RangeExec sent to SQLAppStatusListener earlier

[SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has…

5bf6de5

… 0 partitions with AQE

probot-autolabeler bot added the SQL label Jun 30, 2020

manuzhang mentioned this pull request Jun 30, 2020

[SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE #28916

Closed

cloud-fan reviewed Jun 30, 2020

View reviewed changes

viirya reviewed Jun 30, 2020

View reviewed changes

manuzhang closed this Aug 14, 2020

[SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE #28954

[SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE #28954

Uh oh!

Conversation

manuzhang commented Jun 30, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Before

After

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

cloud-fan Jun 30, 2020

Choose a reason for hiding this comment

Uh oh!

manuzhang Jul 1, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jun 30, 2020

Uh oh!

viirya Jun 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 30, 2020

Uh oh!

cloud-fan commented Jul 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JkSelf commented Jul 1, 2020

Uh oh!

manuzhang commented Jul 1, 2020

Uh oh!

JkSelf commented Jul 1, 2020

Uh oh!

manuzhang commented Jul 2, 2020

Uh oh!

JkSelf commented Jul 2, 2020

Uh oh!

viirya commented Jul 2, 2020

Uh oh!

manuzhang commented Jul 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manuzhang commented Jul 3, 2020

Uh oh!

manuzhang commented Jul 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

viirya Jun 30, 2020 •

edited

Loading

cloud-fan commented Jul 1, 2020 •

edited

Loading

manuzhang commented Jul 3, 2020 •

edited

Loading