[SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13 #29711

LuciferYang · 2020-09-10T10:00:46Z

What changes were proposed in this pull request?

After #29660 and #29689 there are 13 remaining failed cases of sql core module with Scala 2.13.

The reason for the remaining failed cases is the optimization result of CostBasedJoinReorder maybe different with same input in Scala 2.12 and Scala 2.13 if there are more than one same cost candidate plans.

In this pr give a way to make the optimization result deterministic as much as possible to pass all remaining failed cases of sql/core module in Scala 2.13, the main change of this pr as follow:

Change to use LinkedHashMap instead of Map to store foundPlans in JoinReorderDP.search method to ensure same iteration order with same insert order because iteration order of Map behave differently under Scala 2.12 and 2.13
Fixed StarJoinCostBasedReorderSuite affected by the above change
Regenerate golden files affected by the above change.

Why are the changes needed?

We need to support a Scala 2.13 build.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Scala 2.12: Pass the Jenkins or GitHub Action
Scala 2.13: All tests passed.

Do the following:

dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
mvn test -pl sql/core -Pscala-2.13

Before

Tests: succeeded 8485, failed 13, canceled 1, ignored 52, pending 0
*** 13 TESTS FAILED ***

After

Tests: succeeded 8498, failed 0, canceled 1, ignored 52, pending 0
All tests passed.

LuciferYang · 2020-09-10T10:02:28Z

cc @srowen , this patch fixed remaining failed cases of sql core module with Scala 2.13 and after this patch all test passed now.

srowen

I'd like @cloud-fan or @gatorsmile to review just because it changes so many of the expected results.

srowen · 2020-09-10T13:40:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala

+      // SPARK-32687: Change to use `LinkedHashMap` to make sure that items are
+      // inserted and iterated in the same order.
+      val ret = new mutable.LinkedHashMap[Set[Int], JoinPlan]()
+      idToJoinPlanSeq.foreach(v => ret.put(v._1, v._2))


I think you can just construct this above, with a .foreach instead of .map? or just call .addAll on the result of the .map.
I guess LinkedHashMap takes more memory, but that shouldn't matter much here.

Got it ~ Address 10d4953 fix this.

srowen · 2020-09-10T14:53:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala


  /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]


I think this is OK, though it's not strictly necessary to make the type def this specific

Change to use LinkedHashMap instead of Map to store foundPlans in JoinReorderDP.search method to ensure same iteration order with same insert order because iteration order of Map behave differently under Scala 2.12 and 2.13

Can we make it as a separate PR? The plan changes need more reviews

Do you mean to use Map[Set[Int], JoinPlan] or not to define JoinPlanMap type?

@gatorsmile Use a separate JIRA number?

Same JIRA I think, just a separate PR. I'm not sure it matters that much though, it's a tiny part of the change really? we just need more eyes on the plan changes.

Same JIRA I think, just a separate PR. I'm not sure it matters that much though, it's a tiny part of the change really? we just need more eyes on the plan changes.

It seems that there is no way to divide 2 PR because if we change CostBasedJoinReorder only, the test cases in sql/core module will be failed.

So all excepted plan changes are due to use of LinkedHashMap instead of Map

SparkQA · 2020-09-10T16:07:43Z

Test build #128507 has finished for PR 29711 at commit 8a0bb43.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

LuciferYang · 2020-09-10T16:26:24Z

@gatorsmile @srowen @cloud-fan already make this as a separate PR SPARK-32848, #29717

SparkQA · 2020-09-10T22:17:31Z

Test build #128530 has finished for PR 29711 at commit 10d4953.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2020-09-14T13:42:35Z

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q7.sf100/explain.txt

-               :     :     :     +- * Filter (8)
-               :     :     :        +- * ColumnarToRow (7)
-               :     :     :           +- Scan parquet default.store_sales (6)
+               :     :     :  +- * BroadcastHashJoin Inner BuildRight (9)


I'm skimming the changes, and this seems non-trivial? but then again I am not so sure how to read these plans expertly enough to evaluate. The plan below starts by scanning a different table, for example.

We may just have to accept changes like this, if they're equivalent, to make sure they do not depend on implementation details of hash maps. But @gatorsmile @cloud-fan et al do these look like plausible equivalent plans for example?

srowen · 2020-09-14T13:43:14Z

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt

                          Exchange [ss_customer_sk] #2
                            WholeStageCodegen (4)
-                              Project [i_brand_id,i_brand,i_manufact_id,i_manufact,ss_customer_sk,ss_ext_sales_price,s_zip]
+                              Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]


Would this kind of thing possibly affect the order of columns from a select * or is that accounted for elsewhere?

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala

Lines 298 to 308 in 99384d1

val newJoin = Join(left, right, Inner, joinConds.reduceOption(And), JoinHint.NONE)

val collectedJoinConds = joinConds ++ oneJoinPlan.joinConds ++ otherJoinPlan.joinConds

val remainingConds = conditions -- collectedJoinConds

val neededAttr = AttributeSet(remainingConds.flatMap(_.references)) ++ topOutput

val neededFromNewJoin = newJoin.output.filter(neededAttr.contains)

val newPlan =

if ((newJoin.outputSet -- neededFromNewJoin).nonEmpty) {

Project(neededFromNewJoin, newJoin)

} else {

newJoin

}

Project node add by above code part in JoinReorderDP#buildJoin method, I think the Project output order decided by join order

cloud-fan

The change is safe, as it just switches to LinkedHashMap. I looked at a few plan changes and they are indeed no-op, e.g. change from "A join B" to "B join A" without changing the build side.

LuciferYang · 2020-09-17T02:11:54Z

@srowen @gatorsmile Is there any other problem in this pr that needs to be fixed? It seems that @cloud-fan thinks the change is safe.

srowen · 2020-09-17T22:18:45Z

I'll merge this tomorrow if there are no more objections.

srowen · 2020-09-18T15:38:36Z

Merged to master

LuciferYang · 2020-09-21T01:50:42Z

thx~ @srowen @cloud-fan @gatorsmile

LuciferYang added 3 commits September 10, 2020 13:40

use LinkedHashMap instead of Map to record foundPlans

ecfc789

fix test case in sql catalyst module

ff5fdcc

regenerate golden files

8a0bb43

probot-autolabeler bot added the SQL label Sep 10, 2020

LuciferYang changed the title ~~[SPARK-32808][SQL] Let CostBasedJoinReorder produce deterministic result with Scala 2.12 and 2.13~~ [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13 Sep 10, 2020

Merge branch 'upmaster' into SPARK-32808-3

b66ca44

srowen reviewed Sep 10, 2020

View reviewed changes

fix comments

10d4953

srowen reviewed Sep 10, 2020

View reviewed changes

LuciferYang closed this Sep 10, 2020

LuciferYang reopened this Sep 10, 2020

srowen reviewed Sep 14, 2020

View reviewed changes

cloud-fan approved these changes Sep 15, 2020

View reviewed changes

srowen closed this in 2128c4f Sep 18, 2020

LuciferYang deleted the SPARK-32808-3 branch June 6, 2022 03:44

	val newJoin = Join(left, right, Inner, joinConds.reduceOption(And), JoinHint.NONE)
	val collectedJoinConds = joinConds ++ oneJoinPlan.joinConds ++ otherJoinPlan.joinConds
	val remainingConds = conditions -- collectedJoinConds
	val neededAttr = AttributeSet(remainingConds.flatMap(_.references)) ++ topOutput
	val neededFromNewJoin = newJoin.output.filter(neededAttr.contains)
	val newPlan =
	if ((newJoin.outputSet -- neededFromNewJoin).nonEmpty) {
	Project(neededFromNewJoin, newJoin)
	} else {
	newJoin
	}

[SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13 #29711

[SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13 #29711

Uh oh!

Conversation

LuciferYang commented Sep 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Sep 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 10, 2020

Uh oh!

LuciferYang commented Sep 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 10, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Sep 17, 2020

Uh oh!

srowen commented Sep 17, 2020

Uh oh!

srowen commented Sep 18, 2020

Uh oh!

LuciferYang commented Sep 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

LuciferYang commented Sep 10, 2020 •

edited

Loading

LuciferYang commented Sep 10, 2020 •

edited

Loading

LuciferYang Sep 14, 2020 •

edited

Loading

LuciferYang commented Sep 10, 2020 •

edited

Loading