Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Sep 7, 2020

What changes were proposed in this pull request?

The purpose of this pr is to partial resolve SPARK-32808, total of 26 failed test cases were fixed, the related suite as follow:

  • StreamingAggregationSuite related test cases (2 FAILED -> Pass)

  • GeneratorFunctionSuite related test cases (2 FAILED -> Pass)

  • UDFSuite related test cases (2 FAILED -> Pass)

  • SQLQueryTestSuite related test cases (5 FAILED -> Pass)

  • WholeStageCodegenSuite related test cases (1 FAILED -> Pass)

  • DataFrameSuite related test cases (3 FAILED -> Pass)

  • OrcV1QuerySuite\OrcV2QuerySuite related test cases (4 FAILED -> Pass)

  • ExpressionsSchemaSuite related test cases (1 FAILED -> Pass)

  • DataFrameStatSuite related test cases (1 FAILED -> Pass)

  • JsonV1Suite\JsonV2Suite\JsonLegacyTimeParserSuite related test cases (6 FAILED -> Pass)

The main change of this pr as following:

  • Fix Scala 2.13 compilation problems in ShuffleBlockFetcherIterator and Analyzer

  • Specified Seq to scala.collection.Seq in objects.scala and GenericArrayData because internal use Seq maybe mutable.ArraySeq and not easy to call .toSeq

  • Should specified Seq to scala.collection.Seq when we call Row.getAs[Seq] and Row.get(i).asInstanceOf[Seq] because the data maybe mutable.ArraySeq but Seq is immutable.Seq in Scala 2.13

  • Use a compatible way to let + and - method of Decimal having the same behavior in Scala 2.12 and Scala 2.13

  • Call toList in RelationalGroupedDataset.toDF method when groupingExprs is Stream type because Stream can't serialize in Scala 2.13

  • Add a manual sort to classFunsMap in ExpressionsSchemaSuite because Iterable.groupBy in Scala 2.13 has different result with TraversableLike.groupBy in Scala 2.12

Why are the changes needed?

We need to support a Scala 2.13 build.

Does this PR introduce any user-facing change?

Should specified Seq to scala.collection.Seq when we call Row.getAs[Seq] and Row.get(i).asInstanceOf[Seq] because the data maybe mutable.ArraySeq but the Seq is immutable.Seq in Scala 2.13

How was this patch tested?

  • Scala 2.12: Pass the Jenkins or GitHub Action

  • Scala 2.13: Do the following:

dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
mvn test -pl sql/core -Pscala-2.13

Before

Tests: succeeded 8166, failed 319, canceled 1, ignored 52, pending 0
*** 319 TESTS FAILED ***

After

Tests: succeeded 8204, failed 286, canceled 1, ignored 52, pending 0
*** 286 TESTS FAILED ***

@LuciferYang
Copy link
Contributor Author

cc @srowen , this pr is try to pass some test case of sql/core module in Scala 2.13, other failures may related to CostBasedJoinReorder, I will check it and feedback later.

case seq: Seq[Any] => seq.toArray
// Specified this as`scala.collection.Seq` because seqOrArray can be
// `mutable.ArraySeq` in Scala 2.13
case seq: scala.collection.Seq[Any] => seq.toArray
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entrance is

object ArrayData {
def toArrayData(input: Any): ArrayData = input match {
case a: Array[Boolean] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Byte] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Short] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Int] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Long] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Float] => UnsafeArrayData.fromPrimitiveArray(a)
case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
case other => new GenericArrayData(other)
}

not easy to call toSeq, so I changed it here

if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) {
Decimal(longVal + that.longVal, Math.max(precision, that.precision), scale)
} else {
Decimal(toBigDecimal + that.toBigDecimal)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scala 2.13, + method is

def +  (that: BigDecimal): BigDecimal = new BigDecimal(this.bigDecimal.add(that.bigDecimal, mc), mc)

and in Scala 2.12 + method is

def +  (that: BigDecimal): BigDecimal = new BigDecimal(this.bigDecimal add that.bigDecimal, mc)

There are some differences in accuracy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to set a MathContext here anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we need change to use methods with MathContext ? Like BigDecimal add(BigDecimal augend, MathContext mc) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry , I think I don't fully understand this comments ....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is OK here, because we actually do not want to modify the rounding, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right ~

@LuciferYang
Copy link
Contributor Author

add executors default profile *** FAILED *** (82 milliseconds) failed in GItHub Action, but successful in local test.

@SparkQA
Copy link

SparkQA commented Sep 7, 2020

Test build #128352 has finished for PR 29660 at commit 1fa24b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

cc @srowen , this pr is try to pass some test case of sql/core module in Scala 2.13, other failures may related to CostBasedJoinReorder, I will check it and feedback later.

There are other reasons. I'm working on it

if (decimalVal.eq(null) && that.decimalVal.eq(null) && scale == that.scale) {
Decimal(longVal + that.longVal, Math.max(precision, that.precision), scale)
} else {
Decimal(toBigDecimal + that.toBigDecimal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to set a MathContext here anyway?

@srowen
Copy link
Member

srowen commented Sep 7, 2020

Do you want to add more changes here? we can merge it whenever it gets big and continue in another PR if desired.

@LuciferYang
Copy link
Contributor Author

@srowen Maybe we can merge this first, other failures are related to the 'PlanStabilitySuite' and I will continue to fix these in another pr.

@LuciferYang
Copy link
Contributor Author

Address 454b53c merge upstream master and resolve conflict file

@SparkQA
Copy link

SparkQA commented Sep 8, 2020

Test build #128381 has finished for PR 29660 at commit 454b53c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExecutorDecommissionInfo(message: String, workerHost: Option[String] = None)
  • throw new AnalysisException(s\"Can not load class '$className' when registering \" +

@srowen
Copy link
Member

srowen commented Sep 8, 2020

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Sep 8, 2020

Test build #128409 has finished for PR 29660 at commit 454b53c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ExecutorDecommissionInfo(message: String, workerHost: Option[String] = None)
  • throw new AnalysisException(s\"Can not load class '$className' when registering \" +

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Sep 9, 2020

local test mvn clean test -pl core -DwildcardSuites=org.apache.spark.scheduler.BarrierTaskContextSuite -Dtest=none all passed.

Discovery starting.
Discovery completed in 3 seconds, 740 milliseconds.
Run starting. Expected test count is: 11
BarrierTaskContextSuite:
- global sync by barrier() call
- share messages with allGather() call
- throw exception if we attempt to synchronize with different blocking calls
- successively sync with allGather and barrier
- support multiple barrier() call within a single task
- throw exception on barrier() call timeout
- throw exception if barrier() call doesn't happen on every task
- throw exception if the number of barrier() calls are not the same on every task
- barrier task killed, no interrupt
- barrier task killed, interrupt
- SPARK-31485: barrier stage should fail if only partial tasks are launched
Run completed in 4 minutes, 51 seconds.
Total number of tests run: 11
Suites: completed 2, aborted 0
Tests: succeeded 11, failed 0, canceled 0, ignored 0, pending 0

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Sep 9, 2020

Address 9185a95 re-sync master and local test core module ,all test passed.

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128430 has finished for PR 29660 at commit 9185a95.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

LuciferYang commented Sep 9, 2020

org.apache.spark.sql.hive.thriftserver.CliSuite.* failed because Database clitestdb already exists.......

@LuciferYang
Copy link
Contributor Author

local test org.apache.spark.sql.hive.thriftserver.CliSuite, all 28 case succeeded

@xuanyuanking
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128446 has finished for PR 29660 at commit 9185a95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@LuciferYang
Copy link
Contributor Author

@srowen Jenkins and GitHub Action all passed ~

@srowen srowen closed this in 513d51a Sep 9, 2020
@srowen
Copy link
Member

srowen commented Sep 9, 2020

Merged to master. I have left the JIRA open though.

@LuciferYang
Copy link
Contributor Author

thx @srowen @xuanyuanking

srowen pushed a commit that referenced this pull request Sep 18, 2020
### What changes were proposed in this pull request?

After #29660 and #29689 there are 13 remaining  failed cases of sql core module with Scala 2.13.

The reason for the remaining failed cases is the optimization result of `CostBasedJoinReorder` maybe different with same input in Scala 2.12 and Scala 2.13 if there are more than one same cost candidate plans.

In this pr give a way to make the  optimization result deterministic as much as possible to pass all remaining failed cases of `sql/core` module in Scala 2.13, the main change of this pr as follow:

- Change to use `LinkedHashMap` instead of `Map` to store `foundPlans` in `JoinReorderDP.search` method to ensure same iteration order with same insert order because iteration order of `Map` behave differently under Scala 2.12 and 2.13

- Fixed `StarJoinCostBasedReorderSuite` affected by the above change

- Regenerate golden files affected by the above change.

### Why are the changes needed?
We need to support a Scala 2.13 build.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Scala 2.12: Pass the Jenkins or GitHub Action

- Scala 2.13: All tests passed.

Do the following:

```
dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
mvn test -pl sql/core -Pscala-2.13
```

**Before**
```
Tests: succeeded 8485, failed 13, canceled 1, ignored 52, pending 0
*** 13 TESTS FAILED ***

```

**After**

```
Tests: succeeded 8498, failed 0, canceled 1, ignored 52, pending 0
All tests passed.
```

Closes #29711 from LuciferYang/SPARK-32808-3.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
a0x8o added a commit to a0x8o/spark that referenced this pull request Sep 18, 2020
### What changes were proposed in this pull request?

After apache/spark#29660 and apache/spark#29689 there are 13 remaining  failed cases of sql core module with Scala 2.13.

The reason for the remaining failed cases is the optimization result of `CostBasedJoinReorder` maybe different with same input in Scala 2.12 and Scala 2.13 if there are more than one same cost candidate plans.

In this pr give a way to make the  optimization result deterministic as much as possible to pass all remaining failed cases of `sql/core` module in Scala 2.13, the main change of this pr as follow:

- Change to use `LinkedHashMap` instead of `Map` to store `foundPlans` in `JoinReorderDP.search` method to ensure same iteration order with same insert order because iteration order of `Map` behave differently under Scala 2.12 and 2.13

- Fixed `StarJoinCostBasedReorderSuite` affected by the above change

- Regenerate golden files affected by the above change.

### Why are the changes needed?
We need to support a Scala 2.13 build.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Scala 2.12: Pass the Jenkins or GitHub Action

- Scala 2.13: All tests passed.

Do the following:

```
dev/change-scala-version.sh 2.13
mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
mvn test -pl sql/core -Pscala-2.13
```

**Before**
```
Tests: succeeded 8485, failed 13, canceled 1, ignored 52, pending 0
*** 13 TESTS FAILED ***

```

**After**

```
Tests: succeeded 8498, failed 0, canceled 1, ignored 52, pending 0
All tests passed.
```

Closes #29711 from LuciferYang/SPARK-32808-3.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
@LuciferYang LuciferYang deleted the SPARK-32808 branch June 6, 2022 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants