Skip to content

Commit 6c05459

Browse files
LuciferYangdongjoon-hyun
authored andcommitted
[SPARK-35838][BUILD][TESTS] Ensure all modules can be maven test independently in Scala 2.13
### What changes were proposed in this pull request? Similar to SPARK-35532, the main change of this pr is add `scala-2.13` profile to external/kafka-0-10-sql/pom.xml, external/avro/pom.xml and sql/hive-thriftserver/pom.xml, the `scala-2.13` profile include dependency on `scala-parallel-collections_2.13`, then all(34) spark modules can maven test independently. ### Why are the changes needed? Ensure alll(34) spark modules can be maven test independently in Scala 2.13 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass the GitHub Action Scala 2.13 job - Manual test: 1. Execute ``` dev/change-scala-version.sh 2.13 mvn clean install -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 ``` 2. maven test `external/kafka-0-10-sql` module ``` mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl external/kafka-0-10-sql ``` **before** ``` Discovery starting. Discovery completed in 857 milliseconds. Run starting. Expected test count is: 464 ... KafkaRelationSuiteV2: - explicit earliest to latest offsets - default starting and ending offsets - explicit offsets - default starting and ending offsets with headers - timestamp provided for starting and ending - timestamp provided for starting, offset provided for ending - timestamp provided for ending, offset provided for starting - timestamp provided for starting, ending not provided - timestamp provided for ending, starting not provided - global timestamp provided for starting and ending - no matched offset for timestamp - startingOffsets - preferences on offset related options - no matched offset for timestamp - endingOffsets *** RUN ABORTED *** java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1411) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:788) at org.apache.spark.SparkContext.union(SparkContext.scala:1405) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:697) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:182) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:220) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:217) ... Cause: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1411) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:788) at org.apache.spark.SparkContext.union(SparkContext.scala:1405) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:697) ... ``` **After** ``` Run completed in 33 minutes, 51 seconds. Total number of tests run: 464 Suites: completed 31, aborted 0 Tests: succeeded 464, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` 3. maven test `external/avro` module ``` mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl external/avro ``` **before** ``` Discovery starting. Discovery completed in 2 seconds, 765 milliseconds. Run starting. Expected test count is: 255 AvroReadSchemaSuite: - append column at the end - hide column at the end - append column into middle - hide column in the middle - add a nested column at the end of the leaf struct column - add a nested column in the middle of the leaf struct column - add a nested column at the end of the middle struct column - add a nested column in the middle of the middle struct column - hide a nested column at the end of the leaf struct column - hide a nested column in the middle of the leaf struct column - hide a nested column at the end of the middle struct column - hide a nested column in the middle of the middle struct column *** RUN ABORTED *** java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1411) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:788) at org.apache.spark.SparkContext.union(SparkContext.scala:1405) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:697) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:182) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:220) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:217) ... Cause: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1411) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:788) at org.apache.spark.SparkContext.union(SparkContext.scala:1405) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:697) ... ``` **After** ``` Run completed in 1 minute, 42 seconds. Total number of tests run: 255 Suites: completed 12, aborted 0 Tests: succeeded 255, failed 0, canceled 0, ignored 2, pending 0 All tests passed. ``` 4. maven test `sql/hive-thriftserver` module ``` mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl sql/hive-thriftserver ``` **before** ``` - union.sql *** FAILED *** "1 a 1 a 2 b 2 b" did not contain "Exception" Exception did not match for query #2 SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t1), expected: 1 a 1 a 2 b 2 b, but got: java.sql.SQLException org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport at org.apache.spark.sql.hive.thriftserver.HiveThriftServerErrors$.runningQueryError(HiveThriftServerErrors.scala:38) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:324) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:229) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:79) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:63) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:229) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:224) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:238) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport at org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1411) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:788) at org.apache.spark.SparkContext.union(SparkContext.scala:1405) at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:697) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:182) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:220) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:217) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:178) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:323) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:389) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3719) at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2987) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3710) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:774) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3708) at org.apache.spark.sql.Dataset.collect(Dataset.scala:2987) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:299) ... 16 more Caused by: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 40 more (ThriftServerQueryTestSuite.scala:209) ``` **After** ``` Run completed in 29 minutes, 17 seconds. Total number of tests run: 535 Suites: completed 20, aborted 0 Tests: succeeded 535, failed 0, canceled 0, ignored 17, pending 0 All tests passed. ``` Closes apache#32994 from LuciferYang/SPARK-35838. Authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 5a510cf commit 6c05459

File tree

3 files changed

+33
-1
lines changed

3 files changed

+33
-1
lines changed

external/avro/pom.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,17 @@
7575
<artifactId>xz</artifactId>
7676
</dependency>
7777
</dependencies>
78+
<profiles>
79+
<profile>
80+
<id>scala-2.13</id>
81+
<dependencies>
82+
<dependency>
83+
<groupId>org.scala-lang.modules</groupId>
84+
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
85+
</dependency>
86+
</dependencies>
87+
</profile>
88+
</profiles>
7889
<build>
7990
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
8091
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>

external/kafka-0-10-sql/pom.xml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,17 @@
169169
</dependency>
170170

171171
</dependencies>
172-
172+
<profiles>
173+
<profile>
174+
<id>scala-2.13</id>
175+
<dependencies>
176+
<dependency>
177+
<groupId>org.scala-lang.modules</groupId>
178+
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
179+
</dependency>
180+
</dependencies>
181+
</profile>
182+
</profiles>
173183
<build>
174184
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
175185
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>

sql/hive-thriftserver/pom.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,17 @@
138138
<artifactId>commons-cli</artifactId>
139139
</dependency>
140140
</dependencies>
141+
<profiles>
142+
<profile>
143+
<id>scala-2.13</id>
144+
<dependencies>
145+
<dependency>
146+
<groupId>org.scala-lang.modules</groupId>
147+
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
148+
</dependency>
149+
</dependencies>
150+
</profile>
151+
</profiles>
141152
<build>
142153
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
143154
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>

0 commit comments

Comments
 (0)