Skip to content

Conversation

@vicennial
Copy link
Contributor

@vicennial vicennial commented Jun 29, 2023

What changes were proposed in this pull request?

This PR is a follow-up of #41701 and addresses the comments mentioned here. The summary is:

  • pythonIncludes are directly fetched from the ArtifactManager via SessionHolder instead of propagating through the spark conf
  • SessionHolder#withContext renamed to SessionHolder#withContextClassLoader to decrease ambiguity.
  • General increased test coverage for isolated classloading (New unit test in ArtifactManagerSuite and a new suite ClassLoaderIsolationSuite.

Why are the changes needed?

General follow-ups from here.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test suite and unit tests.

@vicennial vicennial changed the title [WIP [WIP][CONNECT][SPARK-44246] Follow-ups for Spark Connect Jar/Classfile Isolation Jun 29, 2023
@vicennial vicennial changed the title [WIP][CONNECT][SPARK-44246] Follow-ups for Spark Connect Jar/Classfile Isolation [CONNECT][SPARK-44246] Follow-ups for Spark Connect Jar/Classfile Isolation Jun 30, 2023
@vicennial vicennial marked this pull request as ready for review June 30, 2023 09:03
// Empty environment variables
envVars = Maps.newHashMap(),
pythonIncludes = pythonIncludes,
pythonIncludes = sessionHolder.artifactManager.getSparkConnectPythonIncludes.asJava,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon With directly fetching the Python Includes, I've removed the private lazy val pythonIncludes and SessionHolder#withSessionBasedPythonPaths since we don't need to propagate them via confs anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotya. This is exactly the change I planned to made this week :-). Thank you for doing this here.

@HyukjinKwon HyukjinKwon changed the title [CONNECT][SPARK-44246] Follow-ups for Spark Connect Jar/Classfile Isolation [SPARK-44246][CONNECT] Follow-ups for Spark Connect Jar/Classfile Isolation Jul 3, 2023
@HyukjinKwon HyukjinKwon changed the title [SPARK-44246][CONNECT] Follow-ups for Spark Connect Jar/Classfile Isolation [SPARK-44246][CONNECT][FOLLOW-UP] Miscellaneous cleanups for Spark Connect Jar/Classfile Isolation Jul 3, 2023
@HyukjinKwon
Copy link
Member

Merged to master.

import org.apache.spark.{JobArtifactSet, LocalSparkContext, SparkConf, SparkContext, SparkFunSuite}
import org.apache.spark.util.Utils

class ClassLoaderIsolationSuite extends SparkFunSuite with LocalSparkContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@LuciferYang LuciferYang Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should upload both TestHelloV2 and TestHelloV3 compiled jar in Scala 2.13 at the same time?

On the other hand, I think we should provide the source code of TestHelloV2 and `TestHelloV3 (such as in comments?). Otherwise, when upgrading to the major version of Scala(for example Scala 3.x), we will not know how to generate the new jar package for testing

Sorry, I found the source code in commnets

Copy link
Contributor

@LuciferYang LuciferYang Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try fix #41852

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LuciferYang

LuciferYang added a commit that referenced this pull request Jul 4, 2023
… with Scala 2.13

### What changes were proposed in this pull request?
The main change of this pr as follows:

1. rename `TestHelloV2.jar` and `TestHelloV3.jar` added in #41789 to `TestHelloV2_2.12.jar` and `TestHelloV3_2.12.jar`
2. Add corresponding `TestHelloV2_2.13.jar` and `TestHelloV3_2.13.jar` which compiled with Scala 2.13
3. Make `ClassLoaderIsolationSuite` use the correct jar in testing

### Why are the changes needed?
Make `ClassLoaderIsolationSuite` test pass with Scala 2.13.

The Scala 2.13 daily test failed after #41789
- https://github.com/apache/spark/actions/runs/5447771717/jobs/9910185372

```
[info] - Executor classloader isolation with JobArtifactSet *** FAILED *** (83 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (localhost executor driver): java.lang.NoClassDefFoundError: scala/Serializable
[info] 	at java.lang.ClassLoader.defineClass1(Native Method)
[info] 	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
[info] 	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[info] 	at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[info] 	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[info] 	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[info] 	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[info] 	at java.security.AccessController.doPrivileged(Native Method)
[info] 	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
[info] 	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[info] 	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[info] 	at java.lang.Class.forName0(Native Method)
[info] 	at java.lang.Class.forName(Class.java:348)
[info] 	at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:35)
[info] 	at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:30)
[info] 	at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
[info] 	at org.apache.spark.executor.ClassLoaderIsolationSuite.$anonfun$new$3(ClassLoaderIsolationSuite.scala:53)
[info] 	at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info] 	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
[info] 	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
[info] 	at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1028)
[info] 	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1028)
[info] 	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2406)
[info] 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info] 	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[info] 	at org.apache.spark.scheduler.Task.run(Task.scala:141)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:593)
[info] 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1478)
[info] 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:596)
[info] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] 	at java.lang.Thread.run(Thread.java:750)
[info] Caused by: java.lang.ClassNotFoundException: scala.Serializable
[info] 	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[info] 	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[info] 	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[info] 	... 33 more
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual check:

**Scala 2.12**
```
build/sbt "core/testOnly *ClassLoaderIsolationSuite"
```

```
[info] ClassLoaderIsolationSuite:
[info] - Executor classloader isolation with JobArtifactSet (1 second, 394 milliseconds)
[info] Run completed in 2 seconds, 437 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

**Scala 2.13**

```
dev/change-scala-version.sh 2.13
build/sbt "core/testOnly *ClassLoaderIsolationSuite" -Pscala-2.13
```

```
[info] ClassLoaderIsolationSuite:
[info] - Executor classloader isolation with JobArtifactSet (1 second, 355 milliseconds)
[info] Run completed in 2 seconds, 264 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Closes #41852 from LuciferYang/SPARK-44297.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
LuciferYang added a commit that referenced this pull request Sep 26, 2023
…ry files

### What changes were proposed in this pull request?
The purpose of this pr is to clean up the binary files used to assist with Scala 2.12 testing.

They include:
- `core/src/test/resources/TestHelloV3_2.12.jar` and `core/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44246(#41789).
- `connector/connect/client/jvm/src/test/resources/udf2.12` and `connector/connect/client/jvm/src/test/resources/udf2.12.jar` added by SPARK-43744(#42069)
- `connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44293(#41844)
- `sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar` added by SPARK-25304(#22308)

### Why are the changes needed?
Spark 4.0 no longer supports Scala 2.12.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43106 from LuciferYang/SPARK-45321.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants