Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR makes a bunch of changes to connect testing for the scala client:

  • We do not start the connect server with the SPARK_DIST_CLASSPATH environment variable. This is set by the build system, but its value for SBT and Maven is different. For SBT it also contained the client code.
  • We use dependency upload to add the dependencies needed for the tests. Currently this entails: the compiled test classes (class files), scalatest jars, and scalactic jars.
  • The use of classfile sync unearthed an issue with stubbing and the ExecutorClassLoader. If they load classes in the same namespace then stubbing will generate stubs for classes that can be loaded by the ExecutorClassLoader. Since this is mostly a testing issue I decided to move the test code to a different namespace. We should definitely fix this later on.
  • A bunch of tiny fixes.

Why are the changes needed?

SBT testing for connect leaked client side code into the server. This is a problem because tests pass and we sign-off on features that do not work when well in a normal environment. Stubbing was an example of this. Maven did not have this problem and was therefore more correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

It are mostly tests.

Was this patch authored or co-authored using generative AI tooling?

No. I write my own code thank you...

f.foreach(v => sum.addAndGet(v))
// The value should be 45
assert(sum.get() == -1)
throw new Exception("Success, processed records: " + sum.get())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fun one. This was throwing a scalatest specific exception. The problem with this is that this error is decoded by the ResultHandler in Spark Core, this does not have the thread so things fail with a ClassNotFoundException instead of the expected exception.

Utils.getContextOrSparkClassLoader)
val prefixes = SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES)
val userClasspathFirst = SparkEnv.get.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST)
val loader = if (prefixes.nonEmpty) {
Copy link
Contributor Author

@hvanhovell hvanhovell Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, this fixes nothing. It just makes sure we respect the userClasspathFirst setting.

val builder = new ProcessBuilder(command.result(): _*)
builder.directory(new File(sparkHome))
val environment = builder.environment()
environment.remove("SPARK_DIST_CLASSPATH")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the important bit, here we nuke the SPARK_DIST_CLASSPATH env variable.

class StubbingTestSuite extends RemoteSparkSession {
private def eval[T](f: => T): T = f

test("capture of to-be stubbed class") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a working reproduction for issue. Now that SparkResult has moved to common, I guess we need to find a new problem...

Copy link
Contributor

@LuciferYang LuciferYang Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this test case can't reproduce the problem. The new test requires a class like SparkResult

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually revert the changes from SPARK-44806 based on this PR and verified that the test cases in SparkSessionE2ESuite can pass through Maven. However, it would be best if we could find a new target case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revamped the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm it is working with Maven, and that it triggers stubbing on the server side.


val command = Seq.newBuilder[String]
command += "bin/spark-submit"
command += "--driver-class-path" += connectJar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to make the test work with maven. Assembly might produce slightly different results.

.build())
.create()

// Execute an RPC which will get retried until the server is up.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retries do not seem to work well for artifact upload, so I am falling back to this one.

@hvanhovell hvanhovell requested a review from LuciferYang August 22, 2023 03:19
@LuciferYang
Copy link
Contributor

A Java code style issue needs to be fixed.

src/test/java/org/apache/spark/sql/JavaEncoderSuite.java:[31,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.connect.client.SparkConnectClient.

@github-actions github-actions bot added the CORE label Aug 25, 2023
@hvanhovell
Copy link
Contributor Author

@LuciferYang PTAL

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.getProperty("java.class.path")
.split(File.pathSeparatorChar)
.filter(_.endsWith(".jar"))
.filter(e => e.contains("scalatest") || e.contains("scalactic"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

/Users/yangjie01/.m2/repository/org/scalatestplus/scalacheck-1-17_2.12/3.2.16.0/scalacheck-1-17_2.12-3.2.16.0.jar
/Users/yangjie01/.m2/repository/org/scalatestplus/mockito-4-11_2.12/3.2.16.0/mockito-4-11_2.12-3.2.16.0.jar
/Users/yangjie01/.m2/repository/org/scalatestplus/selenium-4-9_2.12/3.2.16.0/selenium-4-9_2.12-3.2.16.0.jar

jars also include scalacheck, mockito and selenium due to the full path includes scalatestplus, is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL - I can make it more restrictive.

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

LuciferYang pushed a commit that referenced this pull request Aug 27, 2023
### What changes were proposed in this pull request?
This PR makes a bunch of changes to connect testing for the scala client:
- We do not start the connect server with the `SPARK_DIST_CLASSPATH ` environment variable. This is set by the build system, but its value for SBT and Maven is different. For SBT it also contained the client code.
- We use dependency upload to add the dependencies needed for the tests. Currently this entails: the compiled test classes (class files), scalatest jars, and scalactic jars.
- The use of classfile sync unearthed an issue with stubbing and the `ExecutorClassLoader`. If they load classes in the same namespace then stubbing will generate stubs for classes that can be loaded by the `ExecutorClassLoader`. Since this is mostly a testing issue I decided to move the test code to a different namespace. We should definitely fix this later on.
- A bunch of tiny fixes.

### Why are the changes needed?
SBT testing for connect leaked client side code into the server. This is a problem because tests pass and we sign-off on features that do not work when well in a normal environment. Stubbing was an example of this. Maven did not have this problem and was therefore more correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
It are mostly tests.

### Was this patch authored or co-authored using generative AI tooling?
No. I write my own code thank you...

Closes #42591 from hvanhovell/investigate-stubbing.

Authored-by: Herman van Hovell <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
(cherry picked from commit 9326615)
Signed-off-by: yangjie01 <[email protected]>
@LuciferYang
Copy link
Contributor

GA passed. Merged into master and 3.5. Thanks @hvanhovell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants