-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-44784][CONNECT] Make SBT testing hermetic. #42591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| f.foreach(v => sum.addAndGet(v)) | ||
| // The value should be 45 | ||
| assert(sum.get() == -1) | ||
| throw new Exception("Success, processed records: " + sum.get()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fun one. This was throwing a scalatest specific exception. The problem with this is that this error is decoded by the ResultHandler in Spark Core, this does not have the thread so things fail with a ClassNotFoundException instead of the expected exception.
| Utils.getContextOrSparkClassLoader) | ||
| val prefixes = SparkEnv.get.conf.get(CONNECT_SCALA_UDF_STUB_PREFIXES) | ||
| val userClasspathFirst = SparkEnv.get.conf.get(EXECUTOR_USER_CLASS_PATH_FIRST) | ||
| val loader = if (prefixes.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, this fixes nothing. It just makes sure we respect the userClasspathFirst setting.
| val builder = new ProcessBuilder(command.result(): _*) | ||
| builder.directory(new File(sparkHome)) | ||
| val environment = builder.environment() | ||
| environment.remove("SPARK_DIST_CLASSPATH") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the important bit, here we nuke the SPARK_DIST_CLASSPATH env variable.
| class StubbingTestSuite extends RemoteSparkSession { | ||
| private def eval[T](f: => T): T = f | ||
|
|
||
| test("capture of to-be stubbed class") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a working reproduction for issue. Now that SparkResult has moved to common, I guess we need to find a new problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this test case can't reproduce the problem. The new test requires a class like SparkResult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually revert the changes from SPARK-44806 based on this PR and verified that the test cases in SparkSessionE2ESuite can pass through Maven. However, it would be best if we could find a new target case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revamped the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm it is working with Maven, and that it triggers stubbing on the server side.
|
|
||
| val command = Seq.newBuilder[String] | ||
| command += "bin/spark-submit" | ||
| command += "--driver-class-path" += connectJar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to make the test work with maven. Assembly might produce slightly different results.
| .build()) | ||
| .create() | ||
|
|
||
| // Execute an RPC which will get retried until the server is up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retries do not seem to work well for artifact upload, so I am falling back to this one.
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala
Outdated
Show resolved
Hide resolved
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/test/RemoteSparkSession.scala
Outdated
Show resolved
Hide resolved
|
A Java code style issue needs to be fixed. |
|
@LuciferYang PTAL |
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| .getProperty("java.class.path") | ||
| .split(File.pathSeparatorChar) | ||
| .filter(_.endsWith(".jar")) | ||
| .filter(e => e.contains("scalatest") || e.contains("scalactic")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
/Users/yangjie01/.m2/repository/org/scalatestplus/scalacheck-1-17_2.12/3.2.16.0/scalacheck-1-17_2.12-3.2.16.0.jar
/Users/yangjie01/.m2/repository/org/scalatestplus/mockito-4-11_2.12/3.2.16.0/mockito-4-11_2.12-3.2.16.0.jar
/Users/yangjie01/.m2/repository/org/scalatestplus/selenium-4-9_2.12/3.2.16.0/selenium-4-9_2.12-3.2.16.0.jar
jars also include scalacheck, mockito and selenium due to the full path includes scalatestplus, is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL - I can make it more restrictive.
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### What changes were proposed in this pull request? This PR makes a bunch of changes to connect testing for the scala client: - We do not start the connect server with the `SPARK_DIST_CLASSPATH ` environment variable. This is set by the build system, but its value for SBT and Maven is different. For SBT it also contained the client code. - We use dependency upload to add the dependencies needed for the tests. Currently this entails: the compiled test classes (class files), scalatest jars, and scalactic jars. - The use of classfile sync unearthed an issue with stubbing and the `ExecutorClassLoader`. If they load classes in the same namespace then stubbing will generate stubs for classes that can be loaded by the `ExecutorClassLoader`. Since this is mostly a testing issue I decided to move the test code to a different namespace. We should definitely fix this later on. - A bunch of tiny fixes. ### Why are the changes needed? SBT testing for connect leaked client side code into the server. This is a problem because tests pass and we sign-off on features that do not work when well in a normal environment. Stubbing was an example of this. Maven did not have this problem and was therefore more correct. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? It are mostly tests. ### Was this patch authored or co-authored using generative AI tooling? No. I write my own code thank you... Closes #42591 from hvanhovell/investigate-stubbing. Authored-by: Herman van Hovell <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit 9326615) Signed-off-by: yangjie01 <[email protected]>
|
GA passed. Merged into master and 3.5. Thanks @hvanhovell |
What changes were proposed in this pull request?
This PR makes a bunch of changes to connect testing for the scala client:
SPARK_DIST_CLASSPATHenvironment variable. This is set by the build system, but its value for SBT and Maven is different. For SBT it also contained the client code.ExecutorClassLoader. If they load classes in the same namespace then stubbing will generate stubs for classes that can be loaded by theExecutorClassLoader. Since this is mostly a testing issue I decided to move the test code to a different namespace. We should definitely fix this later on.Why are the changes needed?
SBT testing for connect leaked client side code into the server. This is a problem because tests pass and we sign-off on features that do not work when well in a normal environment. Stubbing was an example of this. Maven did not have this problem and was therefore more correct.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
It are mostly tests.
Was this patch authored or co-authored using generative AI tooling?
No. I write my own code thank you...