[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds #25423

shaneknapp · 2019-08-12T17:29:03Z

What changes were proposed in this pull request?

we need to add the ability to test PRBs against java11.

see comments here: #25405

How was this patch tested?

the build system will test this.

SparkQA · 2019-08-12T17:31:05Z

Test build #108987 has started for PR 25423 at commit 69e77e2.

shaneknapp · 2019-08-12T17:32:25Z

from the console output: Using java executable: /usr/java/jdk-11.0.1/bin/java

looks like we should be good! i'll let the build run and double-check everything when it's done.

shaneknapp · 2019-08-12T17:33:22Z

@HyukjinKwon @wangyum @alanfgates

shaneknapp · 2019-08-12T17:40:27Z

test this please

shaneknapp · 2019-08-12T17:42:15Z

Merged build finished. Test FAILed.

i manually killed the initial test because i wanted to make sure that any k8s-based tests won't be affected by this change... the prb and k8s tests run on different machines (centos vs ubuntu) and while i am certain they won't have any JAVA_HOME collisions it's cheaper to test and be sure.

SparkQA · 2019-08-12T17:54:45Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14056/

dongjoon-hyun · 2019-08-12T17:57:26Z

Thank you, @shaneknapp !

SparkQA · 2019-08-12T18:01:18Z

Test build #108988 has finished for PR 25423 at commit 69e77e2.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

shaneknapp · 2019-08-12T18:04:41Z

oof... javadoc is quite unhappy about java11:

[error] (spark/javaunidoc:doc) javadoc returned nonzero exit code
[error] Total time: 122 s, completed Aug 12, 2019, 11:01:18 AM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos unidoc ; received return code 1

this is definitely out of scope of this particular PR, but needs to be addressed. since my java skills are rather basic i could really use some help here.

SparkQA · 2019-08-12T18:11:03Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14056/

shaneknapp · 2019-08-12T18:11:15Z

test this please

SparkQA · 2019-08-12T18:28:15Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14057/

SparkQA · 2019-08-12T18:31:45Z

Test build #108989 has finished for PR 25423 at commit 69e77e2.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2019-08-12T18:46:14Z

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

shaneknapp · 2019-08-12T18:47:28Z

need to run the k8s test again...

shaneknapp · 2019-08-12T18:47:33Z

test this please

shaneknapp · 2019-08-12T18:50:48Z

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

ugh. :\

shaneknapp · 2019-08-12T18:52:59Z

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

@srowen any ideas here?

srowen · 2019-08-12T18:57:26Z

2.12.8 should work fine enough for Spark's purposes, or at least, we aren't seeing any test failures in all but one module, for a long time now. The java module system thing is not something Spark uses.
2.12.9 advertises better support, but it has an important bug.
We'll update to 2.12.10 eventually, but I don't think there's a known issue here.

SparkQA · 2019-08-12T18:58:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14059/

shaneknapp · 2019-08-12T19:03:12Z

2.12.8 should work fine enough for Spark's purposes, or at least, we aren't seeing any test failures in all but one module, for a long time now. The java module system thing is not something Spark uses.
2.12.9 advertises better support, but it has an important bug.
We'll update to 2.12.10 eventually, but I don't think there's a known issue here.

well, it might be blocking this PR (SPARK-27365), and every single java11 build on jenkins is currently broken and failing hive integration tests.

SparkQA · 2019-08-12T19:04:41Z

Test build #108991 has finished for PR 25423 at commit 69e77e2.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-08-12T19:07:06Z

Yes, Spark does not yet pass tests on Java 11, because of Hive-related issues. That's the last big chunk of work. It's not a scala issue though.

shaneknapp · 2019-08-12T19:42:15Z

Yes, Spark does not yet pass tests on Java 11, because of Hive-related issues. That's the last big chunk of work. It's not a scala issue though.

@srowen -- it doesn't seem to be just Hive-related issues... testing this PRB against java11 also shows that it's both failing the java/scala unidoc section and the k8s integration tests.

i guess the TL;DR here is twofold:

spark built w/java11 will continue to be broken for the near future.
we should definitely merge in these changes to master to set ourselves up for better java11 testing moving forward.

SparkQA · 2019-08-12T19:49:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14059/

srowen · 2019-08-12T19:54:37Z

It's possible; I am not sure for example https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/252/console reaches the scaladoc phase.

Hm, I wasn't aware that one wasn't checking K8S. Let me at least add these to the umbrella.

I bet we can solve both without too much trouble.

HyukjinKwon · 2019-08-25T03:07:53Z

retest this please

HyukjinKwon · 2019-08-25T03:10:35Z

retest this please

SparkQA · 2019-08-25T03:23:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14733/

SparkQA · 2019-08-25T03:39:07Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14733/

SparkQA · 2019-08-25T04:41:33Z

Test build #109685 has finished for PR 25423 at commit bae6524.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-25T04:54:36Z

Test build #109686 has finished for PR 25423 at commit bae6524.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-08-25T05:20:45Z

dev/run-tests.py

+    if "test-java11" in os.environ["ghprbPullTitle"].lower():
+        os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
+        os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], os.environ["PATH"])
+        test_profiles += ['-Djava.version=11']


Can we try to set this in python tests too? Seems like Java gateway has to use JDK 11 as well.

It should use Java 11 if the path provides Java 11 and the test harness that runs Python tests does too. At least I don't know how else one would tell pyspark what to use!

In fact I'm pretty sure the test failure here shows that it is using JDK 11. From JPMML: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory This would be caused by JDK 11 changes. However, I don't get why all the other non-Python tests don't fail.

Given the weird problem in #24651 I am wondering if we have some subtle classpath issues with how the Pyspark tests are run.

This one however might be more directly solvable by figuring out what is suggesting to use this old Sun JAXB implementation. I'll start digging around META-INF

Hm, and why does https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/ pass then? it is doing the same thing in the Jenkins config. (OK I think I answered my own question below)

EDIT: Oh, because it doesn't run Pyspark tests?

No, actually you're right. Yes, seems after Scala tests here, the PATH and JAVA_HOME still set as are.

I thought:

spark/python/pyspark/java_gateway.py

Lines 45 to 60 in 209b936

SPARK_HOME = _find_spark_home()

# Launch the Py4j gateway using Spark's run command so that we pick up the

# proper classpath and settings from spark-env.sh

on_windows = platform.system() == "Windows"

script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"

command = [os.path.join(SPARK_HOME, script)]

if conf:

for k, v in conf.getAll():

command += ['--conf', '%s=%s' % (k, v)]

submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")

if os.environ.get("SPARK_TESTING"):

submit_args = ' '.join([

"--conf spark.ui.enabled=false",

submit_args

])

command = command + shlex.split(submit_args)

spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

Line 425 in 3cb8204

args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"

Here somehow happened to use JDK 8.

Actually the PySpark tests and SparkR tests passed at #25443 (comment)

So, the issue persists here .. but I guess yes we can do it separately since at least this PR seems setting JDK 11 correctly, and it virtually doesn't affect any main or test code (if this title is not used).

It's interesting. Thank you for the investigation, @srowen and @HyukjinKwon

Do we have a JIRA issue for this?

We probably need one, yeah, regardless of the cause. I'll file one to track.

https://issues.apache.org/jira/browse/SPARK-28877

srowen · 2019-08-26T15:19:56Z

I personally think this is OK to merge simply because we need a way to test JDK 11, and this seems to do that. The rest of the error is orthogonal.

So, in order to use this in a JDK 11 Jenkins build, how would one configure the Jenkins job? it is only triggering off the PR title (which is also useful). OK if that's a future step.

HyukjinKwon · 2019-08-26T15:47:20Z

So, in order to use this in a JDK 11 Jenkins build, how would one configure the Jenkins job? it is only triggering off the PR title (which is also useful). OK if that's a future step.

Yes, same conclusion

HyukjinKwon · 2019-08-26T15:47:38Z

Merged to master.

…nviron ### What changes were proposed in this pull request? i broke run-tests.py for non-PRB builds in this PR: #25423 ### Why are the changes needed? to fix what i broke ### Does this PR introduce any user-facing change? no ### How was this patch tested? the build system will test this Closes #25585 from shaneknapp/fix-run-tests. Authored-by: shane knapp <[email protected]> Signed-off-by: shane knapp <[email protected]>

… for JDK 11  ### What changes were proposed in this pull request?  This PR proposes to increase the tolerance for the exact value comparison in `spark.mlp` test. I don't know the root cause but some tolerance is already expected. I suspect it is not a big deal considering all other tests pass. The values are fairly close: JDK 8: ``` -24.28415, 107.8701, 16.86376, 1.103736, 9.244488 ``` JDK 11: ``` -24.33892, 108.0316, 16.89082, 1.090723, 9.260533 ``` ### Why are the changes needed?  To fully support JDK 11. See, for instance, apache#25443 and apache#25423 for ongoing efforts. ### Does this PR introduce any user-facing change?  No ### How was this patch tested?  Manually tested on the top of apache#25472 with JDK 11 ```bash ./build/mvn -DskipTests -Psparkr -Phadoop-3.2 package ./bin/sparkR ``` ```R absoluteSparkPath <- function(x) { sparkHome <- sparkR.conf("spark.home") file.path(sparkHome, x) } df <- read.df(absoluteSparkPath("data/mllib/sample_multiclass_classification_data.txt"), source = "libsvm") model <- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 5, 4, 3), solver = "l-bfgs", maxIter = 100, tol = 0.00001, stepSize = 1, seed = 1) summary <- summary(model) head(summary$weights, 5) ``` Closes apache#25478 from HyukjinKwon/SPARK-28755. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

adding java11 support for pull request builds

69e77e2

shaneknapp changed the title ~~[SPARK-28701][test-java11] adding java11 support for pull request builds~~ [SPARK-28701][test-java11][k8s] adding java11 support for pull request builds Aug 12, 2019

dongjoon-hyun added the BUILD label Aug 12, 2019

remove debugging, lower-case the title of the PRB

e608ada

HyukjinKwon reviewed Aug 25, 2019

View reviewed changes

HyukjinKwon approved these changes Aug 26, 2019

View reviewed changes

HyukjinKwon closed this in 13fd32c Aug 26, 2019

shaneknapp deleted the spark-prb-java11 branch August 26, 2019 16:35

shaneknapp mentioned this pull request Aug 26, 2019

[SPARK-28701][INFRA][FOLLOWUP] Fix the key error when looking in os.environ #25585

Closed

	SPARK_HOME = _find_spark_home()
	# Launch the Py4j gateway using Spark's run command so that we pick up the
	# proper classpath and settings from spark-env.sh
	on_windows = platform.system() == "Windows"
	script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
	command = [os.path.join(SPARK_HOME, script)]
	if conf:
	for k, v in conf.getAll():
	command += ['--conf', '%s=%s' % (k, v)]
	submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
	if os.environ.get("SPARK_TESTING"):
	submit_args = ' '.join([
	"--conf spark.ui.enabled=false",
	submit_args
	])
	command = command + shlex.split(submit_args)

[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds #25423

[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds #25423

Uh oh!

Conversation

shaneknapp commented Aug 12, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

dongjoon-hyun commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

viirya commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

srowen commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

srowen commented Aug 12, 2019

Uh oh!

shaneknapp commented Aug 12, 2019

Uh oh!

SparkQA commented Aug 12, 2019

Uh oh!

srowen commented Aug 12, 2019

Uh oh!

HyukjinKwon commented Aug 25, 2019

Uh oh!

HyukjinKwon commented Aug 25, 2019

Uh oh!

SparkQA commented Aug 25, 2019

Uh oh!

SparkQA commented Aug 25, 2019

Uh oh!

SparkQA commented Aug 25, 2019

Uh oh!

SparkQA commented Aug 25, 2019

Uh oh!

HyukjinKwon Aug 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srowen Aug 26, 2019

Choose a reason for hiding this comment

Uh oh!

srowen Aug 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

HyukjinKwon Aug 25, 2019 •

edited

Loading

srowen Aug 26, 2019 •

edited

Loading

HyukjinKwon Aug 26, 2019 •

edited

Loading

HyukjinKwon commented Aug 26, 2019 •

edited

Loading