Skip to content

Conversation

@shaneknapp
Copy link
Contributor

What changes were proposed in this pull request?

we need to add the ability to test PRBs against java11.

see comments here: #25405

How was this patch tested?

the build system will test this.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108987 has started for PR 25423 at commit 69e77e2.

@shaneknapp
Copy link
Contributor Author

from the console output: Using java executable: /usr/java/jdk-11.0.1/bin/java

looks like we should be good! i'll let the build run and double-check everything when it's done.

@shaneknapp
Copy link
Contributor Author

@shaneknapp shaneknapp changed the title [SPARK-28701][test-java11] adding java11 support for pull request builds [SPARK-28701][test-java11][k8s] adding java11 support for pull request builds Aug 12, 2019
@shaneknapp
Copy link
Contributor Author

test this please

@shaneknapp
Copy link
Contributor Author

Merged build finished. Test FAILed.

i manually killed the initial test because i wanted to make sure that any k8s-based tests won't be affected by this change... the prb and k8s tests run on different machines (centos vs ubuntu) and while i am certain they won't have any JAVA_HOME collisions it's cheaper to test and be sure.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

@dongjoon-hyun
Copy link
Member

Thank you, @shaneknapp !

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108988 has finished for PR 25423 at commit 69e77e2.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor Author

oof... javadoc is quite unhappy about java11:

[error] (spark/javaunidoc:doc) javadoc returned nonzero exit code
[error] Total time: 122 s, completed Aug 12, 2019, 11:01:18 AM
[error] running /home/jenkins/workspace/SparkPullRequestBuilder/build/sbt -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos unidoc ; received return code 1

this is definitely out of scope of this particular PR, but needs to be addressed. since my java skills are rather basic i could really use some help here.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

@shaneknapp
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108989 has finished for PR 25423 at commit 69e77e2.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Aug 12, 2019

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

@shaneknapp
Copy link
Contributor Author

need to run the k8s test again...

@shaneknapp
Copy link
Contributor Author

test this please

@shaneknapp
Copy link
Contributor Author

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

ugh. :\

@shaneknapp
Copy link
Contributor Author

Seems scala 2.12.8 is not completely compatible to JDK 11?

https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html#jdk-11-compatibility-notes

@srowen any ideas here?

@srowen
Copy link
Member

srowen commented Aug 12, 2019

2.12.8 should work fine enough for Spark's purposes, or at least, we aren't seeing any test failures in all but one module, for a long time now. The java module system thing is not something Spark uses.
2.12.9 advertises better support, but it has an important bug.
We'll update to 2.12.10 eventually, but I don't think there's a known issue here.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

@shaneknapp
Copy link
Contributor Author

2.12.8 should work fine enough for Spark's purposes, or at least, we aren't seeing any test failures in all but one module, for a long time now. The java module system thing is not something Spark uses.
2.12.9 advertises better support, but it has an important bug.
We'll update to 2.12.10 eventually, but I don't think there's a known issue here.

well, it might be blocking this PR (SPARK-27365), and every single java11 build on jenkins is currently broken and failing hive integration tests.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

Test build #108991 has finished for PR 25423 at commit 69e77e2.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 12, 2019

Yes, Spark does not yet pass tests on Java 11, because of Hive-related issues. That's the last big chunk of work. It's not a scala issue though.

@shaneknapp
Copy link
Contributor Author

Yes, Spark does not yet pass tests on Java 11, because of Hive-related issues. That's the last big chunk of work. It's not a scala issue though.

@srowen -- it doesn't seem to be just Hive-related issues... testing this PRB against java11 also shows that it's both failing the java/scala unidoc section and the k8s integration tests.

i guess the TL;DR here is twofold:

  1. spark built w/java11 will continue to be broken for the near future.
  2. we should definitely merge in these changes to master to set ourselves up for better java11 testing moving forward.

@SparkQA
Copy link

SparkQA commented Aug 12, 2019

@srowen
Copy link
Member

srowen commented Aug 12, 2019

It's possible; I am not sure for example https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/252/console reaches the scaladoc phase.

Hm, I wasn't aware that one wasn't checking K8S. Let me at least add these to the umbrella.

I bet we can solve both without too much trouble.

@HyukjinKwon
Copy link
Member

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

Test build #109685 has finished for PR 25423 at commit bae6524.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 25, 2019

Test build #109686 has finished for PR 25423 at commit bae6524.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if "test-java11" in os.environ["ghprbPullTitle"].lower():
os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], os.environ["PATH"])
test_profiles += ['-Djava.version=11']
Copy link
Member

@HyukjinKwon HyukjinKwon Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to set this in python tests too? Seems like Java gateway has to use JDK 11 as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should use Java 11 if the path provides Java 11 and the test harness that runs Python tests does too. At least I don't know how else one would tell pyspark what to use!

In fact I'm pretty sure the test failure here shows that it is using JDK 11. From JPMML: java.lang.ClassNotFoundException: com.sun.xml.internal.bind.v2.ContextFactory This would be caused by JDK 11 changes. However, I don't get why all the other non-Python tests don't fail.

Given the weird problem in #24651 I am wondering if we have some subtle classpath issues with how the Pyspark tests are run.

This one however might be more directly solvable by figuring out what is suggesting to use this old Sun JAXB implementation. I'll start digging around META-INF

Copy link
Member

@srowen srowen Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, and why does https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/ pass then? it is doing the same thing in the Jenkins config. (OK I think I answered my own question below)

EDIT: Oh, because it doesn't run Pyspark tests?

Copy link
Member

@HyukjinKwon HyukjinKwon Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually you're right. Yes, seems after Scala tests here, the PATH and JAVA_HOME still set as are.

I thought:

SPARK_HOME = _find_spark_home()
# Launch the Py4j gateway using Spark's run command so that we pick up the
# proper classpath and settings from spark-env.sh
on_windows = platform.system() == "Windows"
script = "./bin/spark-submit.cmd" if on_windows else "./bin/spark-submit"
command = [os.path.join(SPARK_HOME, script)]
if conf:
for k, v in conf.getAll():
command += ['--conf', '%s=%s' % (k, v)]
submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
if os.environ.get("SPARK_TESTING"):
submit_args = ' '.join([
"--conf spark.ui.enabled=false",
submit_args
])
command = command + shlex.split(submit_args)

args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"

Here somehow happened to use JDK 8.

Actually the PySpark tests and SparkR tests passed at #25443 (comment)

So, the issue persists here .. but I guess yes we can do it separately since at least this PR seems setting JDK 11 correctly, and it virtually doesn't affect any main or test code (if this title is not used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting. Thank you for the investigation, @srowen and @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a JIRA issue for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need one, yeah, regardless of the cause. I'll file one to track.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
Copy link
Member

srowen commented Aug 26, 2019

I personally think this is OK to merge simply because we need a way to test JDK 11, and this seems to do that. The rest of the error is orthogonal.

So, in order to use this in a JDK 11 Jenkins build, how would one configure the Jenkins job? it is only triggering off the PR title (which is also useful). OK if that's a future step.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Aug 26, 2019

So, in order to use this in a JDK 11 Jenkins build, how would one configure the Jenkins job? it is only triggering off the PR title (which is also useful). OK if that's a future step.

Yes, same conclusion

@HyukjinKwon
Copy link
Member

Merged to master.

@shaneknapp shaneknapp deleted the spark-prb-java11 branch August 26, 2019 16:35
shaneknapp added a commit that referenced this pull request Aug 26, 2019
…nviron

### What changes were proposed in this pull request?

i broke run-tests.py for non-PRB builds in this PR:
#25423

### Why are the changes needed?

to fix what i broke

### Does this PR introduce any user-facing change?
no

### How was this patch tested?
the build system will test this

Closes #25585 from shaneknapp/fix-run-tests.

Authored-by: shane knapp <[email protected]>
Signed-off-by: shane knapp <[email protected]>
rshkv pushed a commit to palantir/spark that referenced this pull request Jun 18, 2020
… for JDK 11

<!--
Thanks for sending a pull request!  Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
  2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
  4. Be sure to keep the PR description updated to reflect all changes.
  5. Please write your PR title to summarize what this PR proposes.
  6. If possible, provide a concise example to reproduce the issue for a faster review.
-->

### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
  1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
  2. If you fix some SQL features, you can provide some references of other DBMSes.
  3. If there is design documentation, please add the link.
  4. If there is a discussion in the mailing list, please add the link.
-->

This PR proposes to increase the tolerance for the exact value comparison in `spark.mlp` test. I don't know the root cause but some tolerance is already expected. I suspect it is not a big deal considering all other tests pass.

The values are fairly close:

JDK 8:

```
-24.28415, 107.8701, 16.86376, 1.103736, 9.244488
```

JDK 11:

```
-24.33892, 108.0316, 16.89082, 1.090723, 9.260533
```

### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
-->

To fully support JDK 11. See, for instance, apache#25443 and apache#25423 for ongoing efforts.

### Does this PR introduce any user-facing change?
<!--
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If no, write 'No'.
-->

No

### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->

Manually tested on the top of apache#25472 with JDK 11

```bash
./build/mvn -DskipTests -Psparkr -Phadoop-3.2 package
./bin/sparkR
```

```R
absoluteSparkPath <- function(x) {
  sparkHome <- sparkR.conf("spark.home")
  file.path(sparkHome, x)
}
df <- read.df(absoluteSparkPath("data/mllib/sample_multiclass_classification_data.txt"),
              source = "libsvm")
model <- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 5, 4, 3),
                   solver = "l-bfgs", maxIter = 100, tol = 0.00001, stepSize = 1, seed = 1)
summary <- summary(model)
head(summary$weights, 5)
```

Closes apache#25478 from HyukjinKwon/SPARK-28755.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants