Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 26, 2019

What changes were proposed in this pull request?

mllib has jaxb-runtime-2.3.2 as a runtime dependency. This PR makes it as a compile-time dependency. This doesn't change our dependency manifest and LICENSEs. Instead, this will add the following three jars into our pre-built artifacts.

  • jaxb-runtime-2.3.2.jar
  • jakarta.xml.bind-api-2.3.2.jar
  • istack-commons-runtime-3.0.8.jar

Why are the changes needed?

We need to use the followings.

  • JDK8: com.sun.xml.internal.bind.v2.ContextFactory
  • JDK11: com.sun.xml.bind.v2.ContextFactory

com.sun.xml.bind.v2.ContextFactory is inside jaxb-runtime-2.3.2.

$ javap -cp jaxb-runtime-2.3.2.jar com.sun.xml.bind.v2.ContextFactory
Compiled from "ContextFactory.java"
public class com.sun.xml.bind.v2.ContextFactory {
  public static final java.lang.String USE_JAXB_PROPERTIES;
  public com.sun.xml.bind.v2.ContextFactory();
  public static javax.xml.bind.JAXBContext createContext(java.lang.Class[], java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
  public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
  public static com.sun.xml.bind.api.JAXBRIContext createContext(java.lang.Class[], java.util.Collection<com.sun.xml.bind.api.TypeReference>, java.util.Map<java.lang.Class, java.lang.Class>, java.lang.String, boolean, com.sun.xml.bind.v2.model.annotation.RuntimeAnnotationReader, boolean, boolean, boolean, boolean) throws javax.xml.bind.JAXBException;
  public static javax.xml.bind.JAXBContext createContext(java.lang.String, java.lang.ClassLoader, java.util.Map<java.lang.String, java.lang.Object>) throws javax.xml.bind.JAXBException;
}

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the Jenkins with test-java11.

For manual testing, do the following with JDK11.

$ java -version
openjdk version "11.0.3" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.3+7)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.3+7, mixed mode)

$ build/sbt -Pyarn -Phadoop-3.2 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package

$ python/run-tests.py --python-executables python --modules pyspark-ml
...
Finished test(python): pyspark.ml.recommendation (65s)
Tests passed in 174 seconds

@HyukjinKwon
Copy link
Member

This is nice!!

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if tests pass

@HyukjinKwon HyukjinKwon changed the title [SPARK-28877][PYSPARK][test-hadoop3.2][test-java11] Make jaxb-runtime compile-time dependency [SPARK-28877][PYSPARK] Make jaxb-runtime compile-time dependency Aug 27, 2019
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109762 has finished for PR 25587 at commit 278de83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-28877][PYSPARK] Make jaxb-runtime compile-time dependency [SPARK-28877][PYSPARK][test-hadoop3.2][test-java11] Make jaxb-runtime compile-time dependency Aug 27, 2019
@HyukjinKwon
Copy link
Member

retest this please

@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval and retriggering, @HyukjinKwon .
Thank you, @shaneknapp . It's nice that I can use test-java11. :)

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's probably the right answer here. We need to bundle some (possibly many) JAXB implementations for JDK 11 and it shouldn't hurt JDK 8. We might need to slip in an extra entry in one of the license files, otherwise can't think of anything else.

EDIT: No it's already there. It's already a runtime dependency. Hm, wait, then why isn't it present for the tests already? we should already be bundling this. I somehow think this wasn't the issue, it was already fine but the issue is in how Pyspark tests form the classpath from the assembly jars/ dir. Let's take a look.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Aug 27, 2019

Thank you for review, @srowen . However, our assembly jars doesn't have it without this PR, does it? So, this is an issue in not only test class path, but also pre-built artifacts. Without this PR, the pre-build artifacts will raise the same exception on JDK11. How do you think about that?

BEFORE THIS PR

$ ls assembly/target/scala-2.12/jars/*runtime*
assembly/target/scala-2.12/jars/antlr-runtime-3.5.2.jar
assembly/target/scala-2.12/jars/antlr4-runtime-4.7.1.jar

@srowen
Copy link
Member

srowen commented Aug 27, 2019

Yeah if it causes them to be in the assembly, then it's the right fix. It sort of should already be there, but the rules for transitive dependency resolution are complex. Making it compile-scope isn't strictly correct, but if it causes the resolution to work out, seems fine, and does little harm (we won't be writing code directly depending on it).

@dongjoon-hyun
Copy link
Member Author

Ah, transitive dependency. Since this passes the dependency manifest test, I thought we are safe. But, I didn't check the others. I'll check the full diff and update the PR description. Thanks.

@srowen
Copy link
Member

srowen commented Aug 27, 2019

I bet you will find it does not change the transitive dependencies as reported by Maven or SBT - it's already a runtime dependency. This was the reason to be suspicious that it's not the fix. But the fact that it's not in the assembly is the issue, and the reasons are arcane (maybe even a more general problem) about how the assembly plugin treats transitive dependencies of various scopes. We need not get into that here; if it resolves the issue it's OK as a patch.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Aug 27, 2019

I updated the PR description. jakarta.xml.bind-api-2.3.2.jar and istack-commons-runtime-3.0.8.jar are required and added to our pre-build artifacts. For transitive dependency, what I worried was our artifact size increment.

@dongjoon-hyun
Copy link
Member Author

I'll take a look once more.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109764 has finished for PR 25587 at commit 278de83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #109768 has finished for PR 25587 at commit 278de83.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Technically, the UT failures here are irrelevant to this PR. Especially, last two failures seems to be due to the new test suite flakiness. (#25567 (comment))

@SparkQA
Copy link

SparkQA commented Aug 27, 2019

Test build #4842 has finished for PR 25587 at commit 278de83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

This PR passed the Jenkins with JDK8 building and JDK11 testing.

========================================================================
Building Spark
========================================================================
[info] Building Spark using SBT with these arguments:  -Phadoop-3.2 -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos test:package streaming-kinesis-asl-assembly/assembly
Using /usr/java/jdk1.8.0_191 as default JAVA_HOME.
...
[info] Done packaging.
[success] Total time: 21 s, completed Aug 26, 2019 8:54:41 PM

========================================================================
Running Spark unit tests
========================================================================
[info] Running Spark tests using SBT with these arguments:  -Phadoop-3.2 -Pkubernetes -Phadoop-cloud -Phive -Phive-thriftserver -Pyarn -Pspark-ganglia-lgpl -Pkinesis-asl -Pmesos -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest -Djava.version=11 test
Using /usr/java/jdk-11.0.1 as default JAVA_HOME.
...
[success] Total time: 5438 s, completed Aug 26, 2019, 10:30:04 PM

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'python3.6', 'pypy']
...
Tests passed in 1222 seconds

========================================================================
Running SparkR tests
========================================================================
WARNING: An illegal reflective access operation has occurred
...
DONE ===========================================================================

@dongjoon-hyun
Copy link
Member Author

After this PR, we can use test-java11 freely. And, we can move forward to optimize our build.

Merged to master. Thank you all.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-28877 branch August 27, 2019 06:23
@HyukjinKwon
Copy link
Member

This is nice!

@liangrui1988
Copy link

666

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants