[SPARK-5052] Add common/base classes to fix guava methods signatures. #3874

elmer-garduno · 2015-01-02T03:49:05Z

Fixes problems with incorrect method signatures related to shaded classes. For discussion see the jira issue.

AmplabJenkins · 2015-01-02T03:52:09Z

Can one of the admins verify this patch?

srowen · 2015-01-02T09:11:03Z

You're right that I think this is too broad. I think I misspoke earlier. Isn't the theory here that you can bring a later version if Optional with you in your app? Spark barely uses its API. If your copy of Optional hides the one in Spark, which is only there to keep the signature the same, is that OK?

elmer-garduno · 2015-01-02T15:48:02Z

I tried that before using spark.files.userClassPathFirst, but it resulted in a java.lang.NoClassDefFoundError: org/apache/spark/Partition (full stack trace), which seemed bad enough not to go that way, but maybe someone else here knows the correct way to achieve it.

elmer-garduno · 2015-01-03T20:07:43Z

The problem seems to be fixed when using spark-submit instead of spark-class, that seems to set the appropriate precedence on the classpaths, so no need for any special flags or changes here.

Thanks

elmer-garduno · 2015-01-04T01:30:08Z

Reopening, the problem is still there when running on the standalone cluster.

AmplabJenkins · 2015-01-04T01:32:09Z

Can one of the admins verify this patch?

elmer-garduno · 2015-01-04T01:41:59Z

@vanzin, any input here? Running with the patched code seems to solve the problem. Running on 1.2.0 standalone cluster generates the following errors:

a) java.lang.NoSuchMethodError: com.google.common.base.Optional.transform(Lcom/google/common/base/Function;)Lcom/google/common/base/Optional; as explained on the issue.
b) java.lang.NoClassDefFoundError: org/apache/spark/Partition; when running with spark.files.userClassPathFirst=true.

srowen · 2015-01-05T11:51:34Z

This one always confuses me, but here's what I think I know:

The compiled Optional in Spark won't have the "correct" (meaning, matching the Google Guava Optional) signatures on its methods since other Guava classes are shaded. It's there for the Spark API to compile against the Guava class in the package that the user app expects.

Apps that uses the API method that uses Optional will be bundling Guava. Spark uses Guava 14, although in theory you can use any version... that still has the very few methods on Optional that Spark actually calls, I suppose. Because Spark will be using the user app's copy of Optional at runtime.

You say you tried it and got java.lang.NoClassDefFoundError: org/apache/spark/Partition though. That's weird. Optional will be in a different classloader (?) but shouldn't refer to Spark classes. Right? if there's a problem it's somewhere in there, since that's where how I thought this works seems to not match your experience. Or else, there's something else subtly not-quite-right about how the user app is run here.

vanzin · 2015-01-05T18:26:52Z

Damn, exposing Optional in the API turned out to be a minefield... I see the problem and I don't see a good solution other than adding Function to the list of exclusions.

That being said, I think only Function should be added. The source indicates it has no references to other types, so hopefully it shouldn't cause any other issues...

elmer-garduno · 2015-01-05T19:51:47Z

I tried adding Function before, but it had some runtime dependencies.
Please see the last part of the third comment on
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5052.

On Mon, Jan 5, 2015, 10:27 AM Marcelo Vanzin [email protected]
wrote:

Damn, exposing Optional in the API turned out to be a minefield... I see
the problem and I don't see a good solution other than adding Function to
the list of exclusions.

That being said, I think only Function should be added. The source
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/base/Function.java?name=master
indicates it has no references to other types, so hopefully it shouldn't
cause any other issues...

—
Reply to this email directly or view it on GitHub
#3874 (comment).

vanzin · 2015-01-05T20:19:23Z

What runtime dependencies does Function have? I linked to its source above. It doesn't reference any other Guava types at all.

Also, because of the or() method, Supplier should also be added to the exclusions. As with Function, it doesn't reference any other Guava type, so those two extra exclusions should be enough. If they fail your test for some reason, please post more information about what exactly is being referenced and causing problems.

srowen · 2015-01-05T20:23:37Z

The only risk here is that this increases surface area for Guava version conflict, which was the whole point of the shading. Elmer has some analysis of what happens when you do this in the JIRA: https://issues.apache.org/jira/browse/SPARK-5052?focusedCommentId=14262740&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14262740

I'm still not sure why the heck the Partition class isn't found if you just bundle your own Guava, which is how it's "supposed" to be done (right?). That seems like the path of least resistance. I haven't had a chance to look at that more.

elmer-garduno · 2015-01-05T20:31:48Z

In fact what happens is the following:

If you embed guava on your jar you still get java.lang.NoSuchMethodError: com.google.common.base.Optional.transform(Lcom/google/common/base/Function;)Lcom/google/common/base/Optional; as it seems to be loading the Optional from Spark first.

It is when you specify spark.files.userClassPathFirst=true that you get java.lang.NoClassDefFoundError: org/apache/spark/Partition; given that option is marked as experimental, I don't know to which extent it is usable.

vanzin · 2015-01-05T20:32:11Z

Yeah, that's why I'd like to restrict the exclusions as much as possible. Exposing more Guava types makes it harder and harder for people to use a different Guava version in their apps.

I'm still trying to wrap my head around the VerifyError in the bug. It doesn't really make a lot of sense. But it might be related to the fact that, if you add <include>com/google/common/base/Function*</include> to the exclusions, it's picking up more than just Function.class (i.e. it's also picking up FunctionalEquivalence.class).

elmer-garduno · 2015-01-05T20:45:05Z

Let me try just adding Function without Function*

…ignatures.

elmer-garduno · 2015-01-05T21:34:19Z

Thanks, that worked, I updated the PR to reflect those changes. And here is a list of the actual classes that get included into the jar:

jar tf assembly/target/scala-2.10/spark-assembly-1.3.0-SNAPSHOT-hadoop1.2.1.jar |grep com.google.common.base
com/google/common/base/
com/google/common/base/Absent.class
com/google/common/base/Function.class
com/google/common/base/Optional$1$1.class
com/google/common/base/Optional$1.class
com/google/common/base/Optional.class
com/google/common/base/Present.class
com/google/common/base/Supplier.class

vanzin · 2015-01-05T21:36:39Z

Latest version LGTM. Thanks!

srowen · 2015-01-05T22:02:01Z

Although further creep of the unshading-of-the-shading feels risky, it seems to resolve a problem, and is in principle OK on the same grounds that unshading Optional is. I'm still puzzled about why using Guava 14 isn't working but wouldn't argue with solving a problem this way.

elmer-garduno · 2015-01-06T16:42:02Z

Thanks all, anything else left to be done here?

vanzin · 2015-01-06T17:37:33Z

Not from your side, you just need a committer to notice this PR. :-) @pwendell ?

elmer-garduno · 2015-01-14T05:29:15Z

Ping?

elmer-garduno · 2015-01-26T03:51:03Z

Jenkins test this please.

pwendell · 2015-01-26T22:50:55Z

Jenkins, test this please.

pwendell · 2015-01-26T22:51:45Z

LGTM although it's now been rolled into #3658 (@vanzin - maybe we can merge this one first just so we give credit correctly to @elmer-garduno).

SparkQA · 2015-01-26T22:52:22Z

Test build #26121 has started for PR 3874 at commit aa5d8e0.

This patch merges cleanly.

vanzin · 2015-01-26T22:55:33Z

Yeah, no problem. I just didn't want to cause a regression later when my changes go in.

SparkQA · 2015-01-26T23:44:33Z

Test build #26121 has finished for PR 3874 at commit aa5d8e0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-26T23:44:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26121/
Test FAILed.

pwendell · 2015-01-26T23:55:11Z

Jenkins, retest this please.

SparkQA · 2015-01-26T23:57:46Z

Test build #26126 has started for PR 3874 at commit aa5d8e0.

This patch merges cleanly.

SparkQA · 2015-01-27T01:12:10Z

Test build #26126 has finished for PR 3874 at commit aa5d8e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-01-27T01:12:13Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26126/
Test PASSed.

elmer-garduno · 2015-01-27T07:26:43Z

Thanks for merging!

The current way of shading Guava is a little problematic. Code that depends on "spark-core" does not see the transitive dependency, yet classes in "spark-core" actually depend on Guava. So it's a little tricky to run unit tests that use spark-core classes, since you need a compatible version of Guava in your dependencies when running the tests. This can become a little tricky, and is kind of a bad user experience. This change modifies the way Guava is shaded so that it's applied uniformly across the Spark build. This means Guava is shaded inside spark-core itself, so that the dependency issues above are solved. Aside from that, all Spark sub-modules have their Guava references relocated, so that they refer to the relocated classes now packaged inside spark-core. Before, this was only done by the time the assembly was built, so projects that did not end up inside the assembly (such as streaming backends) could still reference the original location of Guava classes. The Guava classes are added to the "first" artifact Spark generates (network-common), so that all downstream modules have the needed classes available. Since "network-common" is a dependency of spark-core, all Spark apps should get the relocated classes automatically. Author: Marcelo Vanzin <[email protected]> Closes #3658 from vanzin/SPARK-4809 and squashes the following commits: 3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact. 5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 b3104fc [Marcelo Vanzin] Add comment. 941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874). 40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.

https://issues.apache.org/jira/browse/SPARK-5052 apache#3874

elmer-garduno closed this Jan 3, 2015

elmer-garduno reopened this Jan 4, 2015

elmer-garduno mentioned this pull request Jan 4, 2015

Upgrade to Spark 1.2.0 googlegenomics/spark-examples#61

Closed

Unshade common/base[Function|Supplier] classes to fix guava methods s…

aa5d8e0

…ignatures.

elmer-garduno force-pushed the fix_guava_signatures branch from 596ddba to aa5d8e0 Compare January 5, 2015 21:31

vanzin pushed a commit to vanzin/spark that referenced this pull request Jan 7, 2015

Add fix for SPARK-5052 (PR apache#3874).

107d7da

asfgit closed this in 661e0fc Jan 27, 2015

pwendell mentioned this pull request Jan 27, 2015

[SPARK-4809] Rework Guava library shading. #3658

Closed

sasansamit pushed a commit to sasansamit/spark that referenced this pull request May 13, 2015

made changes from

d9a774f

https://issues.apache.org/jira/browse/SPARK-5052 apache#3874

[SPARK-5052] Add common/base classes to fix guava methods signatures. #3874

[SPARK-5052] Add common/base classes to fix guava methods signatures. #3874

Uh oh!

Conversation

elmer-garduno commented Jan 2, 2015

Uh oh!

AmplabJenkins commented Jan 2, 2015

Uh oh!

srowen commented Jan 2, 2015

Uh oh!

elmer-garduno commented Jan 2, 2015

Uh oh!

elmer-garduno commented Jan 3, 2015

Uh oh!

elmer-garduno commented Jan 4, 2015

Uh oh!

AmplabJenkins commented Jan 4, 2015

Uh oh!

elmer-garduno commented Jan 4, 2015

Uh oh!

srowen commented Jan 5, 2015

Uh oh!

vanzin commented Jan 5, 2015

Uh oh!

elmer-garduno commented Jan 5, 2015

Uh oh!

vanzin commented Jan 5, 2015

Uh oh!

srowen commented Jan 5, 2015

Uh oh!

elmer-garduno commented Jan 5, 2015

Uh oh!

vanzin commented Jan 5, 2015

Uh oh!

elmer-garduno commented Jan 5, 2015

Uh oh!

elmer-garduno commented Jan 5, 2015

Uh oh!

vanzin commented Jan 5, 2015

Uh oh!

srowen commented Jan 5, 2015

Uh oh!

elmer-garduno commented Jan 6, 2015

Uh oh!

vanzin commented Jan 6, 2015

Uh oh!

elmer-garduno commented Jan 14, 2015

Uh oh!

elmer-garduno commented Jan 26, 2015

Uh oh!

pwendell commented Jan 26, 2015

Uh oh!

pwendell commented Jan 26, 2015

Uh oh!

SparkQA commented Jan 26, 2015

Uh oh!

vanzin commented Jan 26, 2015

Uh oh!

SparkQA commented Jan 26, 2015

Uh oh!

AmplabJenkins commented Jan 26, 2015

Uh oh!

pwendell commented Jan 26, 2015

Uh oh!

SparkQA commented Jan 26, 2015

Uh oh!

SparkQA commented Jan 27, 2015

Uh oh!

AmplabJenkins commented Jan 27, 2015

Uh oh!

elmer-garduno commented Jan 27, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants