-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54049][BUILD] Shade com.google.thirdparty package to fix Guava class conflicts in spark 4.0 #52767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ed41206 to
17f1274
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch! I checked the jar, and com.google.thirdparty is the only missing package.
$ jar -tf spark-network-common_2.13-4.1.0-preview3.jar | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
org/
org/apache/
com/
com/google/
com/google/thirdparty/
com/google/thirdparty/publicsuffix/
com/google/thirdparty/publicsuffix/PublicSuffixPatterns.class
com/google/thirdparty/publicsuffix/PublicSuffixType.class
com/google/thirdparty/publicsuffix/TrieParser.class
|
cc @LuciferYang |
|
Is it a regression in spark master branch or it's a long standing issue? |
|
@cloud-fan it's a regression since 4.0, because we upgraded Guava from 14.0.1 to 30+ in 4.0 |
|
@cloud-fan , @pan3793 , Can we backport this change to 4.0.0 as well? |
|
Any changes necessary to SBT build? |
@vrozov IIRC, the current sbt building script does not process shading and relocation properly. |
|
@vinodkc Please update the PR title to more clearly explain what this PR is intended to do. The current description is somewhat misleading to me, leading me to mistakenly assume that after this PR, the Spark-network-common module would no longer shade Guava. Thanks |
@vrozov I think no, so far, sbt is dev only, the developer is likely to only do improvements/fixes when something goes wrong, e.g., wrong dependency version resolution causes CI failure. |
|
@LuciferYang , I've updated the PR title and description. |
|
Merged to master and branch-4.1. |
… class conflicts in spark 4.0
We upgraded Guava from 14.0.1 to 30+ in spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages:
- `com.google.common`
- `com.google.thirdparty`
Prior to this PR, only the `com.google.common` package was shaded into the spark-network-common jar, while classes under `com.google.thirdparty` remained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava.
Eg: calls to guava class `com.google.common.net.InternetDomainName` fails with the following error:
```
Caused by: java.lang.NoSuchFieldError: EXACT
at com.google.common.net.InternetDomainName.findSuffixOfType(InternetDomainName.java:226)
at com.google.common.net.InternetDomainName.publicSuffixIndex(InternetDomainName.java:185)
at com.google.common.net.InternetDomainName.hasPublicSuffix(InternetDomainName.java:400)
at com.eadx.Domain$.printDomainInfo(Domain.scala:16)
at com.eadx.TestApp$.main(TestApp.scala:16)
```
**Root Cause**:
`com.google.common.net.InternetDomainName` uses classes from `com.google.thirdparty.publicsuffix`.
The classloader resolves `com.google.common.net.InternetDomainName` from the downstream Guava jar, while `com.google.thirdparty.publicsuffix.PublicSuffixPatterns` is loaded from Spark 4.x Guava classes, leading to binary incompatibility.
Example diagnostic:
```
InternetDomainName → guava-32.0.0-jre.jar
(target/.../guava-32.0.0-jre.jar)
PublicSuffixPatterns → spark-network-common_2.13-4.0.0.jar
(target/.../spark-network-common_2.13-4.0.0.jar)
```
### What changes were proposed in this pull request?
This PR ensures package `com.google.thirdparty` is also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors.
### Why are the changes needed?
These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
No new test cases added; used existing UT and IT.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #52767 from vinodkc/br_shade_guava_thirdparty.
Authored-by: vinodkc <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit c8cb3e7)
Signed-off-by: Hyukjin Kwon <[email protected]>
|
@HyukjinKwon this should go branch-4.0 too, @vinodkc could you please open a 4.0 backport PR? |
…Guava class conflicts in spark 4.0 Backport #52767 to Spark 4.0 branch We upgraded Guava from 14.0.1 to 30+ in spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages: - `com.google.common` - `com.google.thirdparty` Prior to this PR, only the `com.google.common` package was shaded into the spark-network-common jar, while classes under `com.google.thirdparty` remained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava. Eg: calls to guava class `com.google.common.net.InternetDomainName` fails with the following error: ``` Caused by: java.lang.NoSuchFieldError: EXACT at com.google.common.net.InternetDomainName.findSuffixOfType(InternetDomainName.java:226) at com.google.common.net.InternetDomainName.publicSuffixIndex(InternetDomainName.java:185) at com.google.common.net.InternetDomainName.hasPublicSuffix(InternetDomainName.java:400) at com.eadx.Domain$.printDomainInfo(Domain.scala:16) at com.eadx.TestApp$.main(TestApp.scala:16) ``` **Root Cause**: `com.google.common.net.InternetDomainName` uses classes from `com.google.thirdparty.publicsuffix`. The classloader resolves `com.google.common.net.InternetDomainName` from the downstream Guava jar, while `com.google.thirdparty.publicsuffix.PublicSuffixPatterns` is loaded from Spark 4.x Guava classes, leading to binary incompatibility. Example diagnostic: ``` InternetDomainName → guava-32.0.0-jre.jar (target/.../guava-32.0.0-jre.jar) PublicSuffixPatterns → spark-network-common_2.13-4.0.0.jar (target/.../spark-network-common_2.13-4.0.0.jar) ``` ### What changes were proposed in this pull request? This PR ensures package `com.google.thirdparty` is also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors. ### Why are the changes needed? These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No new test cases added; used existing UT and IT. ### Was this patch authored or co-authored using generative AI tooling? No Closes #52869 from vinodkc/br_shade_guava_thirdparty_4.0. Authored-by: vinodkc <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
We upgraded Guava from 14.0.1 to 30+ in spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages:
com.google.commoncom.google.thirdpartyPrior to this PR, only the
com.google.commonpackage was shaded into the spark-network-common jar, while classes undercom.google.thirdpartyremained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava.Eg: calls to guava class
com.google.common.net.InternetDomainNamefails with the following error:Root Cause:
com.google.common.net.InternetDomainNameuses classes fromcom.google.thirdparty.publicsuffix.The classloader resolves
com.google.common.net.InternetDomainNamefrom the downstream Guava jar, whilecom.google.thirdparty.publicsuffix.PublicSuffixPatternsis loaded from Spark 4.x Guava classes, leading to binary incompatibility.Example diagnostic:
What changes were proposed in this pull request?
This PR ensures package
com.google.thirdpartyis also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors.Why are the changes needed?
These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark
Does this PR introduce any user-facing change?
No
How was this patch tested?
No new test cases added; used existing UT and IT.
Was this patch authored or co-authored using generative AI tooling?
No