Skip to content

Conversation

@vinodkc
Copy link
Contributor

@vinodkc vinodkc commented Oct 28, 2025

We upgraded Guava from 14.0.1 to 30+ in  spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages:

  • com.google.common
  • com.google.thirdparty

Prior to this PR, only the com.google.common package was shaded into the spark-network-common jar, while classes under com.google.thirdparty remained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava.

Eg: calls to guava class com.google.common.net.InternetDomainName fails with the following error:

Caused by: java.lang.NoSuchFieldError: EXACT
        at com.google.common.net.InternetDomainName.findSuffixOfType(InternetDomainName.java:226)
        at com.google.common.net.InternetDomainName.publicSuffixIndex(InternetDomainName.java:185)
        at com.google.common.net.InternetDomainName.hasPublicSuffix(InternetDomainName.java:400)
        at com.eadx.Domain$.printDomainInfo(Domain.scala:16)
        at com.eadx.TestApp$.main(TestApp.scala:16)

Root Cause:
com.google.common.net.InternetDomainName uses classes from com.google.thirdparty.publicsuffix.
The classloader resolves com.google.common.net.InternetDomainName from the downstream Guava jar, while com.google.thirdparty.publicsuffix.PublicSuffixPatterns is loaded from Spark 4.x Guava classes, leading to binary incompatibility.

Example diagnostic:

InternetDomainName → guava-32.0.0-jre.jar
(target/.../guava-32.0.0-jre.jar)

PublicSuffixPatterns → spark-network-common_2.13-4.0.0.jar
(target/.../spark-network-common_2.13-4.0.0.jar)

What changes were proposed in this pull request?

This PR ensures package com.google.thirdparty is also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors.

Why are the changes needed?

These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark

Does this PR introduce any user-facing change?

No

How was this patch tested?

No new test cases added; used existing UT and IT.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the BUILD label Oct 28, 2025
@vinodkc vinodkc force-pushed the br_shade_guava_thirdparty branch from ed41206 to 17f1274 Compare October 28, 2025 18:47
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch! I checked the jar, and com.google.thirdparty is the only missing package.

$ jar -tf spark-network-common_2.13-4.1.0-preview3.jar | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
org/
org/apache/
com/
com/google/
com/google/thirdparty/
com/google/thirdparty/publicsuffix/
com/google/thirdparty/publicsuffix/PublicSuffixPatterns.class
com/google/thirdparty/publicsuffix/PublicSuffixType.class
com/google/thirdparty/publicsuffix/TrieParser.class

@pan3793
Copy link
Member

pan3793 commented Oct 29, 2025

cc @LuciferYang

@cloud-fan
Copy link
Contributor

Is it a regression in spark master branch or it's a long standing issue?

@pan3793
Copy link
Member

pan3793 commented Oct 29, 2025

@cloud-fan it's a regression since 4.0, because we upgraded Guava from 14.0.1 to 30+ in 4.0

@vinodkc
Copy link
Contributor Author

vinodkc commented Oct 29, 2025

@cloud-fan , @pan3793 , Can we backport this change to 4.0.0 as well?
If yes, should I create a separate follow-up PR?

@vrozov
Copy link
Member

vrozov commented Oct 29, 2025

Any changes necessary to SBT build?

@pan3793
Copy link
Member

pan3793 commented Oct 30, 2025

Any changes necessary to SBT build?

@vrozov IIRC, the current sbt building script does not process shading and relocation properly.

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 30, 2025

@vinodkc Please update the PR title to more clearly explain what this PR is intended to do. The current description is somewhat misleading to me, leading me to mistakenly assume that after this PR, the Spark-network-common module would no longer shade Guava. Thanks

@vrozov
Copy link
Member

vrozov commented Oct 30, 2025

@vrozov IIRC, the current sbt building script does not process shading and relocation properly.

@pan3793 Is there a JIRA that explains what is wrong with shading in SBT build?

@pan3793
Copy link
Member

pan3793 commented Oct 30, 2025

@vrozov IIRC, the current sbt building script does not process shading and relocation properly.

@pan3793 Is there a JIRA that explains what is wrong with shading in SBT build?

@vrozov I think no, so far, sbt is dev only, the developer is likely to only do improvements/fixes when something goes wrong, e.g., wrong dependency version resolution causes CI failure.

@vinodkc vinodkc changed the title [SPARK-54049][BUILD] Spark-network-common no longer shades all of Guava [SPARK-54049][BUILD] Shade com.google.thirdparty Package to Fix Guava Class Conflicts in Spark 4.0 Oct 30, 2025
@vinodkc vinodkc changed the title [SPARK-54049][BUILD] Shade com.google.thirdparty Package to Fix Guava Class Conflicts in Spark 4.0 [SPARK-54049][BUILD] Shade com.google.thirdparty package to fix Guava class conflicts in spark 4.0 Oct 30, 2025
@vinodkc
Copy link
Contributor Author

vinodkc commented Nov 1, 2025

@LuciferYang , I've updated the PR title and description.
Thanks

@HyukjinKwon
Copy link
Member

Merged to master and branch-4.1.

HyukjinKwon pushed a commit that referenced this pull request Nov 4, 2025
… class conflicts in spark 4.0

We upgraded Guava from 14.0.1 to 30+ in  spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages:

- `com.google.common`
- `com.google.thirdparty`

Prior to this PR, only the `com.google.common` package was shaded into the spark-network-common jar, while classes under `com.google.thirdparty` remained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava.

Eg:  calls to guava class `com.google.common.net.InternetDomainName` fails with the following error:
```
Caused by: java.lang.NoSuchFieldError: EXACT
        at com.google.common.net.InternetDomainName.findSuffixOfType(InternetDomainName.java:226)
        at com.google.common.net.InternetDomainName.publicSuffixIndex(InternetDomainName.java:185)
        at com.google.common.net.InternetDomainName.hasPublicSuffix(InternetDomainName.java:400)
        at com.eadx.Domain$.printDomainInfo(Domain.scala:16)
        at com.eadx.TestApp$.main(TestApp.scala:16)
```
**Root Cause**:
`com.google.common.net.InternetDomainName` uses classes from `com.google.thirdparty.publicsuffix`.
The classloader resolves `com.google.common.net.InternetDomainName` from the downstream Guava jar, while `com.google.thirdparty.publicsuffix.PublicSuffixPatterns` is loaded from Spark 4.x Guava classes, leading to binary incompatibility.

Example diagnostic:

```
InternetDomainName → guava-32.0.0-jre.jar
(target/.../guava-32.0.0-jre.jar)

PublicSuffixPatterns → spark-network-common_2.13-4.0.0.jar
(target/.../spark-network-common_2.13-4.0.0.jar)
```

### What changes were proposed in this pull request?

This PR ensures package  `com.google.thirdparty` is also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors.

### Why are the changes needed?

These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No new test cases added; used existing UT and IT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52767 from vinodkc/br_shade_guava_thirdparty.

Authored-by: vinodkc <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit c8cb3e7)
Signed-off-by: Hyukjin Kwon <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Nov 4, 2025

@HyukjinKwon this should go branch-4.0 too, @vinodkc could you please open a 4.0 backport PR?

HyukjinKwon pushed a commit that referenced this pull request Nov 4, 2025
…Guava class conflicts in spark 4.0

Backport #52767 to Spark 4.0 branch
We upgraded Guava from 14.0.1 to 30+ in  spark 4.0 . Guava 33.4.0 used in Spark 4 consists of two main packages:

- `com.google.common`
- `com.google.thirdparty`

Prior to this PR, only the `com.google.common` package was shaded into the spark-network-common jar, while classes under `com.google.thirdparty` remained unshaded in the spark-network-common jar. This partial shading causes classloading conflicts and runtime errors when a downstream project depends on both Spark and its own version of Guava.

Eg:  calls to guava class `com.google.common.net.InternetDomainName` fails with the following error:
```
Caused by: java.lang.NoSuchFieldError: EXACT
        at com.google.common.net.InternetDomainName.findSuffixOfType(InternetDomainName.java:226)
        at com.google.common.net.InternetDomainName.publicSuffixIndex(InternetDomainName.java:185)
        at com.google.common.net.InternetDomainName.hasPublicSuffix(InternetDomainName.java:400)
        at com.eadx.Domain$.printDomainInfo(Domain.scala:16)
        at com.eadx.TestApp$.main(TestApp.scala:16)
```
**Root Cause**:
`com.google.common.net.InternetDomainName` uses classes from `com.google.thirdparty.publicsuffix`.
The classloader resolves `com.google.common.net.InternetDomainName` from the downstream Guava jar, while `com.google.thirdparty.publicsuffix.PublicSuffixPatterns` is loaded from Spark 4.x Guava classes, leading to binary incompatibility.

Example diagnostic:

```
InternetDomainName → guava-32.0.0-jre.jar
(target/.../guava-32.0.0-jre.jar)

PublicSuffixPatterns → spark-network-common_2.13-4.0.0.jar
(target/.../spark-network-common_2.13-4.0.0.jar)
```

### What changes were proposed in this pull request?

This PR ensures package  `com.google.thirdparty` is also shaded and isolated under the sparkproject namespace in Spark, preventing downstream class conflicts and runtime errors.

### Why are the changes needed?

These changes are necessary to prevent runtime errors and class conflicts for downstream projects that depend on both Spark and Guava by restoring proper isolation of shaded Guava classes in spark

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No new test cases added; used existing UT and IT.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52869 from vinodkc/br_shade_guava_thirdparty_4.0.

Authored-by: vinodkc <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants