Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Feb 8, 2023

Change Logs

Addresses an issue of following relocations configs in MR/Spark bundles stranded after removal of Guava from Hudi Spark and MR bundles:

<relocation>
  <pattern>com.google.common.</pattern>
  <shadedPattern>org.apache.hudi.com.google.common.</shadedPattern>
</relocation> 

Such relocations entailed that all references from any class (included into the Hudi bundle) referencing Guava would be shaded, even though Hudi isn't packaging Guava anymore, potentially resulting in exception when these classes try to access Guava provided by Spark for ex:

Caused by: java.lang.NoClassDefFoundError: org/apache/hudi/com/google/common/base/Preconditions
	at org.apache.curator.ensemble.fixed.FixedEnsembleProvider.<init>(FixedEnsembleProvider.java:39)
	at org.apache.curator.framework.CuratorFrameworkFactory$Builder.connectString(CuratorFrameworkFactory.java:193)
	at org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider$.buildZookeeperClient(ZookeeperClientProvider.scala:62)
	at org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient.<init>(ZookeeperDiscoveryClient.scala:65)
	... 45 more 

Impact

See above

Risk level (write none, low medium or high below)

Low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@alexeykudinkin alexeykudinkin changed the title [MINOR] Cleaning up unnecessary relocation for com.google.common packages [HUDI-5731] Cleaning up unnecessary relocation for com.google.common packages Feb 8, 2023
@alexeykudinkin alexeykudinkin added dependencies Dependency updates priority:critical Production degraded; pipelines stalled labels Feb 8, 2023
@pan3793
Copy link
Member

pan3793 commented Feb 9, 2023

Thanks for fixing this issue. And I think curator should be relocated/removed as well.

The issue happens on Kyuubi IT because

  1. Kyuubi engine(a Spark job) invokes the curator to access ZK
  2. During the testing, Kyuubi engine loads the vanilla curator classes from the maven test scope classpath which includes both vanilla curator jars and Hudi bundle jars
  3. Since Hudi bundle jar contains vanilla curator classes, it has chances to be loaded by Kyuubi engine
  4. On production workloads, Kyuubi uses the shaded and relocated curator classes(shipped by Kyuubi fat jar) so there is no problem.

Based on the above facts, I think Hudi bundle jar should exclude the curator because the spark binary dist already shipped it, or relocated it to avoid class conflicts.

Has a quick glance, rocksdb and some other libs are in the same position w/ curator, please consider excluding or relocating them to avoid potential class conflicts.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Feb 9, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@alexeykudinkin alexeykudinkin merged commit 60dfe4d into apache:master Feb 9, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…ache#7900)

Addresses an issue of following relocations configs in MR/Spark bundles stranded after removal of Guava from Hudi Spark and MR bundles:

```
<relocation>
  <pattern>com.google.common.</pattern>
  <shadedPattern>org.apache.hudi.com.google.common.</shadedPattern>
</relocation> 
```

Such relocations entailed that all references from any class (included into the Hudi bundle) referencing Guava would be shaded, even though Hudi isn't packaging Guava anymore, potentially resulting in exception when these classes try to access Guava provided by Spark for ex:

```
Caused by: java.lang.NoClassDefFoundError: org/apache/hudi/com/google/common/base/Preconditions
	at org.apache.curator.ensemble.fixed.FixedEnsembleProvider.<init>(FixedEnsembleProvider.java:39)
	at org.apache.curator.framework.CuratorFrameworkFactory$Builder.connectString(CuratorFrameworkFactory.java:193)
	at org.apache.kyuubi.ha.client.zookeeper.ZookeeperClientProvider$.buildZookeeperClient(ZookeeperClientProvider.scala:62)
	at org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient.<init>(ZookeeperDiscoveryClient.scala:65)
	... 45 more 
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates priority:critical Production degraded; pipelines stalled

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants