Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Jan 28, 2023

Change Logs

After a70355f kryo was added explicitly shaded in a few bundles. But, it missed hudi-hive-sync-bundle. Due to that, hive sync using run_sync_tool woild fail due to java.lang.NoClassDefFoundError: com/esotericsoftware/kryo/KryoSerializable. This PR fixes it. Note that we need to add explicitly in the bundle pom because in the parent pom kryo-shaded is declared in provided scope.

Impact

Fix hive sync.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope added component:catalog-sync Catalog-sync related priority:blocker Production down; release blocker labels Jan 28, 2023
@apache apache deleted a comment from hudi-bot Jan 28, 2023
@apache apache deleted a comment from hudi-bot Jan 29, 2023
@apache apache deleted a comment from hudi-bot Jan 29, 2023
@apache apache deleted a comment from hudi-bot Jan 29, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan
Copy link
Member

xushiyan commented Jan 30, 2023

@alexeykudinkin @codope don't we want to include kryo in aws, gcp, datahub-sync, and timeline-server bundles too? these all could run as standalone app. In #7702, we mainly want to exclude kryo from spark and utilities bundles, right?

@codope
Copy link
Member Author

codope commented Jan 30, 2023

Again it is deltastreamer multiwriter testUpsertsContinuousModeWithMultipleWritersForConflicts which is failing. It's known to be flaky and under investigation. I am going to land this PR without wasting another CI cycle.

@codope codope merged commit e6c0bd6 into apache:master Jan 30, 2023
@codope
Copy link
Member Author

codope commented Jan 30, 2023

@alexeykudinkin @codope don't we want to include kryo in aws, gcp, datahub-sync, and timeline-server bundles too? these all could run as standalone app. In #7702, we mainly want to exclude kryo from spark and utilities bundles, right?

I verified hudi-aws-bundle by running glue sync in standalone mode. I was able to run the sync and query table through Athena. This suggests that AwsGlueCatalogSyncTool is not touching HoodieRecord which is where KryoSerializable is needed. That said, we should test other bundles and put up a fix if needed.

Screenshot 2023-01-30 at 11 15 57 AM

@alexeykudinkin
Copy link
Contributor

@xushiyan we should not add it to bundles that are being mixed in w/ bundles that are NOT shading (Spark)

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Jan 31, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:catalog-sync Catalog-sync related priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants