Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Jan 30, 2023

Change Logs

The Hudi CLI commands which require launching Spark cannot be executed in Hudi CLI shell with hudi-cli-bundle:

savepoint create --commit <latest-commit-timestamp> --sparkMaster local
savepoint delete --commit <latest-commit-timestamp> --sparkMaster local
savepoint create --commit <latest-commit-timestamp> --sparkMaster local
downgrade table --toVersion 3 --sparkMaster local
upgrade table --toVersion 5 --sparkMaster local
compaction schedule --hoodieConfigs hoodie.compact.inline.max.delta.commits=1

Sample error message:

30977 [Thread-4] INFO  org.apache.hudi.cli.utils.InputStreamConsumer [] - Error: Failed to load org.apache.hudi.cli.commands.SparkMain: org/apache/hudi/common/engine/HoodieEngineContext

The root cause is that the hudi-cli-bundle excludes the classes already in hudi-spark*-bundle, such as in hudi-common module, and the hudi-spark*-bundle is not added to the Spark launcher, so that the Spark job fails due to class not found.

This PR fixes the problem by adding the hudi-spark*-bundle specified by env variable SPARK_BUNDLE_JAR to the Spark launcher. Note that SPARK_BUNDLE_JAR is required when using hudi-cli-bundle.

Impact

Ensures that Hudi CLI commands which require launching Spark can be executed with hudi-cli-bundle. The above CLI commands are tested to be working locally with this fix.

Risk level

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@yihua yihua added cli priority:blocker Production down; release blocker labels Jan 30, 2023
@yihua
Copy link
Contributor Author

yihua commented Jan 30, 2023

@rahil-c could you review this?

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 22eab39 into apache:master Jan 30, 2023
yihua added a commit that referenced this pull request Jan 30, 2023
)

- Ensures that Hudi CLI commands which require launching Spark can be executed with hudi-cli-bundle
@rahil-c
Copy link
Collaborator

rahil-c commented Jan 30, 2023

Thanks for helping with this @yihua

fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Jan 31, 2023
…ache#7790)

- Ensures that Hudi CLI commands which require launching Spark can be executed with hudi-cli-bundle
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…ache#7790)

- Ensures that Hudi CLI commands which require launching Spark can be executed with hudi-cli-bundle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants