Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Mar 20, 2022

What is the purpose of the pull request

By default, for Deltastreamer and Spark datasource, if the key generator class is not configured by the user, the SimpleKeyGenerator is used by default. However, the table upgrade from v2 to v3 still expects explicit key generator class config for Spark and fails if not set. This PR sets the key generator class to org.apache.hudi.keygen.SimpleKeyGenerator for Spark client if it is not set in the write configs, so that the table upgrade can succeed without the following validation error.

22/03/14 12:28:10 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
java.lang.IllegalStateException: Missing config: Key: 'hoodie.table.keygenerator.class' , default: null description: Key Generator class property for the hoodie table since version: version is not defined deprecated after: version is not defined) or Key: 'hoodie.datasource.write.keygenerator.class' , default: null description: Key generator class, that implements `org.apache.hudi.keygen.KeyGenerator` extract a key out of incoming records. since version: version is not defined deprecated after: version is not defined)

Brief change log

  • Adds logic of setting default key generator class for Spark client in TwoToThreeUpgradeHandler
  • Adds tests in TestTwoToThreeUpgradeHandler (hudi-client-common) and TestUpgradeDowngrade (hudi-spark-client)

Verify this pull request

This change adds tests in TestTwoToThreeUpgradeHandler (hudi-client-common) and TestUpgradeDowngrade (hudi-spark-client) for spark client to verify the upgrade logic is expected.

I also tested this PR by writing a MOR table with Hudi 0.9.0 deltastreamer without setting hoodie.datasource.write.keygenerator.class first, and then using the build from this PR to continue writing the same table with the same configs. The ingestion can continue and the table upgrade is successful (before this change the table upgrade failed).

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@yihua yihua force-pushed the HUDI-3640-2to3-key-gen branch from c4ad59e to fff3082 Compare March 20, 2022 06:58
@yihua yihua force-pushed the HUDI-3640-2to3-key-gen branch from fff3082 to bd25c00 Compare March 20, 2022 07:01
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan self-assigned this Mar 20, 2022
@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Mar 20, 2022
@nsivabalan nsivabalan merged commit 9b6e138 into apache:master Mar 22, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants