-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
|
@nsivabalan |
|
@hudi-bot run azure |
|
@xiarixiaoyao the CI report has been successful. pls help me review again when you are free. thank you very much. |
|
@MrSleeping123 thanks, i will review it tomorrow |
|
LGTM |
|
@nsivabalan if you have free time, could you pls review again , thanks |
|
@MrSleeping123 thanks for your contribute, will merge it tomorrow. |
@xiarixiaoyao thank you very much. |
What is the purpose of the pull request
The pr is syncing source table column comments to a hive table with syncing hudi to hive when users add column comments to datasource schema.
Brief change log
(for example: using spark datasource)
StructType schema = new StructType().add("key", "string", false, "comment")sparkSession.createDataFrame(rdd, schema).write().format("org.apache.hudi").......option("hoodie.datasource.hive_sync.sync_comment","true").......save("/xxxx");Verify this pull request
Run TestHiveSyncTool#testUpdateTableComments and TestHiveSyncTool#testSyncWithCommentedSchema successfully.
This pull request is a trivial rework / code cleanup without any test coverage.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.