[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960

MrSleeping123 · 2022-03-06T03:13:19Z

What is the purpose of the pull request

The pr is syncing source table column comments to a hive table with syncing hudi to hive when users add column comments to datasource schema.

Brief change log

Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config is false by default.
While syncing data source to hudi, add table column comments to datasource avro schema, and if the sync_comment is true, syncing column comments to the hive table.
(for example: using spark datasource)
StructType schema = new StructType().add("key", "string", false, "comment")
sparkSession.createDataFrame(rdd, schema)
.write().format("org.apache.hudi")
......
.option("hoodie.datasource.hive_sync.sync_comment","true")
......
.save("/xxxx");

Verify this pull request

Run TestHiveSyncTool#testUpdateTableComments and TestHiveSyncTool#testSyncWithCommentedSchema successfully.

This pull request is a trivial rework / code cleanup without any test coverage.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.

MrSleeping123 · 2022-03-06T03:23:10Z

@nsivabalan
@xiarixiaoyao
pls help me review this pr again. I have updated my branch to master version and modified code according to the suggestion. thanks for patient guidance.

xiarixiaoyao · 2022-03-06T08:02:53Z

@hudi-bot run azure

hudi-bot · 2022-03-06T10:15:41Z

CI report:

9185fb4 Azure: FAILURE Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

MrSleeping123 · 2022-03-06T10:20:49Z

@xiarixiaoyao the CI report has been successful. pls help me review again when you are free. thank you very much.

xiarixiaoyao · 2022-03-06T12:24:07Z

@MrSleeping123 thanks， i will review it tomorrow

xiarixiaoyao · 2022-03-07T03:40:01Z

LGTM

xiarixiaoyao · 2022-03-07T03:44:00Z

@nsivabalan if you have free time， could you pls review again ， thanks

xiarixiaoyao · 2022-03-09T01:36:54Z

@MrSleeping123 thanks for your contribute， will merge it tomorrow.

MrSleeping123 · 2022-03-10T00:16:27Z

@MrSleeping123 thanks for your contribute， will merge it tomorrow.

@xiarixiaoyao thank you very much.

) Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false. While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.

Merge branch 'apache:master' into master

9185fb4

xiarixiaoyao self-assigned this Mar 6, 2022

xiarixiaoyao approved these changes Mar 7, 2022

View reviewed changes

xiarixiaoyao merged commit 8859b48 into apache:master Mar 10, 2022

nsivabalan mentioned this pull request Apr 27, 2022

[SUPPORT] Hudi don't propagate column comments into hive metastore / parquet files #5363

Closed

parisni mentioned this pull request Jul 29, 2023

[HUDI-5533] Support spark columns comments #8683

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960

[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960

Uh oh!

MrSleeping123 commented Mar 6, 2022

Uh oh!

MrSleeping123 commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 6, 2022

Uh oh!

hudi-bot commented Mar 6, 2022

Uh oh!

MrSleeping123 commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 7, 2022

Uh oh!

xiarixiaoyao commented Mar 7, 2022

Uh oh!

xiarixiaoyao commented Mar 9, 2022

Uh oh!

MrSleeping123 commented Mar 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960

[HUDI-3383] Sync column comments while syncing a hive table, especially using spark datasource api #4960

Uh oh!

Conversation

MrSleeping123 commented Mar 6, 2022

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

MrSleeping123 commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 6, 2022

Uh oh!

hudi-bot commented Mar 6, 2022

CI report:

Uh oh!

MrSleeping123 commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 6, 2022

Uh oh!

xiarixiaoyao commented Mar 7, 2022

Uh oh!

xiarixiaoyao commented Mar 7, 2022

Uh oh!

xiarixiaoyao commented Mar 9, 2022

Uh oh!

MrSleeping123 commented Mar 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants