Skip to content

Conversation

@MrSleeping123
Copy link
Contributor

What is the purpose of the pull request

The pr is syncing source table column comments to a hive table with syncing hudi to hive when users add column comments to datasource schema.

Brief change log

  • Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config is false by default.
  • While syncing data source to hudi, add table column comments to datasource avro schema, and if the sync_comment is true, syncing column comments to the hive table.
    (for example: using spark datasource)
    StructType schema = new StructType().add("key", "string", false, "comment")
    sparkSession.createDataFrame(rdd, schema)
    .write().format("org.apache.hudi")
    ......
    .option("hoodie.datasource.hive_sync.sync_comment","true")
    ......
    .save("/xxxx");

Verify this pull request

Run TestHiveSyncTool#testUpdateTableComments and TestHiveSyncTool#testSyncWithCommentedSchema successfully.

This pull request is a trivial rework / code cleanup without any test coverage.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
@MrSleeping123
Copy link
Contributor Author

@nsivabalan
@xiarixiaoyao
pls help me review this pr again. I have updated my branch to master version and modified code according to the suggestion. thanks for patient guidance.

@xiarixiaoyao
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

hudi-bot commented Mar 6, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@MrSleeping123
Copy link
Contributor Author

@xiarixiaoyao the CI report has been successful. pls help me review again when you are free. thank you very much.

@xiarixiaoyao
Copy link
Contributor

@MrSleeping123 thanks, i will review it tomorrow

@xiarixiaoyao xiarixiaoyao self-assigned this Mar 6, 2022
@xiarixiaoyao
Copy link
Contributor

LGTM

@xiarixiaoyao
Copy link
Contributor

@nsivabalan if you have free time, could you pls review again , thanks

@xiarixiaoyao
Copy link
Contributor

@MrSleeping123 thanks for your contribute, will merge it tomorrow.

@MrSleeping123
Copy link
Contributor Author

@MrSleeping123 thanks for your contribute, will merge it tomorrow.

@xiarixiaoyao thank you very much.

@xiarixiaoyao xiarixiaoyao merged commit 8859b48 into apache:master Mar 10, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
)

Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
)

Desc: Add a hive sync config(hoodie.datasource.hive_sync.sync_comment). This config defaults to false.
While syncing data source to hudi, add column comments to source avro schema, and the sync_comment is true, syncing column comments to the hive table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants