-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2682] Spark schema not updated with new columns on hive sync #4533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR is to solve this issue |
|
@xushiyan @nsivabalan Hello,can you please take a review? |
|
@xiarixiaoyao : Can you please review this patch. thanks. |
|
@dongkelun : can you please check if HUDI-3192 and https://issues.apache.org/jira/browse/HUDI-2682 are duplicates. if yes, please mark one of them as duplicate and close it. |
I think it's a duplicate.HUDI-3192 has been closed |
| LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties); | ||
| } | ||
| hoodieHiveClient.updateTableProperties(tableName, tableProperties); | ||
| LOG.info("Sync table properties for " + tableName + ", table properties is: " + cfg.tableProperties); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg.tableProperties ?, i think it should be tableProperties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about change this code to:
if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) {
hoodieHiveClient.updateTableProperties(tableName, tableProperties);
LOG.info("Sync table properties for " + tableName + ", table properties is: "
+ (cfg.tableProperties == null ? "" : cfg.tableProperties));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if add new columns,and cfg.tableProperties is null,then do not executeupdateTableProperties,then spark sql will not get the new columns.
I'm not sure if delete columns and update columns are the same.
If not, I think it can be judged by schemaDiff.getAddColumnTypes().isEmpty().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about change this code to: if (cfg.tableProperties != null || cfg.syncAsSparkDataSourceTable) { hoodieHiveClient.updateTableProperties(tableName, tableProperties); LOG.info("Sync table properties for " + tableName + ", table properties is: " + (cfg.tableProperties == null ? "" : cfg.tableProperties)); }
Sorry to see this new news now. Let me think about it first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to use chemaDiff.getAddColumnTypes().isEmpty(). your modify is ok, just
pay attention to that: cfg.tableProperties maybe null and only if sync DataSourceTable we need these logical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. Thank you for your reminder. Your idea is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the log like this?
LOG.info("Sync table properties for " + tableName + ", table properties is: " + tableProperties);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have submitted the newly modified code
|
hi @xiarixiaoyao . thx for looking at this. not sure we can solve this from hudi. the problem happens on spark vanilla to. see my explainations here https://lists.apache.org/thread/9mmrnc5o7w42z723s2yqgcrdpwwtts3x |
Hello, I think this PR can explain why it is necessary |
I packed and verified it today. It should solve this problem However, adding columns with Hive SQL is not supported |
|
@parisni we want sparksql tread hudi as DataSource table to have a better performace.
|
|
@dongkelun we have no way to control the behavie of hive, so i think this pr is ok. thanks for your contribution. |
does this mean the hive_sync shall be equal to |
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.