[HUDI-3112] Fix KafkaConnect can not sync to Hive Problem#4458
[HUDI-3112] Fix KafkaConnect can not sync to Hive Problem#4458yihua merged 1 commit intoapache:masterfrom
Conversation
|
@yihua Thanks for reviewing this |
| /** | ||
| * Build Hive Sync Config | ||
| */ | ||
| public HiveSyncConfig buildSyncConfig(TypedProperties props, String tableBasePath) { |
There was a problem hiding this comment.
If the problem is due to irrelevant classes to Kafka Connect imported from DataSourceUtils, is it possible to move DataSourceUtils::buildHiveSyncConfig to a different/new util class so buildHiveSyncConfig() can still be reused here, instead of duplicating the code in hudi-kafka-connect module? Should that solve the problem?
There was a problem hiding this comment.
@yihua
I had checked the Hudi project when I modified the codes. Besides hive synchronization in spark, Flink also has the same problem. However, in the flink, they also redeclared a new set of variables to solve the problem.
hudi/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
Lines 576 to 610 in e9efbdb
Considering that if unification is a relatively large part of adjustment, it may be a better way to solve it with a new issue. Because there are some Scala logic in hive sync, it cannot be split directly.
There was a problem hiding this comment.
@cdmikechen Understood. I'm thinking about only moving util methods related Hive sync configs, not the Hive sync logic, to a separate Util class. The worry I have is that hive sync configs are spread into different places now and they may diverge if we forget to update all of them to be consistent.
We can keep this PR as is for now. @cdmikechen could you create a Jira ticket to track the Hive sync config unification, which will be done in a different PR in future?
|
@cdmikechen @yihua : We are targeting this patch for 0.10.1. We have code freeze planned this monday. Would be nice to get this in by then. Wanted to send out a reminder. |
| /** | ||
| * Build Hive Sync Config | ||
| */ | ||
| public HiveSyncConfig buildSyncConfig(TypedProperties props, String tableBasePath) { |
There was a problem hiding this comment.
@cdmikechen Understood. I'm thinking about only moving util methods related Hive sync configs, not the Hive sync logic, to a separate Util class. The worry I have is that hive sync configs are spread into different places now and they may diverge if we forget to update all of them to be consistent.
We can keep this PR as is for now. @cdmikechen could you create a Jira ticket to track the Hive sync config unification, which will be done in a different PR in future?
| /** | ||
| * Build Hive Sync Config | ||
| */ | ||
| public HiveSyncConfig buildSyncConfig(TypedProperties props, String tableBasePath) { |
There was a problem hiding this comment.
Let's move this util method to KafkaConnectUtils class.
|
@yihua |
What is the purpose of the pull request
KafkaConnect use
org.apache.hudi.DataSourceUtilsto build HiveSyncConfig now, butDataSourceUtilsimport some spark dependencies. So that Hive sync will fail because of the application of related classes.Brief change log
Verify this pull request
Need to add Hive sync test by https://issues.apache.org/jira/browse/HUDI-2673
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.