-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5275] Fix reading data using the HoodieHiveCatalog will cause the Spark write to fail #7295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,6 +50,7 @@ | |
|
|
||
| import static org.apache.flink.table.factories.FactoryUtil.CONNECTOR; | ||
| import static org.apache.hudi.common.table.HoodieTableMetaClient.AUXILIARYFOLDER_NAME; | ||
| import static org.apache.hudi.common.table.HoodieTableMetaClient.METAFOLDER_NAME; | ||
|
|
||
| /** | ||
| * Helper class to read/write flink table options as a map. | ||
|
|
@@ -64,6 +65,7 @@ public class TableOptionProperties { | |
| static final Map<String, String> KEY_MAPPING = new HashMap<>(); | ||
|
|
||
| private static final String FILE_NAME = "table_option.properties"; | ||
| private static final String HOODUE_PROP_FILE_NAME = "hoodie.properties"; | ||
|
|
||
| public static final String PK_CONSTRAINT_NAME = "pk.constraint.name"; | ||
| public static final String PK_COLUMNS = "pk.columns"; | ||
|
|
@@ -89,6 +91,7 @@ public class TableOptionProperties { | |
| KEY_MAPPING.put(FlinkOptions.RECORD_KEY_FIELD.key(), "primaryKey"); | ||
| KEY_MAPPING.put(FlinkOptions.PRECOMBINE_FIELD.key(), "preCombineField"); | ||
| KEY_MAPPING.put(FlinkOptions.PAYLOAD_CLASS_NAME.key(), "payloadClass"); | ||
| KEY_MAPPING.put(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), FlinkOptions.HIVE_STYLE_PARTITIONING.key()); | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -113,6 +116,18 @@ public static void createProperties(String basePath, | |
| */ | ||
| public static Map<String, String> loadFromProperties(String basePath, Configuration hadoopConf) { | ||
| Path propertiesFilePath = getPropertiesFilePath(basePath); | ||
| return getPropsFromFile(basePath, hadoopConf, propertiesFilePath); | ||
| } | ||
|
|
||
| /** | ||
| * Read table options map from the given table base path. | ||
| */ | ||
| public static Map<String, String> loadFromHoodiePropertieFile(String basePath, Configuration hadoopConf) { | ||
| Path propertiesFilePath = getHoodiePropertiesFilePath(basePath); | ||
| return getPropsFromFile(basePath, hadoopConf, propertiesFilePath); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, gentle ping :)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, just saw it. Is it better to replace loadFromHoodiePropertieFile with StreamerUtil.getTableConfig here? |
||
|
|
||
| private static Map<String, String> getPropsFromFile(String basePath, Configuration hadoopConf, Path propertiesFilePath) { | ||
| Map<String, String> options = new HashMap<>(); | ||
| Properties props = new Properties(); | ||
|
|
||
|
|
@@ -134,6 +149,11 @@ private static Path getPropertiesFilePath(String basePath) { | |
| return new Path(auxPath, FILE_NAME); | ||
| } | ||
|
|
||
| private static Path getHoodiePropertiesFilePath(String basePath) { | ||
| String auxPath = basePath + Path.SEPARATOR + METAFOLDER_NAME; | ||
| return new Path(auxPath, HOODUE_PROP_FILE_NAME); | ||
| } | ||
|
|
||
| public static String getPkConstraintName(Map<String, String> options) { | ||
| return options.get(PK_CONSTRAINT_NAME); | ||
| } | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to put the explicit table options here ? Which function needs this config and where we miss it then ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can read the table config through
StreamerUtil.createMetaClient(xx).getTableConfig()We actually get the original hive table params first here:
parameters = hiveTable.getParameters(), you mean there are options missing for spark ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, after Spark created the table, there are some table properties in .hoodie/hoodie.properties, such as hoodie.datasource.write.hive_style_partitioning. These attributes are not included in the hive table attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we only append hive options here, curious why this can cause error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because spark uses the hoodie.datasource.write.hive_style_partitioning property to be true when creating hudi non-partitioned tables, and records hoodie.datasource.write.hive_style_partitioning=true in hoodie.properties. But this property is not seen in hoodiehivecatalog, and inferred It is also inferred that it is wrong, so hoodie.datasource.write.hive_style_partitioning is assigned a value of false and uploaded to hive options. As a result, spark has two different hoodie.datasource.write.hive_style_partitioning attribute values when viewing the table, thus reporting an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!parameters.containsKey(FlinkOptions.HIVE_STYLE_PARTITIONING.key())) {
Path hoodieTablePath = new Path(path);
boolean hiveStyle = Arrays.stream(FSUtils.getFs(hoodieTablePath, hiveConf).listStatus(hoodieTablePath))
.map(fileStatus -> fileStatus.getPath().getName())
.filter(f -> !f.equals(".hoodie") && !f.equals("default"))
.anyMatch(FilePathUtils::isHiveStylePartitioning);
parameters.put(FlinkOptions.HIVE_STYLE_PARTITIONING.key(), String.valueOf(hiveStyle));
}
--- The params in this code are not inferred according to the hoodie.properties property file, resulting in an inference error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you, i think we should not put extra options like
FlinkOptions.HIVE_STYLE_PARTITIONINGhere, the right fix is to supplement the table config options for the flink read/write path, I have applied a patch actually.