-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3825] Fixing non-partitioned table Partition Records persistence in MT #5259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_all_partitions__" record
| if (partitionTypes.contains(MetadataPartitionType.FILES)) { | ||
| // Record which saves the list of all partitions | ||
| HoodieRecord allPartitionRecord = HoodieMetadataPayload.createPartitionListRecord(partitions); | ||
| if (partitions.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just pure duplication
| HoodieTableMetadataUtil.getPartition(partitionInfo.getRelativePath()), Option.of(validFileNameToSizeMap), Option.empty()); | ||
| }); | ||
| filesPartitionRecords = filesPartitionRecords.union(fileListRecords); | ||
| if (partitionInfoList.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just inverted conditional to simplify control flow
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…e in MT (#5259) * Filter out empty string (for non-partitioned table) being added to "__all_partitions__" record * Instead of filtering, transform empty partition-id to `NON_PARTITIONED_NAME` * Cleaned up `HoodieBackedTableMetadataWriter` * Make sure REPLACE_COMMITS are handled as well
Tips
What is the purpose of the pull request
Fixing non-partitioned table Partition Records persistence in MT.
Right now for non-partitioned table both "" (empty) and "." will be persisted as corresponding partitions, which is incorrect. Instead we make sure that we always map factual partition path (empty) into a partition-id (".") before persisting it to MT.
Brief change log
Check out above
Verify this pull request
This pull request is already covered by existing tests, such as (please describe tests).
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.