-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table #8900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fe74a9a to
f9e3b8d
Compare
| // test metadata table compaction | ||
| // write another 4 commits | ||
| for (int i = 1; i < 5; i++) { | ||
| // write another 9 commits to trigger compaction twice. Since default clean version to retain is 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danny0405 : can you review changes in flink classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather prefer to set the clean version to retain as 1. Doing 9 commits is going to add to test time which is already quite high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to change the clean strategy then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean's retainFileVersions value is changed to 2 and it is actually hardcoded in HoodieMetadataWriteUtils and both Spark and Flink gets the writeConfig using this. So, this cannot be overridden.
The reason for making the change is to support restore of main table timeline for atleast 10 commits. Restore on main table also causes restore on metadata table. If the retainFileVersions is less than 2, then next cleaner job after compaction would remove previous file slice there by blocking restore on metadata table or loosing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for making the change is to support restore
First of all, I'm confused why this change is related with restore ? The change is for MDT log compaction right? Can we address the restore issue in another PR ?
then next cleaner job after compaction would remove previous file slice there by blocking restore on metadata table or loosing data.
Does the new cleaning strategy solve the issue, even we keep at least 2 versions for each file group, it does not ensure we can restore to a long history commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, reverting this change.
...nt/src/test/java/org/apache/hudi/client/functional/TestHoodieClientOnMergeOnReadStorage.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...ient/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
Outdated
Show resolved
Hide resolved
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...ent/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...nt/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/CompactHelpers.java
Outdated
Show resolved
Hide resolved
.../org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
Outdated
Show resolved
Hide resolved
...t/java/org/apache/hudi/client/functional/TestDataValidationCheckForLogCompactionActions.java
Show resolved
Hide resolved
| }); | ||
|
|
||
| // SOLO_COMMIT_TIMESTAMP is used during bootstrap so it is a valid timestamp | ||
| validInstantTimestamps.add(SOLO_COMMIT_TIMESTAMP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But any MDT partition initialization instant timestamp is of the format SOLO_COMMIT_TIMESTAMP + 3-digit suffix, e.g. 00000000000000010 for FILES partition. How to account for the suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. good catch. we need to account for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are couple of other places too. I've fixed it in #8915
| // test metadata table compaction | ||
| // write another 4 commits | ||
| for (int i = 1; i < 5; i++) { | ||
| // write another 9 commits to trigger compaction twice. Since default clean version to retain is 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather prefer to set the clean version to retain as 1. Doing 9 commits is going to add to test time which is already quite high.
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
...ient/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java
Outdated
Show resolved
Hide resolved
| // test metadata table compaction | ||
| // write another 4 commits | ||
| for (int i = 1; i < 5; i++) { | ||
| // write another 9 commits to trigger compaction twice. Since default clean version to retain is 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to change the clean strategy then?
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
.../org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
Show resolved
Hide resolved
| .key("hoodie.log.compaction.enable") | ||
| .defaultValue("false") | ||
| .sinceVersion("0.14") | ||
| .withDocumentation("By enabling log compaction through this config, log compaction will also get enabled for the metadata table."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems only MDT uses the log compaction. If that is true, kind of think this belongs to a MDT config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log compaction can also be done on a main table, if it of MOR table type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enable log compaction is still required so created another config in HoodieMetadataConfig class.
…various bugfixes to metadata table Following changes are also included as part of this commit: 1. Change cleaner policy to KEEP_LATEST_FILE_VERSIONS for metadata table 2. Added logs statements 3. Fix rollback of inflight deltacommits on metadata table 4. Add log statements to LogCompaction planner 5. Integrate logcompaction table service on metadata table 6. Include checks to block Logcompaction in presence of compaction with greater timestamp 7. Use instant range option for major and minor compaction on metadata table
…onfig/HoodieCompactionConfig.java Co-authored-by: Sagar Sumit <[email protected]>
…able/action/compact/CompactHelpers.java Co-authored-by: Sagar Sumit <[email protected]>
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can land once the CI is green.
| TaskContextSupplier taskContextSupplier, | ||
| Option<BaseKeyGenerator> keyGeneratorOpt) { | ||
| LOG.info("Create update handle for fileId {} and partition path {} at commit {}", fileId, partitionPath, instantTime); | ||
| if (table.requireSortedRecords()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just remove this log, it's verbose.
| TaskContextSupplier taskContextSupplier, | ||
| Option<BaseKeyGenerator> keyGeneratorOpt) { | ||
| LOG.info("Get updateHandle for fileId {} and partitionPath {} at commit {}", fileId, partitionPath, instantTime); | ||
| if (table.requireSortedRecords()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just remove this log, it's verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will change it to debug mode in subsequent patch.

Change Logs
Apart from the Logcompaction table service this change also contains other bug fixes to metadata table.
Following are the changes present.
Impact
The changes contains bug fixes to metadata table and new feature request to integrate with logcompaction. So, impact is around the metadata table.
Risk level (write none, low medium or high below)
Medium
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist