-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3178] Fixing metadata table compaction so as to not include uncommitted data #4530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nsivabalan
merged 1 commit into
apache:master
from
nsivabalan:metadataCompactionIgnoreFailedCommits
Jan 8, 2022
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -128,6 +128,13 @@ protected void commit(HoodieData<HoodieRecord> hoodieDataRecords, String partiti | |
| JavaRDD<HoodieRecord> recordRDD = prepRecords(records, partitionName, 1); | ||
|
|
||
| try (SparkRDDWriteClient writeClient = new SparkRDDWriteClient(engineContext, metadataWriteConfig, true)) { | ||
| if (canTriggerTableService) { | ||
| // trigger compaction before doing the delta commit. this is to ensure, if this delta commit succeeds in metadata table, but failed in data table, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it the case that MT commit could succeed, while Data Table commit could fail? |
||
| // we would have compacted metadata table and so could have included uncommitted data which will never be ignored while reading from metadata | ||
| // table (since reader will filter out only from delta commits) | ||
| compactIfNecessary(writeClient, instantTime); | ||
| } | ||
|
|
||
| if (!metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(instantTime)) { | ||
| // if this is a new commit being applied to metadata for the first time | ||
| writeClient.startCommitWithTime(instantTime); | ||
|
|
@@ -153,7 +160,6 @@ protected void commit(HoodieData<HoodieRecord> hoodieDataRecords, String partiti | |
| // reload timeline | ||
| metadataMetaClient.reloadActiveTimeline(); | ||
| if (canTriggerTableService) { | ||
| compactIfNecessary(writeClient, instantTime); | ||
| cleanIfNecessary(writeClient, instantTime); | ||
| writeClient.archive(); | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compactionInstantTimeat line 703 has to be based offinstantTimeand notlatestDeltaCommitTime. Latest delta commit time is not part of the compaction yet. Otherwise we are changing the meaning of the current compaction timeline with this change.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me try to explain.
lets say we have 10 commits, C1, C2 -> C10.
Prior to this patch, we will compact immediately after C10 and so compaction commit will be C10 + "001".
With this patch, we will be compacting just before C11 starts getting applied to MDT.
And so, I am basing the compaction commit of latest delta commit time which is C10 and not instant time which is C11.
And so, its C10 + "001". but if I go with instantTime, then we might change the behavior. In fact, we can't do that, since compaction time will be greater than the delta commit which will be eventually created when we apply C11 to MDT.
Let me know if this makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was actually asking the compaction time to be C10 and not C11. I misread line 689. Look good then.