-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-5950] Fixing pending instant deduction to trigger compaction in MDT #8223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yihua
approved these changes
Mar 17, 2023
Contributor
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
danny0405
reviewed
Mar 18, 2023
| .getTimestamp(); | ||
| List<HoodieInstant> pendingInstants = dataMetaClient.reloadActiveTimeline().filterInflightsAndRequested() | ||
| .findInstantsBefore(latestDeltaCommitTimeInMetadataTable).getInstants(); | ||
| // ignore pending indexing action |
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add some documents to help the understanding of the use case, sth like this:
delta_c1 (F3, F4) (MDT)
delta_c1 (F1, F2) (DT)
//
c2.inflight (compaction triggers in DT)
c2 (F7, F8) (compaction complete in MDT)
c2 fails to commit to DT
//
delta_c4 (F9, F10) (MDT)
-- can we trigger MDT compaction here? The answer is no!
1. we have no instant filtering for HFile reader
2. the avro log merge reader has instant filtering, but it can only filter out the invalid instants, not rollback
delta_c4 (F11, F12) (DT)
//
r5 (to rollback c2) (MDT)
-F7, -F8
r5 (to rollback c2) (DT)
//
delta_c6 (F13, F14) (MDT) -- now the compaction can be triggered
delta_c6 (F15, F16) (DT)20827b4 to
8dcd6dc
Compare
danny0405
approved these changes
Mar 20, 2023
Collaborator
Contributor
Author
nsivabalan
added a commit
to nsivabalan/hudi
that referenced
this pull request
Mar 22, 2023
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
nsivabalan
added a commit
to nsivabalan/hudi
that referenced
this pull request
Mar 23, 2023
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
fengjian428
pushed a commit
to fengjian428/hudi
that referenced
this pull request
Apr 5, 2023
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
stayrascal
pushed a commit
to stayrascal/hudi
that referenced
this pull request
Apr 20, 2023
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
flashJd
pushed a commit
to flashJd/hudi
that referenced
this pull request
May 5, 2023
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT. (cherry picked from commit f7c0d31)
KnightChess
pushed a commit
to KnightChess/hudi
that referenced
this pull request
Jan 2, 2024
… MDT (apache#8223) - Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

Change Logs
Fixing a corner case bug where compaction in MDT could get triggered w/ partially failed commit in DT.
Currently the logic to deduce pending instants in MDT is as below
a = we get latest completed delta commit from MDT.
Find any inflights in DT timeline before {a}
and if we don't find any such inflights, we will go ahead and may be compact MDT.
But what incase the latest delta commit in MDT succeeded in MDT, but failed in DT. so, it could potentially result in triggering compaction in MDT which should not happen.
So, the right fix is
a = we get latest completed delta commit from MDT.
Find any inflights in DT timeline before or equals to {a}
This should take care of not triggering compaction in MDT when here are inflights in DT which is committed to MDT.
Impact
Stabilizes metadata table.
Risk level (write none, low medium or high below)
medium
Documentation Update
N/A
Contributor's checklist