-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6678] Fix the acquisition of clean&rollback instants to archive #9416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
|
@suryaprasanna @yihua @prashantwason can you help with a revew~ |
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
|
Looks good from my side, cc @yihua for the final review. |
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks ok to me. Will take a second pass to inspect details.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java
Outdated
Show resolved
Hide resolved
| // beyond savepoint) and the earliest inflight instant (all actions). | ||
| // This is required by metadata table, see HoodieTableMetadataUtil#processRollbackMetadata | ||
| // for details. | ||
| // Todo: Remove #7580 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on @danny0405 's comment. Before this PR, the logic is to keep at least one clean and rollback instant on the active timeline. Archiving all rollback instants in the active timeline are ok, but there's caveat on archiving all clean instants (e.g., all completed clean instants are before the earliest commit to retain; one of the cases could be the user turns off cleaning for some time and turns it on again). The incremental cleaning may not get the latest clean instant.
However, it's not the right way either to keep at least one clean instant in the active timeline in the new logic, because that can block the archival of commits for the sake of a contiguous block of instants.
@Zouxxyy could you double check if there is any issue wrt to different cleaning modes under the new archival logic? e.g., incremental cleaning should fall back to full cleaning if there's no clean instant on the active timeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yihua see #9416 (comment), It explains why incremental cleaning always works even with this PR.
|
@Zouxxyy could you also rebase the PR on the master? |
Thanks for the review, will rebase at night, the conflict is a bit big |
|
Taking a final pass now. |
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Finally, the archival logic is intuitive and easy to follow. Thanks for improving it :)
| List<HoodieInstant> instantsToArchive = getCommitInstantsToArchive(); | ||
| if (!instantsToArchive.isEmpty()) { | ||
| HoodieInstant latestCommitInstantToArchive = instantsToArchive.get(instantsToArchive.size() - 1); | ||
| // Then get clean and rollback instants to archive. | ||
| List<HoodieInstant> cleanAndRollbackInstantsToArchive = | ||
| getCleanAndRollbackInstantsToArchive(latestCommitInstantToArchive); | ||
| instantsToArchive.addAll(cleanAndRollbackInstantsToArchive); | ||
| instantsToArchive.sort(HoodieInstant.COMPARATOR); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the logic can be further simplified by treating all instants to archive the same, i.e., getting all instants before the earliest commit to retain, without differentiating the action types. We can follow up in a separate PR.
|
@Zouxxyy the changes have not been cherry-picked to branch-0.x. It would be great to have this feature in the upcoming release 0.15.0. Do you have time to do that? |
@yihua OK, but I don’t seem to find the 0.15.0 branch |
@Zouxxyy there is no 0.15.0 branch yet. 0.15.0 release branch will be forked out of |

Change Logs
The current
getCleanInstantsToArchivefilters clean and rollback instants according to maxInstantsToKeep and minInstantsToKeep respectively. There are two disadvantages:eg:
eg:
A continuous active timeline is our goal.
Therefore, modifying the logic of
getCleanAndRollbackInstantsToArchiveas follows:Impact
getCleanAndRollbackInstantsToArchive, fix two disadvantages abovegetCommitInstantsToArchive, divide it into two parts: 1) handle oldestInstantToRetain, 2) handle savepoint.Risk level (write none, low medium or high below)
medium
Documentation Update
None
Contributor's checklist