-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh… #8443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6068] Improve logic of getOldestInstantToRetainForClustering wh… #8443
Conversation
| if (!replaceTimeline.empty()) { | ||
| Option<HoodieInstant> cleanInstantOpt = | ||
| activeTimeline.getCleanerTimeline().filter(instant -> !instant.isCompleted()).firstInstant(); | ||
| activeTimeline.getCleanerTimeline().filterCompletedInstants().lastInstant(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, using completed clean instants is more reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, cc @bvaradar for another round of review~
|
@hbgstc123, which border situation could cause that there maybe a moment when the last clean is complete and the next clean plan not generated, if timeline archive execute at this moment, no replace commit will be retained? |
For example in flink pipeline, clean is scheduled and executed in class |
…en archive timeline
996608e to
0dff5a6
Compare
|
@bvaradar pls cc when convenient, thanks |
bvaradar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks for the fix.
|
The test is failing continously after the fix: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16792&view=logs&j=dcedfe73-9485-5cc5-817a-73b61fc5dcb0&t=746585d8-b50a-55c3-26c5-517d93af9934&l=33925, can you take a look ~ |
…en archive timeline (apache#8443) Co-authored-by: hbg <[email protected]>
…en archive timeline (apache#8443) Co-authored-by: hbg <[email protected]>
…en archive timeline (apache#8443) Co-authored-by: hbg <[email protected]>
Change Logs
Original
ClusteringUtils::getOldestInstantToRetainForClusteringis based on inflight clean instant, but there maybe a moment when the last clean is complete and the next clean plan not generated, if timeline archive execute at this moment, no replace commit will be retained.This pr propose to decide OldestInstantToRetainForClustering based on latest completed clean instant, return the first replace commit after the
earliestInstantToRetainof last complete clean or first replace commit after last clean instant ifearliestInstantToRetainis empty, and return the first replace commit in active timeline if there is no clean instant.Impact
no
Risk level (write none, low medium or high below)
low
Documentation Update
no
Contributor's checklist