-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present. #4212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2925] Fix duplicate cleaning of same files when unfinished clean operations are present. #4212
Conversation
951d3b4 to
c75df82
Compare
...lient/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
Show resolved
Hide resolved
|
Hey @prashantwason : Can you respond to our comments above. Looking to get his into 0.10.1. would appreciate if you can respond. we can see how to proceed and get this in. |
|
Responded to the comment above. Are there specific objections to the way this patch is implemented or the issue itself? I am open to any way to fix this issue. |
|
Since we don't have a clear/concrete plan yet, I am removing it from 0.10.1. but lets brainstorm and try to get a closure. |
|
@prashantwason : We are looking to get this in for 0.11. Would appreciate if you can follow up on this and let us know once its ready to review again. |
|
@nsivabalan Your plan to not schedule clean when another is in progress works. Can you please make that change and I can close this PR? Happy to review the other PR too. |
a14bc80 to
98c3c09
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Outdated
Show resolved
Hide resolved
| } | ||
| LOG.info("Cleaner started"); | ||
| final Timer.Context timerContext = metrics.getCleanCtx(); | ||
| LOG.info("Cleaned failed attempts if any"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is not very useful and prints each time. Probably move it to within rollbackFailedWrites() if any writes are actually found to be cleaned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to CleanerUtils.rollbackFailedWrites().
rollbackFailedwrites() is called in regular rollbacks as well (single writer) and not in the context of cleaner
| * Tests no more than 1 clean is scheduled/executed if HoodieCompactionConfig.allowMultipleCleanSchedule config is disabled. | ||
| */ | ||
| @Test | ||
| public void testMultiClean() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any file which has cleaner specific tests? This does not seem metadata table related test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I just followed up from where you left :) will try to move it to TestCleaner.
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataFileSystemView.java
Outdated
Show resolved
Hide resolved
prashantwason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few minor comments.
|
@prashantwason : addressed all feedback. |
prashantwason
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@yihua : do you want to review. Prashant has given a ship it. |
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed all comments
| } | ||
| LOG.info("Cleaner started"); | ||
| final Timer.Context timerContext = metrics.getCleanCtx(); | ||
| LOG.info("Cleaned failed attempts if any"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to CleanerUtils.rollbackFailedWrites().
rollbackFailedwrites() is called in regular rollbacks as well (single writer) and not in the context of cleaner
| * Tests no more than 1 clean is scheduled/executed if HoodieCompactionConfig.allowMultipleCleanSchedule config is disabled. | ||
| */ | ||
| @Test | ||
| public void testMultiClean() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. I just followed up from where you left :) will try to move it to TestCleaner.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
…n operations are present.
3ab4015 to
8c8eeed
Compare
|
@yihua : if you are good, can I go ahead and merge this in. |
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. For the config naming, let's have a consensus.
…n operations are present using a config. (apache#4212) Co-authored-by: sivabalan <n.siva.b@gmail.com>
What is the purpose of the pull request
Fixed duplicate cleans which lead to issues with metadata table.
Brief change log
Perform unfinished clean operations before attempting to generate a new clean plan.
Refresh fsview after each unfinished clean operation.
Verify this pull request
Added unit test TestHoodieBackedMetadata::testMultiClean
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.