-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Hudi-3376] Add an option to skip under deletion files for HoodieMetadataTableValidator #4994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hudi-bot run azure |
|
@hudi-bot run azure |
|
Azure Passed. Hi @yihua really appreciate it if you could help me review this patch :) |
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java
Outdated
Show resolved
Hide resolved
| } catch (HoodieIOException ex) { | ||
|
|
||
| if (ex.getIOException() instanceof FileNotFoundException) { | ||
| // cleaner instant could be deleted by archive and FileNotFoundException could be threw during getInstantDetails function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inflight cleaning instants shouldn't be archived. Is this considering the inflight cleaning instant deleted due to failed cleaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no need to check FileNotFoundException here. removed.
| .collect(Collectors.toList()); | ||
| HoodieMetadataValidationContext fsBasedContext, | ||
| String partitionPath, | ||
| List<String> baseFilesUnderDeletion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar here for variable naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chanegd.
| LOG.info("Validation of getLatestFileSlices succeeded for partition " + partitionPath); | ||
| } | ||
|
|
||
| private List<FileSlice> filterFileSliceBasedOnUnderDeletionFiles(List<FileSlice> sortedLatestFileSliceList, List<String> baseFilesUnderDeletion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could baseFilesUnderDeletion be a Set<String> to speed up lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! changed. Thanks a lot for your reviewing!
|
@hudi-bot run azure |
yihua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…dataTableValidator (apache#4994) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
…dataTableValidator (apache#4994) Co-authored-by: yuezhang <yuezhang@freewheel.tv>
https://issues.apache.org/jira/browse/HUDI-3376
What is the purpose of the pull request
add an option named
--skip-data-files-for-cleaningwhich will skip to compare the data files which are under deletion by cleanerBrief change log
When enabled, HoodieMetadataTableValidator will read the clean plan for pending cleaning action and get under deletion data files. Then hoodie will skip to compare these files.
This patch is tested on our local env.
There 're pending cleaning actions during performing HoodieMetadataTableValidator.
Before this patch validator failed with
After this patch
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.