-
Notifications
You must be signed in to change notification settings - Fork 592
HDDS-10059. TestOMRatisSnapshots.testInstallSnapshot should only validate live files. #6069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @xBis7 for debugging this problem and @GeorgeJahad for working on the fix.
Some minor code improvements suggested, otherwise LGTM.
| // Skip if not hard link on the leader | ||
| if (!leaderActiveSST.toFile().exists()) { | ||
| // First confirm it is live | ||
| if (!liveSstFiles.stream().anyMatch(s -> s.equals(fileName))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be simplified?
| if (!liveSstFiles.stream().anyMatch(s -> s.equals(fileName))) { | |
| if (!liveSstFiles.contains(fileName)) { |
| // timeouts have to be increased. | ||
| @Unhealthy("HDDS-10059") | ||
| void testInstallSnapshot(int numSnapshotsToCreate, @TempDir Path tempDir) throws Exception { | ||
| private static int numSnapshotsToCreate = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be final
| private static int numSnapshotsToCreate = 100; | |
| private static final int NUM_SNAPSHOTS_TO_CREATE = 100; |
(needs corresponding change in testInstallSnapshot)
| List<String> liveSstFiles = new ArrayList<>(); | ||
| // strip the leading "/". | ||
| liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect( | ||
| Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can use the result of collect() directly, no need for two lists.
| List<String> liveSstFiles = new ArrayList<>(); | |
| // strip the leading "/". | |
| liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect( | |
| Collectors.toList())); | |
| // strip the leading "/". | |
| List<String> liveSstFiles = activeRocksDB.getLiveFiles().files.stream() | |
| .map(s -> s.substring(1)) | |
| .collect(Collectors.toList()); |
(also, it can be a Set, since we don't need to consider duplicates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it should be set for better search.
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgeJahad for the patch and @xBis7 for the investigation.
| List<String> liveSstFiles = new ArrayList<>(); | ||
| // strip the leading "/". | ||
| liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect( | ||
| Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it should be set for better search.
|
@GeorgeJahad do you mind if I make these changes and update the PR? |
@adoroszlai I would appreciate it if you could that. I apologize for not doing it myself. I just haven't had the time. |
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @GeorgeJahad and @adoroszlai for the contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @GeorgeJahad and @adoroszlai.
|
Thanks @GeorgeJahad for the patch, @hemantk-12, @xBis7 for the review. |
…date live files. (apache#6069) (cherry picked from commit b90d109)
What changes were proposed in this pull request?
Modify TestOMRatisSnapshots.testInstallSnapshot() to ignore sst files that are not live.
The test is flakey because it ignores the fact that some of the rocksdb sst files are not "live".
What is a "live" file?
After a compaction, a compacted file may not be immediately deleted if it is still in use. In those cases, the file is no longer considered "live" and no hard link is created for it during a rocksdb checkpoint operation. But the file still exists on the filesystem and testInstallSnapshot was incorrectly considering them.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10059
How was this patch tested?
Ran it 400 times here: https://github.com/GeorgeJahad/ozone/actions/runs/7619254060
Still fails 1% of the time, but without the fix it was around 50%. I think there maybe another problem, but this is an important test that needs to be re-enabled now.