Skip to content

Conversation

@GeorgeJahad
Copy link
Contributor

What changes were proposed in this pull request?

Modify TestOMRatisSnapshots.testInstallSnapshot() to ignore sst files that are not live.

The test is flakey because it ignores the fact that some of the rocksdb sst files are not "live".

What is a "live" file?

After a compaction, a compacted file may not be immediately deleted if it is still in use. In those cases, the file is no longer considered "live" and no hard link is created for it during a rocksdb checkpoint operation. But the file still exists on the filesystem and testInstallSnapshot was incorrectly considering them.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10059

How was this patch tested?

Ran it 400 times here: https://github.com/GeorgeJahad/ozone/actions/runs/7619254060

Still fails 1% of the time, but without the fix it was around 50%. I think there maybe another problem, but this is an important test that needs to be re-enabled now.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @xBis7 for debugging this problem and @GeorgeJahad for working on the fix.

Some minor code improvements suggested, otherwise LGTM.

// Skip if not hard link on the leader
if (!leaderActiveSST.toFile().exists()) {
// First confirm it is live
if (!liveSstFiles.stream().anyMatch(s -> s.equals(fileName))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be simplified?

Suggested change
if (!liveSstFiles.stream().anyMatch(s -> s.equals(fileName))) {
if (!liveSstFiles.contains(fileName)) {

// timeouts have to be increased.
@Unhealthy("HDDS-10059")
void testInstallSnapshot(int numSnapshotsToCreate, @TempDir Path tempDir) throws Exception {
private static int numSnapshotsToCreate = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be final

Suggested change
private static int numSnapshotsToCreate = 100;
private static final int NUM_SNAPSHOTS_TO_CREATE = 100;

(needs corresponding change in testInstallSnapshot)

Comment on lines 354 to 357
List<String> liveSstFiles = new ArrayList<>();
// strip the leading "/".
liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect(
Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use the result of collect() directly, no need for two lists.

Suggested change
List<String> liveSstFiles = new ArrayList<>();
// strip the leading "/".
liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect(
Collectors.toList()));
// strip the leading "/".
List<String> liveSstFiles = activeRocksDB.getLiveFiles().files.stream()
.map(s -> s.substring(1))
.collect(Collectors.toList());

(also, it can be a Set, since we don't need to consider duplicates)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it should be set for better search.

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgeJahad for the patch and @xBis7 for the investigation.

Comment on lines 354 to 357
List<String> liveSstFiles = new ArrayList<>();
// strip the leading "/".
liveSstFiles.addAll(activeRocksDB.getLiveFiles().files.stream().map(s -> s.substring(1)).collect(
Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it should be set for better search.

@adoroszlai
Copy link
Contributor

@GeorgeJahad do you mind if I make these changes and update the PR?

@GeorgeJahad
Copy link
Contributor Author

@GeorgeJahad do you mind if I make these changes and update the PR?

@adoroszlai I would appreciate it if you could that. I apologize for not doing it myself. I just haven't had the time.

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @GeorgeJahad and @adoroszlai for the contribution.

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @GeorgeJahad and @adoroszlai.

@adoroszlai adoroszlai merged commit b90d109 into apache:master Jan 29, 2024
@adoroszlai
Copy link
Contributor

Thanks @GeorgeJahad for the patch, @hemantk-12, @xBis7 for the review.

xichen01 pushed a commit to xichen01/ozone that referenced this pull request Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants