-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8207. [Snapshot] Fix bugs and add tests for SnapshotDeletingService. #4571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like checkDirReclaimable() returns true if the dir is reclaimable, but checkKeyReclaimable() returns false if the key is reclaimable. Is that correct? Lines 452 to 459 in 9cd92f4
Lines 474 to 485 in 9cd92f4
|
|
Do we need to change the snapshotInfo status to SNAPSHOT_RECLAIMED here? Line 282 in 9cd92f4
or maybe here: Lines 182 to 184 in 9cd92f4
|
|
I'd like to see unit tests that excercise all the if/else paths in the following methods: checkSnapshotReclaimable In addition, I'd like a unit test that confirms that we are correctly starting and stopping within the bucket scope. |
Yes you are correct. I'm going to make this consistent in the following patches or in this patch. |
Do we need to store intermediate result here? Because the Snapshot is removed from the
Sure I will add more tests to this patch. |
That is a good question. So maybe we can remove the RECLAIMED status from the enum, if it isn't used anywhere? |
Thanks, I found it very confusing. |
...e/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestSnapshotDeletingService.java
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
Outdated
Show resolved
Hide resolved
| // If both next global and path snapshot are same, it may overwrite | ||
| // nextPathSnapInfo.setPathPreviousSnapshotID(), adding this check | ||
| // will prevent it. | ||
| if (nextGlobalSnapInfo != null && nextGlobalSnapInfo.getSnapshotID() | ||
| .equals(nextPathSnapInfo.getSnapshotID())) { | ||
| nextPathSnapInfo.setGlobalPreviousSnapshotID( | ||
| snapInfo.getPathPreviousSnapshotID()); | ||
| metadataManager.getSnapshotInfoTable().putWithBatch(batchOperation, | ||
| nextPathSnapInfo.getTableKey(), nextPathSnapInfo); | ||
| } else if (nextGlobalSnapInfo != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case described in the comment, we need to update both nextPathSnapInfo and nextGlobalSnapInfo right? if so we shouldn't use else if imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for using else if because nextGlobalSnapInfo can be null if we are deleting the last snapshot, so we need that check.
|
|
||
| public static void checkSnapshotActive(SnapshotInfo snapInfo) | ||
| public static void checkSnapshotActive(SnapshotInfo snapInfo, | ||
| boolean override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| boolean override) | |
| boolean skipCheck) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tho it sounds a bit weird to have a param to skip the check for the check method. If we fix the caller, how many places do we have to change? If not many we should do that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought about this, but it looks like we should be changing in many places. I kept on getting Exceptions from multiple request.
.../ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
Show resolved
Hide resolved
...e/integration-test/src/test/java/org/apache/hadoop/fs/ozone/TestSnapshotDeletingService.java
Show resolved
Hide resolved
| BucketArgs bucketArgs = new BucketArgs.Builder() | ||
| .setBucketLayout(BucketLayout.LEGACY) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should works for all 3 types of buckets? Parameterize this later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For FSO there is a separate test. Maybe we can parameterize it later. Anyways I'm planning on adding more tests in a separate JIRA which George requested for.
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aswinshakil for the fixes and new tests. Looks good mostly.
|
Looks like it's timing out on a relevant test: https://github.com/apache/ozone/actions/runs/5060501663/jobs/9088775175 This could be related to the bug @hemantk-12 just pointed out: |
|
@smengcl and @hemantk-12 for pointing out, I have fixed it. The tests doesn't look like it's failing on this bug. I checked the log and it is failing on checkpoint creation. I'm not sure why the directory exists in the first place. Checking this out. |
|
Checkpoint creation is not an idempotent operation. In case of log replay also it can fail. We should figure this out but we need to make sure that the checkpoint directory doesn't exist. If it exists that means checkpoint must have been created in previous incarnation of log replay, |
|
The test timeout issue seems fixed now. Thanks @aswinshakil . Some test failures still looks relevant (flaky?): https://github.com/apache/ozone/actions/runs/5075772329/jobs/9117315098 This might be irrelevant: |
|
Thanks for pointing this out @smengcl, I updated PR. |
|
|
||
| assertTableRowCount(snapshotInfoTable, 2); | ||
| GenericTestUtils.waitFor(() -> snapshotDeletingService | ||
| .getSuccessfulRunCount() >= 1, 1000, 10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation. Can fix this in the other PR: https://github.com/apache/ozone/pull/4571/files#r1199182257
| .getSuccessfulRunCount() >= 1, 1000, 10000); | |
| .getSuccessfulRunCount() >= 1, 1000, 10000); |
|
Thanks @aswinshakil for the fixes and tests. Thanks @GeorgeJahad for reviewing this as well. |
What changes were proposed in this pull request?
Add tests for
SnapshotDeletingServiceto cover all the Snapshot GC code paths.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8207
How was this patch tested?
This is a test change.