-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13776. Fail DirectoryPurge requests if previous snapshot ID validation fails. #9130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a concurrency issue between directory deletion and snapshot creation by failing DirectoryPurge requests when snapshot ID validation fails. When a snapshot is created for a bucket while DirectoryDeletingService processes deleted directories from the same bucket, the DirectoryPurge requests should fail and be reprocessed later during snapshot deep cleaning.
- Changes the previous snapshot ID validation from void to boolean return type
- Adds conditional logic to fail the request when validation returns false
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse, | ||
| new OMException("Snapshot validation failed", OMException.ResultCodes.INVALID_REQUEST))); | ||
| } |
Copilot
AI
Oct 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'Snapshot validation failed' is too generic and doesn't provide enough context for debugging. Consider adding more specific information about what aspect of validation failed or why the request should be retried.
| return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse, | |
| new OMException("Snapshot validation failed", OMException.ResultCodes.INVALID_REQUEST))); | |
| } | |
| String actualPrevSnapshotId = (fromSnapshotInfo != null && fromSnapshotInfo.getPreviousSnapshot() != null) | |
| ? fromSnapshotInfo.getPreviousSnapshot().toString() : "null"; | |
| String errorMsg = String.format( | |
| "Snapshot validation failed for fromSnapshot='%s': expected previousSnapshotId=%s, actual previousSnapshotId=%s", | |
| fromSnapshot, | |
| expectedPreviousSnapshotId != null ? expectedPreviousSnapshotId.toString() : "null", | |
| actualPrevSnapshotId | |
| ); | |
| return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse, | |
| new OMException(errorMsg, OMException.ResultCodes.INVALID_REQUEST))); |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @SaketaChalamchala
This is caused by HDDS-12982
So we have two options
(1) work on this RP, or
(2) revert HDDS-12982
Regardless, it looks like we need a test to validate this scenario, otherwise we could have caught the bug.
Thanks @SaketaChalamchala for the fix. +1 that we'd better add a test case. |
|
I'll go ahead and merge it. If we want to add a test we can add later. |
|
@jojochuang @SaketaChalamchala This seems to break build in master. |
|
Reverted due to compile error, which happened because the corresponding change in Please do not merge with outdated CI run. Approving pending CI run after 2 weeks still tests the change against |
|
@SaketaChalamchala @jojochuang we don't need this since HDDS-13799 is already reverts the initial change. Maybe a PR with test case would be better |
What changes were proposed in this pull request?
If a snapshot is created for a bucket while DirectoryDeletingService is processing deleted directories from the same bucket then, the
DirectoryPurgerequests submitted by DirectoryDeletingService should not execute and should be processed when snapshot is deep cleaned.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13776
How was this patch tested?