Skip to content

Conversation

@SaketaChalamchala
Copy link
Contributor

What changes were proposed in this pull request?

If a snapshot is created for a bucket while DirectoryDeletingService is processing deleted directories from the same bucket then, the DirectoryPurge requests submitted by DirectoryDeletingService should not execute and should be processed when snapshot is deep cleaned.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13776

How was this patch tested?

@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Oct 9, 2025
@smengcl smengcl requested a review from Copilot October 9, 2025 15:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a concurrency issue between directory deletion and snapshot creation by failing DirectoryPurge requests when snapshot ID validation fails. When a snapshot is created for a bucket while DirectoryDeletingService processes deleted directories from the same bucket, the DirectoryPurge requests should fail and be reprocessed later during snapshot deep cleaning.

  • Changes the previous snapshot ID validation from void to boolean return type
  • Adds conditional logic to fail the request when validation returns false

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +116 to +118
return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse,
new OMException("Snapshot validation failed", OMException.ResultCodes.INVALID_REQUEST)));
}
Copy link

Copilot AI Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Snapshot validation failed' is too generic and doesn't provide enough context for debugging. Consider adding more specific information about what aspect of validation failed or why the request should be retried.

Suggested change
return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse,
new OMException("Snapshot validation failed", OMException.ResultCodes.INVALID_REQUEST)));
}
String actualPrevSnapshotId = (fromSnapshotInfo != null && fromSnapshotInfo.getPreviousSnapshot() != null)
? fromSnapshotInfo.getPreviousSnapshot().toString() : "null";
String errorMsg = String.format(
"Snapshot validation failed for fromSnapshot='%s': expected previousSnapshotId=%s, actual previousSnapshotId=%s",
fromSnapshot,
expectedPreviousSnapshotId != null ? expectedPreviousSnapshotId.toString() : "null",
actualPrevSnapshotId
);
return new OMDirectoriesPurgeResponseWithFSO(createErrorOMResponse(omResponse,
new OMException(errorMsg, OMException.ResultCodes.INVALID_REQUEST)));

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @SaketaChalamchala
This is caused by HDDS-12982

So we have two options
(1) work on this RP, or
(2) revert HDDS-12982

Regardless, it looks like we need a test to validate this scenario, otherwise we could have caught the bug.

@smengcl
Copy link
Contributor

smengcl commented Oct 9, 2025

Good catch @SaketaChalamchala This is caused by HDDS-12982

So we have two options (1) work on this RP, or (2) revert HDDS-12982

Regardless, it looks like we need a test to validate this scenario, otherwise we could have caught the bug.

Thanks @SaketaChalamchala for the fix.

+1 that we'd better add a test case.

@jojochuang jojochuang marked this pull request as ready for review October 23, 2025 02:59
@jojochuang
Copy link
Contributor

I'll go ahead and merge it. If we want to add a test we can add later.

@jojochuang jojochuang merged commit 62682ac into apache:master Oct 23, 2025
14 checks passed
@ivandika3
Copy link
Contributor

@jojochuang @SaketaChalamchala This seems to break build in master.

adoroszlai added a commit that referenced this pull request Oct 23, 2025
… ID validation fails. (#9130)"

This reverts commit 62682ac.

Reason for revert: compile error
@adoroszlai
Copy link
Contributor

adoroszlai commented Oct 23, 2025

Reverted due to compile error, which happened because the corresponding change in SnapshotUtils was reverted in #9162.

Please do not merge with outdated CI run. Approving pending CI run after 2 weeks still tests the change against master at the time of PR creation.

@swamirishi
Copy link
Contributor

swamirishi commented Oct 23, 2025

@SaketaChalamchala @jojochuang we don't need this since HDDS-13799 is already reverts the initial change. Maybe a PR with test case would be better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants