Skip to content

Conversation

@aswinshakil
Copy link
Member

What changes were proposed in this pull request?

When a previous snapshot purge request has already purged the snapshot. There will be a race condition between SnapshotDeletingService and OMSnapshotPurgeRequest where we resend the same request which causes a NPE when getting the snapshotInfo from the snapshotInfoTable. We can ignore this request as this snapshot is already purged.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10408

How was this patch tested?

Existing tests.

@aswinshakil aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Feb 21, 2024
@aswinshakil aswinshakil self-assigned this Feb 21, 2024
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @aswinshakil.

LGTM.

Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.

Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswinshakil thanks for the patch. Overall the patch looks good to me. Wondering if you can add a testcase for the same, which recreates the duplicate request scenario.

SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
.get(snapTableKey);

if (fromSnapshot == null) {
Copy link
Contributor

@swamirishi swamirishi Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a unit test case for this in TestSnapshotDeletingService?

@swamirishi
Copy link
Contributor

swamirishi commented Feb 22, 2024

Thanks for the quick fix @aswinshakil.

LGTM.

Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.

When further thinking about it, we can still end up having duplicate entries even when we check the cache since there could be a case where the first request could be in pre execute stage when the second request is sent. The first request's validateAndUpdateCache method could be execute when the request is inflight. We need to make this operation idempotent, there is no other way going about it.

@swamirishi
Copy link
Contributor

swamirishi commented Feb 22, 2024

@aswinshakil Please create a follow up task for the test case & iterating through the snapshot info table cache to reduce the occurance. @hemantk-12 thanks for the review

@swamirishi swamirishi merged commit 6dfd7d4 into apache:master Feb 22, 2024
myskov pushed a commit to myskov/ozone that referenced this pull request Apr 4, 2024
swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Jun 10, 2024
…apache#6250)

(cherry picked from commit 6dfd7d4)
Change-Id: I1d602ad904b48e342b7aeb6d5aa925232e03486f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants