HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250

aswinshakil · 2024-02-21T22:14:18Z

What changes were proposed in this pull request?

When a previous snapshot purge request has already purged the snapshot. There will be a race condition between SnapshotDeletingService and OMSnapshotPurgeRequest where we resend the same request which causes a NPE when getting the snapshotInfo from the snapshotInfoTable. We can ignore this request as this snapshot is already purged.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10408

How was this patch tested?

Existing tests.

hemantk-12

Thanks for the quick fix @aswinshakil.

LGTM.

Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.

swamirishi

@aswinshakil thanks for the patch. Overall the patch looks good to me. Wondering if you can add a testcase for the same, which recreates the duplicate request scenario.

swamirishi · 2024-02-22T04:28:11Z

...anager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotPurgeRequest.java

        SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable()
            .get(snapTableKey);

+        if (fromSnapshot == null) {


Is it possible to create a unit test case for this in TestSnapshotDeletingService?

swamirishi · 2024-02-22T04:46:05Z

Thanks for the quick fix @aswinshakil.

LGTM.

Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.

When further thinking about it, we can still end up having duplicate entries even when we check the cache since there could be a case where the first request could be in pre execute stage when the second request is sent. The first request's validateAndUpdateCache method could be execute when the request is inflight. We need to make this operation idempotent, there is no other way going about it.

swamirishi · 2024-02-22T18:44:48Z

@aswinshakil Please create a follow up task for the test case & iterating through the snapshot info table cache to reduce the occurance. @hemantk-12 thanks for the review

(cherry picked from commit 6dfd7d4)

…apache#6250) (cherry picked from commit 6dfd7d4) Change-Id: I1d602ad904b48e342b7aeb6d5aa925232e03486f

HDDS-10408. NPE causes OM crash in Snapshot Purge request

039e9bf

aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Feb 21, 2024

aswinshakil requested review from hemantk-12 and swamirishi February 21, 2024 22:14

aswinshakil self-assigned this Feb 21, 2024

hemantk-12 approved these changes Feb 21, 2024

View reviewed changes

swamirishi approved these changes Feb 22, 2024

View reviewed changes

swamirishi merged commit 6dfd7d4 into apache:master Feb 22, 2024

myskov pushed a commit to myskov/ozone that referenced this pull request Apr 4, 2024

HDDS-10408. NPE causes OM crash in Snapshot Purge request (apache#6250)

42e2de4

(cherry picked from commit 6dfd7d4)

myskov mentioned this pull request Apr 4, 2024

[DO NOT MERGE] Backport some fixes from master to ozone-1.4 #6479

Merged

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Jun 10, 2024

CDPD-66963. HDDS-10408. NPE causes OM crash in Snapshot Purge request (…

54c6fa8

…apache#6250) (cherry picked from commit 6dfd7d4) Change-Id: I1d602ad904b48e342b7aeb6d5aa925232e03486f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250

HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250

Uh oh!

aswinshakil commented Feb 21, 2024

Uh oh!

hemantk-12 left a comment •

edited

Loading

Uh oh!

swamirishi left a comment

Uh oh!

swamirishi Feb 22, 2024 •

edited

Loading

Uh oh!

swamirishi commented Feb 22, 2024 •

edited

Loading

Uh oh!

swamirishi commented Feb 22, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250

HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250

Uh oh!

Conversation

aswinshakil commented Feb 21, 2024

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

hemantk-12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi left a comment

Choose a reason for hiding this comment

Uh oh!

swamirishi Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swamirishi commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swamirishi commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hemantk-12 left a comment •

edited

Loading

swamirishi Feb 22, 2024 •

edited

Loading

swamirishi commented Feb 22, 2024 •

edited

Loading

swamirishi commented Feb 22, 2024 •

edited

Loading