-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10408. NPE causes OM crash in Snapshot Purge request. #6250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @aswinshakil.
LGTM.
Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.
swamirishi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aswinshakil thanks for the patch. Overall the patch looks good to me. Wondering if you can add a testcase for the same, which recreates the duplicate request scenario.
| SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable() | ||
| .get(snapTableKey); | ||
|
|
||
| if (fromSnapshot == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to create a unit test case for this in TestSnapshotDeletingService?
When further thinking about it, we can still end up having duplicate entries even when we check the cache since there could be a case where the first request could be in pre execute stage when the second request is sent. The first request's validateAndUpdateCache method could be execute when the request is inflight. We need to make this operation idempotent, there is no other way going about it. |
|
@aswinshakil Please create a follow up task for the test case & iterating through the snapshot info table cache to reduce the occurance. @hemantk-12 thanks for the review |
(cherry picked from commit 6dfd7d4)
…apache#6250) (cherry picked from commit 6dfd7d4) Change-Id: I1d602ad904b48e342b7aeb6d5aa925232e03486f
What changes were proposed in this pull request?
When a previous snapshot purge request has already purged the snapshot. There will be a race condition between
SnapshotDeletingServiceandOMSnapshotPurgeRequestwhere we resend the same request which causes a NPE when getting thesnapshotInfofrom thesnapshotInfoTable. We can ignore this request as this snapshot is already purged.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10408
How was this patch tested?
Existing tests.