-
Notifications
You must be signed in to change notification settings - Fork 587
HDDS-9636. Fix NoSuchElement in OMSnapshotPurgeRequest and exit loop early in SnapshotDeletingService #5554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…early in SnapshotDeletingService
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @aswinshakil.
Just curious how you found the issue. It would be great if we can add unit or integration test for it.
| String deletedDirKey = deletedDir.getKey(); | ||
|
|
||
| // Exit for dirs out of snapshot scope. | ||
| if (!deletedDirKey.startsWith(dbBucketKeyForDir)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Table#iterator(prefix)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware this existed. Yes, we can do it, but we need to change this in multiple places and test it. I can do it as a part of a larger separate patch.
| String tableKey = chainManager.getTableKey(nextPathSnapshot); | ||
| SnapshotInfo nextSnapshotInfo = | ||
| omSnapshotManager.getSnapshotInfo(tableKey); | ||
| UUID nextPathSnapshot = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aswinshakil When would getNextActiveSnapshot be called and the current snapshot also deleted in parallel. I guess this should not happen right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called when the current snapshot is deleted. This is to set the deep clean flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are deleting the current snapshot from the chain then why call this function at the first place, it would return null anyhow right? So do you mean to say we are deleting snapshots which are not deep cleaned? So are we not taking any lock on the portion of the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #5339, KeyDeletingService runs every 100ms. So even before the 1st request is flushed by the OMDoubleBuffer, The second request(same) comes again. At that time the chain already removed the current deleted snapshot (from the 1st request). When the 2nd request (same as 1st) comes this will throw NoSuchElementException
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @aswinshakil
It would be great if you can add this in description. Or explain how batch delete/purge is caused this failure.
prashantpogde
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks for the review @hemantk-12 @swamirishi @prashantpogde |
…d exit loop early in SnapshotDeletingService (apache#5554) (cherry picked from commit ad24146) Change-Id: I4dbe9a51b4383730e5146266b1bcf82e3d5ba8b7
What changes were proposed in this pull request?
SnapshotDeletingServiceshould exit early if the bucket is out of scope and avoid expanding buckets that are out of scope.SnapshotUtils.getNextActiveSnapshot()throws NoSuchElement exception if the snapshot is deleted in the previous run andOMDoubleBufferis not flushed yet, so the transactions are not yet persisted.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9636
How was this patch tested?
Existing Tests.