-
Notifications
You must be signed in to change notification settings - Fork 614
HDDS-10076. SnapshotCache closes RocksDB instance with Reference. #5934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -186,6 +186,7 @@ public ReferenceCounted<IOmMetadataReader, SnapshotCache> get(String key, | |
| throw new IllegalStateException(ex); | ||
| } | ||
| } | ||
| LOG.debug("Got snapshot {} from SnapshotCache", k); | ||
| return v; | ||
| }); | ||
|
|
||
|
|
@@ -304,30 +305,30 @@ private void cleanupInternal() { | |
| OmSnapshot omSnapshot = (OmSnapshot) rcOmSnapshot.get(); | ||
| LOG.debug("Evicting OmSnapshot instance {} with table key {}", | ||
| rcOmSnapshot, omSnapshot.getSnapshotTableKey()); | ||
| // Sanity check | ||
| Preconditions.checkState(rcOmSnapshot.getTotalRefCount() == 0L, | ||
| "Illegal state: OmSnapshot reference count non-zero (" | ||
| + rcOmSnapshot.getTotalRefCount() + ") but shows up in the " | ||
| + "clean up list"); | ||
|
|
||
| final String key = omSnapshot.getSnapshotTableKey(); | ||
| final ReferenceCounted<IOmMetadataReader, SnapshotCache> result = | ||
| dbMap.remove(key); | ||
| // Sanity check | ||
| Preconditions.checkState(rcOmSnapshot == result, | ||
| "Cache map entry removal failure. The cache is in an inconsistent " | ||
| + "state. Expected OmSnapshot instance: " + rcOmSnapshot | ||
| + ", actual: " + result + " for key: " + key); | ||
|
|
||
| pendingEvictionList.remove(result); | ||
|
|
||
| // Close the instance, which also closes its DB handle. | ||
| try { | ||
| ((OmSnapshot) rcOmSnapshot.get()).close(); | ||
| } catch (IOException ex) { | ||
| throw new IllegalStateException("Error while closing snapshot DB", ex); | ||
| } | ||
| dbMap.computeIfPresent(key, (k, v) -> { | ||
| // Sanity check | ||
| Preconditions.checkState(rcOmSnapshot == v, | ||
| "Cache map entry removal failure. The cache is in an inconsistent " | ||
| + "state. Expected OmSnapshot instance: " + rcOmSnapshot | ||
| + ", actual: " + v + " for key: " + key); | ||
|
|
||
| // Close the instance, which also closes its DB handle. | ||
| if (rcOmSnapshot.getTotalRefCount() == 0L) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This particular check is not exactly thread safe there would still be a race condition, this doesn't really take a ref count lock. We should probably put this function check inside ReferenceCounted class which would take a ref count lock when doing the check. Some function like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GetKey increaments the ref count after the dbMap.compute() function so the lock is already out of scope. I don't see any change in the get function so this should be the way going forward for the cleanup.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see a need to get
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah every time we do a refCount.get() if it is going to impact the cache yes we need to take a lock
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess separating out the referenceCounted and cacheLoader is not the right way to implement the cache. From what I know this has go hand in hand in a single lock. Increasing the reference count or decreasing the reference count or when evicting an instance from the cache which would check if there are any references. BTW even decrementRefCount has this problem. decrement is called with referenceCountLock, and it checks if the reference count is zero and adds it to the pending eviction list which checks the count is 0. But in case of a race condition a get call could have a closed rocksdb instance, cleaner thread could have closed the instance in the background.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. refCount increment should happen when things are loaded in the cache and within the same lock. i.e. the same db.compute method
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. refCount increment cannot happen in the same db.compute() because it will cause a deadlock.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about a new atomic operation like
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it comes to the same point when we call |
||
| try { | ||
| ((OmSnapshot) rcOmSnapshot.get()).close(); | ||
| } catch (IOException ex) { | ||
| throw new IllegalStateException("Error while closing snapshot DB", ex); | ||
| } | ||
| return null; | ||
| } | ||
|
aswinshakil marked this conversation as resolved.
|
||
| LOG.warn("Snapshot {} is still being referenced ({}), skipping its clean up", | ||
| k, rcOmSnapshot.getTotalRefCount()); | ||
| return v; | ||
| }); | ||
|
|
||
| pendingEvictionList.remove(rcOmSnapshot); | ||
|
smengcl marked this conversation as resolved.
|
||
| --numEntriesToEvict; | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that we should use
computeand log/throw an error if dbMap doesn't contain the key. In case when key is missing from dbMap, it is possible that object was not closed properly or some other inconsistency issue.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Updated the patch.