Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ public ReferenceCounted<IOmMetadataReader, SnapshotCache> get(String key,
throw new IllegalStateException(ex);
}
}
LOG.debug("Got snapshot {} from SnapshotCache", k);
return v;
});

Expand Down Expand Up @@ -304,30 +305,35 @@ private void cleanupInternal() {
OmSnapshot omSnapshot = (OmSnapshot) rcOmSnapshot.get();
LOG.debug("Evicting OmSnapshot instance {} with table key {}",
rcOmSnapshot, omSnapshot.getSnapshotTableKey());
// Sanity check
Preconditions.checkState(rcOmSnapshot.getTotalRefCount() == 0L,
"Illegal state: OmSnapshot reference count non-zero ("
+ rcOmSnapshot.getTotalRefCount() + ") but shows up in the "
+ "clean up list");

final String key = omSnapshot.getSnapshotTableKey();
final ReferenceCounted<IOmMetadataReader, SnapshotCache> result =
dbMap.remove(key);
// Sanity check
Preconditions.checkState(rcOmSnapshot == result,
"Cache map entry removal failure. The cache is in an inconsistent "
+ "state. Expected OmSnapshot instance: " + rcOmSnapshot
+ ", actual: " + result + " for key: " + key);

pendingEvictionList.remove(result);
dbMap.compute(key, (k, v) -> {
if (v == null) {
throw new IllegalStateException("Key '" + k + "' does not exist in cache. The RocksDB " +
"instance of the Snapshot may not be closed properly.");
}

// Close the instance, which also closes its DB handle.
try {
((OmSnapshot) rcOmSnapshot.get()).close();
} catch (IOException ex) {
throw new IllegalStateException("Error while closing snapshot DB", ex);
}
// Sanity check
Preconditions.checkState(rcOmSnapshot == v,
"Cache map entry removal failure. The cache is in an inconsistent "
+ "state. Expected OmSnapshot instance: " + rcOmSnapshot
+ ", actual: " + v + " for key: " + key);

// Close the instance, which also closes its DB handle.
if (rcOmSnapshot.getTotalRefCount() == 0L) {
Copy link
Contributor

@swamirishi swamirishi Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular check is not exactly thread safe there would still be a race condition, this doesn't really take a ref count lock. We should probably put this function check inside ReferenceCounted class which would take a ref count lock when doing the check. Some function like
boolean isTotalRefCountEqual(long expectedValue) which would return if the value is equal by taking a ref count lock. You can refer incrementRefCount function for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetKey increaments the ref count after the dbMap.compute() function so the lock is already out of scope. I don't see any change in the get function so this should be the way going forward for the cleanup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need to get refCountLock unless you are actually updating it. By that logic then we would also need to get a refCountLock everytime we do refCount.get(). @smengcl Do you see any problem with this? AFAIK I don't think this is problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah every time we do a refCount.get() if it is going to impact the cache yes we need to take a lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess separating out the referenceCounted and cacheLoader is not the right way to implement the cache. From what I know this has go hand in hand in a single lock. Increasing the reference count or decreasing the reference count or when evicting an instance from the cache which would check if there are any references. BTW even decrementRefCount has this problem. decrement is called with referenceCountLock, and it checks if the reference count is zero and adds it to the pending eviction list which checks the count is 0. But in case of a race condition a get call could have a closed rocksdb instance, cleaner thread could have closed the instance in the background.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refCount increment should happen when things are loaded in the cache and within the same lock. i.e. the same db.compute method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refCount increment cannot happen in the same db.compute() because it will cause a deadlock.

Copy link
Contributor

@smengcl smengcl Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a new atomic operation like rcOmSnapshot.closeObjSafely(). It only closes the internal obj if refcount == 0, and it returns true when the internal object is successfully closed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it comes to the same point when we call rcOmSnapshot.closeObjSafely() it returns true but when actually closing it, object is referenced again.

try {
((OmSnapshot) rcOmSnapshot.get()).close();
} catch (IOException ex) {
throw new IllegalStateException("Error while closing snapshot DB", ex);
}
return null;
}
LOG.warn("Snapshot {} is still being referenced ({}), skipping its clean up",
k, rcOmSnapshot.getTotalRefCount());
return v;
});

pendingEvictionList.remove(rcOmSnapshot);
--numEntriesToEvict;
}

Expand Down