Skip to content

Conversation

@aswinshakil
Copy link
Member

What changes were proposed in this pull request?

When accessing the snapshot cache, Multiple threads fall into a deadlock between ConcurrentHashMap(dbMap) and SynchronizedSet(pendingEvictionList).

Scenario:

  1. dbMap.compute() holds the ConcurrentHashMap lock while trying to hold the lock for pendingEvictionList Set.
  2. cleanup() method holds the lock for pendingEvictionList Set while trying to hold the lock for the ConcurrentHashMap (dbMap).
"snapshot-diff-job-thread-id-8":
  waiting to lock monitor 0x00007fa7964788d8 (object 0x00000006d658eed0, a java.util.Collections$SynchronizedSet),
  which is held by "KeyDeletingService#0"
"KeyDeletingService#0":
  waiting to lock monitor 0x0000000000e99e68 (object 0x00000006db6bcba8, a java.util.concurrent.ConcurrentHashMap$Node),
  which is held by "SstFilteringService#0"
"SstFilteringService#0":
  waiting to lock monitor 0x00007fa7964788d8 (object 0x00000006d658eed0, a java.util.Collections$SynchronizedSet),
  which is held by "KeyDeletingService#0"

One way to avoid this is to follow lock ordering. dbMap.compute() is removed so that we never get lock in the 1st order.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9871

How was this patch tested?

Existing Test.

@aswinshakil aswinshakil self-assigned this Dec 8, 2023
@aswinshakil aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Dec 8, 2023
@smengcl smengcl added the bug Something isn't working label Dec 9, 2023
@smengcl
Copy link
Contributor

smengcl commented Dec 9, 2023

Good catch @aswinshakil .

Is it possible to fix this by adjusting the lock order in cleanup()? So that dbMap object lock is acquired first?

@aswinshakil
Copy link
Member Author

In cleanup(), We iterate over pendingEvictionList to remove entries from dbMap with dbMap.remove(Take a lock). We cannot change that order. I don't think we need that level of atomicity to increase and decrease reference count while holding dbMap lock. Please let me know what are your thoughts on this? Thanks! @smengcl

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aswinshakil for catching this nasty bug.

Changes look good to me.

Comment on lines +241 to +246
if (rcOmSnapshot.decrementRefCount() == 0L) {
synchronized (pendingEvictionList) {
// v is eligible to be evicted and closed
pendingEvictionList.add(rcOmSnapshot);
}

return v;
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure I would outline two possible race condition scenarios without the dbMap lock here. Though I believe both are already protected by my existing implementation.

Case 1:

Since pendingEvictionList is a Set, it should be fine if multiple threads are calling release(key) as the same time while some of those threads observed that rcOmSnapshot's ref count reached 0 at the same time. So the logic here would still be correct in this case.

Case 2: Multiple threads interleaving get() and release()

  1. Thread 1 executes get(key) right past rcOmSnapshot.incrementRefCount() (after the change in this PR). ref count is 1 at this point.
  2. Thread 2 executes release(key) past rcOmSnapshot.decrementRefCount(). ref count is 0 at this point. (*)
  3. Thread 2 continues to execute pendingEvictionList.add(rcOmSnapshot), pendingEvictionList now has the snapshot, if cacheSizeLimit is exceeded, cleanup() will pick up the snapshot immediately.
  4. Thread 1 executes pendingEvictionList.remove(rcOmSnapshot). snapshot is (supposed to be) removed from eviction list while the snapshot is already removed from dbMap (cleaned up) by cleanup().

(*) While actually Step 2 in Case 2 above would be prevented because decrementRefCount() throws if a thread tries to release() a snapshot it had not get() before, as a safety mechanism. And this safety, together with atomic ref count should also prevent Case 1 from happening in the first place (for which multiple threads would observe that ref count reached 0 at the same time).

cleanup(), as of this time, is only called at the end of each get() and release(). As it doesn't run in a seperate thread it should not pose extra correctness issue at least for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smengcl From the code I don't see release() being used anywhere. How do we decrement reference count if we don't use release(). The only other place we decrement reference count is in get()

Copy link
Contributor

@smengcl smengcl Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswinshakil Indeed SnapshotCache#release is not called directly right now and is only kept to conform to Guava cache interface (and just in case we would need it). Ref count is currently decremented by ReferenceCounted#close because we encourage using try-with-resource on snapshot instances.

@smengcl smengcl merged commit a02a8ed into apache:master Dec 12, 2023
@smengcl
Copy link
Contributor

smengcl commented Dec 12, 2023

Thanks @aswinshakil for the patch. Thanks @hemantk-12 for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants