-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-10103. Simplified snapshotCache by using single reentrant lock instead multiple locks #5968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for fixing this @hemantk-12. There can still be a case where this can result in a deadlock. For example, Now both the threads are doing the Let me know if this is possible. |
Currently cleanup is synchronized at class level and should prevent this. Only one thread will be doing clean up at a time. I think we can move cleanup here out of the lock. |
|
Even with synchronized, we can still hit this scenario. |
|
@aswinshakil actually you are right (Good catch), even after the synchronized cleanup, it is possible the thread is waiting for the lock which is acquired by another thread. For example, Now thread one acquires the cleanup() lock and starts the clean up, but it will stuck for snap2 lock because it has not been released so far. Moving cleanup in get function out of the lock should fix it. After the change, One of the thread will acquire the cleanup lock because it is synchronized. Let's say, Thread 1 gets it. |
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Outdated
Show resolved
Hide resolved
|
LGTM + 1. It would be great if someone else also took a look at it. Thanks @hemantk-12 for working on this. |
| // Soft-limit of the total number of snapshot DB instances allowed to be | ||
| // opened on the OM. | ||
| private final int cacheSizeLimit; | ||
| private final Striped<ReadWriteLock> striped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a javadoc paragraph above this line to explain the intended usage of this for other maintainers would be appreciated.
| this.dbMap = new ConcurrentHashMap<>(); | ||
| this.pendingEvictionList = | ||
| Collections.synchronizedSet(new LinkedHashSet<>()); | ||
| this.dbMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could thread visibility be an issue here with plain HashMap since dbMap can be mutated by multiple threads and we are not using synchronize on this particular Map?
https://stackoverflow.com/a/11050613
We likely still need ConcurrentHashMap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You made a good point. I never heard before that HashMap could lead to infinite loop if multiple thread are updating it.
But using ConcurrentHashMap with locks could lead to the same problem as #5751.
There is another approach to just use the ConcurrentHashMap. I created draft PR: #5986 according to that. My only concern is if it will throw ConcurrentModificationException while iterating over the map when another thread is updating or adding entries to it. I read couple of blogs and seems like it should not be a problem if I go with compute function https://stackoverflow.com/questions/37127285/iterate-over-concurrenthashmap-while-deleting-entries and https://www.digitalocean.com/community/tutorials/concurrenthashmap-in-java
Let me know what you think about this approach.
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hemantk-12 . Overall looks fine. I have some comments inline.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hemantk-12 thanks for the patch. I believe there is still an issue with the locking.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Show resolved
Hide resolved
|
Closing the pull request in favour of #5986 |
What changes were proposed in this pull request?
Currently we use different locks to provide consistent view of the snapshot cache. Which are causing multiple problem e.g. deadlock, race condition.
In this change, a reentrant lock is used to achieve the synchronization which give the consistency view of the cache. And Guava's Striped is used to distribute the lock so that multiple thread can access the cache at the same time.
What is the link to the Apache JIRA
HDDS-10103
How was this patch tested?
Existing unit and integration tests.