-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10156. Optimize Snapshot Cache get and eviction #6024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swamirishi for working on this.
I looked at this briefly and overall looks good to me.
Left some inline comments.
...-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/ReferenceCounted.java
Outdated
Show resolved
Hide resolved
| // TODO: [SNAPSHOT] Add OzoneManager.isLeaderReady() check along with | ||
| // suspended. `isLeaderReady` check was removed because some unit tests | ||
| // were failing due to Mockito limitation. Remove this once unit tests | ||
| // or mocking are fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment valid for this? Or just copy-paste from SnapshotDiffCleanupService?
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotCacheCleanupService.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Outdated
Show resolved
Hide resolved
|
Why is this a Draft? |
...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java
Outdated
Show resolved
Hide resolved
| dbMap.compute(entry.getKey(), (k, v) -> { | ||
| for (String evictionKey : pendingEvictionQueue) { | ||
| dbMap.compute(evictionKey, (k, v) -> { | ||
| pendingEvictionQueue.remove(k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Iterating and removing elements, while get() is adding to the same pendingEvictionQueue can cause ConcurrentModificationException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be fine, since this is a concurrent hashset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add test for it? Or may be use iterator.
| */ | ||
| public void invalidateAll() { | ||
| Iterator<Map.Entry<String, ReferenceCounted<IOmMetadataReader, SnapshotCache>>> | ||
| Iterator<Map.Entry<String, ReferenceCounted<IOmMetadataReader, SnapshotCache, String>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
invalidateAll() and invalidate() should also cleanup the if entries already exist in pendingEvictionQueue.
I haven't added the unit test yet. I am working on it. |
| private ReferenceCounted<IOmMetadataReader, SnapshotCache> | ||
| rcOmMetadataReader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it was indented better before.
|
|
||
| public ReferenceCounted< | ||
| IOmMetadataReader, SnapshotCache> getOmMetadataReader() { | ||
| public ReferenceCounted<IOmMetadataReader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it could be in a single line since 120 chars are allowed per line now.
| if (!skipActiveCheck && !omSnapshotManager.isSnapshotStatus(key, SNAPSHOT_ACTIVE)) { | ||
| // Ref count was incremented. Need to decrement on exception here. | ||
| rcOmSnapshot.decrementRefCount(); | ||
| rcOmSnapshot.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be decrementRefCount() and not close(). Some of the background services for exmaple SnapshotDeletingService works on non-Active snapshot. Snapshot is opened by SnapshotDeletingService, it will close the currently used snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a offline discussion, close() internal calls decrementRefCount().
But I think decrementRefCount() is better to use from readability perspective otherwise someone may have the same doubt I had.
| dbMap.compute(entry.getKey(), (k, v) -> { | ||
| for (String evictionKey : pendingEvictionQueue) { | ||
| dbMap.compute(evictionKey, (k, v) -> { | ||
| pendingEvictionQueue.remove(k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add test for it? Or may be use iterator.
| // Reset cache for each test case | ||
| snapshotCache = new SnapshotCache( | ||
| omSnapshotManager, cacheLoader, CACHE_SIZE_LIMIT); | ||
| omSnapshotManager, cacheLoader, CACHE_SIZE_LIMIT, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How setting the clean up interval to 0 is working for the test? I don't think it is correct to set clean-up to 0 or tests are more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want the cleanup service to run here, since everything is mocked here.
# Conflicts: # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/ReferenceCounted.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotCache.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
|
@swamirishi can you please fix the check-style and unit tests? |
Change-Id: I79ba52de34a144e5e1d21c6a8c9f794ce8172e78
Change-Id: I3ea7fe97f0e6a5fc76387894ab1f56c87c6a78b1
Change-Id: Ic50b54d77a20c22ca4b55ec2e4ea647453ce87e5
Change-Id: I7d842af27926a434bd9b7da4197bc5e0330e3de6 # Conflicts: # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotCache.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Change-Id: If95fd3509b261f8ec5a1c72f7e8349742d17c6dd
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
Show resolved
Hide resolved
Change-Id: I3c59fc898deb9a95198cf383a5abde2b5d1f5f9f
Change-Id: I594b27798c9d7be9682ddaa8f0c3454592a0d8f3
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @swamirishi for the patch.
Left a comment about closing the scheduler otherwise looks good to me.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
Outdated
Show resolved
Hide resolved
Change-Id: I9eeb49fe60851d6d58d55aa18682d63585320e7c
Change-Id: Id9bda88e9c79934f69712a8a93fb04c3fe74e3a4
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thank you for the review @hemantk-12 @aswinshakil |
(cherry picked from commit 3e97d8f)
(cherry picked from commit 3e97d8f)
(cherry picked from commit 3e97d8f)
(cherry picked from commit 3e97d8f)
(cherry picked from commit 3e97d8f)
(cherry picked from commit 3e97d8f)
What changes were proposed in this pull request?
Currently SnapshotCache does a cleanup on every operation performed on the cache which is synchronized operation which overall makes the get,release operation non performant. A better approach would be introduce a background service which will periodically come up and perform a cleanup of all the unreferenced snapshot instances.
HDDS-10103 removed pending eviction list to avoid deadlock issues which means we have to check each and every snapshot instance in the cache if it has any references. If we use the concurrentHashSet for the pendingEvictionSet instead we can avoid such deadlock issues from occuring and it would be optimized at the same time.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10156
How was this patch tested?
Adding unit tests and integration tests and existing tests should also help