Skip to content

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

Currently SnapshotCache does a cleanup on every operation performed on the cache which is synchronized operation which overall makes the get,release operation non performant. A better approach would be introduce a background service which will periodically come up and perform a cleanup of all the unreferenced snapshot instances.

HDDS-10103 removed pending eviction list to avoid deadlock issues which means we have to check each and every snapshot instance in the cache if it has any references. If we use the concurrentHashSet for the pendingEvictionSet instead we can avoid such deadlock issues from occuring and it would be optimized at the same time.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10156

How was this patch tested?

Adding unit tests and integration tests and existing tests should also help

@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jan 18, 2024
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swamirishi for working on this.

I looked at this briefly and overall looks good to me.

Left some inline comments.

Comment on lines 87 to 90
// TODO: [SNAPSHOT] Add OzoneManager.isLeaderReady() check along with
// suspended. `isLeaderReady` check was removed because some unit tests
// were failing due to Mockito limitation. Remove this once unit tests
// or mocking are fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment valid for this? Or just copy-paste from SnapshotDiffCleanupService?

@swagle
Copy link
Contributor

swagle commented Jan 18, 2024

Why is this a Draft?

dbMap.compute(entry.getKey(), (k, v) -> {
for (String evictionKey : pendingEvictionQueue) {
dbMap.compute(evictionKey, (k, v) -> {
pendingEvictionQueue.remove(k);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Iterating and removing elements, while get() is adding to the same pendingEvictionQueue can cause ConcurrentModificationException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fine, since this is a concurrent hashset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test for it? Or may be use iterator.

*/
public void invalidateAll() {
Iterator<Map.Entry<String, ReferenceCounted<IOmMetadataReader, SnapshotCache>>>
Iterator<Map.Entry<String, ReferenceCounted<IOmMetadataReader, SnapshotCache, String>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateAll() and invalidate() should also cleanup the if entries already exist in pendingEvictionQueue.

@swamirishi
Copy link
Contributor Author

Why is this a Draft?

I haven't added the unit test yet. I am working on it.

Comment on lines 488 to 489
private ReferenceCounted<IOmMetadataReader, SnapshotCache>
rcOmMetadataReader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it was indented better before.


public ReferenceCounted<
IOmMetadataReader, SnapshotCache> getOmMetadataReader() {
public ReferenceCounted<IOmMetadataReader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it could be in a single line since 120 chars are allowed per line now.

if (!skipActiveCheck && !omSnapshotManager.isSnapshotStatus(key, SNAPSHOT_ACTIVE)) {
// Ref count was incremented. Need to decrement on exception here.
rcOmSnapshot.decrementRefCount();
rcOmSnapshot.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be decrementRefCount() and not close(). Some of the background services for exmaple SnapshotDeletingService works on non-Active snapshot. Snapshot is opened by SnapshotDeletingService, it will close the currently used snapshot.

Copy link
Contributor

@hemantk-12 hemantk-12 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a offline discussion, close() internal calls decrementRefCount().
But I think decrementRefCount() is better to use from readability perspective otherwise someone may have the same doubt I had.

dbMap.compute(entry.getKey(), (k, v) -> {
for (String evictionKey : pendingEvictionQueue) {
dbMap.compute(evictionKey, (k, v) -> {
pendingEvictionQueue.remove(k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test for it? Or may be use iterator.

// Reset cache for each test case
snapshotCache = new SnapshotCache(
omSnapshotManager, cacheLoader, CACHE_SIZE_LIMIT);
omSnapshotManager, cacheLoader, CACHE_SIZE_LIMIT, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How setting the clean up interval to 0 is working for the test? I don't think it is correct to set clean-up to 0 or tests are more correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want the cleanup service to run here, since everything is mocked here.

swamirishi and others added 3 commits February 20, 2024 12:33
# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/ReferenceCounted.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotCache.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
@hemantk-12
Copy link
Contributor

@swamirishi can you please fix the check-style and unit tests?

Change-Id: I79ba52de34a144e5e1d21c6a8c9f794ce8172e78
Change-Id: I3ea7fe97f0e6a5fc76387894ab1f56c87c6a78b1
Change-Id: Ic50b54d77a20c22ca4b55ec2e4ea647453ce87e5
Change-Id: I7d842af27926a434bd9b7da4197bc5e0330e3de6

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotCache.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotCache.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Change-Id: If95fd3509b261f8ec5a1c72f7e8349742d17c6dd
Change-Id: I3c59fc898deb9a95198cf383a5abde2b5d1f5f9f
Change-Id: I594b27798c9d7be9682ddaa8f0c3454592a0d8f3
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swamirishi for the patch.
Left a comment about closing the scheduler otherwise looks good to me.

Change-Id: I9eeb49fe60851d6d58d55aa18682d63585320e7c
@swamirishi swamirishi marked this pull request as ready for review April 16, 2024 22:22
Change-Id: Id9bda88e9c79934f69712a8a93fb04c3fe74e3a4
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@swamirishi
Copy link
Contributor Author

Thank you for the review @hemantk-12 @aswinshakil

@swamirishi swamirishi merged commit 3e97d8f into apache:master Apr 17, 2024
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 17, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Jul 18, 2024
swamirishi added a commit to swamirishi/ozone that referenced this pull request Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants