Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Apr 14, 2023

What changes were proposed in this pull request?

This is a follow-up to HDDS-7935.

Opening this as a draft for suggestions. @GeorgeJahad @hemantk-12 could also take a look at this. I'd like to give some opinions and feedbacks on the overall approach.

The core of this PR is SnapshotCache class and ReferenceCounted interface. One can start from there. SnapshotCache is supposed to be thread-safe.

With this approach, every single time an OmSnapshot instance is retrieved using snapshotCache.get(), a corresponding snapshotCache.close() has to be called at the end of its usage (e.g. using try-with-resources). Otherwise, there will be leakages when the reference count is not decremented. I have identified all existing usages of snapshotCache.get(), wrapped in another helper method or not, on latest master branch and added the logic there on how they should be properly handled.

In the meantime, I will explore if CacheBuilder#softValues() would be a viable alternative (potential approach 2), where we could probably rely on JVM itself to do the "reference counting" on the instances. Done in HDDS-7935.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8528

How was this patch tested?

  • Add unit tests.
  • Tests that covers existing snapshot cache use-cases, that is all snapshot operations that read or write snapshot DB, should all pass.

…LoadingCache

Change-Id: Id28c09eae42d3ba5e90117bbea37891c725b32da
@GeorgeJahad
Copy link
Contributor

GeorgeJahad commented Apr 14, 2023

I tend to prefer approach 1.

The guava doc has specific warnings about using weak/soft values: https://guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html#weakValues()

"Weak values will be garbage collected once they are weakly reachable. This makes them a poor candidate for caching; consider softValues() instead. "

"Warning: in most circumstances it is better to set a per-cache maximum size instead of using soft references. You should only use this method if you are well familiar with the practical consequences of soft references. "

So I am hesitant about approach 2.

For me the main problem with approach 1 is that the implementation doesn't yet support try-with-resources.

Your concern about encapsulation is valid: As you say, "putting the decrement ref count logic in OmSnapshot#close (so that we can wrap OmSnapshot inside a try-with-resources) would also break the encapsulation thus considered hacky by myself."

Would it be better, if instead of the refCount interface, we had a cacheEntry wrapper class, that manages the reference counting and also implements the close? The cache entry constructor would take an omSnapshot as a paramater, and would have a getter method to all the enduser to access the actual omSnapshot methods?

So users would always do a try-with-resources{} to get a cache entry and then unwrap that to get the omSnapshot? Or maybe we could make the cache entry implement the IOMetadataReader interface as well, so there would usually be no unwrapping needed.

@GeorgeJahad
Copy link
Contributor

I also think some of the snapshot user code should be modified to support try-with-resources. For example, this code:

omFromSnapshot = (OmSnapshot) omSnapshotManager
.checkForSnapshot(snapshotInfo.getVolumeName(),
snapshotInfo.getBucketName(),
getSnapshotPrefix(snapshotInfo.getName()));
}
omClientResponse = new OMKeyPurgeResponse(omResponse.build(),
keysToBePurgedList, omFromSnapshot);
} catch (IOException ex) {

does the checkForSnapshot() call and then passes a reference to the omSnapshot into the double buffer thread.

I think those sorts of calls should be modified just to pass the snapshot key into the double buffer thread and have it get the snapshot from the cache, (with try-with-resources.)

The way it is implemented now, the reference is passed between two threads, and try-with-resources requires it all happen in the same thread.

@smengcl
Copy link
Contributor Author

smengcl commented Apr 14, 2023

Thanks @GeorgeJahad for the comment.

I tend to prefer approach 1.

The guava doc has specific warnings about using weak/soft values: guava.dev/releases/19.0/api/docs/com/google/common/cache/CacheBuilder.html#weakValues()

"Weak values will be garbage collected once they are weakly reachable. This makes them a poor candidate for caching; consider softValues() instead. "

"Warning: in most circumstances it is better to set a per-cache maximum size instead of using soft references. You should only use this method if you are well familiar with the practical consequences of soft references. "

So I am hesitant about approach 2.

Yes I read the doc before I went through the implementation. Though I should have used softValues() in approach 2. Also just found an unbounded cache usage in Hive code base here:

https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java#L76-L80

For me the main problem with approach 1 is that the implementation doesn't yet support try-with-resources.

Your concern about encapsulation is valid: As you say, "putting the decrement ref count logic in OmSnapshot#close (so that we can wrap OmSnapshot inside a try-with-resources) would also break the encapsulation thus considered hacky by myself."

Would it be better, if instead of the refCount interface, we had a cacheEntry wrapper class, that manages the reference counting and also implements the close? The cache entry constructor would take an omSnapshot as a paramater, and would have a getter method to all the enduser to access the actual omSnapshot methods?

So users would always do a try-with-resources{} to get a cache entry and then unwrap that to get the omSnapshot? Or maybe we could make the cache entry implement the IOMetadataReader interface as well, so there would usually be no unwrapping needed.

That could work. I had the same idea briefly but I am not too fond of the extra layer either. However, this would preserve the encapsulation.

@smengcl
Copy link
Contributor Author

smengcl commented Apr 14, 2023

I also think some of the snapshot user code should be modified to support try-with-resources. For example, this code:

omFromSnapshot = (OmSnapshot) omSnapshotManager
.checkForSnapshot(snapshotInfo.getVolumeName(),
snapshotInfo.getBucketName(),
getSnapshotPrefix(snapshotInfo.getName()));
}
omClientResponse = new OMKeyPurgeResponse(omResponse.build(),
keysToBePurgedList, omFromSnapshot);
} catch (IOException ex) {

does the checkForSnapshot() call and then passes a reference to the omSnapshot into the double buffer thread.

I think those sorts of calls should be modified just to pass the snapshot key into the double buffer thread and have it get the snapshot from the cache, (with try-with-resources.)

The way it is implemented now, the reference is passed between two threads, and try-with-resources requires it all happen in the same thread.

Back when @aswinshakil was implementing this in #4486 I asked the same question of why didn't we just pass the snapshot table key to the Response and get the instance there. The answer was also encapsulation. Response only gets the OMMetadataManager instance (active or snapshot DB) it is operating on. Though I think passing in the extra omSnapshotManager alone is fine.

@hemantk-12
Copy link
Contributor

hemantk-12 commented Apr 14, 2023

Only for the snapshot use case, I am inclined towards approach #2.

Reason:

  1. The way SnapshotCache is implemented is not bounded cache. I don't see cacheSize is used in eviction policy if cache is full. Yes, it doesn't proactively claims some entries but only if they are not used by anyone. If it need bounded, cache it doesn't solve that problem.
  2. If I understood it properly, Cache is used in snapshot to reuse the RocksDB instance instead of create a new one every time. In that case, I think it is OK to rely on GC for that.

I was thinking to use WeakReference with WeakHashMap which I guess would be similar to CacheBuilder#weakValues() and CacheBuilder#softValues().

@GeorgeJahad
Copy link
Contributor

@smengcl I have no experience with "softValues()" but my understanding is that it won't do any reclamation until the heap is completely full. Is that correct?

In a degenerate case where OM JVM heap usage is pushed to the absolute limit,

My fear is that this is not a degenerate case but will rather be common once enough snapshot operations have been performed.

@GeorgeJahad
Copy link
Contributor

@hemantk-12

I don't see cacheSize is used in eviction policy if cache is full. Yes, it doesn't proactively claims some entries but only if they are not used by anyone. If it need bounded, cache it doesn't solve that problem.

It does seem to be implemented here, or am I misunderstanding your point?

* If cache size exceeds soft limit, attempt to clean up and close the
* instances that has zero reference count.
* TODO: [SNAPSHOT] Add new ozone debug CLI command to trigger this directly.
*/
private void cleanup() {
long numOfInstToClose = (long) dbMap.size() - cacheSize;
while (numOfInstToClose > 0L) {
// Get the first instance in the clean up list

@hemantk-12
Copy link
Contributor

@hemantk-12

I don't see cacheSize is used in eviction policy if cache is full. Yes, it doesn't proactively claims some entries but only if they are not used by anyone. If it need bounded, cache it doesn't solve that problem.

It does seem to be implemented here, or am I misunderstanding your point?

* If cache size exceeds soft limit, attempt to clean up and close the
* instances that has zero reference count.
* TODO: [SNAPSHOT] Add new ozone debug CLI command to trigger this directly.
*/
private void cleanup() {
long numOfInstToClose = (long) dbMap.size() - cacheSize;
while (numOfInstToClose > 0L) {
// Get the first instance in the clean up list

I had the same impression when I look at it first time but it is not bounded because cleanup() is called only by release().

@smengcl smengcl changed the title HDDS-7935. [Snapshot] Custom SnapshotCache implementation to replace LoadingCache HDDS-8528. [Snapshot] Custom SnapshotCache implementation to replace LoadingCache May 3, 2023
@smengcl
Copy link
Contributor Author

smengcl commented May 4, 2023

@hemantk-12

I don't see cacheSize is used in eviction policy if cache is full. Yes, it doesn't proactively claims some entries but only if they are not used by anyone. If it need bounded, cache it doesn't solve that problem.

It does seem to be implemented here, or am I misunderstanding your point?

* If cache size exceeds soft limit, attempt to clean up and close the
* instances that has zero reference count.
* TODO: [SNAPSHOT] Add new ozone debug CLI command to trigger this directly.
*/
private void cleanup() {
long numOfInstToClose = (long) dbMap.size() - cacheSize;
while (numOfInstToClose > 0L) {
// Get the first instance in the clean up list

I had the same impression when I look at it first time but it is not bounded because cleanup() is called only by release().

Thanks for the reminder. cleanup() needs to be called by get() as well. I will push the change later. (Doesn't change the fact that the cache is still unbounded.)

smengcl added 4 commits May 4, 2023 15:41
…se `ReferenceCounted<>` to the public and replace every single instance of `OmSnapshot` with `ReferenceCounted<OmSnapshot>` so as they could be auto-closed with try-with-resources.
@smengcl
Copy link
Contributor Author

smengcl commented May 5, 2023

@GeorgeJahad In the latest commit posted above (8b57f1f), I wrapped OmSnapshot inside ReferenceCounted<> so it becomes ReferenceCounted<OmSnapshot> in the cache dbMap.

The plan is to replace every single instance of OmSnapshot with ReferenceCounted<OmSnapshot> in the rest of Ozone code so that they could be auto-closed with try-with-resources. This would involve quite a few changes:

  1. IOmMetadataReader would need to be wrapped as ReferenceCounted<IOmMetadataReader> as well. (10 usages in 6 files)
  2. Every single call to OmSnapshotManager#checkForSnapshot would now need to be wrapped inside try-with-resources. (10 usages in 6 files excluding tests)
  3. Need to wrap active DB's OmMetadataReader in ReferenceCounted<> as well due to its usage in checkForSnapshot(), although reference counting wouldn't be useful to active OM DB.
  4. snapshotCache.get() usages would need to be checked now that it returns ReferenceCounted<OmSnapshot>. (5 usages in 2 files)
  5. SnapDiff thread executor will not be able to use try-with-resources. Rather it has to call referenceCountedOmSnapshot.close() manually upon completion of the SnapDiff job on both snapshot DB instances (fromDB and toDB)

I will postpone further work on this until #4642 is merged to avoid stepped on each others' toes.

In the meantime, ReferenceCounted and the rest of the skeleton should not change any more (apart from the TODOs listed above). You could take another look.

@GeorgeJahad
Copy link
Contributor

@smengcl I took a quick look at ReferenceCounted.java and SnapshotCache.java. They look good to me. (The cleanup() method probably requires some synchronization.) I'll take a closer look when the PR comes out of draft state.

smengcl added 3 commits May 24, 2023 18:02
…etadataReader>` so as they could be auto-closed with try-with-resources.
Conflicts:
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshot.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMDirectoriesPurgeRequestWithFSO.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyPurgeRequest.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/snapshot/OMSnapshotMoveDeletedKeysResponse.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java
@smengcl
Copy link
Contributor Author

smengcl commented May 25, 2023

This PR is ready for review at this point.

The overall structure would not receive any big changes from me without further comments.

I will mark the PR ready for review once I resolve the test timeout (caused by strict thread ID checking in ReferenceCounted) in a bit.

smengcl added 3 commits May 26, 2023 16:07
Conflicts:
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshot.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMDirectoriesPurgeRequestWithFSO.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyPurgeRequest.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotMoveDeletedKeysRequest.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/DirectoryDeletingService.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/SnapshotDeletingService.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotUtils.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotManager.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/key/TestOMKeyPurgeRequestAndResponse.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestSnapshotDeletingService.java
smengcl added 13 commits June 21, 2023 16:19
Conflicts:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Conflicts:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Conflicts:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/snapshot/OMSnapshotMoveDeletedKeysRequest.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMKeyPurgeResponse.java
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/snapshot/TestOMSnapshotPurgeRequestAndResponse.java
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
@GeorgeJahad
Copy link
Contributor

Hey @smengcl It looks like this comment got marked as resolved just as I was responding to it: #4567 (comment)

Please take another look

}
}

return v;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return v here will put it back in the dbMap, but if it gets removed in cleanup() above, I don' t think that is what we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Replaced this line with return dbMap.get(k);

Might look hacky, but it works. Tested with a snippet:

import java.util.concurrent.ConcurrentHashMap;

class Untitled {
  public static void main(String[] args) {
    ConcurrentHashMap map = new ConcurrentHashMap<String, String>();
    String k1 = "k1";
    map.put(k1, "val1");
    map.compute(k1, (k, v) -> {
      System.out.format("k=%s, v=%s\n", k, v);
      map.remove(k);
      return map.get(k);
    });
    System.out.format("k=%s, v=%s\n", k1, map.get(k1));
  }
}

Output:

k=k1, v=val1
k=k1, v=null

Copy link
Contributor Author

@smengcl smengcl Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. turns out it is a bad idea to update other mappings inside a map.compute() anyways according to its javadoc. In this case it corrupts the map size count, and is caught by my UT:

https://github.com/apache/ozone/actions/runs/5396424029/jobs/9800041580?pr=4567#step:6:2594

Repro snippet:

import java.util.concurrent.ConcurrentHashMap;

class Untitled {
  public static void main(String[] args) {
    ConcurrentHashMap map = new ConcurrentHashMap<String, String>();
    String k1 = "k1";
    map.put(k1, "v1");
    map.put("k2", "v2");
    System.out.println("- Map dump 1");
    map.forEach((k, v) -> {
      System.out.format("k=%s, v=%s\n", k, v, map.size());
    });
    System.out.println();
    
    System.out.println("- compute()");
    map.compute(k1, (k, v) -> {
      System.out.format("k=%s, v=%s, size=%d\n", k, v, map.size());
      map.remove(k);
      System.out.format("k=%s, v=%s, size=%d\n", k, v, map.size());
      return map.get(k);
    });
    System.out.println();
    // Map size becomes incorrect at this point
    System.out.format("k=%s, v=%s, size=%d\n", k1, map.get(k1), map.size());
    
    System.out.println("- Map dump 2");
    map.forEach((k, v) -> {
      System.out.format("k=%s, v=%s\n", k, v, map.size());
    });
  }
}

Output:

- Map dump 1
k=k1, v=v1
k=k2, v=v2

- compute()
k=k1, v=v1, size=2
k=k1, v=v1, size=1

k=k1, v=null, size=0
- Map dump 2
k=k2, v=v2

You can see map's size=0 while it obviously still has k2 entry in it.

I have moved cleanup() out of compute() and added additional syncing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the ref counting is still done inside dbMap's monitor lock, the previously mentioned issue here and here should not revive.

And cleanup() is now wrapped inside synchronized.

smengcl added 4 commits June 27, 2023 18:56
Conflicts:
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Conflicts:
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java
@GeorgeJahad
Copy link
Contributor

That looks like it fixes the issues I found. Thanks!

@GeorgeJahad GeorgeJahad self-requested a review June 28, 2023 21:41
@smengcl
Copy link
Contributor Author

smengcl commented Jun 28, 2023

Thanks @GeorgeJahad and @hemantk-12 for reviewing this! I will merge this shortly.

@smengcl smengcl merged commit 5a6f08c into apache:master Jun 28, 2023
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
vtutrinov pushed a commit to Cyrill/ozone that referenced this pull request Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants