HDDS-6323. Close RocksObject(s). #3091

szetszwo · 2022-02-15T09:49:22Z

What changes were proposed in this pull request?

Ensure that the RocksObjects are closed properly.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6323

How was this patch tested?

Existing unit tests.

szetszwo · 2022-02-15T14:46:56Z

Triggered an existing bug that ColumnFamilyOptions were shared and never closed. With the change, some unit tests failed since the new code will close ColumnFamilyOptions.

szetszwo · 2022-02-17T04:20:45Z

Triggered an existing bug that ColumnFamilyOptions were shared and never closed. With the change, some unit tests failed since the new code will close ColumnFamilyOptions.

The bug mentioned above is now fixed in the this pull request. "All checks have passed" as show in https://github.com/apache/ozone/runs/5202916023

adoroszlai · 2022-03-01T06:35:12Z

@kerneltime please review

kerneltime · 2022-03-01T07:23:43Z

The wrapping of underlying DB to ensure that close is in the right direction, though I am not sure if the expectation is that the DB should be closed on all exceptions? Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

szetszwo · 2022-03-01T07:31:29Z

... the DB should be closed on all exceptions?

@kerneltime , if we found any cases that the DB should not be closed, we may change the code later.

... Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

When an object needs to be closed so that it won't be access anymore. We probably won't have to care about the LRU cache. No?

szetszwo · 2022-03-04T03:02:09Z

@kerneltime , do you have anymore comments?

kerneltime · 2022-03-05T00:09:12Z

... the DB should be closed on all exceptions?

@kerneltime , if we found any cases that the DB should not be closed, we may change the code later.

... Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

When an object needs to be closed so that it won't be access anymore. We probably won't have to care about the LRU cache. No?

I was referring to the cache of open DBs maintained by the Datanode and the closing via reference counting that is implemented.

szetszwo · 2022-03-06T03:51:07Z

I was referring to the cache of open DBs maintained by the Datanode ...

@kerneltime , I see. Then, further accesses to the cached DB should see an exception since the DB is already closed. This seems okay.

szetszwo · 2022-03-16T03:27:21Z

@kerneltime , anymore comments?

kerneltime · 2022-04-01T21:21:25Z

@szetszwo I am not sure how this code is expected to work overall.

ReferenceCountedDB lives in the ContainerCache and currently the logic to open and close a DB is tied to the lifecycle of the cache entry.
Can you point out the documentation recommending that the DB should be closed on any exception.
The main problem we have in our code is that RocksDB expects that any Java class that inherits RocksObject should be manually closed when done instead of depending on the JVM calling the finalizer which then frees up the memory allocated by the underlying JNI code. This code addresses some of the code paths but there are other places where we need to reason about our memory management. We currently do not have a leak but are depending on the JVM to eventually clean up the memory.
The segfaults witnessed by the code were related to the open and close path for RocksDB and this code change would not solve the problems we encountered.
I think it is a worthwhile effort to fix our usage of RocksDB but we need to complete the model including cache lifecycle management. If the underlying DB is closed it should be evicted from the cache as well after the reference count has gone down to 0. Also, if not all Exceptions need the DB to be closed, we should then only close the DB for the Exceptions that cannot be recovered.

  public enum Code {
    Ok(                 (byte)0x0),
    NotFound(           (byte)0x1),
    Corruption(         (byte)0x2),
    NotSupported(       (byte)0x3),
    InvalidArgument(    (byte)0x4),
    IOError(            (byte)0x5),
    MergeInProgress(    (byte)0x6),
    Incomplete(         (byte)0x7),
    ShutdownInProgress( (byte)0x8),
    TimedOut(           (byte)0x9),
    Aborted(            (byte)0xA),
    Busy(               (byte)0xB),
    Expired(            (byte)0xC),
    TryAgain(           (byte)0xD),
    Undefined(          (byte)0x7F);

NotFound shows up as an error code that is can be sent up as part of the RocksDBException, is the normal key get impacted with closing the DB on Exception?

  public byte[] get(ColumnFamily family, byte[] key) throws IOException {
    try {
      return db.get(family.getHandle(), key);
    } catch (RocksDBException e) {
      close();
      throw toIOException("get " + bytes2String(key) + " from " + family, e);
    }
  }

szetszwo · 2022-04-05T08:52:15Z

Can you point out the documentation recommending that the DB should be closed on any exception.

Unfortunately, it does not seems any RocksDB documentation specifically talking about the DB should be closed on any exception. (Please let me know if you could find one.) However, in the the example provided by https://github.com/facebook/rocksdb/wiki/RocksJava-Basics (copied below), they do use try-with-resource to close the db and the RocksObject for any exceptions.

  try (final Options options = new Options().setCreateIfMissing(true)) {
    
    // a factory method that returns a RocksDB instance
    try (final RocksDB db = RocksDB.open(options, "path/to/db")) {
    
        // do something
    }
  } catch (RocksDBException e) {
    // do some error handling
    ...
  }

szetszwo · 2022-04-05T09:13:56Z

The segfaults witnessed by the code ...

@kerneltime , I agree that this change does not fix segfaults at all. This changes is to close the underlying resources.

I think it is a worthwhile effort to fix our usage of RocksDB but we need to complete the model including cache lifecycle management. If the underlying DB is closed it should be evicted from the cache as well after the reference count has gone down to 0. ...

There may be possible improvements on cache management. However, it is outside the scope of this change. The current change is already not small.

... Also, if not all Exceptions need the DB to be closed, we should then only close the DB for the Exceptions that cannot be recovered.

What exceptions are recoverable so that the db should not be closed? Could you give some examples? As shown in the example in https://github.com/facebook/rocksdb/wiki/RocksJava-Basics , they do suggest close the db for any exceptions.

In the current code, if there is an exception, it is possible to lead to silent data loss since the db and other RocksObject are not closed. The usual exception handling is to close these objects so that the underlying files/objects can be closed and the underlying resources can be released.

arp7 · 2022-04-05T16:06:21Z

If there is doubt about exception handling we should check with the RocksDB team (slack or mailing list?). They should be able to give a quick response. I do agree the documentation is ambiguous. Personally closing sounds safer because how can I trust the state of the handle after an exception?

szetszwo · 2022-04-07T14:47:33Z

If there is doubt about exception handling we should check with the RocksDB team (slack or mailing list?). ...

@arp7 , that's a good idea. Just have post a question in facebook/rocksdb#8617 (comment)

szetszwo · 2022-04-07T15:13:12Z

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

kerneltime · 2022-04-07T17:12:03Z

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

I have asked a follow on question, I am not sure if he is saying that we must close RockDB on any exception. Also, how do you see a Reference counted DB cache entry reopening a DB after the underlying DB is closed due to an exception?

kerneltime · 2022-04-08T16:48:48Z

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

@szetszwo he elaborated on his comment. I am still not sure this is the right patch.

szetszwo · 2022-04-11T12:38:18Z

... . I am still not sure this is the right patch.

@kerneltime , could you provide the right patch then?

BTW, you have not answered the questions below

What exceptions are recoverable so that the db should not be closed? Could you give some examples?

kerneltime · 2022-04-18T05:12:58Z

... . I am still not sure this is the right patch.

@kerneltime , could you provide the right patch then?

BTW, you have not answered the questions below

What exceptions are recoverable so that the db should not be closed? Could you give some examples?

@szetszwo let us sync up. I think this PR is a step in the right direction but needs more work, we should look into how entire lifecycle of a RocksDB from open to close and caching.

szetszwo · 2022-04-25T10:55:18Z

... I think this PR is a step in the right direction but needs more work, we should look into how entire lifecycle of a RocksDB from open to close and caching.

@kerneltime , thanks for repeating the generic comments. It would be much better if you could

answer my questions;
give some concrete suggestions; and/or
provide the right patch.

szetszwo · 2022-04-25T10:59:42Z

@arp7 , thanks for looking into this. Unfortunately, we are not able to make any progress due to @kerneltime 's comments. This pull request already has unexpectedly consumed a lot of my time so that I won't be able to continue working on the code change. It seems that Ozone is working fine without this change. I would suggest to simply close this pull request instead of closing the rocks objects in Ozone. What do you think?

szetszwo · 2022-05-13T10:06:02Z

@kerneltime , In our discussion on Apr 28, you said that you would work on a pull request for the ContainerCache. Any updates?

kerneltime · 2022-05-13T17:01:31Z

@kerneltime , In our discussion on Apr 28, you said that you would work on a pull request for the ContainerCache. Any updates?
I will try to get a patch out by Monday. There were other priorities that came up also, looks like there needs to be a major refactor of all RocksDB code now that post major version upgrade there is no finalize() to clean up allocated memory during GC.

kerneltime · 2022-05-16T20:37:35Z

If I get the time, I will see if I can add additional tests but this should address my concern for leaving entries in the LRU cache that never get evicted

diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
index af0958a205..2377bb0b3a 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
@@ -143,7 +143,7 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       lock.lock();
       try {
         db = (ReferenceCountedDB) this.get(containerDBPath);
-        if (db != null) {
+        if (db != null && !db.isClosed()) {
           metrics.incNumCacheHits();
           db.incrementReference();
           return db;
@@ -170,7 +170,7 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       try {
         ReferenceCountedDB currentDB =
             (ReferenceCountedDB) this.get(containerDBPath);
-        if (currentDB != null) {
+        if (currentDB != null && !db.isClosed()) {
           // increment the reference before returning the object
           currentDB.incrementReference();
           // clean the db created in previous step
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
index 5fe61a85b0..d20dd6fd80 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
@@ -95,4 +95,8 @@ public DatanodeStore getStore() {
   public void close() {
     decrementReference();
   }
+
+  public boolean isClosed() {
+    return store.isClosed();
+  }
 }
\ No newline at end of file
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
index f9f794df7d..56c44455c7 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
@@ -189,6 +189,11 @@ public BatchOperationHandler getBatchHandler() {
             blockDataTableWithIterator.iterator(), filter);
   }
 
+  @Override
+  public boolean isClosed() {
+    return store.isClosed();
+  }
+
   @Override
   public void flushDB() throws IOException {
     store.flushDB();
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
index 5a0ce7a646..2bc3b7a46b 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
@@ -91,4 +91,6 @@
 
   BlockIterator<BlockData>
       getBlockIterator(MetadataKeyFilters.KeyPrefixFilter filter);
+
+  boolean isClosed();
 }
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
index 2ac2bdc730..e4b8bd02b6 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
@@ -197,4 +197,6 @@ DBUpdatesWrapper getUpdatesSince(long sequenceNumber)
    */
   DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
       throws SequenceNumberNotFoundException;
+
+  boolean isClosed();
 }
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
index 467cf4462d..baaa17e49a 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
@@ -327,6 +327,11 @@ public DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
     return dbUpdatesWrapper;
   }
 
+  @Override
+  public boolean isClosed() {
+    return db.isClosed();
+  }
+
   @VisibleForTesting
   public RocksDatabase getDb() {
     return db;
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
index 4457551602..963d894562 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
@@ -62,6 +62,7 @@
   static final Logger LOG = LoggerFactory.getLogger(RocksDatabase.class);
 
   static final String ESTIMATE_NUM_KEYS = "rocksdb.estimate-num-keys";
+  boolean dbClosed;
 
   /**
    * Read DB and return existing column families.
@@ -172,6 +173,10 @@ private static void runWithTryCatch(Runnable runnable, Object name) {
     };
   }
 
+  public boolean isClosed() {
+    return isClosed.get();
+  }
+
   /**
    * Represents a checkpoint of the db.
    *

kerneltime · 2022-05-16T21:56:49Z

@szetszwo please go ahead and merge this PR and I can add the logic on top of your change.

LGTM

szetszwo · 2022-05-18T09:35:33Z

@szetszwo please go ahead and merge this PR and I can add the logic on top of your change.

LGTM

@arp7 , could you review this?

arp7 · 2022-05-20T01:37:13Z

@kerneltime if there are correctness issues with removing closed handles from the RocksDB cache then shouldn't we fix it prior to commit?

kerneltime · 2022-05-20T01:39:05Z

@kerneltime if there are correctness issues with removing closed handles from the RocksDB cache then shouldn't we fix it prior to commit?

Yes, PR #3426 (linked above) includes this change as well as logic to remove closed DB references from cache
We can either merge just #3426 or merge this, rebase and merge the second PR

adoroszlai · 2022-05-20T05:53:17Z

PR #3426 (linked above) includes this change as well as logic to remove closed DB references from cache
We can either merge just #3426 or merge this, rebase and merge the second PR

I don't think we should merge a PR with two related, but separate patches from two different authors. That's why I marked #3426 as draft.

kerneltime · 2022-05-20T05:59:57Z

Sure I can wait for this one to get merged and then rebase.

kerneltime · 2022-05-20T06:10:20Z

@szetszwo will take a look at the update you pushed tomorrow.

szetszwo · 2022-05-20T07:44:42Z

@kerneltime , there were conflicts again so that have to push again.

jojochuang

Thanks for the tremendous amount of effort @szetszwo @kerneltime @arp7

I've taken a look at this patch which looks good to me. I would personally love to take the subclasses of RocksDatabase out to their own source files but we can address that later.

This PR serves as the important first step in addressing the resource leakage problems in RocksDB 7 and there are multiple PRs (#3426 and #3400) lining up on top of it to improve further.

So I am +1 and will merge to unblock the rest of the work.

Thanks all!

szetszwo · 2022-05-25T03:09:43Z

@jojochuang , thanks a lot for reviewing and merging this!

szetszwo force-pushed the HDDS-6323 branch from 26b0244 to 2412ffd Compare February 15, 2022 15:38

szetszwo requested review from arp7 and swagle February 16, 2022 07:20

arp7 requested a review from hanishakoneru March 31, 2022 02:40

szetszwo removed the request for review from swagle April 5, 2022 09:14

szetszwo force-pushed the HDDS-6323 branch from 2412ffd to f2f8957 Compare April 7, 2022 15:22

szetszwo force-pushed the HDDS-6323 branch from f2f8957 to b5999c1 Compare April 27, 2022 10:37

kerneltime mentioned this pull request May 18, 2022

HDDS-5821 Container cache management for closing RockDB #3426

Merged

adoroszlai requested review from jojochuang and lokeshj1703 May 18, 2022 11:37

szetszwo force-pushed the HDDS-6323 branch from b5999c1 to 5b100fb Compare May 20, 2022 06:02

HDDS-6323. Close RocksObject(s).

2312de9

szetszwo force-pushed the HDDS-6323 branch from 5b100fb to 2312de9 Compare May 20, 2022 06:07

Code clean up.

9674dd5

szetszwo removed request for hanishakoneru and lokeshj1703 May 20, 2022 06:26

Fix long lines.

7a58a46

jojochuang approved these changes May 24, 2022

View reviewed changes

jojochuang merged commit 5ed0e0a into apache:master May 24, 2022

HDDS-6323. Close RocksObject(s). #3091

HDDS-6323. Close RocksObject(s). #3091

Uh oh!

Conversation

szetszwo commented Feb 15, 2022

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

szetszwo commented Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Feb 17, 2022

Uh oh!

adoroszlai commented Mar 1, 2022

Uh oh!

kerneltime commented Mar 1, 2022

Uh oh!

szetszwo commented Mar 1, 2022

Uh oh!

szetszwo commented Mar 4, 2022

Uh oh!

kerneltime commented Mar 5, 2022

Uh oh!

szetszwo commented Mar 6, 2022

Uh oh!

szetszwo commented Mar 16, 2022

Uh oh!

kerneltime commented Apr 1, 2022

Uh oh!

szetszwo commented Apr 5, 2022

Uh oh!

szetszwo commented Apr 5, 2022

Uh oh!

arp7 commented Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Apr 7, 2022

Uh oh!

szetszwo commented Apr 7, 2022

Uh oh!

kerneltime commented Apr 7, 2022

Uh oh!

kerneltime commented Apr 8, 2022

Uh oh!

szetszwo commented Apr 11, 2022

Uh oh!

kerneltime commented Apr 18, 2022

Uh oh!

szetszwo commented Apr 25, 2022

Uh oh!

szetszwo commented Apr 25, 2022

Uh oh!

szetszwo commented May 13, 2022

Uh oh!

kerneltime commented May 13, 2022

Uh oh!

kerneltime commented May 16, 2022

Uh oh!

kerneltime commented May 16, 2022

Uh oh!

szetszwo commented May 18, 2022

Uh oh!

arp7 commented May 20, 2022

Uh oh!

kerneltime commented May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adoroszlai commented May 20, 2022

Uh oh!

kerneltime commented May 20, 2022

Uh oh!

kerneltime commented May 20, 2022

Uh oh!

szetszwo commented May 20, 2022

Uh oh!

jojochuang left a comment

Choose a reason for hiding this comment

Uh oh!

szetszwo commented May 25, 2022

szetszwo commented Feb 15, 2022 •

edited

Loading

arp7 commented Apr 5, 2022 •

edited

Loading

kerneltime commented May 20, 2022 •

edited

Loading