Skip to content

Conversation

@szetszwo
Copy link
Contributor

What changes were proposed in this pull request?

Ensure that the RocksObjects are closed properly.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6323

How was this patch tested?

Existing unit tests.

@szetszwo
Copy link
Contributor Author

szetszwo commented Feb 15, 2022

Triggered an existing bug that ColumnFamilyOptions were shared and never closed. With the change, some unit tests failed since the new code will close ColumnFamilyOptions.

@szetszwo
Copy link
Contributor Author

Triggered an existing bug that ColumnFamilyOptions were shared and never closed. With the change, some unit tests failed since the new code will close ColumnFamilyOptions.

The bug mentioned above is now fixed in the this pull request. "All checks have passed" as show in https://github.com/apache/ozone/runs/5202916023

@adoroszlai
Copy link
Contributor

@kerneltime please review

@kerneltime
Copy link
Contributor

The wrapping of underlying DB to ensure that close is in the right direction, though I am not sure if the expectation is that the DB should be closed on all exceptions? Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

@szetszwo
Copy link
Contributor Author

szetszwo commented Mar 1, 2022

... the DB should be closed on all exceptions?

@kerneltime , if we found any cases that the DB should not be closed, we may change the code later.

... Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

When an object needs to be closed so that it won't be access anymore. We probably won't have to care about the LRU cache. No?

@szetszwo
Copy link
Contributor Author

szetszwo commented Mar 4, 2022

@kerneltime , do you have anymore comments?

@kerneltime
Copy link
Contributor

... the DB should be closed on all exceptions?

@kerneltime , if we found any cases that the DB should not be closed, we may change the code later.

... Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs?

When an object needs to be closed so that it won't be access anymore. We probably won't have to care about the LRU cache. No?

I was referring to the cache of open DBs maintained by the Datanode and the closing via reference counting that is implemented.

@szetszwo
Copy link
Contributor Author

szetszwo commented Mar 6, 2022

I was referring to the cache of open DBs maintained by the Datanode ...

@kerneltime , I see. Then, further accesses to the cached DB should see an exception since the DB is already closed. This seems okay.

@szetszwo
Copy link
Contributor Author

@kerneltime , anymore comments?

@arp7 arp7 requested a review from hanishakoneru March 31, 2022 02:40
@kerneltime
Copy link
Contributor

@szetszwo I am not sure how this code is expected to work overall.

  1. ReferenceCountedDB lives in the ContainerCache and currently the logic to open and close a DB is tied to the lifecycle of the cache entry.
  2. Can you point out the documentation recommending that the DB should be closed on any exception.
  3. The main problem we have in our code is that RocksDB expects that any Java class that inherits RocksObject should be manually closed when done instead of depending on the JVM calling the finalizer which then frees up the memory allocated by the underlying JNI code. This code addresses some of the code paths but there are other places where we need to reason about our memory management. We currently do not have a leak but are depending on the JVM to eventually clean up the memory.
  4. The segfaults witnessed by the code were related to the open and close path for RocksDB and this code change would not solve the problems we encountered.
    I think it is a worthwhile effort to fix our usage of RocksDB but we need to complete the model including cache lifecycle management. If the underlying DB is closed it should be evicted from the cache as well after the reference count has gone down to 0. Also, if not all Exceptions need the DB to be closed, we should then only close the DB for the Exceptions that cannot be recovered.
  public enum Code {
    Ok(                 (byte)0x0),
    NotFound(           (byte)0x1),
    Corruption(         (byte)0x2),
    NotSupported(       (byte)0x3),
    InvalidArgument(    (byte)0x4),
    IOError(            (byte)0x5),
    MergeInProgress(    (byte)0x6),
    Incomplete(         (byte)0x7),
    ShutdownInProgress( (byte)0x8),
    TimedOut(           (byte)0x9),
    Aborted(            (byte)0xA),
    Busy(               (byte)0xB),
    Expired(            (byte)0xC),
    TryAgain(           (byte)0xD),
    Undefined(          (byte)0x7F);

NotFound shows up as an error code that is can be sent up as part of the RocksDBException, is the normal key get impacted with closing the DB on Exception?

  public byte[] get(ColumnFamily family, byte[] key) throws IOException {
    try {
      return db.get(family.getHandle(), key);
    } catch (RocksDBException e) {
      close();
      throw toIOException("get " + bytes2String(key) + " from " + family, e);
    }
  }

@szetszwo
Copy link
Contributor Author

szetszwo commented Apr 5, 2022

  1. Can you point out the documentation recommending that the DB should be closed on any exception.

Unfortunately, it does not seems any RocksDB documentation specifically talking about the DB should be closed on any exception. (Please let me know if you could find one.) However, in the the example provided by https://github.com/facebook/rocksdb/wiki/RocksJava-Basics (copied below), they do use try-with-resource to close the db and the RocksObject for any exceptions.

  try (final Options options = new Options().setCreateIfMissing(true)) {
    
    // a factory method that returns a RocksDB instance
    try (final RocksDB db = RocksDB.open(options, "path/to/db")) {
    
        // do something
    }
  } catch (RocksDBException e) {
    // do some error handling
    ...
  }

@szetszwo
Copy link
Contributor Author

szetszwo commented Apr 5, 2022

  1. The segfaults witnessed by the code ...

@kerneltime , I agree that this change does not fix segfaults at all. This changes is to close the underlying resources.

I think it is a worthwhile effort to fix our usage of RocksDB but we need to complete the model including cache lifecycle management. If the underlying DB is closed it should be evicted from the cache as well after the reference count has gone down to 0. ...

There may be possible improvements on cache management. However, it is outside the scope of this change. The current change is already not small.

... Also, if not all Exceptions need the DB to be closed, we should then only close the DB for the Exceptions that cannot be recovered.

What exceptions are recoverable so that the db should not be closed? Could you give some examples? As shown in the example in https://github.com/facebook/rocksdb/wiki/RocksJava-Basics , they do suggest close the db for any exceptions.

In the current code, if there is an exception, it is possible to lead to silent data loss since the db and other RocksObject are not closed. The usual exception handling is to close these objects so that the underlying files/objects can be closed and the underlying resources can be released.

@szetszwo szetszwo removed the request for review from swagle April 5, 2022 09:14
@arp7
Copy link
Contributor

arp7 commented Apr 5, 2022

If there is doubt about exception handling we should check with the RocksDB team (slack or mailing list?). They should be able to give a quick response. I do agree the documentation is ambiguous. Personally closing sounds safer because how can I trust the state of the handle after an exception?

@szetszwo
Copy link
Contributor Author

szetszwo commented Apr 7, 2022

If there is doubt about exception handling we should check with the RocksDB team (slack or mailing list?). ...

@arp7 , that's a good idea. Just have post a question in facebook/rocksdb#8617 (comment)

@szetszwo
Copy link
Contributor Author

szetszwo commented Apr 7, 2022

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

@kerneltime
Copy link
Contributor

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

I have asked a follow on question, I am not sure if he is saying that we must close RockDB on any exception. Also, how do you see a Reference counted DB cache entry reopening a DB after the underlying DB is closed due to an exception?

@kerneltime
Copy link
Contributor

@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)

You should close any objects you have acquired

@szetszwo he elaborated on his comment. I am still not sure this is the right patch.

@szetszwo
Copy link
Contributor Author

... . I am still not sure this is the right patch.

@kerneltime , could you provide the right patch then?

BTW, you have not answered the questions below

What exceptions are recoverable so that the db should not be closed? Could you give some examples?

@kerneltime
Copy link
Contributor

... . I am still not sure this is the right patch.

@kerneltime , could you provide the right patch then?

BTW, you have not answered the questions below

What exceptions are recoverable so that the db should not be closed? Could you give some examples?

@szetszwo let us sync up. I think this PR is a step in the right direction but needs more work, we should look into how entire lifecycle of a RocksDB from open to close and caching.

@szetszwo
Copy link
Contributor Author

... I think this PR is a step in the right direction but needs more work, we should look into how entire lifecycle of a RocksDB from open to close and caching.

@kerneltime , thanks for repeating the generic comments. It would be much better if you could

  1. answer my questions;
  2. give some concrete suggestions; and/or
  3. provide the right patch.

@szetszwo
Copy link
Contributor Author

@arp7 , thanks for looking into this. Unfortunately, we are not able to make any progress due to @kerneltime 's comments. This pull request already has unexpectedly consumed a lot of my time so that I won't be able to continue working on the code change. It seems that Ozone is working fine without this change. I would suggest to simply close this pull request instead of closing the rocks objects in Ozone. What do you think?

@szetszwo
Copy link
Contributor Author

@kerneltime , In our discussion on Apr 28, you said that you would work on a pull request for the ContainerCache. Any updates?

@kerneltime
Copy link
Contributor

@kerneltime , In our discussion on Apr 28, you said that you would work on a pull request for the ContainerCache. Any updates?
I will try to get a patch out by Monday. There were other priorities that came up also, looks like there needs to be a major refactor of all RocksDB code now that post major version upgrade there is no finalize() to clean up allocated memory during GC.

@kerneltime
Copy link
Contributor

If I get the time, I will see if I can add additional tests but this should address my concern for leaving entries in the LRU cache that never get evicted

diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
index af0958a205..2377bb0b3a 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerCache.java
@@ -143,7 +143,7 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       lock.lock();
       try {
         db = (ReferenceCountedDB) this.get(containerDBPath);
-        if (db != null) {
+        if (db != null && !db.isClosed()) {
           metrics.incNumCacheHits();
           db.incrementReference();
           return db;
@@ -170,7 +170,7 @@ public ReferenceCountedDB getDB(long containerID, String containerDBType,
       try {
         ReferenceCountedDB currentDB =
             (ReferenceCountedDB) this.get(containerDBPath);
-        if (currentDB != null) {
+        if (currentDB != null && !db.isClosed()) {
           // increment the reference before returning the object
           currentDB.incrementReference();
           // clean the db created in previous step
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
index 5fe61a85b0..d20dd6fd80 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ReferenceCountedDB.java
@@ -95,4 +95,8 @@ public DatanodeStore getStore() {
   public void close() {
     decrementReference();
   }
+
+  public boolean isClosed() {
+    return store.isClosed();
+  }
 }
\ No newline at end of file
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
index f9f794df7d..56c44455c7 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/AbstractDatanodeStore.java
@@ -189,6 +189,11 @@ public BatchOperationHandler getBatchHandler() {
             blockDataTableWithIterator.iterator(), filter);
   }
 
+  @Override
+  public boolean isClosed() {
+    return store.isClosed();
+  }
+
   @Override
   public void flushDB() throws IOException {
     store.flushDB();
diff --git a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
index 5a0ce7a646..2bc3b7a46b 100644
--- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
+++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/metadata/DatanodeStore.java
@@ -91,4 +91,6 @@
 
   BlockIterator<BlockData>
       getBlockIterator(MetadataKeyFilters.KeyPrefixFilter filter);
+
+  boolean isClosed();
 }
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
index 2ac2bdc730..e4b8bd02b6 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/DBStore.java
@@ -197,4 +197,6 @@ DBUpdatesWrapper getUpdatesSince(long sequenceNumber)
    */
   DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
       throws SequenceNumberNotFoundException;
+
+  boolean isClosed();
 }
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
index 467cf4462d..baaa17e49a 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RDBStore.java
@@ -327,6 +327,11 @@ public DBUpdatesWrapper getUpdatesSince(long sequenceNumber, long limitCount)
     return dbUpdatesWrapper;
   }
 
+  @Override
+  public boolean isClosed() {
+    return db.isClosed();
+  }
+
   @VisibleForTesting
   public RocksDatabase getDb() {
     return db;
diff --git a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
index 4457551602..963d894562 100644
--- a/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
+++ b/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/RocksDatabase.java
@@ -62,6 +62,7 @@
   static final Logger LOG = LoggerFactory.getLogger(RocksDatabase.class);
 
   static final String ESTIMATE_NUM_KEYS = "rocksdb.estimate-num-keys";
+  boolean dbClosed;
 
   /**
    * Read DB and return existing column families.
@@ -172,6 +173,10 @@ private static void runWithTryCatch(Runnable runnable, Object name) {
     };
   }
 
+  public boolean isClosed() {
+    return isClosed.get();
+  }
+
   /**
    * Represents a checkpoint of the db.
    *

@kerneltime
Copy link
Contributor

@szetszwo please go ahead and merge this PR and I can add the logic on top of your change.

LGTM

@szetszwo
Copy link
Contributor Author

@szetszwo please go ahead and merge this PR and I can add the logic on top of your change.

LGTM

@arp7 , could you review this?

@arp7
Copy link
Contributor

arp7 commented May 20, 2022

@kerneltime if there are correctness issues with removing closed handles from the RocksDB cache then shouldn't we fix it prior to commit?

@kerneltime
Copy link
Contributor

kerneltime commented May 20, 2022

@kerneltime if there are correctness issues with removing closed handles from the RocksDB cache then shouldn't we fix it prior to commit?

Yes, PR #3426 (linked above) includes this change as well as logic to remove closed DB references from cache
We can either merge just #3426 or merge this, rebase and merge the second PR

@adoroszlai
Copy link
Contributor

PR #3426 (linked above) includes this change as well as logic to remove closed DB references from cache
We can either merge just #3426 or merge this, rebase and merge the second PR

I don't think we should merge a PR with two related, but separate patches from two different authors. That's why I marked #3426 as draft.

@kerneltime
Copy link
Contributor

Sure I can wait for this one to get merged and then rebase.

@kerneltime
Copy link
Contributor

@szetszwo will take a look at the update you pushed tomorrow.

@szetszwo
Copy link
Contributor Author

@kerneltime , there were conflicts again so that have to push again.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tremendous amount of effort @szetszwo @kerneltime @arp7

I've taken a look at this patch which looks good to me. I would personally love to take the subclasses of RocksDatabase out to their own source files but we can address that later.

This PR serves as the important first step in addressing the resource leakage problems in RocksDB 7 and there are multiple PRs (#3426 and #3400) lining up on top of it to improve further.

So I am +1 and will merge to unblock the rest of the work.

Thanks all!

@jojochuang jojochuang merged commit 5ed0e0a into apache:master May 24, 2022
@szetszwo
Copy link
Contributor Author

@jojochuang , thanks a lot for reviewing and merging this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants