-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-6323. Close RocksObject(s). #3091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Triggered an existing bug that ColumnFamilyOptions were shared and never closed. With the change, some unit tests failed since the new code will close ColumnFamilyOptions. |
The bug mentioned above is now fixed in the this pull request. "All checks have passed" as show in https://github.com/apache/ozone/runs/5202916023 |
|
@kerneltime please review |
|
The wrapping of underlying DB to ensure that close is in the right direction, though I am not sure if the expectation is that the DB should be closed on all exceptions? Also, what is the expectation for the LRU cache that holds references to the reference-counted DBs? |
@kerneltime , if we found any cases that the DB should not be closed, we may change the code later.
When an object needs to be closed so that it won't be access anymore. We probably won't have to care about the LRU cache. No? |
|
@kerneltime , do you have anymore comments? |
I was referring to the cache of open DBs maintained by the Datanode and the closing via reference counting that is implemented. |
@kerneltime , I see. Then, further accesses to the cached DB should see an exception since the DB is already closed. This seems okay. |
|
@kerneltime , anymore comments? |
|
@szetszwo I am not sure how this code is expected to work overall.
NotFound shows up as an error code that is can be sent up as part of the RocksDBException, is the normal key get impacted with closing the DB on Exception? |
Unfortunately, it does not seems any RocksDB documentation specifically talking about the DB should be closed on any exception. (Please let me know if you could find one.) However, in the the example provided by https://github.com/facebook/rocksdb/wiki/RocksJava-Basics (copied below), they do use try-with-resource to close the db and the RocksObject for any exceptions. |
@kerneltime , I agree that this change does not fix segfaults at all. This changes is to close the underlying resources.
There may be possible improvements on cache management. However, it is outside the scope of this change. The current change is already not small.
What exceptions are recoverable so that the db should not be closed? Could you give some examples? As shown in the example in https://github.com/facebook/rocksdb/wiki/RocksJava-Basics , they do suggest close the db for any exceptions. In the current code, if there is an exception, it is possible to lead to silent data loss since the db and other RocksObject are not closed. The usual exception handling is to close these objects so that the underlying files/objects can be closed and the underlying resources can be released. |
|
If there is doubt about exception handling we should check with the RocksDB team (slack or mailing list?). They should be able to give a quick response. I do agree the documentation is ambiguous. Personally closing sounds safer because how can I trust the state of the handle after an exception? |
@arp7 , that's a good idea. Just have post a question in facebook/rocksdb#8617 (comment) |
|
@arp7 , @kerneltime , got the response below; see facebook/rocksdb#8617 (comment)
|
I have asked a follow on question, I am not sure if he is saying that we must close RockDB on any exception. Also, how do you see a Reference counted DB cache entry reopening a DB after the underlying DB is closed due to an exception? |
@szetszwo he elaborated on his comment. I am still not sure this is the right patch. |
@kerneltime , could you provide the right patch then? BTW, you have not answered the questions below
|
@szetszwo let us sync up. I think this PR is a step in the right direction but needs more work, we should look into how entire lifecycle of a RocksDB from open to close and caching. |
@kerneltime , thanks for repeating the generic comments. It would be much better if you could
|
|
@arp7 , thanks for looking into this. Unfortunately, we are not able to make any progress due to @kerneltime 's comments. This pull request already has unexpectedly consumed a lot of my time so that I won't be able to continue working on the code change. It seems that Ozone is working fine without this change. I would suggest to simply close this pull request instead of closing the rocks objects in Ozone. What do you think? |
|
@kerneltime , In our discussion on Apr 28, you said that you would work on a pull request for the ContainerCache. Any updates? |
|
|
If I get the time, I will see if I can add additional tests but this should address my concern for leaving entries in the LRU cache that never get evicted |
|
@szetszwo please go ahead and merge this PR and I can add the logic on top of your change. LGTM |
|
@kerneltime if there are correctness issues with removing closed handles from the RocksDB cache then shouldn't we fix it prior to commit? |
Yes, PR #3426 (linked above) includes this change as well as logic to remove closed DB references from cache |
I don't think we should merge a PR with two related, but separate patches from two different authors. That's why I marked #3426 as draft. |
|
Sure I can wait for this one to get merged and then rebase. |
|
@szetszwo will take a look at the update you pushed tomorrow. |
|
@kerneltime , there were conflicts again so that have to push again. |
jojochuang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tremendous amount of effort @szetszwo @kerneltime @arp7
I've taken a look at this patch which looks good to me. I would personally love to take the subclasses of RocksDatabase out to their own source files but we can address that later.
This PR serves as the important first step in addressing the resource leakage problems in RocksDB 7 and there are multiple PRs (#3426 and #3400) lining up on top of it to improve further.
So I am +1 and will merge to unblock the rest of the work.
Thanks all!
|
@jojochuang , thanks a lot for reviewing and merging this! |
What changes were proposed in this pull request?
Ensure that the RocksObjects are closed properly.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-6323
How was this patch tested?
Existing unit tests.