-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to discuss in the email, if here always use the previous node, then how can it ensure that the previous node is a completed one?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand what "completed one" means.
But this patch does not actually change the logic of the previous use of this LRUCache:
RingBufferEventHandler#onEvent -> RingBufferEventHandler#append -> ProtobufLogWriter#append -> CompressedKvEncoder#write -> LRUDictionary#findEntryFor this actually used write link, findEntry uses the previously existing node.
We did find NPE on the read link, but the root reason is not the implementation of this LRUCache (this patch), but that the LRUCache is polluted.
I just unified the logic of addEntry with the actual logic on the write link, which I think is more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call trace is WALEntryStream#tryAdvanceEntry->ProtobufLogReader#readNext->CompressedKVDecoder#readIntoArray->LRUDirectory#addEntry->then here the changed BidirectionalLRUMap#put.
Your change only makes the newly some node will not be added to the directory, but the old same node may have uncompleted data, e.g. tailing the WAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i got your point. As we discussed in email, to solve the problem you mentioned, we need to rebuild the LRUCache every time when we re-seek to somewhere(will done it in 26849), and in the future we could try to implement a 'versioned' cache for replication.
This patch is just for code optimization, not to solve the problem. So it's an "Improvement", not a "bug".
Could this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the new value maybe a completed one, this improvement can not prove using the old value is always better than the new value, except the performance improvement.
I think an umbrella should be created to track the problem mentioned in the email, and this issue can be a child of it. So before the umbrella issue is completed, all the child codes can be tested together.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a stability or performance standpoint, I don't think it's a good or bad/right or wrong question since it doesn't change the existing logic.
But from a code architecture point of view, I think this way is better. The original implementation is to put the logic of "find the existing node and return" into findEntry, and directly expose addEntry to the outside, which leads to the possibility of inconsistent behavior between the two. So I think we can completely encapsulate the same logic in addEntry (although this does not bring any stability improvement for now).
But if you would like to wait for 26849 to finish and watch it together, I think it's OK~
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow the discussion because in the proposed improvement the old node is not reused if the contents being stored are different.
containsKey will use hashcode of Node, which is Bytes.hashCode over the contents. A previous short read and a current full read will have different contents so different hashcode, right? If so, this just reuses an entry that has equivalent data, which I agree is an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely correct.
This patch only reuse SAME node.
Actually, In the previous implementation, if the nodes are same, the existing nodes will also be reused too, the only difference is this logic were in findIdx:
For the write link:
But for the read link:
We could see, on the read link, it just addEntry directly, without findIdx(reuse the existing same node).
So, I just thought it would be more beautiful to write this way.