-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
hbase-common/src/test/java/org/apache/hadoop/hbase/io/util/TestLRUDictionary.java
Outdated
Show resolved
Hide resolved
|
🎊 +1 overall
This message was automatically generated. |
| if (nodeToIndex.containsKey(node)) { | ||
| short index = nodeToIndex.get(node); | ||
| node = indexToNode[index]; | ||
| moveToHead(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to discuss in the email, if here always use the previous node, then how can it ensure that the previous node is a completed one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand what "completed one" means.
But this patch does not actually change the logic of the previous use of this LRUCache:
RingBufferEventHandler#onEvent -> RingBufferEventHandler#append -> ProtobufLogWriter#append -> CompressedKvEncoder#write -> LRUDictionary#findEntry
For this actually used write link, findEntry uses the previously existing node.
We did find NPE on the read link, but the root reason is not the implementation of this LRUCache (this patch), but that the LRUCache is polluted.
I just unified the logic of addEntry with the actual logic on the write link, which I think is more elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call trace is WALEntryStream#tryAdvanceEntry->ProtobufLogReader#readNext->CompressedKVDecoder#readIntoArray->LRUDirectory#addEntry->then here the changed BidirectionalLRUMap#put.
Your change only makes the newly some node will not be added to the directory, but the old same node may have uncompleted data, e.g. tailing the WAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i got your point. As we discussed in email, to solve the problem you mentioned, we need to rebuild the LRUCache every time when we re-seek to somewhere(will done it in 26849), and in the future we could try to implement a 'versioned' cache for replication.
This patch is just for code optimization, not to solve the problem. So it's an "Improvement", not a "bug".
Could this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the new value maybe a completed one, this improvement can not prove using the old value is always better than the new value, except the performance improvement.
I think an umbrella should be created to track the problem mentioned in the email, and this issue can be a child of it. So before the umbrella issue is completed, all the child codes can be tested together.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a stability or performance standpoint, I don't think it's a good or bad/right or wrong question since it doesn't change the existing logic.
But from a code architecture point of view, I think this way is better. The original implementation is to put the logic of "find the existing node and return" into findEntry, and directly expose addEntry to the outside, which leads to the possibility of inconsistent behavior between the two. So I think we can completely encapsulate the same logic in addEntry (although this does not bring any stability improvement for now).
But if you would like to wait for 26849 to finish and watch it together, I think it's OK~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow the discussion because in the proposed improvement the old node is not reused if the contents being stored are different.
Node node = new Node();
node.setContents(stored, 0, stored.length);
if (nodeToIndex.containsKey(node)) {
// new logic reusing existing entry and index
// ...
} else {
// original logic adding new entry
// ...
}
containsKey will use hashcode of Node, which is Bytes.hashCode over the contents. A previous short read and a current full read will have different contents so different hashcode, right? If so, this just reuses an entry that has equivalent data, which I agree is an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old node is not reused if the contents being stored are different.
Completely correct.
This patch only reuse SAME node.
Actually, In the previous implementation, if the nodes are same, the existing nodes will also be reused too, the only difference is this logic were in findIdx:
private short findIdx(byte[] array, int offset, int length) {
Short s;
final Node comparisonNode = new Node();
comparisonNode.setContents(array, offset, length);
if ((s = nodeToIndex.get(comparisonNode)) != null) {
moveToHead(indexToNode[s]);
return s;
} else {
return -1;
}
}
For the write link:
CompressedKvEncoder#write
->
LRUDictionary#findEntry (LRUDictionary#findIdx)
->
LRUDictionary#addEntry
But for the read link:
CompressedKVDecoder#readIntoArray
->
LRUDirectory#addEntry
We could see, on the read link, it just addEntry directly, without findIdx(reuse the existing same node).
So, I just thought it would be more beautiful to write this way.
No description provided.