HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233

thangTang · 2022-03-16T09:23:05Z

No description provided.

Apache-HBase · 2022-03-16T10:12:29Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 14s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
		_ branch-1 Compile Tests _
+1 💚	mvninstall	4m 17s	branch-1 passed
+1 💚	compile	0m 21s	branch-1 passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	compile	0m 22s	branch-1 passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	checkstyle	0m 27s	branch-1 passed
+1 💚	shadedjars	2m 52s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 23s	branch-1 passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javadoc	0m 22s	branch-1 passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+0 🆗	spotbugs	1m 7s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	1m 4s	branch-1 passed
		_ Patch Compile Tests _
+1 💚	mvninstall	2m 8s	the patch passed
+1 💚	compile	0m 21s	the patch passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javac	0m 21s	the patch passed
+1 💚	compile	0m 24s	the patch passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	javac	0m 24s	the patch passed
+1 💚	checkstyle	0m 27s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	shadedjars	2m 48s	patch has no errors when building our shaded downstream artifacts.
+1 💚	hadoopcheck	4m 53s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2.
+1 💚	javadoc	0m 21s	the patch passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javadoc	0m 22s	the patch passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	findbugs	1m 14s	the patch passed
		_ Other Tests _
+1 💚	unit	2m 30s	hbase-common in the patch passed.
+1 💚	asflicense	0m 14s	The patch does not generate ASF License warnings.
		31m 31s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/1/artifact/out/Dockerfile
GITHUB PR	#4233
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux f89a2a8fc107 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4233/out/precommit/personality/provided.sh
git revision	branch-1 / `70e695b`
Default Java	Azul Systems, Inc.-1.7.0_272-b10
Multi-JDK versions	/usr/lib/jvm/zulu-8-amd64:Azul Systems, Inc.-1.8.0_262-b19 /usr/lib/jvm/zulu-7-amd64:Azul Systems, Inc.-1.7.0_272-b10
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/1/testReport/
Max. process+thread count	143 (vs. ulimit of 10000)
modules	C: hbase-common U: hbase-common
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/1/console
versions	git=2.17.1 maven=3.6.0 findbugs=3.0.1
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

hbase-common/src/test/java/org/apache/hadoop/hbase/io/util/TestLRUDictionary.java

Apache-HBase · 2022-03-17T09:20:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Comment
+0 🆗	reexec	1m 4s	Docker mode activated.
		_ Prechecks _
+1 💚	dupname	0m 0s	No case conflicting files found.
+1 💚	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 💚	@author	0m 0s	The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s	The patch appears to include 1 new or modified test files.
		_ branch-1 Compile Tests _
+1 💚	mvninstall	2m 53s	branch-1 passed
+1 💚	compile	0m 13s	branch-1 passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	compile	0m 17s	branch-1 passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	checkstyle	0m 19s	branch-1 passed
+1 💚	shadedjars	1m 45s	branch has no errors when building our shaded downstream artifacts.
+1 💚	javadoc	0m 16s	branch-1 passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javadoc	0m 16s	branch-1 passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+0 🆗	spotbugs	0m 44s	Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚	findbugs	0m 42s	branch-1 passed
		_ Patch Compile Tests _
+1 💚	mvninstall	1m 14s	the patch passed
+1 💚	compile	0m 13s	the patch passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javac	0m 13s	the patch passed
+1 💚	compile	0m 16s	the patch passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	javac	0m 16s	the patch passed
+1 💚	checkstyle	0m 18s	the patch passed
+1 💚	whitespace	0m 0s	The patch has no whitespace issues.
+1 💚	shadedjars	1m 43s	patch has no errors when building our shaded downstream artifacts.
+1 💚	hadoopcheck	2m 55s	Patch does not cause any errors with Hadoop 2.8.5 2.9.2.
+1 💚	javadoc	0m 13s	the patch passed with JDK Azul Systems, Inc.-1.8.0_262-b19
+1 💚	javadoc	0m 16s	the patch passed with JDK Azul Systems, Inc.-1.7.0_272-b10
+1 💚	findbugs	0m 47s	the patch passed
		_ Other Tests _
+1 💚	unit	1m 53s	hbase-common in the patch passed.
+1 💚	asflicense	0m 12s	The patch does not generate ASF License warnings.
		20m 11s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/2/artifact/out/Dockerfile
GITHUB PR	#4233
Optional Tests	dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile
uname	Linux 7c6178f9c255 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4233/out/precommit/personality/provided.sh
git revision	branch-1 / `aa9cba3`
Default Java	Azul Systems, Inc.-1.7.0_272-b10
Multi-JDK versions	/usr/lib/jvm/zulu-8-amd64:Azul Systems, Inc.-1.8.0_262-b19 /usr/lib/jvm/zulu-7-amd64:Azul Systems, Inc.-1.7.0_272-b10
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/2/testReport/
Max. process+thread count	161 (vs. ulimit of 10000)
modules	C: hbase-common U: hbase-common
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4233/2/console
versions	git=2.17.1 maven=3.6.0 findbugs=3.0.1
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

sunhelly · 2022-03-18T05:22:40Z

hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/LRUDictionary.java

+      if (nodeToIndex.containsKey(node)) {
+        short index = nodeToIndex.get(node);
+        node = indexToNode[index];
+        moveToHead(node);


According to discuss in the email, if here always use the previous node, then how can it ensure that the previous node is a completed one?

Sorry, I didn't understand what "completed one" means.
But this patch does not actually change the logic of the previous use of this LRUCache:
RingBufferEventHandler#onEvent -> RingBufferEventHandler#append -> ProtobufLogWriter#append -> CompressedKvEncoder#write -> LRUDictionary#findEntry
For this actually used write link, findEntry uses the previously existing node.
We did find NPE on the read link, but the root reason is not the implementation of this LRUCache (this patch), but that the LRUCache is polluted.
I just unified the logic of addEntry with the actual logic on the write link, which I think is more elegant.

The call trace is WALEntryStream#tryAdvanceEntry->ProtobufLogReader#readNext->CompressedKVDecoder#readIntoArray->LRUDirectory#addEntry->then here the changed BidirectionalLRUMap#put.
Your change only makes the newly some node will not be added to the directory, but the old same node may have uncompleted data, e.g. tailing the WAL.

Oh i got your point. As we discussed in email, to solve the problem you mentioned, we need to rebuild the LRUCache every time when we re-seek to somewhere(will done it in 26849), and in the future we could try to implement a 'versioned' cache for replication.
This patch is just for code optimization, not to solve the problem. So it's an "Improvement", not a "bug".
Could this answer your question?

Since the new value maybe a completed one, this improvement can not prove using the old value is always better than the new value, except the performance improvement.
I think an umbrella should be created to track the problem mentioned in the email, and this issue can be a child of it. So before the umbrella issue is completed, all the child codes can be tested together.
Thanks.

From a stability or performance standpoint, I don't think it's a good or bad/right or wrong question since it doesn't change the existing logic.
But from a code architecture point of view, I think this way is better. The original implementation is to put the logic of "find the existing node and return" into findEntry, and directly expose addEntry to the outside, which leads to the possibility of inconsistent behavior between the two. So I think we can completely encapsulate the same logic in addEntry (although this does not bring any stability improvement for now).
But if you would like to wait for 26849 to finish and watch it together, I think it's OK~

I am not sure I follow the discussion because in the proposed improvement the old node is not reused if the contents being stored are different.

Node node = new Node(); node.setContents(stored, 0, stored.length); if (nodeToIndex.containsKey(node)) { // new logic reusing existing entry and index // ... } else { // original logic adding new entry // ... }

containsKey will use hashcode of Node, which is Bytes.hashCode over the contents. A previous short read and a current full read will have different contents so different hashcode, right? If so, this just reuses an entry that has equivalent data, which I agree is an improvement.

the old node is not reused if the contents being stored are different.

Completely correct.
This patch only reuse SAME node.
Actually, In the previous implementation, if the nodes are same, the existing nodes will also be reused too, the only difference is this logic were in findIdx:

private short findIdx(byte[] array, int offset, int length) { Short s; final Node comparisonNode = new Node(); comparisonNode.setContents(array, offset, length); if ((s = nodeToIndex.get(comparisonNode)) != null) { moveToHead(indexToNode[s]); return s; } else { return -1; } }

For the write link:

CompressedKvEncoder#write -> LRUDictionary#findEntry (LRUDictionary#findIdx) -> LRUDictionary#addEntry

But for the read link:

CompressedKVDecoder#readIntoArray -> LRUDirectory#addEntry

We could see, on the read link, it just addEntry directly, without findIdx(reuse the existing same node).
So, I just thought it would be more beautiful to write this way.

sunhelly reviewed Mar 16, 2022

View reviewed changes

hbase-common/src/test/java/org/apache/hadoop/hbase/io/util/TestLRUDictionary.java Outdated Show resolved Hide resolved

HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary

9e82780

thangTang force-pushed the HBASE-26850 branch from a078a2b to 9e82780 Compare March 17, 2022 08:47

sunhelly reviewed Mar 18, 2022

View reviewed changes

Apache9 closed this Aug 9, 2022

thangTang mentioned this pull request Feb 9, 2023

HBASE-27621 Also clear the Dictionary when resetting when reading compressed WAL file #5016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233

HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233

Uh oh!

thangTang commented Mar 16, 2022

Uh oh!

Apache-HBase commented Mar 16, 2022

Uh oh!

Uh oh!

Apache-HBase commented Mar 17, 2022

Uh oh!

sunhelly Mar 18, 2022

Uh oh!

thangTang Mar 18, 2022 •

edited

Loading

Uh oh!

sunhelly Mar 18, 2022

Uh oh!

thangTang Mar 18, 2022

Uh oh!

sunhelly Mar 18, 2022

Uh oh!

thangTang Mar 18, 2022

Uh oh!

apurtell Apr 17, 2022 •

edited

Loading

Uh oh!

thangTang Apr 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233

HBASE-26850 Optimize the implementation of LRUCache in LRUDictionary #4233

Uh oh!

Conversation

thangTang commented Mar 16, 2022

Uh oh!

Apache-HBase commented Mar 16, 2022

Uh oh!

Uh oh!

Apache-HBase commented Mar 17, 2022

Uh oh!

sunhelly Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

thangTang Mar 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunhelly Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

thangTang Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

sunhelly Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

thangTang Mar 18, 2022

Choose a reason for hiding this comment

Uh oh!

apurtell Apr 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thangTang Apr 18, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thangTang Mar 18, 2022 •

edited

Loading

apurtell Apr 17, 2022 •

edited

Loading