-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10288. Checksum to support direct buffers #6162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| private static final Logger LOG = | ||
| LoggerFactory.getLogger(ChecksumByteBufferImpl.class); | ||
|
|
||
| private final Checksum checksum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duongkame , it seems that we could change this field to ChecksumByteBuffer and update the callers of the ChecksumByteBufferImpl constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. ChecksumByteBufferImpl is a decoration of an actual Checksum to simulate (and now bridge) the update(ByteBuffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my previous comment does not make sense. My suggestion is not to use CRC32 and PureJavaCrc32C.
ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferFactory.java
Lines 74 to 76 in 5dd14ac
| public static ChecksumByteBuffer crc32Impl() { | |
| return new ChecksumByteBufferImpl(new CRC32()); | |
| } |
We may
return new PureJavaCrc32ByteBuffer();Similarly,
ozone/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferFactory.java
Line 88 in 5dd14ac
| return new ChecksumByteBufferImpl(new PureJavaCrc32C()); |
We may
return new PureJavaCrc32CByteBuffer();There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PureJavaCrc32ByteBuffer impl involves reading buffers, while the CRC32 implementation in JDK9+ (e.g. in OpenJdk11) calculates checksum using native code on the native buffer directly and that probably indicates performance advantages.
Probably we should only replace the return new ChecksumByteBufferImpl(new PureJavaCrc32C()); by return new PureJavaCrc32CByteBuffer();, but I'm not sure what's the advatages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... implementation in JDK9+ (e.g. in OpenJdk11) calculates using native code on the native buffer directly and that probably indicates performance advantages.
It is likely. We should benchmark it.
... but I'm not sure what's the advatages.
The advantage is to avoid creating an array and copying for (slow path) direct buffers in Java 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should keep this PR as an optimization for DirectBuffer by calling a buffer-friendly API.
Actual CrC implementation/algo change can be done by another JIRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I'll merge it.
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java
Outdated
Show resolved
Hide resolved
| MethodHandle byteBufferUpdate = null; | ||
| if (JavaUtils.isJavaVersionAtLeast(9)) { | ||
| try { | ||
| byteBufferUpdate = MethodHandles.publicLookup().findVirtual(Checksum.class, "update", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not NativeCheckSumCRC32? This reflection is to lock the implementation to be used if the Java version is recent?
Also, not sure if the unit test will fail if I skip using the locked implementation.
─⠠⠵ java -version
openjdk version "1.8.0_392"
OpenJDK Runtime Environment (Zulu 8.74.0.17-CA-macos-aarch64) (build 1.8.0_392-b08)
OpenJDK 64-Bit Server VM (Zulu 8.74.0.17-CA-macos-aarch64) (build 25.392-b08, mixed mode)
|
Previous benchmarks ran against various checksum implementations in #1910 These are JMH throughput numbers - higher is better. The Java zipCRC* implementations are 10x faster than the pure CRC32. So please take care if we switch any implementations to benchmark again and ensure the new one is actually faster! |
|
I merged the PR. We can change the checksum later if we find the alternatives prove to be more efficient. |
This reverts commit 2348784.
(cherry picked from commit b019bfdf3abf01b8192a44c0a9f3af28dcce2de8)
What changes were proposed in this pull request?
While we accept Java 8 as the minimum supported JVM, most Ozone production clusters are running on a newer platform like JDK11.
We can leverage Java 9+ interface using reflection to checksum direct buffers directly in native memory.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10288
How was this patch tested?
Added unit test. Ran it in JDK 11 to ensure same checksum output between on-heap and off-heap.
Perf test is done with HDDS-9843.