Skip to content

Conversation

@duongkame
Copy link
Contributor

@duongkame duongkame commented Feb 4, 2024

What changes were proposed in this pull request?

While we accept Java 8 as the minimum supported JVM, most Ozone production clusters are running on a newer platform like JDK11.
We can leverage Java 9+ interface using reflection to checksum direct buffers directly in native memory.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10288

How was this patch tested?

Added unit test. Ran it in JDK 11 to ensure same checksum output between on-heap and off-heap.
Perf test is done with HDDS-9843.

private static final Logger LOG =
LoggerFactory.getLogger(ChecksumByteBufferImpl.class);

private final Checksum checksum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duongkame , it seems that we could change this field to ChecksumByteBuffer and update the callers of the ChecksumByteBufferImpl constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. ChecksumByteBufferImpl is a decoration of an actual Checksum to simulate (and now bridge) the update(ByteBuffer).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my previous comment does not make sense. My suggestion is not to use CRC32 and PureJavaCrc32C.

public static ChecksumByteBuffer crc32Impl() {
return new ChecksumByteBufferImpl(new CRC32());
}

We may

return new PureJavaCrc32ByteBuffer();

Similarly,


We may

return new PureJavaCrc32CByteBuffer();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PureJavaCrc32ByteBuffer impl involves reading buffers, while the CRC32 implementation in JDK9+ (e.g. in OpenJdk11) calculates checksum using native code on the native buffer directly and that probably indicates performance advantages.

Probably we should only replace the return new ChecksumByteBufferImpl(new PureJavaCrc32C()); by return new PureJavaCrc32CByteBuffer();, but I'm not sure what's the advatages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... implementation in JDK9+ (e.g. in OpenJdk11) calculates using native code on the native buffer directly and that probably indicates performance advantages.

It is likely. We should benchmark it.

... but I'm not sure what's the advatages.

The advantage is to avoid creating an array and copying for (slow path) direct buffers in Java 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we should keep this PR as an optimization for DirectBuffer by calling a buffer-friendly API.
Actual CrC implementation/algo change can be done by another JIRA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I'll merge it.

@duongkame duongkame marked this pull request as ready for review February 8, 2024 01:47
MethodHandle byteBufferUpdate = null;
if (JavaUtils.isJavaVersionAtLeast(9)) {
try {
byteBufferUpdate = MethodHandles.publicLookup().findVirtual(Checksum.class, "update",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not NativeCheckSumCRC32? This reflection is to lock the implementation to be used if the Java version is recent?
Also, not sure if the unit test will fail if I skip using the locked implementation.

─⠠⠵ java -version
openjdk version "1.8.0_392"
OpenJDK Runtime Environment (Zulu 8.74.0.17-CA-macos-aarch64) (build 1.8.0_392-b08)
OpenJDK 64-Bit Server VM (Zulu 8.74.0.17-CA-macos-aarch64) (build 25.392-b08, mixed mode)

@sodonnel
Copy link
Contributor

Previous benchmarks ran against various checksum implementations in #1910

BPC     Impl            J11-1   J11-2   J8-1    J8-2
------------------------------------------------------
1048576	pureCRC32	11.469	11.03	11.673	11.27
1048576	pureCRC32C	11.111	10.98	11.955	11.395
1048576	hadoopCRC32C	15.926	16.686	17.126	18.884
1048576	hadoopCRC32	21.064	20.656	19.65	19.343
1048576	zipCRC32	118.338	116.067	113.645	111.888
1048576	zipCRC32C	117.705	131.284	0	0

These are JMH throughput numbers - higher is better. The Java zipCRC* implementations are 10x faster than the pure CRC32. So please take care if we switch any implementations to benchmark again and ensure the new one is actually faster!

@jojochuang jojochuang merged commit 2348784 into apache:master Feb 15, 2024
@jojochuang
Copy link
Contributor

I merged the PR. We can change the checksum later if we find the alternatives prove to be more efficient.

smengcl added a commit to smengcl/hadoop-ozone that referenced this pull request Mar 4, 2024
@duongkame duongkame deleted the HDDS-10288 branch April 12, 2025 00:10
Cyrill pushed a commit to Cyrill/ozone that referenced this pull request Sep 22, 2025
(cherry picked from commit b019bfdf3abf01b8192a44c0a9f3af28dcce2de8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants