Skip to content

Conversation

@zorrorffm
Copy link

ARM64 provides crc32 instructions for accelerating crc32 calculation.
This patch is optimization for linux under aarch64

The comparision of performance is as below

old
crc32c : 5.670 micros/op; 688.9 MB/s (4K per op)

new
crc32c : 0.451 micros/op; 8663.4.7 MB/s (4K per op)

Change-Id: I51d25ca19688fc95a57b84ee0c1493c18a288087

The name I signed CLA with is [email protected]
The issue number is #478

@pwnall pwnall self-assigned this Aug 1, 2017
@pwnall
Copy link
Member

pwnall commented Aug 1, 2017

@zorrorffm Thank you very much for the contribution!

Due to some infrastructure issues, I won't be able to act on this very quickly. That being said, I am very grateful for your PR, and I look forward to improving the ARM performance.

@zorrorffm
Copy link
Author

@pwnall Thanks you.

If you need ARM machine to run some tests, I would be pleased to do that.

@pwnall
Copy link
Member

pwnall commented Aug 15, 2017

@zorrorffm I ran tests on a Google Pixel C tablet, and the optimized version seems to be 4-10x faster. If this is reasonably representative hardware, I don't think I'll need another machine. I'll take you up on your offer if it turns out I really need a machine that can easily run a GNU/Linux system.

@pwnall
Copy link
Member

pwnall commented Aug 30, 2017

@zorrorffm The LevelDB work hasn't happened yet, but I used your code in https://github.com/google/crc32c/blob/master/src/crc32c_arm64.cc -- thank you very much!

@pwnall
Copy link
Member

pwnall commented Sep 11, 2017

@zorrorffm We deployed the version of this patch at https://github.com/google/crc32c/blob/master/src/crc32c_arm64.cc in Chrome, and we got crashes when trying to execute the vmull_p64 instruction on MSM8916 boards.

I have a hunch that we should also be checking that the flags returned by hwcap include HWCAP_PMULL, which is (1 << 4), before enabling the AArch64 accelerated code. Is this correct?

@pwnall
Copy link
Member

pwnall commented Sep 11, 2017

@zorrorffm On the issue above, can you please take a look and comment on google/crc32c#6?

@zorrorffm
Copy link
Author

@pwnall Yes, your fix is correct. Thank you for your findings.

ARM64 provides crc32 instructions for accelerating crc32 calculation.
This patch is optimization for linux under aarch64

The comparision of performance is as below

old
crc32c       :       5.670 micros/op;  688.9 MB/s (4K per op)

new
crc32c       :       0.451 micros/op; 8663.4.7 MB/s (4K per op)

Change-Id: I51d25ca19688fc95a57b84ee0c1493c18a288087
@zorrorffm
Copy link
Author

Update the patch with detection of CPU capabilities. Acceleration is available only if pmull and crc32c instructions are supported

@cmumford
Copy link
Contributor

@googlebot rescan

@cmumford
Copy link
Contributor

Thx for the PR @zorrorffm. The optimized crc32c code was moved out into https://github.com/google/crc32c (which supports armv8-a+crc+crypto) in 5c39524.

@cmumford cmumford closed this Apr 12, 2019
@zorrorffm zorrorffm deleted the crc32_arm64 branch April 15, 2019 02:20
maochongxin pushed a commit to maochongxin/leveldb that referenced this pull request Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants