Add support for AArch64 CRC32 instructions#6
Conversation
srijs
left a comment
There was a problem hiding this comment.
Hi, thanks for this! The benchmarks certainly look promising.
I've left comments in-line to be addressed.
|
Forgot to say this, but if you wanted to use llvm instrinsics instead of inline assembly, you may be able to use the extern {
#[link_name = "llvm.aarch64.crc32x"]
pub unsafe fn crc32x(a: i32, b: i64) -> i32;
} |
188eab0 to
0fa7925
Compare
|
Now using intrinsics and detection via stdsimd, which should be added by rust-lang/stdarch#612 :) So waiting on that. |
src/specialized/aarch64.rs
Outdated
| let mut ptr4; | ||
| let mut ptr8; | ||
|
|
||
| if len != 0 && ((ptr as usize) & 1) != 0 { |
There was a problem hiding this comment.
Could this perhaps use the recently stabilized align_to method on slices to do the workhorse of the logic around alignment here?
There was a problem hiding this comment.
ooh, this is a very nice method! (and the chunks_exact iterator too)
A quick attempt at using it here though made performance significantly worse. I'll investigate that later.
There was a problem hiding this comment.
oh, it just wasn't inlining the intrinsics' wrappers because I removed the target_feature attr. lol.
|
@alexcrichton What would the timeline look like to get the crc instrinsics change shipped to nightly? |
|
@myfreeweb It looks like the |
|
That's all for this project of course. |
|
Rebased, updated for new intrinsic names rust-lang/stdarch#626 let's wait for them to land in nightly |
|
It looks like the change to the instrinsic names has landed in nightly 🎉 Let me know if you want any help pushing this over the finish line! |
|
Cool. Removed the temporary |
|
Excellent, thanks for all your effort! |
This should eventually be done using intrinsics, butcore::arch::aarch64 doesn't have any crc32 intrinsics right nowIdeally, CPU capabilities should be checked too, butstdsimddoesn't do that on FreeBSD on non-x86 CPUs (elf_aux_info) yet. (And all my machines run FreeBSD :D) CRC is mandatory in ARMv8.1 anyway, and there are very few v8.0 chips without it.see comments
Some fun bench runs!
tfw a humble ARM Cortex-A72 @ 2.18GHz (Rockchip RK3399,
cpuset -l4-5):matches a Ryzen 7 1700 @ 3.85GHz (well, in one test)
while the Cortex-A53 (@ 1.6GHz, Rockchip RK3399,
cpuset -l0-3) is that much worse than the A72:and Cavium ThunderX (Scaleway's KVM VPS) has terrible CRC32 units in particular:
upd: my phone: Qualcomm Snapdragon 660 (Kryo V2, 2.2GHz, weird big.little management?):
upd: Amazon EC2 a1 instance (Graviton, also A72) — looks like more cache than RK3399
upd: Packet c2.large.arm (Ampere eMAG)
upd: Marvell MACCHIATObin (A72 @ 1.6GHz)
upd: Marvell MACCHIATObin (A72 @ 2.0GHz)
upd: Amazon EC2 m6g (Graviton2, Neoverse N1)
upd: Apple M1 Max (MacBook Pro) thanks weatherlight — impressive baseline, but unimpressive HW crc32 units
upd: Ryzen 9 5950X @ PBO for comparison
upd: Mediatek MT8167 (A35 @ 1.3GHz)
upd: Qualcomm Snapdragon X Elite (Oryon @ 3.4GHz, no boost)