-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement avx512 masked load and store intrinsics #1254
Implement avx512 masked load and store intrinsics #1254
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon. Please see the contribution instructions for more information. |
LGTM! Just a small style nit: please indent the contents of the |
I believe in the past we avoided defining functions inside macros because it interacts poorly with our intrinsic checking tools. |
Was just about to update the description to mention this. I saw the I think the macro approach is worth it since it reduces code for the load intrinsics by about 30x and reduces chances of copy-paste mistakes. |
Specifically the I think it would be better to avoid using macros for now. The ARM code avoids this issue by using a code generator, but it is probably not worth the effort in this case since AVX512 is mostly complete already. |
Could you also mark the intrinsics as implemented in crates/core_arch/avx512f.md. We should be able to start stabilizing avx512 once it is complete. |
Should be ready for review now. The github diff view looks confusing, individual commits might be clearer. I ended up using a "poor man's" code generator by expanding the macros from the earlier commit and postprocessing the output with some small regular expressions. It's a bit manual and probably not worth checking in. More time was spent in writing all the tests. The avx512vl functions required adding |
LGTM! I'm just waiting on rust-lang/rust#91381 which is causing the Android CI to fail. |
Did this commit break rollup merge? :) |
let mut dst: __m512i = src; | ||
asm!( | ||
"vmovdqu32 {2}{{{1}}}, [{0}]", | ||
in(reg) mem_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling std v0.0.0 (/checkout/library/std)
error: formatting may not be suitable for sub-register argument
--> library/core/src/../../stdarch/crates/core_arch/src/x86/avx512f.rs:30336:34
|
30336 | "vmovdqu32 {2}{{{1}}}, [{0}]",
| ^^^
30337 | in(reg) mem_addr,
| -------- for this argument
|
= note: `-D asm-sub-register` implied by `-D warnings`
= help: use the `e` modifier to have the register formatted as `eax`
= help: or use the `r` modifier to keep the default formatting of `rax`
(from rollup CI Result)
Yes. The issue is that x32 (x86_64 with 32-bit pointers) the address operand is inserted into the asm as |
Oh, my bad. I'll keep this in mind when I start working on remaining intrinsics. |
@jhorstmann I submitted fix at: #1264 |
Implement avx512 masked load and store intrinsics using inline assembly.
The same approach also works for masked gather/scatter and compress/expand intrinsics. Probably makes sense to split these into their own PR.