x64: Add most remaining AVX lowerings#5819
x64: Add most remaining AVX lowerings#5819alexcrichton merged 5 commits intobytecodealliance:mainfrom
Conversation
This commit goes through `inst.isle` and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as `roundps`. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR. Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for `movdqa` between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions. As a side note, the recent capstone integration into `precise-output` tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests!
cfallin
left a comment
There was a problem hiding this comment.
Thank you for the really tedious and careful work here -- this will undoubtedly be a nice improvement for FP and SIMD code!
Two thoughts below: the first I think I do want before this PR goes in, the second is a note about a pattern needing a longer-term refactor that this PR extends (but it's pre-existing so I won't block on it here necessarily, unless you want to address).
Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where.
Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such.
|
Ok I think the two new commits should address the |
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
cfallin
left a comment
There was a problem hiding this comment.
Fantastic, thanks for tackling both cleanups; getting rid of produces_const is really nice. LGTM!
|
Local fuzzing has found an issue so taking this out of the merge queue. |
This adds a missing `add_trap` to encoding of VEX instructions with memory operands to ensure that if they cause a segfault that there's appropriate metadata for Wasmtime to understand that the instruction could in fact trap. This fixes a fuzz test case found locally where v8 trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
|
Turned out to be a benign mistake where I forgot to call I've intentionally introduced an encoding bug where |
|
Well score one for fuzzing. It took a few hours but the minimal test case was (module
(type (;0;) (func (param v128 i32) (result v128)))
(func (;0;) (type 0) (param v128 i32) (result v128)
local.get 0
local.get 1
i32x4.shr_s
)
(export "test" (func 0))
)where the differential difference was between wasmtime with avx and wasmtime without avx. That's exactly the bug I wanted the fuzzer to find, so yay confidence that fuzzing can find real bugs! I'll still let it run overnight to make sure nothing else crops up. |
This commit goes through
inst.isleand adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such asroundps. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR.Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for
movdqabetween registers. I've attempted to ensure that there's at least one codegen test for all the new instructions.As a side note, the recent capstone integration into
precise-outputtests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests!