x64: Add most remaining AVX lowerings by alexcrichton · Pull Request #5819 · bytecodealliance/wasmtime

alexcrichton · 2023-02-17T17:10:59Z

This commit goes through inst.isle and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as roundps. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for movdqa between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into precise-output tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests!

This commit goes through `inst.isle` and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as `roundps`. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR. Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for `movdqa` between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions. As a side note, the recent capstone integration into `precise-output` tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests!

cfallin

Thank you for the really tedious and careful work here -- this will undoubtedly be a nice improvement for FP and SIMD code!

Two thoughts below: the first I think I do want before this PR goes in, the second is a note about a pattern needing a longer-term refactor that this PR extends (but it's pre-existing so I won't block on it here necessarily, unless you want to address).

cranelift/codegen/src/isa/x64/inst.isle

cranelift/codegen/src/isa/x64/inst/emit.rs

Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where.

Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such.

alexcrichton · 2023-02-17T21:51:32Z

Ok I think the two new commits should address the produces_const expansion in addition to the Gpr-vs-Xmm types. There's still more instructions that do conditional register allocation but I can try to tackle those separately.

github-actions · 2023-02-17T22:51:06Z

Subscribe to Label Action

cc @cfallin, @fitzgen

Details

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

cfallin

Fantastic, thanks for tackling both cleanups; getting rid of produces_const is really nice. LGTM!

alexcrichton · 2023-02-17T23:18:09Z

Local fuzzing has found an issue so taking this out of the merge queue.

This adds a missing `add_trap` to encoding of VEX instructions with memory operands to ensure that if they cause a segfault that there's appropriate metadata for Wasmtime to understand that the instruction could in fact trap. This fixes a fuzz test case found locally where v8 trapped and Wasmtime didn't catch the signal and crashed the fuzzer.

alexcrichton · 2023-02-17T23:30:11Z

Turned out to be a benign mistake where I forgot to call add_trap for VEX-encoded instructions with memory operands. Before merging this though I'm going to let the fuzzer run overnight.

I've intentionally introduced an encoding bug where vsprad is encoded as vspraw and I'm hoping to see the differential fuzzer find this bug eventually with differential execution against v8.

alexcrichton · 2023-02-18T05:16:49Z

Well score one for fuzzing. It took a few hours but the minimal test case was

(module
  (type (;0;) (func (param v128 i32) (result v128)))
  (func (;0;) (type 0) (param v128 i32) (result v128)
    local.get 0
    local.get 1
    i32x4.shr_s
  )
  (export "test" (func 0))
)

where the differential difference was between wasmtime with avx and wasmtime without avx. That's exactly the bug I wanted the fuzzer to find, so yay confidence that fuzzing can find real bugs! I'll still let it run overnight to make sure nothing else crops up.

cfallin reviewed Feb 17, 2023

View reviewed changes

cranelift/codegen/src/isa/x64/inst.isle Outdated Show resolved Hide resolved

cranelift/codegen/src/isa/x64/inst/emit.rs Outdated Show resolved Hide resolved

alexcrichton added 2 commits February 17, 2023 12:41

Move vpinsr* instructions to their own variant

5ec26f5

Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where.

Remove Inst::produces_const accessor

4ce4158

Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such.

Fix tests

3482b61

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Feb 17, 2023

cfallin approved these changes Feb 17, 2023

View reviewed changes

cfallin added this pull request to the merge queue Feb 17, 2023

alexcrichton removed this pull request from the merge queue due to a manual request Feb 17, 2023

alexcrichton added this pull request to the merge queue Feb 20, 2023

Merged via the queue into bytecodealliance:main with commit c26a65a Feb 20, 2023

alexcrichton deleted the more-avx branch February 20, 2023 16:19

alexcrichton mentioned this pull request Feb 27, 2023

x64: Sink constant loads into xmm instructions #5880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Add most remaining AVX lowerings#5819

x64: Add most remaining AVX lowerings#5819
alexcrichton merged 5 commits intobytecodealliance:mainfrom
alexcrichton:more-avx

alexcrichton commented Feb 17, 2023

Uh oh!

cfallin left a comment

Uh oh!

Uh oh!

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

github-actions bot commented Feb 17, 2023

Uh oh!

cfallin left a comment

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

alexcrichton commented Feb 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexcrichton commented Feb 17, 2023

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

github-actions bot commented Feb 17, 2023

Subscribe to Label Action

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

Uh oh!

alexcrichton commented Feb 17, 2023

Uh oh!

alexcrichton commented Feb 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants