Skip to content

x64: Add most remaining AVX lowerings#5819

Merged
alexcrichton merged 5 commits intobytecodealliance:mainfrom
alexcrichton:more-avx
Feb 20, 2023
Merged

x64: Add most remaining AVX lowerings#5819
alexcrichton merged 5 commits intobytecodealliance:mainfrom
alexcrichton:more-avx

Conversation

@alexcrichton
Copy link
Member

This commit goes through inst.isle and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as roundps. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for movdqa between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into precise-output tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests!

This commit goes through `inst.isle` and adds a corresponding AVX
lowering for most SSE lowerings. I opted to skip instructions where the
SSE lowering didn't read/modify a register, such as `roundps`. I think
that AVX will benefit these instructions when there's load-merging since
AVX doesn't require alignment, but I've deferred that work to a future
PR.

Otherwise though in this PR I think all (or almost all) of the 3-operand
forms of AVX instructions are supported with their SSE counterparts.
This should ideally improve codegen slightly by removing register
pressure and the need for `movdqa` between registers. I've attempted to
ensure that there's at least one codegen test for all the new instructions.

As a side note, the recent capstone integration into `precise-output`
tests helped me catch a number of encoding bugs much earlier than
otherwise, so I've found that incredibly useful in tests!
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the really tedious and careful work here -- this will undoubtedly be a nice improvement for FP and SIMD code!

Two thoughts below: the first I think I do want before this PR goes in, the second is a note about a pattern needing a longer-term refactor that this PR extends (but it's pre-existing so I won't block on it here necessarily, unless you want to address).

Use true `XmmMem` and `GprMem` types in the instruction as well to get
more type-level safety for what goes where.
Instead of conditionally defining regalloc and various other operations
instead add dedicated `MInst` variants for operations which are intended
to produce a constant to have more clear interactions with regalloc and
printing and such.
@alexcrichton
Copy link
Member Author

Ok I think the two new commits should address the produces_const expansion in addition to the Gpr-vs-Xmm types. There's still more instructions that do conditional register allocation but I can try to tackle those separately.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Feb 17, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

Details This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks for tackling both cleanups; getting rid of produces_const is really nice. LGTM!

@cfallin cfallin added this pull request to the merge queue Feb 17, 2023
@alexcrichton
Copy link
Member Author

Local fuzzing has found an issue so taking this out of the merge queue.

@alexcrichton alexcrichton removed this pull request from the merge queue due to a manual request Feb 17, 2023
This adds a missing `add_trap` to encoding of VEX instructions with
memory operands to ensure that if they cause a segfault that there's
appropriate metadata for Wasmtime to understand that the instruction
could in fact trap. This fixes a fuzz test case found locally where v8
trapped and Wasmtime didn't catch the signal and crashed the fuzzer.
@alexcrichton
Copy link
Member Author

Turned out to be a benign mistake where I forgot to call add_trap for VEX-encoded instructions with memory operands. Before merging this though I'm going to let the fuzzer run overnight.

I've intentionally introduced an encoding bug where vsprad is encoded as vspraw and I'm hoping to see the differential fuzzer find this bug eventually with differential execution against v8.

@alexcrichton
Copy link
Member Author

Well score one for fuzzing. It took a few hours but the minimal test case was

(module
  (type (;0;) (func (param v128 i32) (result v128)))
  (func (;0;) (type 0) (param v128 i32) (result v128)
    local.get 0
    local.get 1
    i32x4.shr_s
  )
  (export "test" (func 0))
)

where the differential difference was between wasmtime with avx and wasmtime without avx. That's exactly the bug I wanted the fuzzer to find, so yay confidence that fuzzing can find real bugs! I'll still let it run overnight to make sure nothing else crops up.

@alexcrichton alexcrichton added this pull request to the merge queue Feb 20, 2023
Merged via the queue into bytecodealliance:main with commit c26a65a Feb 20, 2023
@alexcrichton alexcrichton deleted the more-avx branch February 20, 2023 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants