x64: Enable load-coalescing for SSE/AVX instructions by alexcrichton · Pull Request #5841 · bytecodealliance/wasmtime

alexcrichton · 2023-02-20T16:20:47Z

This commit unlocks the ability to fold loads into operands of SSE and AVX instructions. This is beneficial for both function size when it happens in addition to being able to reduce register pressure. Previously this was not done because most SSE instructions require memory to be aligned. AVX instructions, however, do not have alignment requirements.

The solution implemented here is one recommended by Chris which is to add a new XmmMemAligned newtype wrapper around XmmMem. All SSE instructions are now annotated as requiring an XmmMemAligned operand except for a new new instruction styles used specifically for instructions that don't require alignment (e.g. movdqu, *sd, and *ss instructions). All existing instruction helpers continue to take XmmMem, however. This way if an AVX lowering is chosen it can be used as-is. If an SSE lowering is chosen, however, then an automatic conversion from XmmMem to XmmMemAligned kicks in. This automatic conversion only fails for unaligned addresses in which case a load instruction is emitted and the operand becomes a temporary register instead. A number of prior Xmm arguments have now been converted to XmmMem as well.

One change from this commit is that loading an unaligned operand for an SSE instruction previously would use the "correct type" of load, e.g. movups for f32x4 or movup for f64x2, but now the loading happens in a context without type information so the movdqu instruction is generated. According to this stack overflow question it looks like modern processors won't penalize this "wrong" choice of type when the operand is then used for f32 or f64 oriented instructions.

Finally this commit improves some reuse of logic in the put_in_*_mem* helper to share code with sinkable_load and avoid duplication. With this in place some various ISLE rules have been updated as well.

In the tests it can be seen that AVX-instructions are now automatically load-coalesced and use memory operands in a few cases.

This commit unlocks the ability to fold loads into operands of SSE and AVX instructions. This is beneficial for both function size when it happens in addition to being able to reduce register pressure. Previously this was not done because most SSE instructions require memory to be aligned. AVX instructions, however, do not have alignment requirements. The solution implemented here is one recommended by Chris which is to add a new `XmmMemAligned` newtype wrapper around `XmmMem`. All SSE instructions are now annotated as requiring an `XmmMemAligned` operand except for a new new instruction styles used specifically for instructions that don't require alignment (e.g. `movdqu`, `*sd`, and `*ss` instructions). All existing instruction helpers continue to take `XmmMem`, however. This way if an AVX lowering is chosen it can be used as-is. If an SSE lowering is chosen, however, then an automatic conversion from `XmmMem` to `XmmMemAligned` kicks in. This automatic conversion only fails for unaligned addresses in which case a load instruction is emitted and the operand becomes a temporary register instead. A number of prior `Xmm` arguments have now been converted to `XmmMem` as well. One change from this commit is that loading an unaligned operand for an SSE instruction previously would use the "correct type" of load, e.g. `movups` for f32x4 or `movup` for f64x2, but now the loading happens in a context without type information so the `movdqu` instruction is generated. According to [this stack overflow question][question] it looks like modern processors won't penalize this "wrong" choice of type when the operand is then used for f32 or f64 oriented instructions. Finally this commit improves some reuse of logic in the `put_in_*_mem*` helper to share code with `sinkable_load` and avoid duplication. With this in place some various ISLE rules have been updated as well. In the tests it can be seen that AVX-instructions are now automatically load-coalesced and use memory operands in a few cases. [question]: https://stackoverflow.com/questions/40854819/is-there-any-situation-where-using-movdqu-and-movupd-is-better-than-movups

alexcrichton · 2023-02-20T16:22:19Z

cc #4767 and the original suggestion

github-actions · 2023-02-20T16:49:15Z

Subscribe to Label Action

cc @cfallin, @fitzgen

Details

This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle
fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

These don't have alignment requirements like other xmm instructions as well. Additionally add some ISA tests to ensure that their output is tested.

cfallin

Looks generally good to me, with a few comment requests below and one question. Overall I think this approach is quite nice (... I might be biased though!) and will hopefully get us decent speedups on some SIMD workloads.

cfallin · 2023-02-21T18:19:31Z

cranelift/codegen/src/isa/x64/inst.isle


+;; Convert an `XmmMem` into an `XmmMemAligned`.
+(decl xmm_mem_to_xmm_mem_aligned (XmmMem) XmmMemAligned)
+(extern constructor xmm_mem_to_xmm_mem_aligned xmm_mem_to_xmm_mem_aligned)


Noting here also a comment I left below where this is designated an implicit conversion: it definitely looks like something else at first sight (my comment on my first pass through this PR here was "can we call this assert_xmm_mem_aligned to make it apparent what this conversion is doing?"). It implicitly emits a load if the arg is actually in memory, which is why it can trivially guarantee alignment of any returned memory op (because it never returns a memory op). We should leave a comment about this I think.

Good point! I've updated the comment here and on the (convert ...) items added too.

cfallin · 2023-02-21T18:21:32Z

cranelift/codegen/src/isa/x64/inst.isle

+(convert Xmm XmmMemAligned xmm_to_xmm_mem_aligned)
 (convert XmmMem XmmMemImm xmm_mem_to_xmm_mem_imm)
 (convert XmmMem RegMem xmm_mem_to_reg_mem)
+(convert XmmMem XmmMemAligned xmm_mem_to_xmm_mem_aligned)


Here we should have a comment I think -- it looks wrong at first sight, as in general we shouldn't be able to take an unaligned thing and coerce it to aligned. The key is that the constructor actually emits a load and returns a register-kind value instead, so we (trivially) avoid unalignment by just never carrying through a memory operand.

cranelift/codegen/src/isa/x64/inst/args.rs

cfallin

LGTM, thanks!

This commit fixes an issue where a `Value` was both load-sunk and used as-is, meaning it was both sunk and not. That triggered a panic in the backend since this isn't valid. The reason for this is due to how some ISLE rules were written where a `Value` was both implicitly coerced into an `XmmMem` and an `Xmm`. This issue is similar to bytecodealliance#4815 for example. The fix in this commit is to force the operands into registers which prevents load sinking which wouldn't work here anyway. This panic was introduced in bytecodealliance#5841 which is quite old at this point. This bug does not affect WebAssembly translation due to how the `v128` type maps to `i8x16` in Cranelift by default. Closes bytecodealliance#9143

This commit fixes an issue where a `Value` was both load-sunk and used as-is, meaning it was both sunk and not. That triggered a panic in the backend since this isn't valid. The reason for this is due to how some ISLE rules were written where a `Value` was both implicitly coerced into an `XmmMem` and an `Xmm`. This issue is similar to #4815 for example. The fix in this commit is to force the operands into registers which prevents load sinking which wouldn't work here anyway. This panic was introduced in #5841 which is quite old at this point. This bug does not affect WebAssembly translation due to how the `v128` type maps to `i8x16` in Cranelift by default. Closes #9143

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen isle Related to the ISLE domain-specific language labels Feb 20, 2023

alexcrichton added 2 commits February 20, 2023 08:50

Fix tests

6027c04

Fix move-and-extend to be unaligned

1fd55b6

These don't have alignment requirements like other xmm instructions as well. Additionally add some ISA tests to ensure that their output is tested.

cfallin reviewed Feb 21, 2023

View reviewed changes

Review comments

37d388b

cfallin approved these changes Feb 21, 2023

View reviewed changes

cfallin enabled auto-merge February 21, 2023 18:57

cfallin added this pull request to the merge queue Feb 21, 2023

Merged via the queue into bytecodealliance:main with commit d82ebcc Feb 21, 2023

alexcrichton deleted the avx-unalign branch February 21, 2023 20:30

alexcrichton mentioned this pull request Feb 27, 2023

x64: Sink constant loads into xmm instructions #5880

Merged

alexcrichton mentioned this pull request Aug 19, 2024

Cranelift: Lowering fails for fmax and fmin of f32x4 on x64 backend #9143

Closed

alexcrichton mentioned this pull request Aug 19, 2024

x64: Prevent load sinking in simd f{min,max} #9144

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Enable load-coalescing for SSE/AVX instructions#5841

x64: Enable load-coalescing for SSE/AVX instructions#5841
cfallin merged 4 commits intobytecodealliance:mainfrom
alexcrichton:avx-unalign

alexcrichton commented Feb 20, 2023

Uh oh!

alexcrichton commented Feb 20, 2023

Uh oh!

github-actions bot commented Feb 20, 2023

Uh oh!

cfallin left a comment

Uh oh!

cfallin Feb 21, 2023

Uh oh!

alexcrichton Feb 21, 2023

Uh oh!

cfallin Feb 21, 2023

Uh oh!

Uh oh!

cfallin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexcrichton commented Feb 20, 2023

Uh oh!

alexcrichton commented Feb 20, 2023

Uh oh!

github-actions bot commented Feb 20, 2023

Subscribe to Label Action

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

cfallin Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

cfallin Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants