x64: Sink constant loads into xmm instructions by alexcrichton · Pull Request #5880 · bytecodealliance/wasmtime

alexcrichton · 2023-02-24T21:28:51Z

A number of places in the x64 backend make use of 128-bit constants for various wasm SIMD-related instructions although most of them currently use the x64_xmm_load_const helper to load the constant into a register. Almost all xmm instructions, however, enable using a memory operand which means that these loads can be folded into instructions to help reduce register pressure. Automatic conversions were added for a VCodeConstant into an XmmMem value and then explicit loads were all removed in favor of forwarding the XmmMem value directly to the underlying instruction. Note that some instances of x64_xmm_load_const remain since they're used in contexts where load sinking won't work (e.g. they're the first operand, not the second for non-commutative instructions).

A number of places in the x64 backend make use of 128-bit constants for various wasm SIMD-related instructions although most of them currently use the `x64_xmm_load_const` helper to load the constant into a register. Almost all xmm instructions, however, enable using a memory operand which means that these loads can be folded into instructions to help reduce register pressure. Automatic conversions were added for a `VCodeConstant` into an `XmmMem` value and then explicit loads were all removed in favor of forwarding the `XmmMem` value directly to the underlying instruction. Note that some instances of `x64_xmm_load_const` remain since they're used in contexts where load sinking won't work (e.g. they're the first operand, not the second for non-commutative instructions).

jameysharp

I've read through both the code and test changes here and it all looks correct to me. But I'd like @elliottt to take a look too because I'm not super confident about spotting issues with load-sinking.

It's worth noting that we don't yet have SIMD support in the Cranelift fuzzer, so we can't rely on that to catch issues with this PR.

alexcrichton · 2023-02-27T15:07:54Z

True! Wasm-based fuzzing, however, should extensively test the simd proposal with differential execution against v8.

cfallin · 2023-02-27T19:44:41Z

To the fuzzing point, one request (if you haven't already planned it this way): could we hold off merging this until Mar 5, when the next release branch is cut, so we maximize fuzz time at 6 weeks rather than 2? I think it's great we're getting this but given history with SIMD loads I too am a bit apprehensive :-)

alexcrichton · 2023-02-27T20:06:41Z

I don't mind, but I'd also push back a bit in that #5849, #5841, and #5819 were sort of all much larger/riskier. I had good success uncovering semantic issues through fuzzing in not much time as well.

Would you prefer, though, to back out those PRs and re-land after the cutoff?

cfallin · 2023-02-27T20:12:08Z

Ah, yeah, that's a fair point that my request isn't really consistent, and I hadn't connected the dots; given those are already in, I retract my request as I don't think this is any more risk in comparison.

elliottt

This seems good to me!

The problems we had last time with aligned loads weren't related to constants, but arbitrary loads that we weren't guaranteed would be aligned. Given that and the requirements we put on function alignment, I'm pretty confident that this won't have the same problems that my changes last summer did :)

This catches a case that wasn't handled previously by bytecodealliance#5880 to allow a constant load to be folded into an instruction rather than forcing it to be loaded into a temporary register.

This catches a case that wasn't handled previously by #5880 to allow a constant load to be folded into an instruction rather than forcing it to be loaded into a temporary register.

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Feb 24, 2023

jameysharp reviewed Feb 25, 2023

View reviewed changes

alexcrichton requested a review from elliottt February 27, 2023 18:44

elliottt approved these changes Feb 27, 2023

View reviewed changes

alexcrichton added this pull request to the merge queue Feb 27, 2023

Merged via the queue into bytecodealliance:main with commit f2dce81 Feb 27, 2023

alexcrichton deleted the sink-const-loads-in-sse-insts branch February 27, 2023 22:41

alexcrichton mentioned this pull request Mar 1, 2023

x64: Improve codegen for i8x16.shr_u #5906

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Sink constant loads into xmm instructions#5880

x64: Sink constant loads into xmm instructions#5880
alexcrichton merged 1 commit intobytecodealliance:mainfrom
alexcrichton:sink-const-loads-in-sse-insts

alexcrichton commented Feb 24, 2023

Uh oh!

jameysharp left a comment

Uh oh!

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 27, 2023

Uh oh!

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 27, 2023

Uh oh!

elliottt left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexcrichton commented Feb 24, 2023

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 27, 2023

Uh oh!

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 27, 2023

Uh oh!

elliottt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants