Skip to content

Conversation

@tannergooding
Copy link
Member

This resolves #117794

When doing the embedded masking/broadcast checks, we sometimes allow for changing the base type under the presumption we'd pick a different instruction. Even if we didn't get a new instruction, we'd still end up changing the base type which caused a codegen bug.

To resolve this, there is now an assert to validate we got a different instruction for the alternative base type. This is achieved by ensuring the existing lookup support returns the instruction optimistically.

We then peephole this back to the original instruction in codegen if we don't end up using any of the embedded features that would require it, since that allows us to have the smaller emitter output.

@Copilot Copilot AI review requested due to automatic review settings July 18, 2025 17:44
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR resolves a codegen bug (#117794) related to embedded masking/broadcast scenarios in hardware intrinsics. The issue occurred when the embedded masking/broadcast checks would change the base type to pick a different instruction, but sometimes the same instruction would be returned, leading to incorrect base type usage in code generation.

  • Introduces a new lookupIns method in CodeGen that handles instruction selection with embedded features and peephole optimization
  • Adds assertions to validate that instruction changes occur when expected for embedded features
  • Refactors instruction lookup calls throughout the codebase to use the new method

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Runtime_117794.cs Regression test case for the bug fix
Runtime_117794.csproj Project file for the regression test
hwintrinsiccodegenxarch.cpp Main implementation of new lookupIns method and refactored codegen
hwintrinsic.cpp Updated instruction lookup to support optimistic EVEX instruction selection
hwintrinsic.h Modified lookupIns signature to remove compiler parameter
instr.cpp Removed embedded broadcast instruction mapping logic
gentree.cpp Updated calls to use new instruction lookup method
codegenxarch.cpp Updated instruction lookup calls
codegen.h Updated method signatures
Comments suppressed due to low confidence (2)

src/coreclr/jit/hwintrinsiccodegenxarch.cpp:433

  • The instruction name INS_movdqa32 is inconsistent with the EVEX naming pattern. It should be INS_vmovdqa32 to match the pattern used for other EVEX instructions in this switch statement.
                ins = INS_movdqa32;

src/coreclr/jit/hwintrinsiccodegenxarch.cpp:439

  • The instruction name INS_movdqu32 is inconsistent with the EVEX naming pattern. It should be INS_vmovdqu32 to match the pattern used for other EVEX instructions in this switch statement.
                ins = INS_movdqu32;

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, @jakobbotsch

@tannergooding
Copy link
Member Author

tannergooding commented Jul 19, 2025

(Trying to minimize the throughput impact, the overall code isn't really changing)

@tannergooding
Copy link
Member Author

tannergooding commented Jul 19, 2025

Happy with the changes now. TP impact is minimized (+0.01% in the worst case), no regressions for any disasm output (only improvements), and the tests are passing as expected.

@tannergooding
Copy link
Member Author

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@saucecontrol saucecontrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! I'll take another look at optimizing for broadcast size after this lands.

@tannergooding
Copy link
Member Author

/ba-g unrelated arm64 timeouts

@tannergooding tannergooding merged commit 9e606b5 into dotnet:main Jul 22, 2025
107 of 116 checks passed
@tannergooding tannergooding deleted the fix-117794 branch July 22, 2025 03:54
@github-actions github-actions bot locked and limited conversation to collaborators Aug 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bad codegen when EVEX embedded broadcast is combined with embedded masking on some bitwise instructions

3 participants