-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Ensure that we lookup the correct instruction for embedded masking/broadcast scenarios #117828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR resolves a codegen bug (#117794) related to embedded masking/broadcast scenarios in hardware intrinsics. The issue occurred when the embedded masking/broadcast checks would change the base type to pick a different instruction, but sometimes the same instruction would be returned, leading to incorrect base type usage in code generation.
- Introduces a new
lookupInsmethod inCodeGenthat handles instruction selection with embedded features and peephole optimization - Adds assertions to validate that instruction changes occur when expected for embedded features
- Refactors instruction lookup calls throughout the codebase to use the new method
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| Runtime_117794.cs | Regression test case for the bug fix |
| Runtime_117794.csproj | Project file for the regression test |
| hwintrinsiccodegenxarch.cpp | Main implementation of new lookupIns method and refactored codegen |
| hwintrinsic.cpp | Updated instruction lookup to support optimistic EVEX instruction selection |
| hwintrinsic.h | Modified lookupIns signature to remove compiler parameter |
| instr.cpp | Removed embedded broadcast instruction mapping logic |
| gentree.cpp | Updated calls to use new instruction lookup method |
| codegenxarch.cpp | Updated instruction lookup calls |
| codegen.h | Updated method signatures |
Comments suppressed due to low confidence (2)
src/coreclr/jit/hwintrinsiccodegenxarch.cpp:433
- The instruction name INS_movdqa32 is inconsistent with the EVEX naming pattern. It should be INS_vmovdqa32 to match the pattern used for other EVEX instructions in this switch statement.
ins = INS_movdqa32;
src/coreclr/jit/hwintrinsiccodegenxarch.cpp:439
- The instruction name INS_movdqu32 is inconsistent with the EVEX naming pattern. It should be INS_vmovdqu32 to match the pattern used for other EVEX instructions in this switch statement.
ins = INS_movdqu32;
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
ad988b1 to
4c3aaa6
Compare
|
CC. @dotnet/jit-contrib, @jakobbotsch |
69215a9 to
9b64f49
Compare
1851a4f to
cd70825
Compare
cd70825 to
bbedc8b
Compare
…oadcast scenarios
|
(Trying to minimize the throughput impact, the overall code isn't really changing) |
bbedc8b to
0df238f
Compare
|
Happy with the changes now. TP impact is minimized (+0.01% in the worst case), no regressions for any disasm output (only improvements), and the tests are passing as expected. |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change! I'll take another look at optimizing for broadcast size after this lands.
|
/ba-g unrelated arm64 timeouts |
This resolves #117794
When doing the embedded masking/broadcast checks, we sometimes allow for changing the base type under the presumption we'd pick a different instruction. Even if we didn't get a new instruction, we'd still end up changing the base type which caused a codegen bug.
To resolve this, there is now an assert to validate we got a different instruction for the alternative base type. This is achieved by ensuring the existing lookup support returns the instruction optimistically.
We then peephole this back to the original instruction in codegen if we don't end up using any of the embedded features that would require it, since that allows us to have the smaller emitter output.