-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Refine assert for RMW check in crc32 and hw intrinsic codegen #108750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The crc32 HW intrinsic is a 2 operand node that produces a register; however, the underlying instruction is an RMW instruction that destructively modifies `op1`. For that reason the codegen needs to issue a move of op1 into the target register before `crc32` is issued. There is an assert that this `mov` does not overwrite op2 in the process. When LSRA builds uses for the operands it needs to mark op2 as delay freed to ensure op2 and the target do not end up in the same register. However, due to optimizations LSRA may skip this marking when op1 is a last use. Thus we can end up hitting the assert. The most logical solution seems to be to actually ensure that op2 is always delay freed, such that it never shares the register with the target. However, the handling for other RMW nodes (like `GT_SUB`) does not have similar logic to this; instead it relies on the situation to not happen except for one particular case where it is ok: when `op1` and `op2` are actually the same local. This PR enhances the assert to match the checks that happen for other RMW nodes.
This matches the logic in runtime/src/coreclr/jit/codegenxarch.cpp Lines 1150 to 1167 in e32148a
To be honest I do not see the constraint properly encoded in LSRA for the common RMW nodes or for crc32, which is a bit scary. In other words, if we have IR like /- op1 LCL_VAR V01 last use
/- op2 LCL_VAR V02
GT_SUB then I think /- op1 LCL_VAR V01 last use rdx
/- op2 LCL_VAR V02 rcx
GT_SUB rcx however, this assignment would hit the assert in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refines an assertion in the CRC32 hardware intrinsic code generation to handle edge cases where LSRA optimizations may cause operand registers to overlap with the target register. The fix aligns the CRC32 assert with the handling used for other RMW (read-modify-write) operations.
Key changes:
- Enhanced the assertion in CRC32 codegen to allow the case where both operands are the same local variable
- Extracted and reused common logic for checking if two trees represent the same local variable
- Simplified existing duplicate assertion logic in binary operation codegen
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Enhanced CRC32 assertion to include same-local-variable check |
src/coreclr/jit/codegenxarch.cpp | Added genIsSameLocalVar helper function and simplified binary operation assertion |
src/coreclr/jit/codegen.h | Added declaration for the new genIsSameLocalVar helper function |
Co-authored-by: Copilot <[email protected]>
@tannergooding can you please take another look at this? Also cc @dotnet/jit-contrib and PTAL @amanasifkhalid |
/ba-g Out of space infra failure |
The crc32 HW intrinsic is a 2 operand node that produces a register; however, the underlying instruction is an RMW instruction that destructively modifies
op1
. For that reason the codegen needs to issue a move of op1 into the target register beforecrc32
is issued.There is an assert that this
mov
does not overwrite op2 in the process. When LSRA builds uses for the operands it needs to mark op2 as delay freed to ensure op2 and the target do not end up in the same register. However, due to optimizations LSRA may skip this marking when op1 is a last use. Thus we can end up hitting the assert.The most logical solution seems to be to actually ensure that op2 is always delay freed, such that it never shares the register with the target. However, the handling for other RMW nodes (like
GT_SUB
) does not have similar logic to this; instead it relies on the situation to not happen except for one particular case where it is ok: whenop1
andop2
are actually the same local. This PR enhances the assert to match the checks that happen for other RMW nodes.Also enhance a similar assert for arm64 HW intrinsics.
Fix #108601
Fix #117612