Skip to content

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 10, 2024

The crc32 HW intrinsic is a 2 operand node that produces a register; however, the underlying instruction is an RMW instruction that destructively modifies op1. For that reason the codegen needs to issue a move of op1 into the target register before crc32 is issued.

There is an assert that this mov does not overwrite op2 in the process. When LSRA builds uses for the operands it needs to mark op2 as delay freed to ensure op2 and the target do not end up in the same register. However, due to optimizations LSRA may skip this marking when op1 is a last use. Thus we can end up hitting the assert.

The most logical solution seems to be to actually ensure that op2 is always delay freed, such that it never shares the register with the target. However, the handling for other RMW nodes (like GT_SUB) does not have similar logic to this; instead it relies on the situation to not happen except for one particular case where it is ok: when op1 and op2 are actually the same local. This PR enhances the assert to match the checks that happen for other RMW nodes.

Also enhance a similar assert for arm64 HW intrinsics.

Fix #108601
Fix #117612

The crc32 HW intrinsic is a 2 operand node that produces a register;
however, the underlying instruction is an RMW instruction that
destructively modifies `op1`. For that reason the codegen needs to issue
a move of op1 into the target register before `crc32` is issued.

There is an assert that this `mov` does not overwrite op2 in the
process. When LSRA builds uses for the operands it needs to mark op2 as
delay freed to ensure op2 and the target do not end up in the same
register. However, due to optimizations LSRA may skip this marking when
op1 is a last use. Thus we can end up hitting the assert.

The most logical solution seems to be to actually ensure that op2 is
always delay freed, such that it never shares the register with the
target. However, the handling for other RMW nodes (like `GT_SUB`) does
not have similar logic to this; instead it relies on the situation to
not happen except for one particular case where it is ok: when `op1` and
`op2` are actually the same local. This PR enhances the assert to match
the checks that happen for other RMW nodes.
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 10, 2024
@jakobbotsch
Copy link
Member Author

This matches the logic in genCodeForBinary from here:

#ifdef DEBUG
unsigned lclNum1 = (unsigned)-1;
unsigned lclNum2 = (unsigned)-2;
GenTree* op1Skip = op1->gtSkipReloadOrCopy();
GenTree* op2Skip = op2->gtSkipReloadOrCopy();
if (op1Skip->OperIsLocalRead())
{
lclNum1 = op1Skip->AsLclVarCommon()->GetLclNum();
}
if (op2Skip->OperIsLocalRead())
{
lclNum2 = op2Skip->AsLclVarCommon()->GetLclNum();
}
assert(GenTree::OperIsCommutative(oper) || (lclNum1 == lclNum2));
#endif

To be honest I do not see the constraint properly encoded in LSRA for the common RMW nodes or for crc32, which is a bit scary. In other words, if we have IR like

/- op1 LCL_VAR V01 last use
/- op2 LCL_VAR V02
GT_SUB

then I think op2 will have a use created that is not marked delay freed. That means the following register assignment is perfectly allowable:

/- op1 LCL_VAR V01 last use rdx
/- op2 LCL_VAR V02 rcx
GT_SUB rcx

however, this assignment would hit the assert in genCodeForBinary. The only reason this does not happen seems to be due to preferencing making it so that LSRA ends up not picking this assignment.

@Copilot Copilot AI review requested due to automatic review settings August 6, 2025 15:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines an assertion in the CRC32 hardware intrinsic code generation to handle edge cases where LSRA optimizations may cause operand registers to overlap with the target register. The fix aligns the CRC32 assert with the handling used for other RMW (read-modify-write) operations.

Key changes:

  • Enhanced the assertion in CRC32 codegen to allow the case where both operands are the same local variable
  • Extracted and reused common logic for checking if two trees represent the same local variable
  • Simplified existing duplicate assertion logic in binary operation codegen

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Enhanced CRC32 assertion to include same-local-variable check
src/coreclr/jit/codegenxarch.cpp Added genIsSameLocalVar helper function and simplified binary operation assertion
src/coreclr/jit/codegen.h Added declaration for the new genIsSameLocalVar helper function

@jakobbotsch jakobbotsch changed the title JIT: Refine assert for RMW check in crc32 codegen JIT: Refine assert for RMW check in crc32 and hw intrinsic codegen Aug 6, 2025
@jakobbotsch
Copy link
Member Author

@tannergooding can you please take another look at this?

Also cc @dotnet/jit-contrib and PTAL @amanasifkhalid

@jakobbotsch
Copy link
Member Author

/ba-g Out of space infra failure

@jakobbotsch jakobbotsch merged commit c993ba2 into dotnet:main Aug 7, 2025
103 of 105 checks passed
@jakobbotsch jakobbotsch deleted the fix-108601 branch August 7, 2025 14:19
@github-actions github-actions bot locked and limited conversation to collaborators Sep 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

3 participants