-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM32: Fix passing odd sized structs from arbitrary sources #70075
Conversation
ARM32 ABI allows passing structs in register even when their sizes are not divisible by 4. This means we sometimes need to pass 3 bytes in the last register. The JIT would not handle this when the source was an arbitrary memory location (this would require multiple loads and shifts). The fix is to just force a copy into the local stack frame for this case. Fix dotnet#61168
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsARM32 ABI allows passing structs in register even when their sizes are Fix #61168
|
src/coreclr/jit/morph.cpp
Outdated
// ARM32 has an edge case where we need to pass 3 bytes in | ||
// the last register, which would require two loads and a | ||
// shift to avoid loading too much if the source is a | ||
// potentially arbitrary address. Handle the edge case here | ||
// by copying into the local stack frame first so we can | ||
// use a register wide load. | ||
if (!arg.AbiInfo.IsSplit() && (((arg.AbiInfo.NumRegs * REGSIZE_BYTES) - passingSize) == 1)) | ||
{ | ||
makeOutArgCopy = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I would have expected this code to be in ArgsComplete
, where we force copies for all other multi-reg enabled platforms for the same reason (of fgMorphMultiregStructArg
not being able to handle them):
runtime/src/coreclr/jit/morph.cpp
Lines 958 to 961 in f7a32c0
// TODO-Arm: This optimization is not implemented for ARM32 | |
// so we skip this for ARM32 until it is ported to use RyuJIT backend | |
// | |
else if (argx->OperGet() == GT_OBJ) |
The code talks about this being an "optimization", not sure why that is (it is required for correctness in a number of cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move it there, though I don't see why this has anything to do with multi-reg. I suspect this is handled for the single-reg case by
if (structSize < TARGET_POINTER_SIZE)
{
makeOutArgCopy = true;
}
right below the code I added. I just did not want to remove that for this surgical fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me get this fix in and then look at generalizing the logic in ArgsComplete
to work for ARM as well in a follow-up. The current code there looks needlessly conservative (for all platforms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I suppose it would only be conservative for ARM, that's probably why it's not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me get this fix in and then look at generalizing the logic in ArgsComplete to work for ARM as well in a follow-up. The current code there looks needlessly conservative (for all platforms).
I would be fine with that. I agree the logic there is too conservative for platforms that support precise PUTARG_STK
s (which is ARM and ARM64 only right now, plus Unix x64 after #68611).
Edit: note I tried the ArgsComplete
fix a few days ago, there were no diffs.
cc @dotnet/jit-contrib PTAL @BruceForstall |
ARM32 ABI allows passing structs in register even when their sizes are
not divisible by 4. This means we sometimes need to pass 3 bytes in the
last register. The JIT would not handle this when the source was an
arbitrary memory location (this would require multiple loads and
shifts). The fix is to just force a copy into the local stack frame for
this case.
Fix #61168
Fix #68837
Fix #70042