-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use push
for 8/12 byte struct args on x86
#65387
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsTesting CI. Highlight:
|
d386313
to
a1a8988
Compare
Two changes: 1) Outline cases for GC pointers from "genPutStructArgStk" into their own functions. Helps with readability, will be used in the substative portion of the change. 2) Do not use "Kind::Push" for a case of a struct less than 16 bytes in size, that does not have GC pointers, on x86. With this, we will now always call "genStructPutArgUnroll" for the "Unroll" kind. This means "genAdjustStackForPutArgStk" has to special-case that. This will go away in the final version of the change. No diffs on x86 or Unix-x64 (the only two affected platforms).
We're not using XMMs on the "push" path.
a1a8988
to
afee491
Compare
push
for 8/12 byte struct args on x86
It is smaller and a little faster than using an XMM register to first load and then store the struct.
afee491
to
1b59ae9
Compare
Not sure what's up with SPMI in CI. Will assume the Windows x64 timeout is not related (it does not use @dotnet/jit-contrib some nice CQ improvements for x86 (as always, because I need them for another change). |
From unix-arm logs, seems some errors:
Can you double check on your local machine? For x86, they likely have timed out while producing .dasm files. |
Could you paste the sample diff? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactoring. Changes LGTM except the failure on superpmi linux-arm that you want to double check.
@kunalspathak Thank you for pointing that failure out, I did unfortunately miss it in the logs. It does not seem related: no ARM code paths were modified, and I can reproduce it both with and without this change. It sure does look curious though: we're OOMing the Jit (2GB+ memory consumption and then an error in the CRT is hit).
They're all are like this: sub esp, 20
- vzeroupper
xor eax, eax I suppose the better wording for this would have been that "LSRA was over-reporting the use of XMM registers". |
Where do you see that? in the logs? |
No, from locally replaying the |
Does it repro without your changes too? |
As I note, yes. Edit: I suppose in the CI log it also says both the baseline and diff Jits encountered the same error:
|
/azp run Fuzzlyn, Antigen |
Azure Pipelines successfully started running 2 pipeline(s). |
Categorizing fuzzer failures. It seems Fuzzlyn is timing out pretty consistently on Linux ARM, and we have one failure there:
That's pre-existing - seen in an earlier run (logs link). Antigen on different targets has quite a few "uncategorized" issues (guessing these are for "wrong result" cases?), but the only x86 failure looks like #64787, and is an inlining assert (that this backend-only change could not have caused, I suppose). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@SingleAccretion - Keep an eye on the superpmi-replay run that will kick off on your commit to make sure there are no errors. |
Improvements - dotnet/perf-autofiling-issues#3651 |
It is the case that currently for code like the following:
On x86 we generate the following:
While could generate:
Which is both a little faster (ah-hoc benchmarking on my machine shows ~10% improvement) and smaller. This change does just that, using
push [addr]
for 8 and 12 byte structs (the latter made the code simple).To achieve that, some code refactoring was undertaken: we already had a path that did what we needed, but it was only used for structs with GC pointers. I deleted the double meaning of
Kind::Push
- forTYP_STRUCT
it now means only one thing, that isgenStructPutArgPush
, while before this change it could also go down the "unroll" path (for the very specific cases of 8 and 12 byte structs that we are moving to usingpush
es).As a cleanup item,
PartialRepInstr
kind was added, to represent the algorithm we use for storing structs with GC pointers on x64 Unix.There will be two types of diffs with this change: first the one showed above, second - removal of unnecessary
vzeroupper
instructions that were being generated because LSRA was allocating an unnecessary XMM register forKind::Push
PUTARG_STK
s with GC pointers.Diffs - x86 only (no diffs on other targets as expected). All regressions are alignment artifacts.