-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: emit vzeroupper before calls to the bulk write barrier helper #106908
Conversation
The helper uses SSE2, so we need to take care to avoid AVX-SSE transition penalties. Fixes dotnet#106679.
@tannergooding @EgorBo PTAL Will backport to .Net 9. Will also look for other helpers that need similar treatement. |
src/coreclr/jit/gentree.cpp
Outdated
@@ -2050,6 +2050,13 @@ bool GenTreeCall::NeedsVzeroupper(Compiler* comp) | |||
} | |||
} | |||
|
|||
// Other special cases | |||
// | |||
if (!needsVzeroupper && IsHelperCall(comp, CORINFO_HELP_BULK_WRITEBARRIER)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be moved under CT_HELPER
since IsHelperCall(...)
requires gtCallType == CT_HELPER
?
-- Not necessary, just makes it slightly more "pay for play"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That switch is guarded with IsPInvoke
but this helper call is not a pinvoke -- it has no managed counterpart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, that makes sense.
I think we may need to ensure this is guarded under if (comp->canUseVexEncoding())
then; otherwise lsraxarch
will mark us as having a call needing vzeroupper
(see SetContainsCallNeedingVzeroupper
) and that will cause a vzeroupper
to be emitted when preserving callee saved float registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
meaning this new bit, or the whole method...?
If we can't use vex then it seems like we can just immediately return false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't use vex then it seems like we can just immediately return false?
Right.
this meaning this new bit, or the whole method...?
The new bit in particular, but we already have the existing if (IsPInvoke() && comp->canUseVexEncoding())
, so we can probably just do something like this:
if (!comp->canUseVexEncoding())
{
return false;
}
if (IsPInvoke())
{
// ...
}
if (!needsVzeroupper && IsHelperCall(comp, CORINFO_HELP_BULK_WRITEBARRIER))
{
// ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI diffs show a lot more additions than I'd expect, will need to generate them locally and review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These all look expected at a glance. vzeroupper
is 3 bytes for any method that uses such a CORINFO_HELP_BULK_WRITEBARRIER
In the case that the method contains a call needing vzeroupper
(i.e. contains a CORINFO_HELP_BULK_WRITEBARRIER
call) then we have two scenarios:
A) The method uses 256-bit or greater AVX instructions
B) The method uses 128-bit AVX or no AVX instructions
For scenario A, we need to emit a vzeroupper
prior to each CORINFO_HELP_BULK_WRITEBARRIER
call
For scenario B, we optimize to only emitting a single vzeroupper
call in the function prologue
Looking at the diffs, the vast majority of cases are those with just 1 new vzeroupper
. There's a few larger functions that have multiple bulk write barriers and use floating-point/simd, and so have many new vzeroupper
. -- 17 write barriers is the most for a non-test scenario, 67 is then an outlier for a ComInterfaceGenerator.Unit.Tests.CompileFails
iterator method
For .NET 9, this is probably the "right" fix and while it increases size, vzeroupper
is supposed to be "zero-cost" and handled by the register renamer. So it shouldn't negatively impact things.
In .NET 10, we could look at adding an explicit AVX path to CORINFO_HELP_BULK_WRITEBARRIER
in order to avoid the need to emit vzeroupper
, which may be desirable.
/backport to release/9.0 |
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10548313343 |
The helper uses SSE2, so we need to take care to avoid AVX-SSE transition penalties.
Fixes #106679.