-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move return buffer handling from interop to the JIT #39294
Conversation
…tbuf arg after the `this` parameter, not before.
…hiscall on required platforms. Still need to reorder the emit so the arguments are emitted correctly.
… JIT can handle it.
…primitive-wrapper struct returning thiscall working.
…eems to optimize them correctly now. Also this enables the JIT to correctly handle native instance method calls that return single primitive field structs.
… since the arguments have already been reversed) on x86.
…a test in interop to validate.
…n for thiscall Reverse P/Invoke stubs, update the stub linker to handle the case where a return value would be an enregistered return on stdcall but is returned via a return buffer for thiscall.
…s a value type. Use a type handle instead of a method table. Don't check size for enums.
Looks like there's more work I need to do to support the EAX/EDX multireg return on Windows x86. I'm locally seeing asserts about how the GCInfo emitting doesn't know how to handle multireg struct returns on Windows x86 Debug (which is possibly causing the libraries test failures). And there's the other assert on Windows x86 Checked when crossgenning, so I need to figure that out as well. cc: @CarolEidt @echesakovMSFT if you could take a look when you have a chance and see if you can figure out what I'm missing/what I need to add or change to get this working, that would be great! |
I would avoid changing managed calling convention in this change, in particular for x86. I would limit this change to just changing handling of unmanaged calling conventions in the JIT. |
Ok. I'll try to limit this change to unmanaged calls only. |
I've tried to move this to only being applied to unmanaged calls, but there are a few places where I don't think I can easily ask "is this an unmanaged call?" when the JIT asks "how do I return this struct?" and as a result I'm ending up with the runtime in a bad state. Since I also don't know how to correctly emit the GCInfo for the alternative case of multireg returns on x86 on all platforms, I'm going to ask for help. @dotnet/jit-contrib any suggestions on how to handle the fact that unmanaged calls on x86 need to support multireg returns for 8-byte structs without causing the JIT to fail because we don't support multireg returns on x86 in the GCInfo encoder? We can't just have interop tell the JIT that the 8-byte struct is a long since in the ThisCall case, 8 byte structs are returned in a return buffer but longs are returned in a register. |
Signed-off-by: Jeremy Koritzinsky <[email protected]>
In #39294 (comment) we were discussing
|
For historic reasons, there are multiple (at least 4) partially overlapping calling convention enum definitions. Notice that
It does not. UNKNOWN is rejected by the JIT inside I do not believe that
Other runtime parts need to distinguish between unspecified unmanaged calling convention that is substituted with platform default and explicitly specified unmanaged calling convention. They use other calling convention enums for this some of the time (e.g.
If we want to switch the JIT/EE interface to use enum classes, it should be done as a separate style-only PR, accross all enums on JIT/EE interface. This PR is big enough as it is. |
@jkotas got it, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to see the whole picture for such a big change, the small parts look close to a merge state.
I left a few comments/questions.
src/coreclr/src/jit/compiler.cpp
Outdated
@@ -693,7 +742,7 @@ var_types Compiler::getArgTypeForStruct(CORINFO_CLASS_HANDLE clsHnd, | |||
assert(structSize != 0); | |||
|
|||
// Determine if we can pass the struct as a primitive type. | |||
// Note that on x86 we never pass structs as primitive types (unless the VM unwraps them for us). | |||
// Note that on x86 we only pass specific pointer-sized structs as primitives that the VM used to unwrap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will soon forget what VM used to unwrap, maybe delete an outdated reference and say something like:
// Note that on Windows we only pass specific pointer-sized structs that satisfy isTrivialPointerSizedStruct
checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the body says:
On Unix x86, always use a return buffer for structures.
Is it still true? What does this method return for a trivial struct on x86 unix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native calling conventions always uses a return buffer for structures, even non-trivial ones, on x86 unix. For the managed calling convention, we'll continue to support the current system (4-byte structs returned in registers).
// Skip any user args that we've already processed. | ||
assert(userArgsToSkip <= argSigLen); | ||
argSigLen -= userArgsToSkip; | ||
for (unsigned i = 0; i < userArgsToSkip; i++, argLst = info.compCompHnd->getArgNext(argLst)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I see userArgsToSkip
can be only 0 or 1, do we need a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop is moreso to future proof this so it works for any input, even though we currently only use 0 or 1. If you prefer, I can change this to be an if condition and assert that the value is 0 or 1.
@@ -2837,10 +2846,11 @@ void Compiler::fgInitArgInfo(GenTreeCall* call) | |||
{ | |||
maxRegArgs = 0; | |||
} | |||
|
|||
#ifdef UNIX_X86_ABI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was to preserve the previous behavior. We used to do this on both x86 platforms. For consistency with the previous behavior (since I don't have a unix x86 test setup and we don't run it in CI), I wanted to keep this block active for Unix x86.
If we use |
Signed-off-by: Jeremy Koritzinsky <[email protected]>
I've addressed all of the feedback excluding the conversation about "union vs side table". Does anyone have any more feedback for this PR before approval? |
Once all the PR feedback has been addressed and the outerloop test run is clean, I suggest you trigger basically every JIT and GC stress AzDO job. Have you analyzed JIT throughput impact (if any is expected)? Verified no asm diffs (if none are expected), or expected asm diffs? |
I don't expect any JIT throughput impact. I've verified that the only asm diffs are the expected asm diffs (mentioned above that the only ones are on x86 and due to the change in struct normalization). I'll trigger the stress jobs once the PR jobs are done to try to not overload the queues. |
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
I've run the outerloop pipelines. I've validated that the jitstress failures match the same failures in master. The gcstress+jitstress failures are a little harder to validate, but it looks to me like they're the same failures that exist in master as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience during this massive change.
I have one more request, that could be done as a separate PR, I can open an issue if you don't have time for it right now:
https://github.com/dotnet/runtime/blob/master/docs/design/coreclr/botr/clr-abi.md does not have a description (or link) for x86/arm abi, so it would not be fair to ask you to write it fully, but could you please expand your examples from the PR header and put them there with additional comments and examples with accent on new managed/native differences?
Managed->Native
On Windows non-ARM, the JIT will generate a struct return buffer argument passed as per the platform ABI for thiscall.
please add what happens for windows arm and not for thiscall.
I think I would prefer just to see a banch of example, with different struct sizes, with/without special arguments/profiler attached etc with comments where each argument goes on each platform.
I'll work on another change to update that doc with my learnings in another PR so I don't slow down merging this one in. Thanks! |
So far, when specific calling conventions have required different rules around return buffers than the managed calling conventions, the JIT has relied on the interop space to munge the IL stubs so the JIT doesn't have to handle the return buffer logic itself. Through the work for supporting
thiscall
in .NET Core 3.x and .NET 5, we've found that handling the return buffer logic in the interop space for anything more than the most trivial cases is prone to bugs and makes the logic hard to follow. In addition, handling the logic in the IL stubs precludesUnmanagedCallersOnly
methods that might need return buffers that would be provided by the interop system from not needing an IL stub.This PR moves the return buffer handling into the JIT so the interop space doesn't need to know when to create a return buffer nor does it need to know how to munge the IL to correctly create return buffers in the managed->native and native->managed directions.
Fixes #12375
The general logic of this PR goes as follows:
General return buffer handling
On Unix x86, always use a return buffer for structures. On Windows x86, use the JITs support for unwrapping trivial structs to enregister <=4-byte structs for non-thiscall scenarios. Add in EAX/EDX multireg return support for 8-byte structures on Windows x86 when not in the thiscall scenario.
Managed->Native
On Windows non-ARM, the JIT will generate a struct return buffer argument passed as per the platform ABI for
thiscall
.Native->Managed
On platforms where we have a standard UMThunkStub (non-Windows x86), we pass the unmanaged calling convention to the JIT.
On Windows non-ARM, the JIT will generate a return buffer for
thiscall
methods.Since Windows x86 has a special stub linker, we update the
thiscall
handling there. Since the current implementation tries to surfacethiscall
asstdcall
for the majority of the thunk by swapping around arguments, I augment that support by adding support for copying an EAX/EDX multireg return value from the managed call to the native return buffer.