Skip to content

Conversation

MichalStrehovsky
Copy link
Member

@MichalStrehovsky MichalStrehovsky commented Feb 11, 2025

This changes interface calls when CFG is enabled to use a resolve helper instead of the stub dispatcher. I've also extended the CFG testing a bit to cover more of interface calls in the smoke test.

static int Call(IFoo f, int x, int y) => f.Call(x, y);

Before this change:

00007FF605971D50  push        rbx  
00007FF605971D51  sub         rsp,20h  
00007FF605971D55  mov         qword ptr [rsp+30h],rcx  
00007FF605971D5A  mov         r10d,edx  
00007FF605971D5D  mov         ebx,r8d  
00007FF605971D60  mov         rcx,qword ptr [__InterfaceDispatchCell_repro_Program_IFoo__Call_repro_Program__Call (07FF6059BEBD0h)]  
00007FF605971D67  call        qword ptr [__guard_check_icall_fptr (07FF6059CD558h)]  
00007FF605971D6D  mov         rax,rcx  
00007FF605971D70  mov         rcx,qword ptr [rsp+30h]  
00007FF605971D75  mov         r8d,ebx  
00007FF605971D78  mov         edx,r10d  
00007FF605971D7B  lea         r10,[__InterfaceDispatchCell_repro_Program_IFoo__Call_repro_Program__Call (07FF6059BEBD0h)]  
00007FF605971D82  call        rax  
00007FF605971D84  nop  
00007FF605971D85  add         rsp,20h  
00007FF605971D89  pop         rbx  
00007FF605971D8A  ret  

After this change:

00007FF704181CF0  sub         rsp,28h  
00007FF704181CF4  lea         r10,[__InterfaceDispatchCell_repro_Program_IFoo__Call_repro_Program__Call (07FF7041CEBD0h)]  
00007FF704181CFB  call        RhpResolveInterfaceMethodFast (07FF703FFF5A0h)  
00007FF704181D00  call        qword ptr [__guard_dispatch_icall_fptr (07FF7041DD560h)]  
00007FF704181D06  nop  
00007FF704181D07  add         rsp,28h  
00007FF704181D0B  ret  

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

For this not to be a throughput regression, we need to figure out a clean way to implement the hack in last two commits.

I ran JSON serialization with/without this PR. Having to preserve argument registers before calling the resolve helper costs us about 1% in throughput.

PR-hack is the hack where we allow RyuJIT to assume argument registers are not clobbered. We better not take a GC in the slow path.
PR is this PR without that hack.
Baseline is the vanilla CFG build.

Style Attemp1 ms Attemp2 ms Attemp3 ms Attemp4 ms Attemp5 ms
PR-hack 1103 1105 1096 1102 1103
PR 1125 1126 1139 1127 1134
Baseline 1112 1105 1111 1104 1106

Do we have a way to do this hack cleanly that doesn't involve building a new universal transition?

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky MichalStrehovsky marked this pull request as ready for review February 18, 2025 07:17
@Copilot Copilot AI review requested due to automatic review settings February 18, 2025 07:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 22 changed files in this pull request and generated no comments.

Files not reviewed (19)
  • src/coreclr/inc/corinfo.h: Language not supported
  • src/coreclr/inc/jiteeversionguid.h: Language not supported
  • src/coreclr/inc/jithelpers.h: Language not supported
  • src/coreclr/jit/codegencommon.cpp: Language not supported
  • src/coreclr/jit/gentree.cpp: Language not supported
  • src/coreclr/jit/gentree.h: Language not supported
  • src/coreclr/jit/lower.cpp: Language not supported
  • src/coreclr/jit/morph.cpp: Language not supported
  • src/coreclr/jit/targetamd64.h: Language not supported
  • src/coreclr/jit/targetarm64.h: Language not supported
  • src/coreclr/nativeaot/Runtime/AsmOffsets.h: Language not supported
  • src/coreclr/nativeaot/Runtime/CMakeLists.txt: Language not supported
  • src/coreclr/nativeaot/Runtime/EHHelpers.cpp: Language not supported
  • src/coreclr/nativeaot/Runtime/StackFrameIterator.cpp: Language not supported
  • src/coreclr/nativeaot/Runtime/amd64/DispatchResolve.asm: Language not supported
  • src/coreclr/nativeaot/Runtime/amd64/UniversalTransition.asm: Language not supported
  • src/coreclr/nativeaot/Runtime/arm64/DispatchResolve.asm: Language not supported
  • src/coreclr/nativeaot/Runtime/arm64/UniversalTransition.asm: Language not supported
  • src/coreclr/nativeaot/Runtime/inc/rhbinder.h: Language not supported
Comments suppressed due to low confidence (1)

src/coreclr/tools/Common/JitInterface/CorInfoHelpFunc.cs:267

  • Ensure that the new helper CORINFO_HELP_INTERFACELOOKUP_FOR_SLOT is covered by tests to verify its behavior.
CORINFO_HELP_INTERFACELOOKUP_FOR_SLOT,  // Resolve a non-generic interface method from this pointer and dispatch cell

@MichalStrehovsky
Copy link
Member Author

@dotnet/ilc-contrib this is ready for review

@jakobbotsch who would be the best to review the JIT side?

@jakobbotsch
Copy link
Member

/azp run jit-cfg

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member

There's some more JIT work needed to properly report non callee saves after a call. I opened #113071 for it.
Still unsure if it will "just work" once the JIT reporting is right.

@jkotas
Copy link
Member

jkotas commented Mar 3, 2025

My general wondering is if the runtime would know to report and relocate the GC refs in the argument registers of the transition frame if GC happens while inside the new helper.

The runtime knows how to do it for the existing VSD helper that tailcalls, so it should be able to do it for helper that returns as well. The new helper uses the same GC reporting setup as the existing VSD helper that tailcalls.

In order for this to work, the JIT has to setup all arguments before calling the new helper. It does not seem to be happening in all cases. For example, I see the following codegen for a call from Dictionary.FindValue to System.Collections.Generic.IEqualityComparer<string>.Equals(string,string):

...
00007ff8`f8b2d475 4c8bd8          mov     r11,rax
00007ff8`f8b2d478 498b1424        mov     rdx,qword ptr [r12]
00007ff8`f8b2d47c 488bcf          mov     rcx,rdi
00007ff8`f8b2d47f e8bcd6825f      call    coreclr!JIT_InterfaceLookupForSlot (00007ff9`5835ab40)
00007ff8`f8b2d484 488bcf          mov     rcx,rdi <--- unnecessary duplication, but it does not hurt
00007ff8`f8b2d487 4c8bc6          mov     r8,rsi <--- this needs to be before JIT_InterfaceLookupForSlot 
00007ff8`f8b2d48a ff156008de5f    call    qword ptr [00007ff9`5890dcf0] // JIT_DispatchIndirectCall
00007ff8`f8b2d490 85c0            test    eax,eax
...

@jakobbotsch
Copy link
Member

In order for this to work, the JIT has to setup all arguments before calling the new helper. It does not seem to be happening in all cases. For example, I see the following codegen for a call from Dictionary.FindValue to System.Collections.Generic.IEqualityComparer<string>.Equals(string,string):

That's expected with how this is implemented in the JIT. The JIT just considers this helper to not trash the argument registers. It is free to leave any GC ref in those registers across the call. It is not considering the "resolve + dispatch" sequence as one unit that takes the arguments of the final call. (I am not sure if this would be feasible on arm64 that uses "resolve + validate + call" sequence).
What the JIT needs from the VM side here is that the VM side uses its reported GC information to update the argument registers of the transition frame (like it would already be doing for callee saves). Is it possible to make it do that?

@jkotas
Copy link
Member

jkotas commented Mar 3, 2025

What the JIT needs from the VM side here is that the VM side uses its reported GC information to update the argument registers of the transition frame (like it would already be doing for callee saves). Is it possible to make it do that?

Yes, I have pushed a commit to do that (some of the changes in that commit are quick hack that needs cleaning up if it works).

Track all integer registers for calls in `regPtrDsc`. This does not cost
any extra memory and it saves us from going back and forth between an
intermediate format. It also unblocks proper GC reporting for helper
calls that are GC reported with non-standard calling convention.
@dotnet dotnet deleted a comment Mar 3, 2025
@jakobbotsch jakobbotsch closed this Mar 3, 2025
@jakobbotsch jakobbotsch reopened this Mar 3, 2025
@jakobbotsch
Copy link
Member

/azp run jit-cfg

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@risc-vv
Copy link

risc-vv commented Jul 4, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

1 similar comment
@risc-vv
Copy link

risc-vv commented Jul 29, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@jakobbotsch
Copy link
Member

Should we just take a simpler but more obviously correct fix here instead of trying to make the calling convention change work? It does not appear to be that simple.

I still wonder if it wouldn't be simpler/less risk to change the cached interface dispatch stubs to emit the call through the CFG dispatcher directly and avoid the more substantial changes in how the scheme works.

@MichalStrehovsky
Copy link
Member Author

I still wonder if it wouldn't be simpler/less risk to change the cached interface dispatch stubs to emit the call through the CFG dispatcher directly and avoid the more substantial changes in how the scheme works.

Would that help? Right now the interface call sequence is:

lea someReg, [DispatchCellInMutableMemory]
call qword ptr [DispatchCellInMutableMemory]

The call is indirected within the codegen so doing this in the stub that DispatchCellInMutableMemory points to would be too late because we already made a unverified call.

@jakobbotsch
Copy link
Member

Would that help? Right now the interface call sequence is:

lea someReg, [DispatchCellInMutableMemory]
call qword ptr [DispatchCellInMutableMemory]

The call is indirected within the codegen so doing this in the stub that DispatchCellInMutableMemory points to would be too late because we already made a unverified call.

Really? That would surprise me. JIT should make sure this goes through CFG. That should definitely be fixed.
What's the repro for this?

@MichalStrehovsky
Copy link
Member Author

Really? That would surprise me. JIT should make sure this goes through CFG. That should definitely be fixed.
What's the repro for this?

Ah, yeah, this goes through a CFG check. But your suggestion is to go through two CFG checks for a single interface call?

@jakobbotsch
Copy link
Member

Ah, yeah, this goes through a CFG check. But your suggestion is to go through two CFG checks for a single interface call?

Yes, it would end up going through two CFG checks. I don't know how bad the impact on performance would be.

@MichalStrehovsky
Copy link
Member Author

Ah, yeah, this goes through a CFG check. But your suggestion is to go through two CFG checks for a single interface call?

Yes, it would end up going through two CFG checks. I don't know how bad the impact on performance would be.

I can have a look at that next week. However, it looks like the resolve helper method would actually be a perf improvement from where we're now, so it's a bit more attractive (#112406 (comment)).

@jakobbotsch
Copy link
Member

Ah, yeah, this goes through a CFG check. But your suggestion is to go through two CFG checks for a single interface call?

Yes, it would end up going through two CFG checks. I don't know how bad the impact on performance would be.

I can have a look at that next week. However, it looks like the resolve helper method would actually be a perf improvement from where we're now, so it's a bit more attractive (#112406 (comment)).

I agree that resolve helper with different calling convention is likely the best way to do the dispatch when CFG is enabled, but at this point in the release cycle and based on the kind of issues we saw trying to make that work it seems like a lot of risk to me.

Maybe the resolve helper with standard calling convention will turn out to be better than double CFG approach. That should be less risky (but not quite as low risk as changing the CID stubs).

@jkotas
Copy link
Member

jkotas commented Aug 12, 2025

it seems like a lot of risk to me.

This change looks risky to me for .NET 10 at this point regardless of the implementation strategy.

@jkotas
Copy link
Member

jkotas commented Aug 12, 2025

Could you please resolve the conflicts and get this build again so that we can start iterating on it? (It can wait after .NET 10 snap.)

@risc-vv
Copy link

risc-vv commented Aug 17, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants