Add debug information for runtime async methods #120303

jakobbotsch · 2025-10-01T19:35:35Z

Add new JIT-EE API to report back debug information about the generated state machine and continuations
Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. It is indicated by the first field (size of bounds) having value 0, which is an uncommon value for this field.
Add new async debug information to the storage on the VM side
Set target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.
Implement new format in R2R as well, bump R2R major version (might as well do this now as we expect to need to store async debug info in R2R during .NET 11 anyway)

- Add new JIT-EE API to report back debug information about the generated state machine and continuations - Refactor debug info storage on VM side to be more easily extensible. The new format has either a thin or fat header. The fat header is used when we have either uninstrumented bounds, patchpoint info, rich debug info or async debug info, and stores the blob sizes of all of those components in addition to the bounds and vars. - Add new async debug information to the storage on the VM side - Set get target method desc for async resumption stubs, to be used for mapping from continuations back to the async IL function that it will resume.

dotnet-policy-service · 2025-10-01T19:36:36Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

src/coreclr/inc/cordebuginfo.h

jakobbotsch · 2025-10-02T10:20:48Z

src/coreclr/vm/debuginfostore.cpp

    CONTRACTL
    {
-        NOTHROW;
+        THROWS;


RestorePatchpointInfo is called from JitPatchpointWorker, which has STANDARD_VM_CONTRACT. I think throwing should be ok, since we only throw here on an internal inconsistency in the compressed data when encountered by NibbleReader. That's consistent with the other Restore routines and saves us having to write separate decoding routines for the patchpoint info.

src/coreclr/vm/debuginfostore.h

jakobbotsch · 2025-10-02T12:58:03Z

src/coreclr/vm/codeman.cpp

+    CompressDebugInfo::RestoreRichDebugInfo(
+        fpNew, pNewData,
+        pDebugInfo,
+        ppInlineTree, pNumInlineTree,
+        ppRichMappings, pNumRichMappings);
+
+    return TRUE;
+}


I figured I might as well hook this one up in ReadyToRunJitManager too since the format now technically allows for R2R images that contain the rich debug info, eeven if crossgen2 doesn't produce it.

…mental APIs in async tests

jkotas · 2025-10-07T04:00:54Z

src/coreclr/inc/cordebuginfo.h

+        uint32_t Offset;
+        // Index in continuation's object[] data where this variable's GC pointers are stored, or 0xFFFFFFFF
+        // if the variable does not have any GC pointers
+        uint32_t GCIndex;


Since we are emitting custom methodtables for the continuations, should we rather make the field layout flat and avoid these extra layers?

Right now Continuation is the same single type for all the uses:

runtime/src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs

Lines 75 to 104 in 5811908

internal sealed unsafe class Continuation

{

public Continuation? Next;

public delegate*<Continuation, Continuation?> Resume;

public uint State;

public CorInfoContinuationFlags Flags;

// Data and GCData contain the state of the continuation.

// Note: The JIT is ultimately responsible for laying out these arrays.

// However, other parts of the system depend on the layout to

// know where to locate or place various pieces of data:

//

// 1. Resumption stubs need to know where to place the return value

// inside the next continuation. If the return value has GC references

// then it is boxed and placed at GCData[0]; otherwise, it is placed

// inside Data at offset 0 if

// CORINFO_CONTINUATION_OSR_IL_OFFSET_IN_DATA is NOT set and otherwise

// at offset 4.

//

// 2. Likewise, Finalize[Value]TaskReturningThunk needs to know from

// where to extract the return value.

//

// 3. The dispatcher needs to know where to place the exception inside

// the next continuation with a handler. Continuations with handlers

// have CORINFO_CONTINUATION_NEEDS_EXCEPTION set. The exception is

// placed at GCData[0] if CORINFO_CONTINUATION_RESULT_IN_GCDATA is NOT

// set, and otherwise at GCData[1].

//

public byte[]? Data;

public object?[]? GCData;

Have we done any modeling whether it is worth it to have the data in separate blocks from the continuation? The extra object headers and indirections are not free.

The design makes Continuation shape mostly opaque. Most helpers and the infrastructure do not dig into internals of Continuation. As long as JIT is self-consistent in terms of serialization/deserialization of locals, most other things are not concerned with the shape thus JIT can change the continuation shape, modulo R2R and debug interfaces.

There are just a few things that are meaningful to the infrastructure - the link to the Next continuation, the flags, the locations of the return value or an exception when infrastructure, upon completion of the calee, needs to place results into the caller's continuation before resuming it.
How most of the locals are stored is an opaque agreement between the method who serializes locals into a continuation and the method who deserializes - that is the same method.

Currently we use two arrays (array of bytes + array of objects) to serialize/deserialize locals. The biggest advantage is "time-to-market", obviously.
There are other advantages - like there is no type or GC layout generation. The shape of continuation is different for every await, so it is useful to only emit site-specific code, as we must do anyways, but not types/layouts. There could be more than one await per method so it could add up. We optimize for suspensions never happening though, statistically speaking, so slightly less efficient format which has fewer upfront/static requirements is attractive.

There are disadvantages:

it is not the most compact format. Think of capturing just one int and one object.

return values that happen to be object-containing structs, need to be boxed.

Also it could be difficult to external introspection like a debugger.
For example structs containing a mix of int and object fields get their fields stored in different arrays accordingly. It is not a problem for JIT to reconstitute such struct, but it could be an inconvenience for other observers.

Anyways. The API here tries to support the current format, but leave the door open for future changes.
I think it is good to not have too many parts changing at once, unless it is blocking or costly to change later, so I think it is a good approach even if we think of tweaking the continuation format.

Since we are emitting custom methodtables for the continuations, should we rather make the field layout flat and avoid these extra layers?

Yes, once/if #120411 makes it in, we only need Offset here.

Originally I didn't add GCIndex for precisely the reason that I expected we would do that. However @tommcdon was playing around with the debug info earlier and wondered about the data he was seeing, and since it's not a large change I decided to just add it for now, with the catch that it might change in the future.

I agree with all of @VSadov's points, but we probably need to measure it. I also am somewhat worried about creating custom MethodTable for every continuation up front. Although the actual creation in #120411 is lightweight so maybe it will not be a problem.
In the end we could even take a hybrid approach with the byte[]/object[] version used for tier0/debug and the flat versions used for tier1. Of course that makes it more complicated for everyone since now they need to be aware of two possible formats.

For example structs containing a mix of int and object fields get their fields stored in different arrays accordingly. It is not a problem for JIT to reconstitute such struct, but it could be an inconvenience for other observers.

Is there a design for how the debugger could reconstitute such a struct? It doesn't seem like the current debug info is sufficiently powerful to represent that.

The API here tries to support the current format, but leave the door open for future changes.

If you'd like to leave the door open to store locals inline within the Continuation rather than indirected, perhaps represent the variable location as:

enum Base { ContinuationObj = 1, // variable stored at continuation + offset DataArray = 2, // variable stored at continuation->Data + offset GCDataArray = 3 // variable stored at continuation->GCData + offset } Base OffsetBase; uint32_t Offset;

Alternately changing the contract sometime in the next 6 months probably isn't too costly. Doing it close to .NET 11 ship or after .NET 11 ship would have additional burdens.

Is there a design for how the debugger could reconstitute such a struct? It doesn't seem like the current debug info is sufficiently powerful to represent that.

The replicate the way the JIT currently stores/restores these values, for a type of size S:

If Offset != UINT_MAX, take S bytes from Data starting from Offset

If GCIndex != UINT_MAX, fill in the GC pointers in ascending order of offset in the type, starting with the GC pointer at index GCIndex

If debuggers hard-code that algorithm it becomes a breaking change if the JIT ever wants to lay out the data differently. Is this an algorithm we want to set in stone or just a stop-gap for now? I haven't had a chance to get a good look at #120411 yet but it suggests our plans for field layout are still in flux.

In that algorithm above, do we have to worry about alignment padding or all the fields will be packed?

Is this an algorithm we want to set in stone or just a stop-gap for now?

At this point I am expecting/hopeful that #120411 makes it in. It makes things simpler -- a single offset and the type is just stored in the normal way at that offset in the continuation. But it has other implications for various components as @davidwrighton pointed out in that PR.

The current storage mechanism exists almost unchanged since the original prototype in 2023. It was the simplest thing I could think of that didn't require boxing all structs with object fields in them.

In that algorithm above, do we have to worry about alignment padding or all the fields will be packed?

Do you mean for step 2? The GCIndex encodes an index into the GCData array. If the value has N object refs in it, then GCData will store N object refs starting at that index. However, it will be up to the reconstruction code to figure out where those N object refs are stored in the value and to copy the GC refs from GCData across.

There can be alignment padding in Data to make sure value types are aligned properly but reconstruction doesn't need to worry about that.

I still have a little confusion about the current algorithm, but since it doesn't look like we'll be keeping it much longer no need for me to suss out the details :) I'm assuming the new continuation types will trigger a new round of updates here.

jkotas · 2025-10-07T05:48:42Z

it is not the most compact format. Think of capturing just one int and one object.

Right. It is kind of similar to how Async v1 used to work in .NET Framework where the state and Task were separate object. We got nice improvement in .NET Core by getting rid of the indirections and stuffing everything into one object.

costly to change later,

This is starting to establish contracts for diagnostics. Those are always costly to change later.

VSadov · 2025-10-07T07:22:28Z

Prior to Roslyn, if I remember correctly, the capture of some locals was into array of objects - because display types were created too early when not all locals were known. And native CSC was capturing everything visible within containing { } scopes.
Roslyn moved async much further in lowering, so locals are known, and started using liveness analysis to see what possibly can live across an await (the rest can be ordinary locals).
It is more compact design, although some scenarios had concerns with producing numerous display struct shapes.

Runtime async could tailor continuation shape to the corresponding await as well.

I think one possibility (just a rough example) - we could have ContinuationBase that has only what infrastructure needs - Next, Resume, Flags. And the local state would be stored in the derived Continuaton_SomeDisambiguationNumber. The locals could be stored as flat fields and fields could have name-mangling after the locals they represent - kind of like Roslyn does.
For such format storing just Offset for every IL local would be sufficient.

If we have something like this by the end of net11 GCIndex could be dropped.

I do not think we have a lot of choices right now, if we want to make progress. GCIndex is needed to parse the current format.

Other parts are less likely to change.

jakobbotsch · 2025-10-07T07:57:09Z

I think one possibility (just a rough example) - we could have ContinuationBase that has only what infrastructure needs - Next, Resume, Flags. And the local state would be stored in the derived Continuaton_SomeDisambiguationNumber. The locals could be stored as flat fields and fields could have name-mangling after the locals they represent - kind of like Roslyn does.
For such format storing just Offset for every IL local would be sufficient.

This is essentially what #120411 is doing, although there is no dynamic description of fields or anything like that. The JIT just hands the VM a map of the fields that have object refs in them.

noahfalk · 2025-10-08T00:28:45Z

src/coreclr/vm/codeman.cpp

+    return TRUE;
+}
+
+BOOL EEJitManager::GetAsyncDebugInfo(


In order to invoke GetAsyncDebugInfo we'd first need to create a DebugInfoRequest which requires knowing the starting address of the method. Presumably an async stackwalking algorithm is starting from a Continuation object to represent an async frame. Is there a plan yet for how we resolve from Continuation to code start address in order to get at any of this other method data?

Currently, the plan is that looking up the call address goes something like:

Continuation.Resume contains a pointer to an IL stub. Resolve that function pointer into the MethodDesc*.

The IL stub's resolver contains a pointer to the original async IL method's MethodDesc*

The MethodDesc* has the compressed async debug info, so access that (requires decompression)

Now use Continuation.State to map back to the IL offset of the call that resulted in the suspension point with that State

Finally use IL -> Native mappings to get a native IP

There are concerns being raised about the performance of this if we are making the async stackwalking part of regular stackwalking. I think we can make the improvement you suggested above to store native IP instead. I also think we can store the State -> IP mapping in the compressed debug info in a way that it can be accessed in constant time without fully decompressing it. Then the process looks something like:

Same as above

Same as above

Map Continuation.State -> IP by accessing the debug info directly

That hopefully puts this resolution process in the same realm as normal unwinding data processing of the standard synchronous stackwalking. Or faster than that.

What does it mean to do the first two steps, going from the Continuation.Resume -> MethodDesc * of the IL stub, some map funcPtr -> MethodDesc ? Once having the MethodDesc to the stub do AsDynamicMethodDesc()->GetILStubResolver()->GetStubTargetMethodDesc() to get to the MethodDesc * of the underlying async method? Last step looks straightforward and fast, not sure about the first step or maybe there is a quick path to go from function pointer to MethodDesc*?

I think NonVirtualEntry2MethodDesc can be used to do the first mapping.

OK, there was some discussion above to store the native IP's on the continuation for the suspend and potentially the resumption point, I guess it has its pros/cons, like size increase, additional bookkeeping and potential stuff that needs to be update if the method gets changed in anyway while there are continuations still around. Could and alternative be to store the MethodDesc representing the underlying async method directly in the continuation? If so it would be fast to get hold of the MethodDesc * when having a continuation, then use the state to get the native IP of the suspend and potentially resumption points representing that state. Having that said, maybe NonVirutalEntry2MethodDesc will be fast enough to go from a continuation to a native IP during stackwalk, we benchmark against a normal unwinder step getting too next frame, so given that the continuation chain will be quicker to walk (like a shadow stack), it would be nice to not waste that perf doing extensive lookups to get the native IP of the suspension point.

I think you can construct benchmarks that show benefits of each scheme. It looks like a variant of the optimize for size vs. speed tradeoff.

What would it take to allow both schemes to co-exist? Obviously, the JIT would have to pass down whether to create the continuation optimized for size or speed - that's "just work". It may be interesting to think about the impact that this would have on async stackwalking and diagnostics.

(I am not asking to build this as part of this PR.)

I made a different comment below related to similar topic around impact on stackwalking and diagnostics., #120303 (comment). At least for EventPipe, we have external API's and tooling that expects native IPs representing each frame in that stackwalk, meaning we would need to resolve the native IP for each async frame we report into a EventPipe callstack. I see a lot of value being able to quickly get to the native IP represented by a continuation frame during normal stackwalk in a way that works with the high amounts of events that could be generated by EventPipe. If we can't do that then we would be stuck using sync frames for all existing EventPipe or do whatever need to resolve async frames into native IPs with impact on stackwalk performance. If we would like to change any of this I imagine we would need a new version of nettrace format that could encode stack traces differently and tools will need to update to support the new nettrace version with updated stack metadata. Totally doable of course, but with a longer adaption tail and lots of more work to do. @noahfalk thoughts?

What would it take to allow both schemes to co-exist? Obviously, the JIT would have to pass down whether to create the continuation optimized for size or speed - that's "just work". It may be interesting to think about the impact that this would have on async stackwalking and diagnostics.

At a minimum, the places that dynamically need to access these things need to learn about the two ways to do that. That's during the main continuation dispatch loop where we access Continuation.Resume and Continuation.Flags. I think we would introduce

class ContinuationShared : Continuation { public delegate*<Continuation, Continuation?> Resume; public uint State; public CorInfoContinuationFlags Flags; } class ContinuationUnique : Continuation { }

and either use virtual methods or (more likely, given our experience with GDV) guard on the base MT being either ContinuationShared or ContinuationUnique. Accessing the IP in the ContinuationUnique case would be simple (some indirections through the method table) while accessing the IP for the shared case needs some lookup scheme like we are discussing here. It will add some cost during dispatch, but hard to say if that would be noticeable.

Totally doable of course, but with a longer adaption tail and lots of more work to do.

Yeah, looking at the complications that this optimization would introduce, it does not seem to be worth it.

Totally doable of course, but with a longer adaption tail and lots of more work to do. @noahfalk thoughts?

Agreed. EventPipe expects stack traces are represented by an array of IPs + optionally some extra symbolic data in the trace that helps resolve those IPs into method names, IL offsets, source, etc. I am hoping we do a Continuation object -> IP conversion inside the runtime for each frame. Representing the stack trace differently from that would require a breaking change to the trace format and requires users to update their profilers which is not only work for us but also for the entire ecosystem.

It may be interesting to think about the impact that this would have on async stackwalking and diagnostics.

Using multiple discrete encoding schemes in different situations adds some complexity to the runtime code that has to process them + makes the data contracts more complex. It sounds like modest extra work that we could do if the performance gain was substantial enough to justify it.

src/coreclr/tools/Common/JitInterface/CorInfoTypes.cs

src/coreclr/inc/cordebuginfo.h

src/coreclr/jit/async.cpp

…ostics

rcj1 · 2025-10-14T02:32:29Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work, giving me an address that is about 0x1000 different from method start. However, in Windbg I am able to get the proper IP by going through the DAC, specifically by going through NativeCodeVersion::GetNativeCode().

Do you know why this is?

jkotas · 2025-10-14T03:21:42Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

rcj1 · 2025-10-14T13:09:00Z

The native offsets appear fine, however now pMD->GetNativeCode() doesn’t work anymore

One method can have multiple copies of native code due to code versioning (tiered compilation, etc.). pMD->GetNativeCode() will give you the most recent instance of the native code, but it may not match the native code that you are trying to map the offset for.

The correct way to do this is to go from IP to debug info like what DebugInfoManager::GetBoundariesAndVars does, and never roundtrip through MethodDesc since it may give you mismatched debug info.

(The root cause of the problem you are hitting may be something else, but this would become a problem eventually as well.)

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

jakobbotsch · 2025-10-14T13:33:10Z

Ultimately I am trying to find the IP in the first place from the resume, which is a fix up precode stub that jumps to the stub to the actual method. I need it to do the native -> IL mapping.

The way I see to get this information now is through the code versions, as you mentioned. What do you think about the perf implications of this? This is another reason to have the IP directly in the continuation, or at least to have a state -> IP mapping available as you suggest with as little overhead as possible @jakobbotsch

Good point -- to get from the resumption stub back to the original code we need a lookup that gives back the IP of the exact code version we resume in. Today that gets allocated here while we JIT:

runtime/src/coreclr/vm/jitinterface.cpp

Lines 14674 to 14676 in 2787545

    
           { 
        
               m_finalCodeAddressSlot = (PCODE*)amTracker.Track(m_pMethodBeingCompiled->GetLoaderAllocator()->GetHighFrequencyHeap()->AllocMem(S_SIZE_T(sizeof(PCODE)))); 
        
           }

I think we can subclass ILStubResolver and keep it there for the resumption IL stubs, then get rid of that loader heap allocation. Let me look into this.

jakobbotsch · 2025-10-14T14:11:58Z

@rcj1 I pushed a commit that adds a new AsyncResumeILStubResolver, and the async resumption stubs will have this resolver. There is an AsyncResumeILStubResolver::GetFinalResumeMethodStartAddress() that can be used to retrieve the start address of the method that resumption is going to end up in.

lateralusX · 2025-10-14T15:29:08Z

src/coreclr/vm/debuginfostore.h

+    static BOOL GetAsyncDebugInfo(
+        const DebugInfoRequest & request,
+        IN FP_IDS_NEW fpNew, IN void * pNewData,
+        OUT ICorDebugInfo::AsyncInfo* pAsyncInfo,


Any reason why we keep the number of suspension points in AsyncInfo and number of vars as an out parameter? Since we have the AsyncInfo struct, wouldn't it make sense to put all the out parameters inside that struct?

It is just the fact that the length of the async vars array is not an interesting piece of semantic information about the async method, while the number of suspension points is. So I included the length of that array in the normal "API hygienic" way, while I put the number of suspension points inside ICorDebugInfo::AsyncInfo which contains the semantically interesting method-level information.

I also considered duplicating the length of the suspension points array in the API signature, for API hygiene/consistency, but it feels redundant/confusing the have the same number twice.

lateralusX · 2025-10-14T15:35:29Z

src/coreclr/vm/debuginfostore.h

        OUT ICorDebugInfo::RichOffsetMapping** ppRichMappings,
        OUT ULONG32*                           pNumRichMappings);

+    static BOOL GetAsyncDebugInfo(


Would it be possible to get an optimized version of this function? It will be a common scenario when stackwalking to just request data for a specific continuation resume state index and we are only interested in the suspension point data and not local vars. If we could scope it down to just one item, then I could have a custom fpNew and pNewData using a stack allocated ICorDebugInfo::AsyncSuspensionPoint, meaning there is no need for any dynamic memory allocation, and we could skip to the requested index in async debug info and only extract requested information.

I will look into a way to extract the native offset of a particular state number in constant time.

stackwalking to just request data for a specific continuation resume state index

It still feels wrong for stackwalking to parse the debug info.

Perhaps we should be treating the state index <-> native IP mapping as new unwind data rather than new debug data? Alternately if the Continuations aren't shared across different async methods then putting the info directly in the MethodTable is an option.

To make sure that we are using the same terminology, there are two steps:

Stack walking: Populates Exception._stackTrace with raw data. For async methods, the raw data is (Resume, State) pair and potential keep alive root. We should not need debug info to find (Resume, State) pair. Is that correct?

Stack trace formatting: Converting Exception._stackTrace to a string, like what Exception.ToString() does. This is several orders more expansive than (1). It can use metadata, debug info, etc.

(There is similar two-step process with other diagnostic scenarios, e.g. CPU profiling.)

just to be clear: this would help find which final method we resume in, but we would still keep the state -> IP mapping that takes us to actual internal place within the function that control resumes at after state has been restored, and stack walking would still do this mapping? I.e. we would not try to duplicate IL mappings at each of the trampolines.

It sounds workable and should unify the first mapping steps for all the runtimes. I assume it won't make things faster, so it is just about moving the work into the JIT and out of the runtimes.

we would still keep the state -> IP mapping that takes us to actual internal place within the function

Yes, I think it is fine to keep the state and switch-resume model.

stack walking would still do this mapping

Nit: Stack trace formatting would do this mapping (#120303 (comment) )

we would not try to duplicate IL mappings at each of the trampolines

Yes, I think we can start with one shared trampoline that has no IL mapping. We can keep the non-shared trampolines with IL mappings in our back-pocket in case we run into troubles with propagating the state in addition to the IP in all scenarios.

I assume it won't make things faster

Stack trace formatting is slow by design. Stack trace formatting is about gathering and combining data that come from different data sources. I think the main benefit is that the stack trace formatting does not need IL stub -> method translation and the associated data source(s) anymore.

Nit: Stack trace formatting would do this mapping (#120303 (comment) )

I see. So we do expect to capture (Resume, State) pair as part of the stack walking. Does it mean e.g. ETW stack traces will need a separate post processing step to turn this into a unique IP? (Maybe it already has something like that.)

My main confusion stems from trying to understand what kind of stack traces diagnostics will see. Do we always expect that the stack traces we make available has been processed in a way that makes the IP more friendly, before it makes it out of the runtime?

For EventPipe stackstraces we will need native IP representing suspension point for each continuation when writing event, see:

https://github.com/microsoft/perfview/blob/main/src/TraceEvent/EventPipe/NetTraceFormat.md#stackblock-object

We will need a quick way to go from continuation -> native IP inside runtime stackwalking. The "formatting" is taken care by the tools, native IP -> method + IL offset -> source line.

ETW/User Events is a different story, callstacks are captured by OS, so currently we have no control of the stackwalk, meaning ETW/User Events callstacks won't be able to capture any async frames as part of the underlying OS API implementation. There are different routes we could take to mitigate this, emit async frame data in the event payload (hidden), emit an extra event handled by tooling to stitch together complete stacks, or using a side channel of events to recreate continuation chains externally. Regardless scenario they would all benefit off having ability to efficiently go from continuation -> native IP inside runtime before emitting the events.

To make sure that we are using the same terminology, there are two steps:

Stack walking: Populates Exception._stackTrace with raw data. For async methods, the raw data is (Resume, State) pair and potential keep alive root.

Stack trace formatting ...

The distinction of two phases is good, though (Resume,State) isn't the data which is exchanged across that boundary today. Converting Exception/StackTrace to operate that way in the future would take some modest work but doing the same for EventPipe would be onerous. I am treating all the code that runs prior to serializing an event into the NetTrace file as the 'stack walking' phase of EventPipe and everything that happens in the profiler parsing the NetTrace file as the 'Stack trace formatting' phase. For EventPipe the exchange format across those two phases is an array of IPs serialized in the file and the formatting phase will do IP -> IL offset -> source line conversion. Changing to put (Resume,State) in the file is a breaking change in the format and requires updates to all profilers. I really hope to avoid that.

Yes, I think we can start with one shared trampoline that has no IL mapping. We can keep the non-shared trampolines with IL mappings in our back-pocket in case we run into troubles with propagating the state in addition to the IP in all scenarios.

For EventPipe the IP that we serialize in the NetTrace file needs to be one with a usable native->IL mapping. We'd wind up converting between these different representations during the 'stackwalking' phase to produce one.

state index + async method trampoline IP

state index + async method start IP

async method IP that has usable IL offset mapping

Getting between (1) and (2) probably has fewer indirections vs. starting with the resume stub IP but moving between step (2) and (3) still pulls in the debug data if that is where we are storing the state index -> native offset mapping.

Hopefully a minor detail, but if we do the funclet approach we also need to ensure that we have a mechanism for the stackwalker to distinguish the funclet call from other recursive calls so that we can filter it out of any synchronous stackwalks, the same as we want to do with the resume stub.

jakobbotsch added 3 commits October 1, 2025 14:54

Set target method desc for resumption stubs

b7bb968

Add JIT-EE boilerplate

d5d0864

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 1, 2025

jakobbotsch changed the title ~~Add debug information for runtime async information~~ Add debug information for runtime async methods Oct 1, 2025

dotnet-policy-service bot assigned jakobbotsch Oct 1, 2025

jakobbotsch commented Oct 1, 2025

View reviewed changes

src/coreclr/inc/cordebuginfo.h Outdated Show resolved Hide resolved

jakobbotsch added 8 commits October 1, 2025 21:42

Update managed view

6addd0e

Remove TODOs

6a5bf19

Leaf transition

262260a

Comment

4e2a829

Delete unused enum

820afa9

Restore comment

3621eae

Fix osx build

f597424

Fix GCC build

6bb6cee

am11 added the runtime-async label Oct 1, 2025

build-analysis bot mentioned this pull request Oct 2, 2025

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

Change sentinel value, fix contract

f580e3f

jakobbotsch commented Oct 2, 2025

View reviewed changes

src/coreclr/vm/debuginfostore.h Show resolved Hide resolved

jakobbotsch added 3 commits October 2, 2025 13:41

Bump R2R

55b9522

Clean up

c9c64be

Expose async debug info accessor APIs

db77446

jakobbotsch commented Oct 2, 2025

View reviewed changes

jakobbotsch added 5 commits October 2, 2025 16:18

Missed bumping R2R version for naot

c808390

Fix reverse mapping to IL local nums

3ec5160

Fix monotonicity for async vars

bb12c77

Code style

bf3364b

Fix JIT-EE prompt tools from Egor's instructions, allow use of experi…

b49f38f

…mental APIs in async tests

agocke requested review from eduardo-vp and jtschuster October 3, 2025 17:17

Add AsyncContinuationVarInfo.GCIndex

579714e

build-analysis bot mentioned this pull request Oct 7, 2025

Test failure: baseservices/exceptions/stackoverflow/stackoverflowtester/stackoverflowtester.cmd #110173

Open

jkotas reviewed Oct 7, 2025

View reviewed changes

noahfalk reviewed Oct 8, 2025

View reviewed changes

lateralusX reviewed Oct 8, 2025

View reviewed changes

src/coreclr/jit/async.cpp Outdated Show resolved Hide resolved

jakobbotsch added 6 commits October 8, 2025 18:03

Publish NextContinuation in TLS

c056793

Rename ThunkTask -> RuntimeAsyncTask

9679f91

Merge branch 'main' of github.com:dotnet/runtime into jit-async-diagn…

7a34475

…ostics

Report native offsets instead

e839409

Run jit-format

d8ad0c1

Print reported async debug info, always report it

9fdd8a7

This was referenced Oct 9, 2025

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

[android] Android.Device_Emulator.JIT.Test failing on emulators with CoreCLR #112633

Open

Store target IPs in AsyncResumeILStubResolver

69e02db

Fix bug

68c245b

lateralusX reviewed Oct 14, 2025

View reviewed changes

Remove BBF_INTERNAL from rethrow BB to avoid broken mappings

2031f08

max-charlamb mentioned this pull request Oct 16, 2025

[cDAC] Runtime Change Backlog #120797

Open

2 tasks

jakobbotsch mentioned this pull request Oct 16, 2025

Follow-up work for new profiler ClassLoad events in .NET 11 #120799

Open

	internal sealed unsafe class Continuation
	{
	public Continuation? Next;
	public delegate*<Continuation, Continuation?> Resume;
	public uint State;
	public CorInfoContinuationFlags Flags;

	// Data and GCData contain the state of the continuation.
	// Note: The JIT is ultimately responsible for laying out these arrays.
	// However, other parts of the system depend on the layout to
	// know where to locate or place various pieces of data:
	//
	// 1. Resumption stubs need to know where to place the return value
	// inside the next continuation. If the return value has GC references
	// then it is boxed and placed at GCData[0]; otherwise, it is placed
	// inside Data at offset 0 if
	// CORINFO_CONTINUATION_OSR_IL_OFFSET_IN_DATA is NOT set and otherwise
	// at offset 4.
	//
	// 2. Likewise, Finalize[Value]TaskReturningThunk needs to know from
	// where to extract the return value.
	//
	// 3. The dispatcher needs to know where to place the exception inside
	// the next continuation with a handler. Continuations with handlers
	// have CORINFO_CONTINUATION_NEEDS_EXCEPTION set. The exception is
	// placed at GCData[0] if CORINFO_CONTINUATION_RESULT_IN_GCDATA is NOT
	// set, and otherwise at GCData[1].
	//
	public byte[]? Data;
	public object?[]? GCData;

Add debug information for runtime async methods #120303

Are you sure you want to change the base?

Add debug information for runtime async methods #120303

Conversation

jakobbotsch commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Oct 1, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakobbotsch Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas commented Oct 7, 2025

Uh oh!

VSadov commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakobbotsch commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rcj1 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Oct 14, 2025

jakobbotsch commented Oct 1, 2025 •

edited

Loading

VSadov Oct 7, 2025 •

edited

Loading

jakobbotsch Oct 9, 2025 •

edited

Loading

VSadov commented Oct 7, 2025 •

edited

Loading

lateralusX Oct 8, 2025 •

edited

Loading

jkotas Oct 16, 2025 •

edited

Loading

lateralusX Oct 16, 2025 •

edited

Loading

rcj1 commented Oct 14, 2025 •

edited

Loading

rcj1 commented Oct 14, 2025 •

edited

Loading

jkotas Oct 16, 2025 •

edited

Loading

lateralusX Oct 17, 2025 •

edited

Loading