-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable stacktrace collection per EventPipeSession #3696
Comments
I've just realized that |
Hi @ezsilmar, RequestRundown won't have any affect on stack collection. There is not currently a way to disable stack collection and it causes pain for some scenarios, we are in the process of adding options to disable stack collection. PR dotnet/runtime#82470 is about to be merged, and long term I would like to do the approach you describe where we can disable stack collection per session. |
Hi @davmason, thanks for the clarification! There's not much info on rundown events on the internet, as far as I understand it it's basically symbol information? I was confused with RequestRundown because of these lines: if (event.NeedStack() && !session.RundownEnabled() && pStack == NULL)
{
EventPipe::WalkManagedStackForCurrentThread(s);
pStack = &s;
} I read them incorrectly at first, assuming that we only collect stacks when we emit Rundown which would make sense to me. However I'm a bit lost as the logic is opposite, meaning probably there are two different ways to do a stackwalk? I couldn't find it in the code. Anyway, I'm interested in contributing to this subject as I'm going to implement something fast for our custom-built runtime in the coming days. |
Rundown is something we do to make sure everything in a trace can be resolved. Jitted methods are dynamically generated at runtime so their IPs are not contained in the symbol file, and we also can potentially have dynamically generated types, in memory modules, and other things that are not known in the symbol file. When rundown is triggered we go through all the dynamically generated things and emit the proper information so anyone reading the trace can properly map everything. It is effectively iterating through all the method bodies, types, etc and emitting an event for each one. We turn off stack information for rundown events as an optimization because it's not relevant, there is no reason for anyone to care what stack is emitting rundown events since it is always the same one. |
Ah ok I see there's |
Client-side of dotnet/runtime#84077 and the implementation of #3696. To simplify the interface I made `EventPipeSessionConfiguration` public and introduced a new method in the DiagnosticsClient: `Task<EventPipeSession> StartEventPipeSessionAsync(EventPipeSessionConfiguration configuration, CancellationToken token)`. This is the only method that supports disabling the stackwalk so no additional overloads with a new bool parameter and no synchronous counterpart. I believe it'd be easier to use and maintain a single async method with the options rather than creating more overloads or default parameters but I may not have all the context here so please correct me if you think it's a bad idea. To deal with the backward compatibility I only use `CollectTracingV3` when necessary i.e. when `RequestStackwalk` option is set to false. I think it's a good compromise between the added complexity and potentially surprising behavior: * when the client is old and the runtime is new everything works because the runtime supports `CollectTracingV2` * when the client is new but the runtime is old everything works until the new option is used. When it's used the session won't start as `CollectTracingV3` doesn't exist server side: there'd be no clear error message but it's documented in the option summary. * when both the client and the runtime are new either `CollectTracingV2` or `CollectTracingV3` may be used transparently for the user * we may use the same trick when we introduce `CollectTracingV4` The alternative is to implement version negotiation of some sort but I'd like to have your opinion before attempting this as handling the errors correctly wouldn't be easy (f.e. in [my current fork](criteo-forks@3946b4a#diff-e8365039cd36eae3dec611784fc7076be7dadeda1007733412aaaa63f40a748fR39) I just hide the exception) The testing turned out to be a bit complex as I needed to convert EventPipe stream to `TraceLog` to be able to read the stacktraces. I couldn't achieve that without writing data to a file. Afaiu the stackwalk may not work correctly without the rundown that only happens at the end of the session so I wonder if looking at the stacktraces with a live session is even possible (though iirc netfw+ETW could do that back in the days) ? Thanks for your time !
I've just realized this should be closed as completed as the implementation was merged. |
Background
Currently, all events emitted from the managed space by
EventSource
will do a stackwalk, see eventpipeinternal.cpp:121. The obligatory stackwalk leads to significant performance penalty for frequent events while in some cases the stacktrace is not required.As an example, I'm working on the association between CPU samples from
perf record
and spans ofSystem.Diagnostics.Activity
. The idea is to bring the profiling data on the application level, showing the cpu usage grouped by activity name. To make it work, I record all the activities withMicrosoft-Diagnostics-DiagnosticSource
and trackTraceSyncrhonousWorkStart
andTraceOperationStart
events fromSystem.Threading.Tasks.TplEventSource
to follow the asynchronous flow.With the stacktraces, the idea is not viable because the application spends more than half of the cpu time doing stackwalks. By disabling the stacktrace recording (just put
false
in the file linked above), the overhead drops down to around 3% showing really promising results.Proposed Feature
I propose to introduce the
bool collectStacks
parameter when creating an EventPipe session. The change is required both in the runtime implementation and theDiagnosticsClient
. For managed events the effect ofsession.collectStacks
would be straightforward as there's no way to control it in EventSource definition. For the native we could checksession.collectStacks && event.needStack
to respect ClrEtwAllMeta.lstWhy I think introducing this parameter on the session level is a good idea:
Usage Examples
Starting the session:
The text was updated successfully, but these errors were encountered: