-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[UserEvents] Add end-to-end runtime test #121316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new test for UserEvents tracing on Linux that validates the runtime's ability to emit trace events through the user_events subsystem. The test uses the Microsoft.OneCollect.RecordTrace tool to capture events from a tracee process and validates that GC events were properly recorded.
Key changes include:
- Addition of a new test infrastructure for UserEvents tracing
- Upgrade of TraceEvent library from version 3.1.16 to 3.1.28
- Implementation of multi-process test orchestration with native signal handling
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tests/tracing/eventpipe/userevents/usereventstracee.cs | Implements tracee process that generates GC events for validation |
| src/tests/tracing/eventpipe/userevents/userevents.csproj | Project configuration including NuGet package references and build targets |
| src/tests/tracing/eventpipe/userevents/userevents.cs | Main test orchestration: spawns processes, collects traces, validates events |
| src/tests/tracing/eventpipe/userevents/dotnet-common.script | Configuration script for record-trace tool specifying provider and flags |
| eng/Versions.props | Updates TraceEvent package version to 3.1.28 |
I like this approach. |
| } | ||
|
|
||
| public static int TestEntryPoint() | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add checks for:
- process is elevated
- OS is Linux
- user_events are supported
Its likely at some point this test will be run in the wrong environment and the logs should make it trivial to diagnose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not even build the test on none linux platforms. CLRTestTargetUnsupported msbuild property could be used to exclude a test on specific platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLRTestTargetUnsupported is in the csproj, so it should hopefully prevent this test from running on non linux-x64/linux-arm64 platforms. Then again, I think more logic is needed to check for Alpine.
Added checks for geteuid and checking if sys/kernel/tracing/user_events_data exists
|
|
||
| private static bool ValidateTraceeEvents(string traceFilePath) | ||
| { | ||
| string etlxPath = TraceLog.CreateFromEventPipeDataFile(traceFilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can parse the .nettrace file directly using EventPipeEventSource in TraceEvent. This avoids creating a 2nd file that the test also needs to clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason when I tried it last, it didn't work (pilot error), switched to using EventPipeEventSource. Right now the events are "unknown" with id. Maybe its cause I'm using the Dynamic parser? I'll look into TraceEvent more closely
| recordTraceStartInfo.RedirectStandardError = true; | ||
|
|
||
| using Process traceeProcess = Process.Start(traceeStartInfo); | ||
| using Process recordTraceProcess = Process.Start(recordTraceStartInfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure tracer observes the tracee we should start the tracer process first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched. Originally I was wondering if we should pass --pid {traceePid}, but the process isolation should prevent this test from tracing another runtime test and mistaking the events should the tracee process crash. But I guess even in that case... it showed that user_events were collected.
| public static void Run() | ||
| { | ||
| long startTimestamp = Stopwatch.GetTimestamp(); | ||
| long targetTicks = Stopwatch.Frequency * 10; // 10s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about 1 second instead? Ideally we want tests to run quickly whenever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
| bool startEventFound = false; | ||
| bool stopEventFound = false; | ||
|
|
||
| source.AllEvents += (TraceEvent e) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to add more tests here that checks for metadata/fields, rundown events, callstacks etc or should we add some basic verification to this test or plan to extend existing EventPipe tests to also work over UserEvents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mainly to add a basic verification that the end to end runtime side (from accepting the ipc message to writing to the tracepoints worked). Since User_events is built on EventPipe, my initial thought is that duplicating the existing eventpipe tests for user events wouldn't be adding anything. I think we can add more tests later on, but not sure what coverage is good and reasonable. I'm not even sure yet if our CI machines have user_events, or if they run with elevated privileges, so this was mainly to see if we can have a basic E2E test going.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely not duplicate but maybe look on extending the existing event pipe tests to run over user events + additional validate logic but agree that is something we could look at later.
So, if this is mainly a smoke test then maybe we should make sure we at least hit things we know is handled in the one-collect library, during the work we hit a number of things that needed special attention, like activity id's, custom metadata and potential stack traces. Right now, these only tests one runtime start/stop event fired under very unique circumstances. Maybe we should do some short multi-threading scenario as well, making sure we won't hit any races in the code path unique to user events?
edcad95 to
d81ac77
Compare
bb6923e to
b1d4234
Compare
b1d4234 to
7395930
Compare
0286358 to
8c28d62
Compare
|
Looks like the reason the .NET runtime events aren't being captured in the .nettrace is because a session isn't actually being started. |
With user_events support added in #115265, this PR looks to test a basic end-to-end user_events scenario.
Alternative testing approaches considered
Existing EventPipe runtime tests
Existing EventPipe tests under
src/tests/tracing/eventpipeare incompatible with testing the user_events scenario due to:Starting EventPipeSessions through DiagnosticClient ❌
DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).
Using an EventPipeEventSource to validate events streamed through EventPipe ❌
User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating
.nettracetraces from tracepoint user_events.Native EventPipe Unit Tests
There are Mono Native EventPipe tests under
src/mono/mono/eventpipe/testthat are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.
End-to-End Testing Added
A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating
.nettraces. (Note: dotnet-trace wraps around RecordTrace)Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.
Approach
.nettracefor particular events from Tracee appDependencies:
user_events_data.