Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Nov 3, 2025

With user_events support added in #115265, this PR looks to test a basic end-to-end user_events scenario.

Alternative testing approaches considered

Existing EventPipe runtime tests

Existing EventPipe tests under src/tests/tracing/eventpipe are incompatible with testing the user_events scenario due to:

  1. Starting EventPipeSessions through DiagnosticClient ❌
    DiagnosticClient does not have the support to send the IPC command to start a user_events based EventPipe session, because it requires the user_events_data file descriptor to be sent using SCM_RIGHTS (see https://github.com/dotnet/diagnostics/blob/main/documentation/design-docs/ipc-protocol.md#passing_file_descriptor).

  2. Using an EventPipeEventSource to validate events streamed through EventPipe ❌
    User_events based EventPipe sessions do not stream events. Instead, events are written to configured TraceFS tracepoints, and currently only RecordTrace from https://github.com/microsoft/one-collect/ is capable of generating .nettrace traces from tracepoint user_events.

Native EventPipe Unit Tests

There are Mono Native EventPipe tests under src/mono/mono/eventpipe/test that are not hooked up to CI. These unit tests are built through linking the shared EventPipe interface library against Mono's EventPipe runtime shims and using Mono's test runner. To update these unit tests into the standard runtime tests structure, a larger investment is needed to either migrate EventPipe from using runtime shims to a OS Pal source shared by coreclr/nativeaot/mono (see #118874 (comment)) or build an EventPipe shared library specifically for the runtime test using a runtime-agnostic shim.
As existing mono unit tests don't currently test IPC commands, coupled with no existing runtime infrastructure to read events from tracepoints, there would be even more work on top of updating mono native eventpipe unit tests to even test the user_events scenario.

End-to-End Testing Added

A low-cost approach to testing .NET Runtime's user_events functionality leverages RecordTrace from https://github.com/microsoft/one-collect/, which is already capable of starting user_events based EventPipe sessions and generating .nettraces. (Note: dotnet-trace wraps around RecordTrace)
Despite adding an external dependency which allows RecordTrace failures to fail the end-to-end test, user_events was initially added with the intent to depend on RecordTrace for the end-to-end scenario, and there are no other ways to functionally test a user_events based eventpipe session.

Approach

  1. Start Tracee app
  2. Start tracing with RecordTrace + dotnet-common profile script
  3. Stop RecordTrace (triggers .nettrace generation) and Tracee app
  4. Validate the .nettrace for particular events from Tracee app

Dependencies:

  • CI runs the runtime test in an environment that supports user_events
  • CI runs the runtime test with permissions to access user_events_data.
  • Microsoft.OneCollect.RecordTrace (transitively resolved through a dotnet diagnostics public feed)
  • Microsoft.Diagnostics.Tracing.TraceEvent 3.1.24+ (to read NetTrace V6)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new test for UserEvents tracing on Linux that validates the runtime's ability to emit trace events through the user_events subsystem. The test uses the Microsoft.OneCollect.RecordTrace tool to capture events from a tracee process and validates that GC events were properly recorded.

Key changes include:

  • Addition of a new test infrastructure for UserEvents tracing
  • Upgrade of TraceEvent library from version 3.1.16 to 3.1.28
  • Implementation of multi-process test orchestration with native signal handling

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/tests/tracing/eventpipe/userevents/usereventstracee.cs Implements tracee process that generates GC events for validation
src/tests/tracing/eventpipe/userevents/userevents.csproj Project configuration including NuGet package references and build targets
src/tests/tracing/eventpipe/userevents/userevents.cs Main test orchestration: spawns processes, collects traces, validates events
src/tests/tracing/eventpipe/userevents/dotnet-common.script Configuration script for record-trace tool specifying provider and flags
eng/Versions.props Updates TraceEvent package version to 3.1.28

@jkotas
Copy link
Member

jkotas commented Nov 3, 2025

a basic end-to-end user_events scenario

I like this approach.

}

public static int TestEntryPoint()
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add checks for:

  1. process is elevated
  2. OS is Linux
  3. user_events are supported

Its likely at some point this test will be run in the wrong environment and the logs should make it trivial to diagnose.

Copy link
Member

@lateralusX lateralusX Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not even build the test on none linux platforms. CLRTestTargetUnsupported msbuild property could be used to exclude a test on specific platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLRTestTargetUnsupported is in the csproj, so it should hopefully prevent this test from running on non linux-x64/linux-arm64 platforms. Then again, I think more logic is needed to check for Alpine.

Added checks for geteuid and checking if sys/kernel/tracing/user_events_data exists


private static bool ValidateTraceeEvents(string traceFilePath)
{
string etlxPath = TraceLog.CreateFromEventPipeDataFile(traceFilePath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can parse the .nettrace file directly using EventPipeEventSource in TraceEvent. This avoids creating a 2nd file that the test also needs to clean up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason when I tried it last, it didn't work (pilot error), switched to using EventPipeEventSource. Right now the events are "unknown" with id. Maybe its cause I'm using the Dynamic parser? I'll look into TraceEvent more closely

recordTraceStartInfo.RedirectStandardError = true;

using Process traceeProcess = Process.Start(traceeStartInfo);
using Process recordTraceProcess = Process.Start(recordTraceStartInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure tracer observes the tracee we should start the tracer process first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched. Originally I was wondering if we should pass --pid {traceePid}, but the process isolation should prevent this test from tracing another runtime test and mistaking the events should the tracee process crash. But I guess even in that case... it showed that user_events were collected.

public static void Run()
{
long startTimestamp = Stopwatch.GetTimestamp();
long targetTicks = Stopwatch.Frequency * 10; // 10s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 1 second instead? Ideally we want tests to run quickly whenever possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

bool startEventFound = false;
bool stopEventFound = false;

source.AllEvents += (TraceEvent e) =>
Copy link
Member

@lateralusX lateralusX Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to add more tests here that checks for metadata/fields, rundown events, callstacks etc or should we add some basic verification to this test or plan to extend existing EventPipe tests to also work over UserEvents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mainly to add a basic verification that the end to end runtime side (from accepting the ipc message to writing to the tracepoints worked). Since User_events is built on EventPipe, my initial thought is that duplicating the existing eventpipe tests for user events wouldn't be adding anything. I think we can add more tests later on, but not sure what coverage is good and reasonable. I'm not even sure yet if our CI machines have user_events, or if they run with elevated privileges, so this was mainly to see if we can have a basic E2E test going.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely not duplicate but maybe look on extending the existing event pipe tests to run over user events + additional validate logic but agree that is something we could look at later.

So, if this is mainly a smoke test then maybe we should make sure we at least hit things we know is handled in the one-collect library, during the work we hit a number of things that needed special attention, like activity id's, custom metadata and potential stack traces. Right now, these only tests one runtime start/stop event fired under very unique circumstances. Maybe we should do some short multi-threading scenario as well, making sure we won't hit any races in the code path unique to user events?

@mdh1418 mdh1418 force-pushed the user_events_functional_runtime_test branch 5 times, most recently from bb6923e to b1d4234 Compare November 13, 2025 15:31
@mdh1418 mdh1418 force-pushed the user_events_functional_runtime_test branch from 0286358 to 8c28d62 Compare November 18, 2025 18:35
@mdh1418
Copy link
Member Author

mdh1418 commented Nov 26, 2025

Looks like the reason the .NET runtime events aren't being captured in the .nettrace is because a session isn't actually being started.
On helix machines, the diagnostic port is created under helix's provisioned environment's tempdirectory which is of the form /datadisks/disk1/work/<workID>/t/. RecordTrace currently only scans /tmp/ for these diagnostic ports. I'm planning on adding a config value for eventpipe/userevents debugging for more stresslogs for better diagnostics on whether the point of failure is in the runtime side or external.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants