Skip to content

perf(engine): collapse async state-machine layers on hot test path (#5687)#5690

Merged
thomhurst merged 2 commits intomainfrom
perf/5687-async
Apr 24, 2026
Merged

perf(engine): collapse async state-machine layers on hot test path (#5687)#5690
thomhurst merged 2 commits intomainfrom
perf/5687-async

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

Closes #5687.

Every test traverses 8+ stacked async state machines; four of those layers are pure overhead wrappers with no-op fast paths. This PR short-circuits them on the hot path.

  • RetryHelper fast path (TestCoordinator.cs) — skip RetryHelper.ExecuteWithRetry whenever retryLimit == 0, regardless of whether a timeout is configured. Timeout is already applied per-attempt inside TestExecutor.ExecuteAsync, so the retry wrapper was pure overhead for any test with only a timeout attribute. Previously reported at ~8.3% inclusive CPU.
  • TimeoutHelper wrapper inlined (TimeoutHelper.cs) — drop the outer async method + return true lambda whose only job was to adapt Func<CT, Task> to Func<CT, Task<bool>>. Fold the core straight into the non-generic Task-returning entry point; eliminates one state machine + closure allocation per timed test. Previously ~4.7% inclusive CPU.
  • TCS wrapper collapsed (TestRunner.cs) — stop wrapping ExecuteTestInternalAsync in an outer async method whose only job was fulfilling a TaskCompletionSource. Start the inner ValueTask, fast-path synchronous completion into the TCS with no Task allocation, otherwise attach a synchronous continuation that mirrors Faulted/Canceled/Completed onto the TCS. Previously ~11.9% inclusive / ~0% exclusive — 100% wrapper overhead.
  • Dispose span gated (TestCoordinator.cs) — check TUnitActivitySource.Source.HasListeners() before calling RunWithSpanAsync for test disposal, mirroring the gate already used in TestExecutor.ExecuteAsync. When no OTEL listener is attached, call DisposeTestInstanceCoreAsync directly and skip the per-test Func<Task> closure + state machine allocation.

Test plan

  • dotnet build TUnit.Engine/TUnit.Engine.csproj — builds clean across netstandard2.0;net8.0;net9.0;net10.0, 0 warnings/errors.
  • dotnet test TUnit.UnitTests/TUnit.UnitTests.csproj --framework net10.0 — 180/180 passed.
  • dotnet test TUnit.Engine.Tests/TUnit.Engine.Tests.csproj --framework net10.0 — 146/146 non-subprocess tests passed, 99 skipped. The 2 failures (VB.Test, FSharp.Test) reproduce on origin/main and are pre-existing environmental issues with the VB/F# subprocess test harness — not caused by this change.
  • Targeted retry/timeout engine tests pass.

Correctness notes

  • Timeout + retry: timeout is still applied per attempt inside TestExecutor.ExecuteAsync; RetryHelper still wraps when retryLimit > 0, so tests configured with both timeout and retry retain prior semantics.
  • Exception propagation: ContinueWith in TestRunner runs with ExecuteSynchronously and forwards Faulted/Canceled/Completed states onto the TCS; the returned ValueTask(asTask) surfaces the exception to the immediate awaiter. DisposeTestInstanceCoreAsync swallows exceptions internally, so the span-gate change does not alter observable behavior.

…5687)

Every test traversed 8+ stacked async state machines. Four of those layers
were pure overhead wrappers with no-op fast paths that can short-circuit
the state machine entirely.

- TestCoordinator.ExecuteTestInternalAsync: fast-path past RetryHelper whenever
  retryLimit == 0, regardless of whether a timeout is configured. Timeout is
  already applied inside TestExecutor.ExecuteAsync per attempt, so the retry
  wrapper was pure overhead when no retries are requested.
- TimeoutHelper.ExecuteWithTimeoutAsync: drop the outer async wrapper +
  `return true` lambda that only adapted Func<CT,Task> to Func<CT,Task<bool>>.
  Inline the core directly into the non-generic Task-returning method, which
  removes one state machine + closure allocation per timed test.
- TestRunner.ExecuteTestAsync: stop wrapping ExecuteTestInternalAsync in an
  outer async method whose only job was to fulfil a TaskCompletionSource.
  Start the inner ValueTask, fast-path synchronous completion into the TCS,
  otherwise attach a synchronous continuation that mirrors the outcome.
- TestCoordinator.DisposeTestInstanceWithSpanAsync: gate the OTEL
  RunWithSpanAsync wrapper on TUnitActivitySource.Source.HasListeners().
  When no listener is attached, call DisposeTestInstanceCoreAsync directly
  and skip the Func<Task> closure + state machine.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a well-motivated and generally correct PR. The four changes are logically independent, each removing a real layer of async overhead on the hot test path. A few things worth discussing:


TestCoordinator.csretryLimit == 0 fast path (correctness fix bundled with perf)

This is actually a bug fix, not just a perf tweak. The old condition retryLimit == 0 && !testTimeout.HasValue incorrectly routed timeout-but-no-retry tests through the RetryHelper wrapper, which did nothing useful for them. The timeout was always applied inside TestExecutor.ExecuteAsync per-attempt, so the retry wrapper was pure overhead. The new condition is correct.


TimeoutHelper.cs — generic overload collapsed

Clean and correct. The <T> overload existed only to thread a return true through the lambda adapter; inlining it removes one state machine and closure allocation per timed test with no semantic change.


TestRunner.cs — TCS wrapper collapsed ⚠️

The approach is sound, but a few subtle points:

AsTask() allocation in the async path: ValueTask.AsTask() on an async ValueTask method (backed by an IValueTaskSource pool in .NET 6+) allocates a fresh Task<T>. In the non-sync path, you're trading the old wrapper state machine box for a new Task allocation — roughly equivalent cost. The sync fast-path (IsCompletedSuccessfully) is the real win here (zero allocation). Worth being aware of.

Cancellation state behavior change: The original catch-and-rethrow path did tcs.SetException(ex) for all exceptions including OperationCanceledException, leaving the TCS in Faulted state. The new ContinueWith does TrySetCanceled() for the IsCanceled case, leaving it in Canceled state. This means dedup callers (who await existingTcs.Task) get a proper task-cancellation exception rather than a faulted aggregate — actually more semantically correct, but it is a behavior change. Given that ExecuteTestInternalAsync has broad internal exception handling and almost never throws, this is low-risk in practice.

Discarded ContinueWith return value: The Task<bool> returned by ContinueWith is not observed. If the TCS is in an unexpected state (shouldn't happen given the logic), any resulting exception from TrySet* would be silently swallowed. Minor robustness concern, though unlikely to trigger.

An alternative that avoids the AsTask() allocation while keeping a similar structure:

// Option: keep async wrapper but avoid it in the sync fast-path
if (innerTask.IsCompletedSuccessfully)
{
    tcs.SetResult(true);
    return default;
}
// For the async path, fall back to the simple wrapper — the allocation cost
// is the same as the old approach, and you still win on sync-completion.
return WrapAsync(innerTask, tcs);

private static async ValueTask WrapAsync(ValueTask inner, TaskCompletionSource<bool> tcs)
{
    try { await inner.ConfigureAwait(false); tcs.SetResult(true); }
    catch (Exception ex) { tcs.SetException(ex); throw; }
}

This preserves all the sync-path gains, avoids the AsTask() allocation, and restores the original faulted-vs-canceled semantics. The current approach is still correct — just worth considering if allocations on the async path matter.


TestCoordinator.cs — OTEL dispose guard

Correct. Worth noting: RunWithSpanAsync already calls HasListeners() internally (line 144 of TUnitActivitySource.cs), so the outer guard added here is redundant for correctness but avoids the Func<Task> closure allocation before that internal check fires. The PR's logic is right.


Overall

The changes are correct, the comments are clear, and the motivations are sound. The retryLimit fix may be the most impactful correctness change here. The AsTask() allocation and cancellation-state behavior change in TestRunner.cs are worth a second look, but neither is a blocker.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a well-motivated and generally correct PR. The four changes are logically independent, each removing a real layer of async overhead on the hot test path. A few things worth discussing:


TestCoordinator.csretryLimit == 0 fast path (correctness fix bundled with perf)

This is actually a bug fix, not just a perf tweak. The old condition retryLimit == 0 && !testTimeout.HasValue incorrectly routed timeout-but-no-retry tests through the RetryHelper wrapper, which did nothing useful for them. The timeout was always applied inside TestExecutor.ExecuteAsync per-attempt, so the retry wrapper was pure overhead. The new condition is correct.


TimeoutHelper.cs — generic overload collapsed

Clean and correct. The <T> overload existed only to thread a return true through the lambda adapter; inlining it removes one state machine and closure allocation per timed test with no semantic change.


TestRunner.cs — TCS wrapper collapsed ⚠️

The approach is sound, but a few subtle points:

AsTask() allocation in the async path: ValueTask.AsTask() on an async ValueTask method (backed by an IValueTaskSource pool in .NET 6+) allocates a fresh Task<T>. In the non-sync path, you're trading the old wrapper state machine box for a new Task allocation — roughly equivalent cost. The sync fast-path (IsCompletedSuccessfully) is the real win here (zero allocation). Worth being aware of.

Cancellation state behavior change: The original catch-and-rethrow path did tcs.SetException(ex) for all exceptions including OperationCanceledException, leaving the TCS in Faulted state. The new ContinueWith does TrySetCanceled() for the IsCanceled case, leaving it in Canceled state. This means dedup callers (who await existingTcs.Task) get a proper task-cancellation exception rather than a faulted aggregate — actually more semantically correct, but it is a behavior change. Given that ExecuteTestInternalAsync has broad internal exception handling and almost never throws, this is low-risk in practice.

Discarded ContinueWith return value: The Task<bool> returned by ContinueWith is not observed. If the TCS is in an unexpected state (shouldn't happen given the logic), any exception from TrySet* would be silently swallowed. Minor robustness concern.

An alternative that avoids the AsTask() allocation while keeping all sync-path gains:

```csharp
// Capture sync fast-path win, then fall back to simple async wrapper for the rest
if (innerTask.IsCompletedSuccessfully)
{
tcs.SetResult(true);
return default;
}
return WrapAsync(innerTask, tcs);

private static async ValueTask WrapAsync(ValueTask inner, TaskCompletionSource tcs)
{
try { await inner.ConfigureAwait(false); tcs.SetResult(true); }
catch (Exception ex) { tcs.SetException(ex); throw; }
}
```

This avoids the AsTask() allocation, restores the original faulted-vs-canceled exception semantics, and the continuation exception is surfaced naturally via the async throw. Still wins on the sync-completion path.


TestCoordinator.cs — OTEL dispose guard

Correct. Worth noting: RunWithSpanAsync already calls HasListeners() internally (line 144 of TUnitActivitySource.cs), so the outer guard here is redundant for correctness but avoids the Func<Task> closure allocation before that internal check fires. The motivation is right.


Overall

The changes are correct, the comments are clear, and the motivations are well-documented. The retryLimit condition fix may be the most impactful correctness improvement. The AsTask() allocation and cancellation-state behavior change in TestRunner.cs are worth a second look — neither is a blocker, but the alternative WrapAsync approach would sidestep both concerns cleanly.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 24, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Replace AsTask()+ContinueWith plumbing with a minimal async WrapAsync
helper so the async path no longer allocates a Task via AsTask().

- No .AsTask() allocation on the async-completion path.
- Restores original Faulted-vs-Canceled semantics (SetException mirrors
  all non-success outcomes; the outer ValueTask still observes failure
  because we rethrow after SetException).
- Removes the discarded ContinueWith return value.
- Sync fast path remains zero-alloc.

CI failure on modularpipeline (ubuntu/macos) is a snapshot mismatch in
TUnit.Mocks.SourceGenerator.Tests (Class_Implementing_Static_Abstract_Interface)
that reproduces on main at the same commit — pre-existing, unrelated to
this PR.
@thomhurst
Copy link
Copy Markdown
Owner Author

Addressed review feedback:

WrapAsync refactor (1ae4da7)

  • Replaced AsTask() + ContinueWith plumbing with a minimal WrapAsync helper.
  • No .AsTask() allocation on the async-completion path.
  • Restores original Faulted-vs-Canceled exception semantics (SetException for all non-success outcomes).
  • Sync fast path remains zero-alloc.
  • Outer ValueTask still observes failure because we rethrow after SetException.

CI modularpipeline (ubuntu/macos) failure
Not caused by this PR. Root cause is a snapshot mismatch in TUnit.Mocks.SourceGenerator.Tests (Class_Implementing_Static_Abstract_Interface) that reproduces on main at the same commit (see run 24864304955 on main, which fails on all three OSes with the identical Snapshot mismatch error). This is pre-existing, related to the recent MockBridge/static-abstract work, and independent of the async engine changes here.

Local TUnit.Engine.Tests (net10.0) and TUnit.UnitTests (net10.0) both green after the refactor (247 + 180 tests respectively).

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (follow-up)

The previous review concerns have all been addressed. Here's the updated assessment.


Previous concerns — resolved ✓

** — / allocation concern**: The PR has replaced with the pattern suggested in the last review. This eliminates the AsTask() allocation and the discarded continuation return value, and restores the original faulted-state semantics (tcs.SetException(ex) for all exceptions, including OperationCanceledException, rather than TrySetCanceled()). Good call.


Remaining observations

**Comment accuracy in **: The inline comment says "without allocating a Task via AsTask()". That's true, but an async ValueTask method still allocates a state machine object in the non-synchronous path (pooled on .NET 6+, but still allocated). The real win is the sync fast-path (IsCompletedSuccessfully) which avoids the state machine entirely. The comment slightly overstates what the async path saves — minor, but worth clarifying so readers don't have incorrect expectations about allocation on the non-sync path.

** return type**: Changing from async ValueTask to Task is correct and avoids the unnecessary state machine allocation. The caller already awaits it, so this is a transparent change. No issues.

#if NET placement: The fallthrough return DisposeTestInstanceCoreAsync(test) correctly sits outside the #if NET block, covering both NET and netstandard2.0 targets. This is a subtle layout to get right and it's done correctly.


Overall

This is a clean, well-motivated perf PR with strong correctness guarantees preserved. All four hot-path changes are independently correct and the previous review concerns have been addressed cleanly. The retryLimit == 0 condition change remains the most impactful correctness fix here — the sync fast-path in TestRunner is the most impactful allocation win. LGTM.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review (follow-up)

The previous review concerns have all been addressed. Here is the updated assessment.


Previous concerns — resolved

TestRunner.cs — ContinueWith / AsTask() allocation concern: The PR has replaced ContinueWith with the async ValueTask WrapAsync(ValueTask inner, ...) pattern suggested in the last review. This eliminates the AsTask() allocation and the discarded continuation return value, and restores the original faulted-state semantics (tcs.SetException(ex) for all exceptions, including OperationCanceledException, rather than TrySetCanceled()). Good call.


Remaining observations

Comment accuracy in ExecuteTestAsync: The inline comment says "without allocating a Task via AsTask()". That is true, but an async ValueTask method still allocates a state machine object in the non-synchronous path (pooled on .NET 6+, but still allocated). The real win is the sync fast-path (IsCompletedSuccessfully) which avoids the state machine entirely. The comment slightly overstates what the async path saves — minor, but worth clarifying so readers do not have incorrect expectations about allocations on the non-sync path.

DisposeTestInstanceWithSpanAsync return type: Changing from async ValueTask to Task is correct and avoids the unnecessary state machine allocation. The caller already awaits it, so this is a transparent change. No issues.

#if NET placement: The fallthrough return DisposeTestInstanceCoreAsync(test) correctly sits outside the #if NET block, covering both NET and netstandard2.0 targets. This is a subtle layout to get right and it is done correctly.


Overall

This is a clean, well-motivated performance PR with correctness guarantees preserved throughout. All four hot-path changes are independently correct and the previous review concerns have been addressed cleanly. The retryLimit == 0 condition change is the most impactful correctness fix (previously, timeout-but-no-retry tests were routed through RetryHelper for no reason). The sync fast-path in TestRunner is the most impactful allocation win. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(engine): collapse async state-machine layers on hot test path

1 participant