perf: reduce async state machine overhead in test execution pipeline by thomhurst · Pull Request #5214 · thomhurst/TUnit

thomhurst · 2026-03-22T12:14:53Z

Summary

TestCoordinator: Replace async lambda wrapper with direct method group reference for ExecuteWithRetry, eliminating state machine + closure allocation
RetryHelper: Change ExecuteWithRetry to accept Func<ValueTask> instead of Func<Task> to avoid adapter allocation; cache Task.FromResult results as static readonly fields in ShouldRetry
TestBuilder / TestBuilderPipeline: Cache DateTimeOffset.UtcNow where two adjacent reads occurred to ensure consistent Start/End timestamps and eliminate redundant syscalls
DiscoveryCircuitBreaker: Replace Stopwatch instance with Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() on .NET 8+ to avoid object allocation
TestExecutor: Remove async/await in discovery hook methods, returning tasks directly

Rationale

Profiling shows ~3.8% exclusive CPU in AsyncMethodBuilderCore.Start (2.59%) and AsyncLocalValueMap.Set (1.21%) from deep async call chains, plus 1.16% in DateTime.get_UtcNow. These changes reduce async state machine overhead and unnecessary clock reads.

Test plan

Verify all existing tests pass (no behavioral changes)
Confirm build succeeds across all target frameworks

- Elide async/await in forwarding methods (TestExecutor discovery hooks, TestMethodInvoker, RetryHelper.ShouldRetry/ApplyBackoffDelay) to avoid unnecessary state machine allocations - Replace async lambda wrappers with direct ValueTask construction in TestExecutor.ExecuteTestAsync and TestCoordinator retry path - Change RetryHelper.ExecuteWithRetry to accept Func<ValueTask> instead of Func<Task> to avoid ValueTask-to-Task conversion on the retry path - Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed, TestBuilder, and TestBuilderPipeline to avoid redundant clock reads - Replace Stopwatch instance with Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() in DiscoveryCircuitBreaker on .NET 8+ (falls back to Stopwatch on netstandard2.0)

claude

Code Review

Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.

Solid wins

****: Cleanest change in the PR. Replacing with is a genuine allocation improvement (eliminates state machine + closure), and it works perfectly now that ExecuteWithRetry accepts Func<ValueTask>.
** / **: Caching DateTimeOffset.UtcNow where two adjacent reads occurred is correct — it ensures consistent Start/End timestamps for failed tests and eliminates a syscall.
DiscoveryCircuitBreaker.cs: The Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() approach on #if NET avoids the Stopwatch object allocation. The #if NET symbol is the right choice here (covers net8.0+, not netstandard2.0).

Issues / Questions

1. — no-op caching

The PR description says "where the same timestamp was read twice in adjacent lines", but there's only one call in (the line). The cached variable is only used once, so this change has no effect on performance. The in the initializer reads , not . This could be removed to keep the diff clean, or the PR description should be corrected.

2. discovery hooks — allocation concern

is and iterates hooks — it's virtually never synchronously complete. Calling on an incomplete allocates a object internally. The previous wrapper allocated a state machine instead. These are roughly allocation-equivalent for the common (non-trivially-complete) case, so this is more of a style change than a perf win for discovery hooks. If these callers were themselves, the step could be avoided entirely. Not a blocker, but worth being clear about what's actually saved here.

3. — Task→ValueTask→Task round-trip

returns . Wrapping it in is correct (no copy, just a wrapper). Then on an incomplete allocates again. The original wrapper path also allocated a state machine for this branch. So this is allocation-neutral in the common case — not a regression, but the PR description's claim of "avoiding a state machine + closure allocation" applies more to than here.

The non-executor path () is a genuine win — no allocation at all.

4. — allocates on every retry

allocates a new each call. In the retry path this is invoked on each exception. A cached alternative would avoid this:

This is a micro-optimization, but since the original was (which also allocated a state machine), neither approach is particularly allocation-heavy. Low priority.

Minor

and both contain identical logic for the branch. This pre-existing duplication is unrelated to this PR but could be a future consolidation target.
The null-check removal ( → guarded by ) is correct since is always initialized in that branch.

Overall this is a clean, well-motivated PR. The and caching changes are the clearest wins. The async elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and paths.

claude

Code Review

Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.

Solid wins

TestCoordinator.cs: Cleanest change in the PR. Replacing the async lambda wrapper with a direct method group reference is a genuine allocation improvement (eliminates state machine + closure), and works perfectly now that ExecuteWithRetry accepts Func<ValueTask>.
TestBuilder.cs / TestBuilderPipeline.cs: Caching DateTimeOffset.UtcNow where two adjacent reads occurred is correct — ensures consistent Start/End timestamps for failed tests and eliminates a redundant syscall.
DiscoveryCircuitBreaker.cs: The Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() approach on #if NET avoids the Stopwatch object allocation. The #if NET symbol is the right choice (covers net8.0+, not netstandard2.0).

Issues / Questions

1. TestStateManager.MarkFailed — no-op caching

The PR description says "where the same timestamp was read twice in adjacent lines", but there is only one DateTimeOffset.UtcNow call in MarkFailed (the ??= line). The cached now variable is only used once, and test.Result uses test.EndTime (not another UtcNow call). This change has zero performance impact and could be removed to keep the diff clean, or the PR description should be corrected.

2. TestExecutor.cs discovery hooks — .AsTask() allocation concern

HookExecutor.ExecuteBeforeTestDiscoveryHooksAsync is async ValueTask and iterates hooks — it is virtually never synchronously complete. Calling .AsTask() on an incomplete ValueTask allocates a Task object internally. The previous async Task wrapper allocated a state machine instead. These are roughly allocation-equivalent in the common case, so this is more of a style change than a clear perf win for discovery hooks. If the callers were changed to async ValueTask, the .AsTask() step could be eliminated entirely. Not a blocker, but worth being clear about what is actually saved here.

3. TestMethodInvoker.cs — Task→ValueTask→Task round-trip

test.InvokeTestAsync(...) returns Task. Wrapping it in new ValueTask(task) is correct (just a struct wrapper, no copy). Then .AsTask() on an incomplete ValueTask allocates again. The original async wrapper path also allocated a state machine for this branch. This is allocation-neutral in the common case — the PR description's claim of "avoiding a state machine + closure allocation" applies more to TestCoordinator.cs than here.

The non-executor fast path (return test.InvokeTestAsync(...) directly) is a genuine win — zero overhead.

4. ShouldRetry — Task.FromResult allocates on every call

Task.FromResult<T> allocates a new Task<bool> on each invocation. Since the original async Task<bool> also allocated a state machine, neither is zero-cost. A simple improvement:

private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);

Low priority micro-optimization, but it is cheap to add and eliminates the allocation on the retry hot path.

Minor

TestMethodInvoker and TestExecutor.ExecuteTestAsync contain identical logic for the testExecutor branch. Pre-existing duplication, not introduced here, but a future consolidation target.
The DiscoveryCircuitBreaker.Dispose() null-check removal (_stopwatch?.Stop() → _stopwatch.Stop() guarded by #if !NET) is correct since _stopwatch is always initialized in that branch.

Overall a clean, well-motivated PR. The TestCoordinator.cs simplification and DateTimeOffset caching changes in TestBuilder/TestBuilderPipeline are the clearest wins. The async-elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and TestMethodInvoker paths. Looks good to merge.

claude

Code Review (Follow-up)

This review compares against the previous Claude review to check which concerns have been addressed.

What's been addressed ✅

Issue #4 — ShouldRetry caching: The suggestion to cache Task.FromResult(true/false) as static readonly fields has been implemented:

private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);

This is a real improvement — eliminates one Task<bool> allocation per retry on both early-exit paths. Good addition.

Still open from previous review

Issue #1 — PR description inaccuracy: The description still says "Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed", but TestStateManager.cs is not in the diff. Worth correcting the description to avoid confusing reviewers.

Issue #2 — .AsTask() allocation for discovery hooks: ExecuteBeforeTestDiscoveryHooksAsync / ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs convert a (nearly always incomplete) ValueTask to Task via .AsTask(), which allocates a Task object. The original async Task wrappers also allocated a state machine. These are roughly allocation-equivalent — this is a style change, not a perf win. Not a blocker, but the PR description shouldn't count this as a reduction in state machine overhead.

Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: The new ValueTask(task).AsTask() path is still present. Same concern as above — the fast path (no executor) is a real win; the executor path is allocation-neutral, not an improvement.

New observations

DiscoveryCircuitBreaker.Dispose() — null-check removal correctness: The change from _stopwatch?.Stop() to _stopwatch.Stop() (under #if !NET) is safe because _stopwatch is assigned in the constructor on that branch. However, DiscoveryCircuitBreaker is sealed, so there's no risk of a derived class calling Dispose() before the constructor runs. Correct as written.

GetResourceUsage — minor improvement: Caching elapsed to avoid calling GetElapsed() twice in the same method is a small but clean win, independent of the #if NET optimization.

Summary

The s_shouldRetryTrue/False caching addition is a concrete improvement over the previous version. The TestCoordinator.cs lambda-to-method-group change and TestBuilder/TestBuilderPipeline timestamp caching remain the clearest wins in the PR. The remaining open items (#1, #2, #3) from the prior review are non-blocking. This looks good to merge as-is, with a minor suggestion to correct the PR description.

claude

Code Review (Follow-up on latest commit)

This reviews the latest commit 74e63ecc7b33 perf: eliminate unnecessary Task/ValueTask round-trips against the previous Claude review.

All prior issues addressed ✅

Issue #2 — .AsTask() allocation for discovery hooks: Now fixed. ExecuteBeforeTestDiscoveryHooksAsync and ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs now return ValueTask directly (pass-through from _hookExecutor), eliminating the .AsTask() heap allocation. This is a genuine perf improvement over the previous version.

Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: Now fixed. The return type is changed to ValueTask and the .AsTask() call in the executor branch is gone. The fast path wraps Task in new ValueTask(task) (struct, no heap allocation). Clean.

TestExecutor.cs — new ValueTask<List<Exception>>(task.AsTask()) wrappers: All four occurrences are gone. _hookExecutor.ExecuteAfterTestSessionHooksAsync and ExecuteAfterAssemblyHooksAsync now return ValueTask<List<Exception>> directly to their callers, skipping the unnecessary Task conversion. This is the most impactful change in the latest commit.

Remaining minor items (non-blocking)

Missing newline at EOF in TestMethodInvoker.cs: The No newline at end of file marker is still present. Minor, but worth a one-character fix.
Pre-existing duplication: TestMethodInvoker.InvokeTestAsync and TestExecutor.ExecuteTestAsync still both contain the testExecutor.ExecuteTest(...) pattern independently. Not introduced by this PR — a future consolidation opportunity.

Summary

The three-commit arc of this PR is well-structured: initial optimization → fix no-op + cache Task.FromResult → eliminate .AsTask() round-trips. Each commit responds directly to review feedback. The final state is meaningfully better than the original in all targeted paths. Ready to merge.

thomhurst had a problem deploying to Pull Requests March 22, 2026 12:14 — with GitHub Actions Failure

thomhurst temporarily deployed to Pull Requests March 22, 2026 12:14 — with GitHub Actions Inactive

claude bot reviewed Mar 22, 2026

View reviewed changes

fix: revert no-op caching and cache Task.FromResult in ShouldRetry

437d1f2

thomhurst had a problem deploying to Pull Requests March 22, 2026 12:22 — with GitHub Actions Failure

thomhurst temporarily deployed to Pull Requests March 22, 2026 12:22 — with GitHub Actions Inactive

claude bot reviewed Mar 22, 2026

View reviewed changes

perf: eliminate unnecessary Task/ValueTask round-trips

74e63ec

thomhurst temporarily deployed to Pull Requests March 22, 2026 12:43 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests March 22, 2026 12:43 — with GitHub Actions Failure

claude bot reviewed Mar 22, 2026

View reviewed changes

thomhurst enabled auto-merge (squash) March 22, 2026 12:45

thomhurst merged commit 3b7399e into main Mar 22, 2026
14 of 15 checks passed

thomhurst deleted the perf/reduce-async-overhead branch March 22, 2026 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: reduce async state machine overhead in test execution pipeline#5214

perf: reduce async state machine overhead in test execution pipeline#5214
thomhurst merged 3 commits intomainfrom
perf/reduce-async-overhead

thomhurst commented Mar 22, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rationale

Test plan

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

Solid wins

Issues / Questions

Minor

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

Solid wins

Issues / Questions

Minor

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review (Follow-up)

What's been addressed ✅

Still open from previous review

New observations

Summary

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review (Follow-up on latest commit)

All prior issues addressed ✅

Remaining minor items (non-blocking)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomhurst commented Mar 22, 2026 •

edited

Loading