perf: reduce async state machine overhead in test execution pipeline#5214
perf: reduce async state machine overhead in test execution pipeline#5214
Conversation
- Elide async/await in forwarding methods (TestExecutor discovery hooks, TestMethodInvoker, RetryHelper.ShouldRetry/ApplyBackoffDelay) to avoid unnecessary state machine allocations - Replace async lambda wrappers with direct ValueTask construction in TestExecutor.ExecuteTestAsync and TestCoordinator retry path - Change RetryHelper.ExecuteWithRetry to accept Func<ValueTask> instead of Func<Task> to avoid ValueTask-to-Task conversion on the retry path - Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed, TestBuilder, and TestBuilderPipeline to avoid redundant clock reads - Replace Stopwatch instance with Stopwatch.GetTimestamp() / Stopwatch.GetElapsedTime() in DiscoveryCircuitBreaker on .NET 8+ (falls back to Stopwatch on netstandard2.0)
There was a problem hiding this comment.
Code Review
Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.
Solid wins
-
****: Cleanest change in the PR. Replacing with is a genuine allocation improvement (eliminates state machine + closure), and it works perfectly now that
ExecuteWithRetryacceptsFunc<ValueTask>. -
** / **: Caching
DateTimeOffset.UtcNowwhere two adjacent reads occurred is correct — it ensures consistentStart/Endtimestamps for failed tests and eliminates a syscall. -
DiscoveryCircuitBreaker.cs: TheStopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()approach on#if NETavoids theStopwatchobject allocation. The#if NETsymbol is the right choice here (covers net8.0+, not netstandard2.0).
Issues / Questions
1. — no-op caching
The PR description says "where the same timestamp was read twice in adjacent lines", but there's only one call in (the line). The cached variable is only used once, so this change has no effect on performance. The in the initializer reads , not . This could be removed to keep the diff clean, or the PR description should be corrected.
2. discovery hooks — allocation concern
is and iterates hooks — it's virtually never synchronously complete. Calling on an incomplete allocates a object internally. The previous wrapper allocated a state machine instead. These are roughly allocation-equivalent for the common (non-trivially-complete) case, so this is more of a style change than a perf win for discovery hooks. If these callers were themselves, the step could be avoided entirely. Not a blocker, but worth being clear about what's actually saved here.
3. — Task→ValueTask→Task round-trip
returns . Wrapping it in is correct (no copy, just a wrapper). Then on an incomplete allocates again. The original wrapper path also allocated a state machine for this branch. So this is allocation-neutral in the common case — not a regression, but the PR description's claim of "avoiding a state machine + closure allocation" applies more to than here.
The non-executor path () is a genuine win — no allocation at all.
4. — allocates on every retry
allocates a new each call. In the retry path this is invoked on each exception. A cached alternative would avoid this:
This is a micro-optimization, but since the original was (which also allocated a state machine), neither approach is particularly allocation-heavy. Low priority.
Minor
- and both contain identical logic for the branch. This pre-existing duplication is unrelated to this PR but could be a future consolidation target.
- The null-check removal ( → guarded by ) is correct since is always initialized in that branch.
Overall this is a clean, well-motivated PR. The and caching changes are the clearest wins. The async elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and paths.
There was a problem hiding this comment.
Code Review
Well-structured performance PR. The changes are correct and the overall direction is good — reducing state machine allocations in hot paths is the right focus. A few observations worth discussing.
Solid wins
-
TestCoordinator.cs: Cleanest change in the PR. Replacing the async lambda wrapper with a direct method group reference is a genuine allocation improvement (eliminates state machine + closure), and works perfectly now that
ExecuteWithRetryacceptsFunc<ValueTask>. -
TestBuilder.cs / TestBuilderPipeline.cs: Caching
DateTimeOffset.UtcNowwhere two adjacent reads occurred is correct — ensures consistent Start/End timestamps for failed tests and eliminates a redundant syscall. -
DiscoveryCircuitBreaker.cs: The
Stopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()approach on#if NETavoids theStopwatchobject allocation. The#if NETsymbol is the right choice (covers net8.0+, not netstandard2.0).
Issues / Questions
1. TestStateManager.MarkFailed — no-op caching
The PR description says "where the same timestamp was read twice in adjacent lines", but there is only one DateTimeOffset.UtcNow call in MarkFailed (the ??= line). The cached now variable is only used once, and test.Result uses test.EndTime (not another UtcNow call). This change has zero performance impact and could be removed to keep the diff clean, or the PR description should be corrected.
2. TestExecutor.cs discovery hooks — .AsTask() allocation concern
HookExecutor.ExecuteBeforeTestDiscoveryHooksAsync is async ValueTask and iterates hooks — it is virtually never synchronously complete. Calling .AsTask() on an incomplete ValueTask allocates a Task object internally. The previous async Task wrapper allocated a state machine instead. These are roughly allocation-equivalent in the common case, so this is more of a style change than a clear perf win for discovery hooks. If the callers were changed to async ValueTask, the .AsTask() step could be eliminated entirely. Not a blocker, but worth being clear about what is actually saved here.
3. TestMethodInvoker.cs — Task→ValueTask→Task round-trip
test.InvokeTestAsync(...) returns Task. Wrapping it in new ValueTask(task) is correct (just a struct wrapper, no copy). Then .AsTask() on an incomplete ValueTask allocates again. The original async wrapper path also allocated a state machine for this branch. This is allocation-neutral in the common case — the PR description's claim of "avoiding a state machine + closure allocation" applies more to TestCoordinator.cs than here.
The non-executor fast path (return test.InvokeTestAsync(...) directly) is a genuine win — zero overhead.
4. ShouldRetry — Task.FromResult allocates on every call
Task.FromResult<T> allocates a new Task<bool> on each invocation. Since the original async Task<bool> also allocated a state machine, neither is zero-cost. A simple improvement:
private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);Low priority micro-optimization, but it is cheap to add and eliminates the allocation on the retry hot path.
Minor
TestMethodInvokerandTestExecutor.ExecuteTestAsynccontain identical logic for thetestExecutorbranch. Pre-existing duplication, not introduced here, but a future consolidation target.- The
DiscoveryCircuitBreaker.Dispose()null-check removal (_stopwatch?.Stop()→_stopwatch.Stop()guarded by#if !NET) is correct since_stopwatchis always initialized in that branch.
Overall a clean, well-motivated PR. The TestCoordinator.cs simplification and DateTimeOffset caching changes in TestBuilder/TestBuilderPipeline are the clearest wins. The async-elision changes are correct but the allocation benefit is more nuanced than described for the discovery hook and TestMethodInvoker paths. Looks good to merge.
There was a problem hiding this comment.
Code Review (Follow-up)
This review compares against the previous Claude review to check which concerns have been addressed.
What's been addressed ✅
Issue #4 — ShouldRetry caching: The suggestion to cache Task.FromResult(true/false) as static readonly fields has been implemented:
private static readonly Task<bool> s_shouldRetryTrue = Task.FromResult(true);
private static readonly Task<bool> s_shouldRetryFalse = Task.FromResult(false);This is a real improvement — eliminates one Task<bool> allocation per retry on both early-exit paths. Good addition.
Still open from previous review
Issue #1 — PR description inaccuracy: The description still says "Cache DateTimeOffset.UtcNow in TestStateManager.MarkFailed", but TestStateManager.cs is not in the diff. Worth correcting the description to avoid confusing reviewers.
Issue #2 — .AsTask() allocation for discovery hooks: ExecuteBeforeTestDiscoveryHooksAsync / ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs convert a (nearly always incomplete) ValueTask to Task via .AsTask(), which allocates a Task object. The original async Task wrappers also allocated a state machine. These are roughly allocation-equivalent — this is a style change, not a perf win. Not a blocker, but the PR description shouldn't count this as a reduction in state machine overhead.
Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: The new ValueTask(task).AsTask() path is still present. Same concern as above — the fast path (no executor) is a real win; the executor path is allocation-neutral, not an improvement.
New observations
DiscoveryCircuitBreaker.Dispose() — null-check removal correctness: The change from _stopwatch?.Stop() to _stopwatch.Stop() (under #if !NET) is safe because _stopwatch is assigned in the constructor on that branch. However, DiscoveryCircuitBreaker is sealed, so there's no risk of a derived class calling Dispose() before the constructor runs. Correct as written.
GetResourceUsage — minor improvement: Caching elapsed to avoid calling GetElapsed() twice in the same method is a small but clean win, independent of the #if NET optimization.
Summary
The s_shouldRetryTrue/False caching addition is a concrete improvement over the previous version. The TestCoordinator.cs lambda-to-method-group change and TestBuilder/TestBuilderPipeline timestamp caching remain the clearest wins in the PR. The remaining open items (#1, #2, #3) from the prior review are non-blocking. This looks good to merge as-is, with a minor suggestion to correct the PR description.
There was a problem hiding this comment.
Code Review (Follow-up on latest commit)
This reviews the latest commit 74e63ecc7b33 perf: eliminate unnecessary Task/ValueTask round-trips against the previous Claude review.
All prior issues addressed ✅
Issue #2 — .AsTask() allocation for discovery hooks: Now fixed. ExecuteBeforeTestDiscoveryHooksAsync and ExecuteAfterTestDiscoveryHooksAsync in TestExecutor.cs now return ValueTask directly (pass-through from _hookExecutor), eliminating the .AsTask() heap allocation. This is a genuine perf improvement over the previous version.
Issue #3 — TestMethodInvoker Task→ValueTask→Task round-trip: Now fixed. The return type is changed to ValueTask and the .AsTask() call in the executor branch is gone. The fast path wraps Task in new ValueTask(task) (struct, no heap allocation). Clean.
TestExecutor.cs — new ValueTask<List<Exception>>(task.AsTask()) wrappers: All four occurrences are gone. _hookExecutor.ExecuteAfterTestSessionHooksAsync and ExecuteAfterAssemblyHooksAsync now return ValueTask<List<Exception>> directly to their callers, skipping the unnecessary Task conversion. This is the most impactful change in the latest commit.
Remaining minor items (non-blocking)
- Missing newline at EOF in
TestMethodInvoker.cs: TheNo newline at end of filemarker is still present. Minor, but worth a one-character fix. - Pre-existing duplication:
TestMethodInvoker.InvokeTestAsyncandTestExecutor.ExecuteTestAsyncstill both contain thetestExecutor.ExecuteTest(...)pattern independently. Not introduced by this PR — a future consolidation opportunity.
Summary
The three-commit arc of this PR is well-structured: initial optimization → fix no-op + cache Task.FromResult → eliminate .AsTask() round-trips. Each commit responds directly to review feedback. The final state is meaningfully better than the original in all targeted paths. Ready to merge.
Summary
ExecuteWithRetry, eliminating state machine + closure allocationExecuteWithRetryto acceptFunc<ValueTask>instead ofFunc<Task>to avoid adapter allocation; cacheTask.FromResultresults as static readonly fields inShouldRetryDateTimeOffset.UtcNowwhere two adjacent reads occurred to ensure consistent Start/End timestamps and eliminate redundant syscallsStopwatchinstance withStopwatch.GetTimestamp()/Stopwatch.GetElapsedTime()on .NET 8+ to avoid object allocationRationale
Profiling shows ~3.8% exclusive CPU in
AsyncMethodBuilderCore.Start(2.59%) andAsyncLocalValueMap.Set(1.21%) from deep async call chains, plus 1.16% inDateTime.get_UtcNow. These changes reduce async state machine overhead and unnecessary clock reads.Test plan