perf(engine): reduce message-bus contention on test start (#5685)#5695
perf(engine): reduce message-bus contention on test start (#5685)#5695
Conversation
Two fixes targeting the 12.95% exclusive CPU cost in WaitHandle.WaitOneNoCheck observed in ~1000-test profiles: 1. InProgress is now awaited in TestCoordinator.ExecuteTestInternalAsync instead of fire-and-forget. Back-pressure from MTP's bounded AsynchronousMessageBus channel now spreads publishes across each test task rather than fanning 1000+ concurrent PublishAsync writers into the channel at once. Ordering is preserved (InProgress still published before the terminal Passed/Failed/Skipped/Cancelled message per test). 2. EventReceiverOrchestrator gains RegisterClassInstanceReceiver, a targeted single-object registration used by TestInitializer.PrepareTest in place of the previous full-iteration RegisterReceivers call. Only the freshly created ClassInstance can newly become an event receiver between the initial registration (pre-instance) and PrepareTest (post-instance), so the second iteration over attributes/arguments/etc. was pure overhead.
There was a problem hiding this comment.
Code Review
Good, well-targeted performance fix. The profiling motivation (12.95% exclusive CPU on WaitHandle.WaitOneNoCheck) is clearly stated and the changes are appropriately surgical. A few things to flag:
Fix 1: Awaiting InProgress ✅
The logic is correct. The old fire-and-forget pattern was hammering MTP's bounded channel with 1000+ concurrent writers at startup; awaiting provides natural back-pressure. Worth acknowledging the trade-off though: individual test tasks now block momentarily on the InProgress publish before executing, so wall-clock time for the first few tests could be slightly higher when the channel is congested. That's the correct trade-off for overall throughput, but it's worth a note in the summary since the original comment explicitly said "it doesn't need to block test execution."
Fix 2: RegisterClassInstanceReceiver ✅ (with one style concern)
The behavioral parity is correct — the dedup invariants (_initializedObjects, _registeredFirstEventReceiverTypes) are preserved, and both data structures are ConcurrentHashSet<> so the new code is thread-safe. The single-return vs continue difference from RegisterReceivers is semantically identical here since there's only one object.
One subtle correctness note: The method silently succeeds without registering if classInstance is a IFirstTest* receiver but the type is already in _registeredFirstEventReceiverTypes. This matches the existing RegisterReceivers contract, but it means a shared class instance that implements one of those interfaces will only get registered once across tests in the same class — which is presumably the intended dedup behaviour.
Comment style concern
Per project conventions (CLAUDE.md), code comments should explain the why when it's non-obvious, but should not reference issue numbers or the current task/fix since those belong in the PR description and go stale. Several of the new comments violate this:
// in TestCoordinator.cs
// A second, targeted registration for the freshly created ClassInstance happens
// in TestInitializer.PrepareTest (#5685).
// in RegisterClassInstanceReceiver XML doc
/// ... See #5685.These cross-references are valuable in the PR description (where they already live), but in the code they'll become meaningless noise once the issue is closed. The XML <summary> on RegisterClassInstanceReceiver is also unusually verbose for this codebase — the why here is that ClassInstance is null on the first RegisterReceivers call, which a single line captures:
// ClassInstance is null at the initial RegisterReceivers call, so only it needs registering here.
public void RegisterClassInstanceReceiver(TestContext context)The inline comment in TestCoordinator.cs can simply be removed — the code is self-explanatory once RegisterClassInstanceReceiver has a reasonable name.
Minor: dead parameter in RegisterReceivers
Not introduced by this PR, but cancellationToken in RegisterReceivers(TestContext context, CancellationToken cancellationToken) is never used in the method body. Worth a follow-up cleanup to remove it (and the corresponding _eventReceiverOrchestrator.RegisterReceivers(test.Context, cancellationToken) call site).
Summary: The logic is sound and the fix is correct. The only actionable feedback is trimming the issue-reference comments to match project conventions. The performance reasoning and test coverage are solid.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 6 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Good performance work here. Both changes are clearly motivated by real profiling data and the PR description is exemplary. A few observations:
Fix 1 — Awaiting InProgress (TestCoordinator.cs)
This is sound. The fire-and-forget was the correct culprit for the WaitHandle.WaitOneNoCheck spike: 1000+ concurrent tasks hammering a single bounded channel at startup. Awaiting it gives each task natural back-pressure and spreads the write load across the run duration. The comment is accurate and the change is minimal.
One thing worth confirming: ITUnitMessageBus.InProgress is documented as informational, so the ordering guarantee (InProgress completes before terminal message) was already implicit. The await makes it explicit. No issue here, just good to call out that this is a correctness preservation, not a correctness fix.
Fix 2 — RegisterClassInstanceReceiver (EventReceiverOrchestrator.cs)
The logic is correct for the happy path and the dedup invariants (_initializedObjects, _registeredFirstEventReceiverTypes) are properly honored. A few things to think about:
Implicit assumption about "what can be new"
The new method hard-codes the assumption that only ClassInstance is new between the first RegisterReceivers call and PrepareTest. That assumption is documented in the comment, but it's not enforced anywhere structurally — if a future refactoring moves instance creation earlier, or adds another eligible object that is created post-RegisterReceivers, this method will silently miss it. The old RegisterReceivers call was naturally defensive because it iterated the full set and relied on dedup; the new targeted method is fragile to changes in the eligible-object lifecycle.
Consider whether a Debug.Assert(context.Metadata.TestDetails.ClassInstance != null) (to document the contract), or at minimum an XML doc comment on RegisterClassInstanceReceiver describing the invariant, would make this safer.
ILastTestIn* receivers on ClassInstance are unaffected
RegisterReceiverInternal inside EventReceiverRegistry handles ILastTestInTestSessionEventReceiver, ILastTestInAssemblyEventReceiver, and ILastTestInClassEventReceiver. The new RegisterClassInstanceReceiver calls _registry.RegisterReceiver(classInstance) which ultimately calls RegisterReceiverInternal, so those are covered correctly — but the type-gate in RegisterClassInstanceReceiver only checks for the "First" variants when deciding to _registeredFirstEventReceiverTypes.Add. That is intentional and correct (Last receivers don't use _registeredFirstEventReceiverTypes), but the asymmetry is non-obvious. A brief inline note would help future readers.
ITestStartEventReceiver / ITestEndEventReceiver / ITestSkippedEventReceiver on ClassInstance
If a test class implements ITestStartEventReceiver or any of the non-First/Last receivers, RegisterClassInstanceReceiver still handles them correctly via _registry.RegisterReceiver — but again, the check at the top of the method (IFirstTestIn* gate) only guards the "First" dedup path. Everything else falls through to the registry. This is fine but worth a second read for anyone maintaining this code.
Missing ITestDiscoveryEventReceiver on ClassInstance
GetEligibleEventObjects includes ClassInstance in the eligible set, and RegisterReceivers would therefore handle it if the class implements ITestDiscoveryEventReceiver. The new RegisterClassInstanceReceiver also goes through the same _registry.RegisterReceiver, so this is covered. Fine.
Minor: RegisterReceivers CancellationToken parameter is unused
RegisterReceivers(TestContext, CancellationToken) accepts a CancellationToken but doesn't use it. This was pre-existing, but since TestInitializer.PrepareTest no longer calls it, the CancellationToken that was being passed there is now just silently discarded. Worth tidying up RegisterReceivers's signature or adding _ = cancellationToken in a follow-up.
Summary
Both fixes are well-reasoned and the test coverage (especially FirstEventReceiversRegressionTest.EventReceiversCalledOncePerScope) validates the dedup invariant that the second change depends on. The main architectural concern is that RegisterClassInstanceReceiver encodes a lifecycle assumption that could become a subtle bug if the eligible-object lifecycle changes in future. Adding a short doc comment stating the precondition ("call only after ClassInstance has been assigned, and only once per test — the initial RegisterReceivers call already covers all other eligible objects") would make the contract explicit without requiring any structural changes.
There was a problem hiding this comment.
Code Review
Two previous review passes have already covered the high-level correctness of both fixes, so I'll focus on the remaining gaps and a couple of points not yet raised.
Fix 1 — Awaiting InProgress (TestCoordinator.cs)
The change is correct and well-motivated. One nuance worth being explicit about: this makes InProgress latency directly visible on the wall-clock time of each test task. If MTP's bounded channel is full, a test will sit in await _messageBus.InProgress(...) before _contextRestorer.RestoreContext or any event receivers run. That's the intended trade-off for overall throughput, but it means a hung or slow MTP consumer could now stall individual tests longer than before, whereas previously they would have queued and continued executing. This is almost certainly fine for normal use, but worth documenting in the method-level comment so a future reader doesn't wonder why a test appears to take longer than its body implies.
Fix 2 — RegisterClassInstanceReceiver (EventReceiverOrchestrator.cs)
Cache invalidation interplay: TestCoordinator.ExecuteTestLifecycleAsync explicitly sets test.Context.CachedEligibleEventObjects = null immediately after assigning ClassInstance (line 296), but RegisterClassInstanceReceiver accesses context.Metadata.TestDetails.ClassInstance directly rather than going through GetEligibleEventObjects. This is correct — the new method intentionally bypasses the full eligible-object cache — but the null-assignment on line 296 is now a no-op with respect to the registration path, since registration no longer calls GetEligibleEventObjects after that point. The cache is still needed for the invocation-side (GetTestStartReceivers, etc.), so the null-assignment is still useful, but a reader who traces back to why the cache is invalidated here might be confused by the fact that RegisterClassInstanceReceiver no longer benefits from it. A brief comment at line 296 (// Invalidate so invocation-side caches rebuild with the new instance) would remove the ambiguity.
SkippedTestInstance guard missing in RegisterClassInstanceReceiver: ExecuteTestLifecycleAsync checks classInstance is SkippedTestInstance immediately after creation (lines 300–310) and returns early, so PrepareTest (and therefore RegisterClassInstanceReceiver) is only called for non-skipped instances. This is safe today, but it relies on call-ordering between three different methods. If RegisterClassInstanceReceiver were ever called from a different site, passing a SkippedTestInstance into _registry.RegisterReceiver would silently register a sentinel object as an event receiver. A cheap guard (if (classInstance is SkippedTestInstance) return;) inside RegisterClassInstanceReceiver would make it self-defending at negligible cost.
CancellationToken parameter on PrepareTest: The CancellationToken cancellationToken parameter on TestInitializer.PrepareTest was previously forwarded to RegisterReceivers. Now that RegisterClassInstanceReceiver (and _objectLifecycleService.PrepareTest) don't take a token, the parameter is dead. This is pre-existing in spirit but the PR makes it fully dead for the first time. Worth a follow-up cleanup.
Minor: Debug.Assert + if (classInstance is null) redundancy
The Debug.Assert on line 88 and the if (classInstance is null) return; guard on lines 89–92 serve different purposes (the assert fires in debug builds; the null-guard defends release builds), which is a legitimate pattern. The comment // Precondition: ClassInstance has just been assigned... correctly documents this. No action needed, just confirming it's intentional.
Overall
Both fixes are well-reasoned, surgically scoped, and backed by real profiling data. The test coverage includes the key regression scenario (FirstEventReceiversRegressionTest.EventReceiversCalledOncePerScope). The primary actionable suggestions are:
- Add a
SkippedTestInstanceguard inRegisterClassInstanceReceiverfor defense-in-depth. - Clarify the comment at the
CachedEligibleEventObjects = nullassignment inExecuteTestLifecycleAsync. - Follow-up cleanup of the now-dead
CancellationTokenparameter onPrepareTest.
None of these block merging — the logic is correct as-is.
There was a problem hiding this comment.
Code Review — Round 4
Three prior review rounds have covered the correctness and design of both fixes thoroughly. This pass focuses on what's been addressed since round 3, and one remaining item.
What was addressed ✅
All three actionable items from round 3 are resolved in the latest commit:
-
** guard in ** — Added (lines 97–100). The defense-in-depth guard is clean and the accompanying comment correctly explains the caller already short-circuits, making this belt-and-suspenders rather than load-bearing.
-
**Dead param on ** — Removed. now takes only , which matches actual usage.
-
** comment** — Updated at . The new comment (
Drop the cached eligible-objects list so any later consumer rebuilds it with the new ClassInstance included) correctly scopes the reason to the invocation side, clarifying that no longer depends on this cache rebuild.
One remaining item
at still accepts a that is never used in the method body. This was flagged in every prior review round. The parameter is now doubly dead: (the last caller that forwarded a live token to ) no longer passes one, and the only remaining call site in passes into a no-op. This is a straightforward one-liner cleanup — remove the parameter from both the method signature and the call site — but it can be a follow-up if preferred.
Overall
The logic is sound across all three changed files. The back-pressure fix is well-reasoned and correctly preserves ordering guarantees. faithfully replicates the dedup invariants of for the single-object case, is thread-safe (both writes are atomic), and is now self-defending against misuse. The test coverage — especially — validates the key invariant this change depends on.
Ready to merge pending the cleanup (can be a follow-up).
There was a problem hiding this comment.
Code Review — Round 4
Three prior review rounds have covered the correctness and design of both fixes thoroughly. This pass focuses on what has been addressed since round 3, and one remaining item.
What was addressed
All three actionable items from round 3 are resolved in the latest commit:
1. SkippedTestInstance guard in RegisterClassInstanceReceiver — Added (lines 97–100). The defense-in-depth guard is clean and the accompanying comment correctly explains the caller already short-circuits, making this belt-and-suspenders rather than load-bearing.
2. Dead CancellationToken param on PrepareTest — Removed. TestInitializer.PrepareTest now takes only AbstractExecutableTest, which matches actual usage.
3. CachedEligibleEventObjects = null comment — Updated at TestCoordinator.cs:295. The new comment ("Drop the cached eligible-objects list so any later consumer rebuilds it with the new ClassInstance included") correctly scopes the reason to the invocation side, clarifying that RegisterClassInstanceReceiver no longer depends on this cache rebuild.
One remaining item
RegisterReceivers(TestContext context, CancellationToken cancellationToken) at EventReceiverOrchestrator.cs:46 still accepts a CancellationToken that is never used in the method body. This was flagged in every prior review round. The parameter is now doubly dead: TestInitializer.PrepareTest (the last caller that forwarded a live token into RegisterReceivers) no longer passes one, and the remaining call site in TestCoordinator.cs:67 passes cancellationToken into a no-op. Straightforward one-liner cleanup — remove the parameter from the method signature and the call site — but fine as a follow-up.
Overall
The logic is sound across all three changed files. The InProgress back-pressure fix is well-reasoned and correctly preserves ordering guarantees. RegisterClassInstanceReceiver faithfully replicates the dedup invariants of RegisterReceivers for the single-object case, is thread-safe (both ConcurrentHashSet writes are atomic TryAdd operations), and is now self-defending against misuse. The test coverage — especially FirstEventReceiversRegressionTest.EventReceiversCalledOncePerScope — validates the key invariant this change depends on.
Ready to merge pending the CancellationToken cleanup (can be a follow-up).
There was a problem hiding this comment.
Review — perf(engine): reduce message-bus contention on test start
This is a well-motivated, surgical performance fix. The profiling data (12.95% exclusive CPU on WaitHandle.WaitOneNoCheck) gives a solid baseline, and both changes directly address the root causes rather than papering over them.
Tracking previous review items
Previous rounds raised five items total. All are resolved in the current diff:
| Item | Status |
|---|---|
Remove unused CancellationToken from RegisterReceivers |
✅ Done |
Remove unused CancellationToken from PrepareTest |
✅ Done |
Add SkippedTestInstance guard to RegisterClassInstanceReceiver |
✅ Done |
| Clarify cache-invalidation comment scope | ✅ Done |
Add precondition doc comment to RegisterClassInstanceReceiver |
✅ Done |
Fix 1 — Await InProgress (TestCoordinator.cs:58)
The change from _ = _messageBus.InProgress(test.Context) to await _messageBus.InProgress(test.Context).ConfigureAwait(false) is correct. Fire-and-forget on a bounded channel is a classic "thundering herd" antipattern: all 1000+ test tasks race to write into the same channel simultaneously, causing OS-level WaitHandle contention. Awaiting is the right back-pressure mechanism — each test task naturally yields when the channel is full, spreading the load across the run.
Ordering invariant (InProgress before terminal message) is preserved.
Fix 2 — RegisterClassInstanceReceiver (EventReceiverOrchestrator.cs:85-118)
The new method correctly mirrors the dedup invariants of RegisterReceivers:
- Instance dedup via
_initializedObjects.Add— identical to the full method. - Type dedup for
IFirstTestIn*— theor-pattern checks match the||checks inRegisterReceiversexactly. Only the first encountered instance per type gets registered for these events, which is the intended semantics. ILastTestIn*types — deliberately not type-deduped, which is correct: the "last test" check is counter-based, not receiver-count-based.
The Debug.Assert + defensive null guard combination is appropriate here: the assert fires in debug builds if the call order is violated, while the null guard prevents silent failures in production. The SkippedTestInstance sentinel guard is also correctly placed before the _initializedObjects.Add — the sentinel never pollutes the initialized-objects set.
Using _registry.RegisterReceiver(classInstance) (singular) rather than batching through ValueListBuilder is appropriate since there is exactly one object to register.
One non-blocking note
The line-81 blank line between RegisterReceivers and the new method was removed (the diff shows +\n then the new method starts). This leaves no blank line between the two methods. Not a correctness issue, but worth a whitespace pass before merge if you care about vertical breathing room.
Overall
Both fixes are correct, well-scoped, and properly tested (including the FirstEventReceiversRegressionTest that exercises the exact new code path). No issues remaining from prior reviews.
Approved — ready to merge.
Closes #5685.
Summary
CPU profile of ~1000-test runs showed
WaitHandle.WaitOneNoCheckat 12.95% exclusive CPU — the largest non-idle cost in the trace — traceable to the per-test messaging layer. Two surgical fixes:1.
InProgressis now awaited instead of fire-and-forgetTestCoordinator.ExecuteTestInternalAsyncpreviously discarded theInProgressValueTask, which fanned 1000+ concurrentPublishAsyncwriters into MTP's boundedAsynchronousMessageBuschannel at startup. Awaiting the call lets each test task back-pressure naturally against the channel, spreading publishes across the run instead of clustering them.Ordering is preserved:
InProgressstill completes before the terminalPassed/Failed/Skipped/Cancelledmessage for each test. Published message count is unchanged.2.
RegisterReceiversno longer called twice per testTestInitializer.PrepareTestwas re-iterating the full eligible-event-object set even though only the freshly createdClassInstancecan newly become a receiver between the first call (pre-instance creation, inTestCoordinator) and the second call (post-instance creation, inPrepareTest).Added
EventReceiverOrchestrator.RegisterClassInstanceReceiver(TestContext)— a targeted single-object variant that honors the same dedup invariants (_initializedObjects,_registeredFirstEventReceiverTypes) used by the full method.TestInitializer.PrepareTestnow calls the targeted variant.Verification
dotnet build TUnit.Engine/TUnit.Engine.csproj— 0 warnings, 0 errors across net8.0/net9.0/net10.0/netstandard2.0.dotnet test TUnit.Engine.Tests/TUnit.Engine.Tests.csproj --framework net10.0— 148 passed, 0 failed, 99 skipped (environmental AOT/OS skips).dotnet test TUnit.UnitTests/TUnit.UnitTests.csproj --framework net10.0— 180 passed, 0 failed.dotnet test TUnit.Core.SourceGenerator.Tests/... --framework net10.0— 116 passed, 0 failed, 1 skipped (pre-existing).FirstEventReceiversRegressionTest.EventReceiversCalledOncePerScope(which exercises theClassInstance-as-receiver path) still passes.Test plan