perf(engine): reduce lock contention in scheduling and hook caches (#5686) by thomhurst · Pull Request #5693 · thomhurst/TUnit

thomhurst · 2026-04-24T01:08:48Z

Summary

Closes #5686.

CPU profile of ~1000-test runs showed Monitor.Enter_Slowpath at 4.33% exclusive and a 48.5% GetQueuedCompletionStatus idle cliff. Two structural issues addressed:

Per-key hook caches: BeforeHookTaskCache.GetOrCreateBeforeClassTask and AfterHookPairTracker.GetOrCreateAfterClassTask guarded all classes behind a single Lock, serialising unrelated classes. Replaced with the existing ThreadSafeDictionary<,>.GetOrAdd pattern (already used for GetOrCreateBeforeAssemblyTask), which internally uses Lazy<T> with ExecutionAndPublication and guarantees single-execution per key with no shared monitor. A lock-free TryGetValue fast path keeps cache-hit calls closure-allocation-free.
Bounded unlimited-parallelism path: TestScheduler.ExecuteTestsAsync called Parallel.ForEachAsync without MaxDegreeOfParallelism on the --maximum-parallel-tests-unset branch. Default is ProcessorCount but large test sets still queue near-simultaneous continuations and saturate IOCP. Added explicit MaxDegreeOfParallelism = Environment.ProcessorCount * 2, converging with the already-bounded limited path.

Item 3 in the issue (EventReceiverRegistry single-lock) was left for a follow-up as it's marked lower priority.

Test plan

dotnet build TUnit.Engine/TUnit.Engine.csproj across net8.0/net9.0/net10.0/netstandard2.0 — 0 warnings, 0 errors (no trimmer complaints on IL2077 annotation flow through the new closure).
TUnit.UnitTests (net10.0): 180/180 pass.
TUnit.Engine.Tests (net10.0): 245/247 pass; the 2 failures are pre-existing environmental (FSharp/VB integration tests require those sub-projects to be built locally and are unrelated to this change).
TUnit.TestProject/HookExecutorTests (net10.0): 5/5 pass — verifies Before/After class hooks still execute correctly through the new pattern.

…5686) Three hot-path contention points observed in CPU profiles of ~1000-test runs (Monitor.Enter_Slowpath at 4.33% exclusive, large GetQueuedCompletionStatus idle stalls): 1. BeforeHookTaskCache.GetOrCreateBeforeClassTask and AfterHookPairTracker.GetOrCreateAfterClassTask guarded all classes behind one shared Lock, serialising unrelated classes. Replaced with the existing ThreadSafeDictionary<,>.GetOrAdd pattern (already used for GetOrCreateBeforeAssemblyTask), which internally uses Lazy<T> with ExecutionAndPublication to guarantee single-execution per key without a shared monitor. Keeps a lock-free TryGetValue fast path so the common cache-hit case doesn't allocate a closure. 2. TestScheduler.ExecuteTestsAsync unlimited branch called Parallel.ForEachAsync without MaxDegreeOfParallelism, which defaults to ProcessorCount but still queues near-simultaneous continuations on large test sets, saturating the IOCP thread. Explicit cap at Environment.ProcessorCount * 2 converges with the already-bounded limited-parallelism path. EventReceiverRegistry single-lock (item 3 in the issue) left for a follow-up as it's lower priority and the two above cover the bulk of the observed contention.

codacy-production · 2026-04-24T01:11:01Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -7 complexity

Metric Results

Complexity -7

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

claude

Code Review

Good performance-focused PR addressing real contention hotspots identified via CPU profiling. The hook cache change is solid. One behavioral concern worth discussing before merging.

`TestScheduler.cs` — Behavioral Regression: Unlimited < Default

Concern: UnlimitedPathMaxDop = ProcessorCount * 2 creates a counterintuitive inversion.

The default path (no --maximum-parallel-tests flag) returns ProcessorCount * 4 (line 492). But a user who explicitly opts for unlimited (--maximum-parallel-tests 0) now gets ProcessorCount * 2 — which is more restrictive than doing nothing. This is a silent regression for anyone who has opted in to unlimited parallelism.

// Default path (no flag): ProcessorCount * 4
var defaultLimit = Environment.ProcessorCount * 4;

// Explicit "unlimited" (--maximum-parallel-tests 0): ProcessorCount * 2  ← now more restrictive
private static readonly int UnlimitedPathMaxDop = Environment.ProcessorCount * 2;

The intent is correct (prevent IOCP saturation), but the cap should be at least as high as the default (ProcessorCount * 4), or ideally configurable. A better approach would be to unify the concept: if the goal is a sensible upper bound for async I/O-bound workloads, apply it consistently in GetMaxParallelism rather than as a second cap in ExecuteTestsAsync. For instance, mapping int.MaxValue to ProcessorCount * 4 in GetMaxParallelism itself would mean users see it in one place and ExecuteTestsAsync stays DOP-agnostic.

Incorrect comment: The current comment says "Parallel.ForEachAsync defaults to ProcessorCount" — this is wrong. When MaxDegreeOfParallelism is -1 (the .NET default), Parallel.ForEachAsync is truly unbounded, not capped at ProcessorCount. The actual problem is the unbounded depth, which the fix correctly addresses, but the comment should say "truly unlimited" not "defaults to ProcessorCount".

`BeforeHookTaskCache` / `AfterHookPairTracker` — Solid Improvement

Removing the shared _classLock and delegating to ThreadSafeDictionary.GetOrAdd is the right call. LazyThreadSafetyMode.ExecutionAndPublication inside ThreadSafeDictionary guarantees single-execution per key without serializing unrelated classes. The pattern is already established for the assembly-level tasks, so this is a welcome consistency fix.

The outer TryGetValue fast path before calling GetOrAdd is justified: ThreadSafeDictionary.GetOrAdd allocates a Lazy<T> instance before calling ConcurrentDictionary.GetOrAdd, so the outer check avoids that allocation plus the closure on the common cache-hit path. The comment accurately explains this.

One minor note: the closure passed to GetOrAdd captures cancellationToken from the calling thread. In a race where two threads both miss the TryGetValue fast path, the winning factory uses whichever thread's token got there first. This mirrors the old lock-based behaviour and is fine — just worth knowing if debugging a cancellation edge case.

The [UnconditionalSuppressMessage("Trimming", "IL2077")] suppressions are valid: the [DynamicallyAccessedMembers] annotation is present at the public method boundary, so the trimmer has the information it needs even though it doesn't flow into the lambda. The justification comments make this clear enough for future maintainers.

Summary

Area	Assessment
Hook caches (Before/After class)	✅ Correct, clear, consistent with existing patterns
`TestScheduler` `UnlimitedPathMaxDop`	⚠️ Cap value creates "unlimited < default" inversion; comment is factually wrong
AOT / trimming annotations	✅ Suppressions are valid and well-justified

The hook cache change can merge as-is. I'd suggest revisiting the DOP cap value (or moving the cap into GetMaxParallelism) before this lands to avoid surprising users who opted into unlimited.

- Align unlimited path with default path: both now cap at ProcessorCount * 4 so opting into unlimited isn't a regression. - Correct comment about Parallel.ForEachAsync default behaviour.

claude

Follow-up Review

Previous feedback has been fully addressed in the second commit. Here's the updated assessment.

Previous Concerns — All Resolved ✅

"Unlimited < Default" inversion: Fixed. The 0-means-unlimited case now maps to defaultLimit = ProcessorCount * 4 (same as the default path) via the refactored GetMaxParallelism. The separate UnlimitedPathMaxDop = ProcessorCount * 2 constant is gone.

Incorrect Parallel.ForEachAsync comment: Fixed. The comment now correctly states "truly unbounded" rather than "defaults to ProcessorCount".

Architecture After Second Commit

The simplification is good. ExecuteTestsAsync is now:

private Task ExecuteTestsAsync(AbstractExecutableTest[] tests, CancellationToken cancellationToken)
    => ExecuteWithGlobalLimitAsync(tests, _maxParallelismSemaphore.Value, cancellationToken);

Single code path, no conditional branching, no dead netstandard2.0 fallback path lingering. The async keyword removal is also correct — no need for a state machine when you're just forwarding a task.

One Remaining Consideration (Non-blocking)

True "unlimited" is no longer possible. Users who set --maximum-parallel-tests 0 or TUNIT_MAX_PARALLEL_TESTS=0 explicitly to opt into unlimited parallelism (e.g., network-bound test suites with high latency where more concurrency is beneficial) now silently get ProcessorCount * 4. The log message tells them this happened, which is good. But if any user genuinely needs beyond ProcessorCount * 4, the escape hatch is now to pass an explicit large number (e.g., --maximum-parallel-tests 64). Worth a brief mention in release notes or docs, but not a blocker.

The SemaphoreSlim is now always allocated (previously null for the unlimited case). The overhead of WaitAsync/Release at ProcessorCount * 4 slots is negligible in practice — it will rarely block on typical machines — so this trade-off is fine.

Hook Caches — No Changes Needed

The BeforeHookTaskCache and AfterHookPairTracker changes from the first commit are correct and clean. The per-key Lazy pattern via ThreadSafeDictionary is the right approach and consistent with the assembly-level tasks already in the codebase.

Summary

Area	Status
DOP cap inversion (unlimited < default)	✅ Fixed
Incorrect `ForEachAsync` comment	✅ Fixed
Code path unification	✅ Cleaner than before
Hook caches (Before/After class)	✅ No change needed
True unlimited escape hatch gone	ℹ️ Acknowledged tradeoff, not a blocker

This is in good shape. The profiling-driven changes are well-motivated and the second commit addressed all the structural concerns cleanly.

thomhurst temporarily deployed to Pull Requests April 24, 2026 01:08 — with GitHub Actions Inactive

thomhurst had a problem deploying to Pull Requests April 24, 2026 01:08 — with GitHub Actions Error

claude Bot reviewed Apr 24, 2026

View reviewed changes

review: address #5693 feedback — DOP cap + comment

9c43f0b

- Align unlimited path with default path: both now cap at ProcessorCount * 4 so opting into unlimited isn't a regression. - Correct comment about Parallel.ForEachAsync default behaviour.

thomhurst temporarily deployed to Pull Requests April 24, 2026 01:34 — with GitHub Actions Inactive

claude Bot reviewed Apr 24, 2026

View reviewed changes

thomhurst enabled auto-merge (squash) April 24, 2026 02:09

thomhurst merged commit 093209d into main Apr 24, 2026
15 checks passed

thomhurst deleted the perf/5686-locks branch April 24, 2026 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693

perf(engine): reduce lock contention in scheduling and hook caches (#5686)#5693
thomhurst merged 2 commits intomainfrom
perf/5686-locks

thomhurst commented Apr 24, 2026

Uh oh!

codacy-production Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented Apr 24, 2026

Summary

Test plan

Uh oh!

codacy-production Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Code Review

TestScheduler.cs — Behavioral Regression: Unlimited < Default

BeforeHookTaskCache / AfterHookPairTracker — Solid Improvement

Summary

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Follow-up Review

Previous Concerns — All Resolved ✅

Architecture After Second Commit

One Remaining Consideration (Non-blocking)

Hook Caches — No Changes Needed

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production Bot commented Apr 24, 2026 •

edited

Loading

`TestScheduler.cs` — Behavioral Regression: Unlimited < Default

`BeforeHookTaskCache` / `AfterHookPairTracker` — Solid Improvement