perf: defer per-class JIT via lazy test registration + parallel resolution by thomhurst · Pull Request #5395 · thomhurst/TUnit

thomhurst · 2026-04-04T23:58:47Z

Summary

Lazy registration: Source generator now emits RegisterLazy<T>(static () => Source.Entries) instead of RegisterEntries(Source.Entries). The lambda captures a function pointer without triggering the target class's static constructor — deferring all per-class JIT from module init to discovery time.
Parallel resolution: Engine resolves lazy sources in parallel via Parallel.ForEach before filtering, turning sequential per-class JIT into parallel JIT on multi-core machines.
Filter-aware: For filtered runs (common during development), only type-matched classes are resolved at all — unmatched classes never JIT their .cctor.

Startup JIT cost (1000 test classes)

Phase	Before (Eager)	After (Lazy)
Module init	JIT 1000 `.cctor`s sequentially	JIT 1 `.cctor` + 1000 lambda allocs
Discovery (filtered to 1 class)	Already done	JIT 1 `.cctor`
Discovery (unfiltered)	Already done	JIT 1000 `.cctor`s in parallel

At module init, only 4 methods are JIT'd regardless of N (the shared .cctor, RegisterLazy<__Canon>, LazyTestEntrySource<__Canon>..ctor, Func<__Canon>..ctor) — O(1) due to .NET's reference-type generic code sharing.

No impact on incremental generation

Each test class still generates its own file with its own partial contribution to TUnit_TestRegistration. No Collect needed. The only change per file is the registration expression.

Files changed

File	Change
`TUnit.Core/LazyTestEntrySource.cs`	New — lazy wrapper with factory list support for generic merging
`TUnit.Core/SourceRegistrar.cs`	New `RegisterLazy<T>()` overload
`TUnit.Core.SourceGenerator/.../TestMetadataGenerator.cs`	Both registration call sites emit `RegisterLazy`
`TUnit.Engine/.../AotTestDataCollector.cs`	Parallel pre-resolution before filtering
122 `.verified.txt` files	Updated registration lines in snapshots

Context: dotnet/runtime#126541
Follow-up: #5394

Test plan

Snapshot tests pass on all 4 TFMs (net8.0, net9.0, net10.0, net472)
End-to-end: BasicTests, DataDrivenTests, GenericMethodTests, PropertySetterTests all pass
Build succeeds on netstandard2.0 and net10.0
CI passes

…ution Source-generated test registration previously forced all per-class static constructors to run eagerly during module initialization, causing O(N) JIT compilations at startup. For large test suites (1000+ classes), this cold-start cost exceeded reflection-based discovery. Changes: - Add LazyTestEntrySource<T> that wraps Func<TestEntry<T>[]> factories and resolves lazily on first access (Count, GetFilterData, or Materialize) - Add SourceRegistrar.RegisterLazy<T>() that stores factories without invoking them, reducing module init to O(1) JIT (shared generic code for ref types) - Generator now emits RegisterLazy<T>(static () => Source.Entries) instead of RegisterEntries(Source.Entries) — lambda creation does NOT trigger target .cctor - Engine resolves lazy sources in parallel via Parallel.ForEach before filtering, turning sequential per-class JIT into parallel JIT on multi-core machines - For filtered runs, only type-matched classes are resolved at all Preserves incremental generation (no Collect needed) and AOT compatibility. Context: dotnet/runtime#126541 Follow-up: #5394 (split filter data into separate lightweight class)

codacy-production · 2026-04-05T00:02:20Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 10 complexity

Metric Results

Complexity 10

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

Merge the two-type approach (TestEntrySource + LazyTestEntrySource) into a single TestEntrySource<T> that natively wraps factories with lazy resolution. - Delete LazyTestEntrySource<T> entirely - TestEntrySource<T> now takes Func<TestEntry<T>[]> and resolves on first access - Consolidate RegisterEntries + RegisterLazy into a single RegisterEntries<T>(Func<>) - Generator emits RegisterEntries<T>(static () => Source.Entries) — same method name as before, just with a factory wrapper and explicit type parameter

claude · 2026-04-05T00:03:50Z

Code Review

This PR introduces lazy test registration to defer per-class JIT compilation from module initialization to discovery time. The design goals are sound, and the implementation is largely correct. I have identified a few architectural concerns worth addressing.

Critical: Phase 1's `Sum()` call defeats the filter-aware laziness claim

The PR description states: "For filtered runs (common during development), only type-matched classes are resolved at all — unmatched classes never JIT their .cctor.". This claim is not accurate.

In AotTestDataCollector.CollectTestsFromTestEntries, Phase 0 correctly limits parallel resolution to type-matched sources. However, immediately after, Phase 1 starts with:

// Phase 1: Filter using pure data (no JIT of test-specific methods)
var totalEntries = Sources.TestEntries.Sum(static kvp => kvp.Value.Count);  // Line 87

This Sum call iterates all registered sources and accesses .Count on each one. For LazyTestEntrySource<T>, accessing .Count triggers Resolve(), which invokes the factory and runs the TestSource static constructor. So even in a filtered run targeting a single class out of 1000, all 1000 lazy sources get resolved sequentially during this Sum.

The Sum is only used to pre-size the matching list as a capacity hint — it is not functionally necessary. The fix is to avoid the eager full resolution here:

// Option A: Use a conservative estimate (avoids full resolution, minor over-allocation)
var matchingCapacity = filterHints.HasHints
    ? Math.Min(Sources.TestEntries.Count * 4, 1024)  // rough estimate
    : Sources.TestEntries.Sum(static kvp => kvp.Value.Count);  // ok when no filter

// Option B: Just remove pre-sizing entirely (list grows dynamically, small perf cost)
var matching = new List<(ITestEntrySource Source, int Index)>();

The penalty of skipping pre-sizing is minor (a few list reallocations) compared to the cost of JIT-compiling every TestSource class for a filtered run.

Architectural concern: Phase 0 is redundant when Phase 1 resolves everything anyway

Even for the unfiltered case, Phase 0's benefit over simply letting Phase 1 resolve lazily is only the parallelism. But the Sum on line 87 is still sequential. You could consolidate both phases:

// Combined parallel resolve + count for unfiltered runs
if (!filterHints.HasHints && Sources.TestEntries.Count > 1)
{
    Parallel.ForEach(Sources.TestEntries, kvp =>
    {
        if (!filterHints.HasHints || filterHints.CouldTypeMatch(kvp.Key))
            _ = kvp.Value.Count;
    });
}

Then remove the Sum entirely and just use new List<...>() without pre-sizing. This keeps the parallel speedup for unfiltered runs while not eagerly resolving unmatched sources in filtered runs.

Minor: No `ParallelOptions` — unbounded thread pool usage

Parallel.ForEach(sourcesToResolve, static source => _ = source.Count);

The Parallel.ForEach has no ParallelOptions configured, so it uses Environment.ProcessorCount as the default degree of parallelism. This is generally fine for CPU-bound work like JIT compilation. However, it also has no cancellation support. For very large suites or constrained environments, consider:

var parallelOptions = new ParallelOptions
{
    MaxDegreeOfParallelism = Math.Min(sourcesToResolve.Count, Environment.ProcessorCount)
};
Parallel.ForEach(sourcesToResolve, parallelOptions, static source => _ = source.Count);

This is a minor point since the work is intentionally CPU-bound and unbounded parallelism is a Parallel.ForEach default, but making intent explicit avoids confusion.

Minor: `RegisterEntries` is kept as a public API with no callers

RegisterEntries remains as a public static method in SourceRegistrar. The source generator now exclusively emits RegisterLazy. The handling for the mixed case ("someone called RegisterEntries before RegisterLazy") eagerly invokes factory(), defeating the lazy benefit. If there are no known scenarios where user code calls RegisterEntries directly, this could be [Obsolete] or at least have a [EditorBrowsable(Never)] guard added to discourage use. This prevents the mixed-mode path from being hit accidentally by user-authored module initializers.

Good aspects of this PR

The double-check locking in LazyTestEntrySource.Resolve() is correctly implemented with a volatile field and proper lock ordering.
The AddFactory post-resolution fallback (eagerly calling _resolved.AddEntries(factory())) correctly handles race conditions between module initializers and early resolution.
Using static () => ... lambdas is correct: the compiler caches these as static delegate instances, eliminating per-call heap allocation.
The filter-aware Phase 0 (using filterHints.CouldTypeMatch) correctly avoids adding non-matching types to the parallel work list.
The BFS dependency expansion in Phase 2 is unchanged and unaffected by this refactor.
Snapshot tests were correctly regenerated and committed.

RegisterEntries<T> now takes Func<TestEntry<T>[]> instead of TestEntry<T>[]. TestEntrySource<T> constructor matches. Also picks up unrelated Assertions API snapshot for net472.

claude

Code Review: perf: defer per-class JIT via lazy test registration + parallel resolution

The core optimization here is well-conceived and correctly implemented. Deferring per-class JIT from module init to discovery time is a meaningful improvement for large test suites, and the parallel warm-up in AotTestDataCollector compounds the benefit on multi-core machines.

Architecture / Design

The lazy factory pattern is the right abstraction. Converting TestEntry<T>[] to Func<TestEntry<T>[]> cleanly separates registration (cheap: store a pointer to the generated array property) from resolution (expensive: trigger the class's static ctor). The generated code diff confirms the key invariant: the lambda captures TUnit_TestProject_X__TestSource.Entries (the generated companion class, not the user's test class), so the user's .cctor is deferred correctly.

Double-checked locking in Resolve() is correctly implemented. volatile TestEntry<T>[]? _entries + the outer null check before acquiring the lock is the standard DCL pattern and is correct here. The _factories = null teardown after resolution is a nice memory hygiene touch.

Specific Concerns

1. AggregateException vs. sequential exceptions (behavioral change)

In AotTestDataCollector.cs:

Parallel.ForEach(sourcesToResolve, static source => _ = source.Count);

If any factory throws (e.g. a source generator bug produces a class with empty entries), Parallel.ForEach wraps all exceptions in an AggregateException. Previously, the engine would surface one InvalidOperationException from Resolve() at a time. This isn't a showstopper, but it's worth ensuring the engine's error handling upstream unwraps AggregateException properly so error messages remain actionable.

2. Validation timing regression

Previously, empty-entries validation happened at module init via RegisterEntries. Now it happens lazily in Resolve(). This means a source-gen bug that produces empty entries won't be caught until discovery — which may be mid-run after other setup has occurred. The new InvalidOperationException message in Resolve() is good, but it might be worth adding a comment explaining this tradeoff so future maintainers don't accidentally "fix" it back to eager.

3. ClassName lazy initialization is a benign race, but worth noting

public string ClassName => _className ??= TUnit.Core.Extensions.TestContextExtensions.GetNestedTypeName(typeof(T));

This is not lock-protected. Two threads can both read null, both compute, and both write. In practice this is fine (pure/idempotent, always same result, string writes are atomic), but if the codebase has a strict "no benign races" policy, a simple Lazy<string> or moving the assignment into Resolve() would eliminate it entirely. At minimum, a comment acknowledging the intentional benign race would help.

4. Unrelated assertion API changes bundled in

The TUnit.PublicAPI snapshot changes — removal of CountWrapper, LengthWrapper, HasCount(), HasLength() overloads, and addition of StringLengthWithInlineAssertionAssertion — appear to be from the follow-up PR #5394, not from this lazy JIT work. Bundling them here makes the diff harder to review and means this PR carries assertion API risk alongside the engine change. If this is intentional (rebased for convenience), it's fine, but it's worth a note in the PR description.

5. Reflection path

The PR description mentions both source-gen and reflection modes. The AotTestDataCollector pre-warm pass is great for the source-gen path. Is there a corresponding change for the reflection-based collector? Or does reflection-mode discovery not suffer the same sequential JIT problem because it uses Assembly.GetTypes() rather than pre-registered factories? If so, it's worth documenting that assumption explicitly.

Minor Notes

The if (sourcesToResolve.Count > 1) guard in AotTestDataCollector correctly avoids Parallel.ForEach overhead for the common single-class case. Good.
The outer if (Sources.TestEntries.Count > 1) guard is also good — avoids building the sourcesToResolve list entirely for trivially small suites.
The snapshot BOM character changes (// <auto-generated/> → // <auto-generated/>) are pre-existing whitespace noise, not introduced by this PR's logic.

Summary

The lazy registration + parallel warm-up design is solid. The implementation is correct and the thread-safety model is well-considered. The main actionable items are:

Verify AggregateException handling upstream in the engine
Consider a note on the validation timing tradeoff
Clarify whether the reflection path needs similar treatment
Separate or document the bundled assertion API changes

Sources.TestEntries.Sum(kvp => kvp.Value.Count) was forcing resolution of ALL lazy sources just for a list capacity hint — including sources that don't match the filter. Use a growing list instead. Also remove self-evident WHAT comment.

thomhurst · 2026-04-05T00:12:44Z

Thanks for the thorough review! Here's the status of each item:

Critical: Sum() defeating laziness — Fixed in a500ef5. The Sum() has been removed; matching now uses a growing list.

RegisterEntries kept with no callers — Addressed in c55d700. The refactor consolidated RegisterEntries + RegisterLazy into a single RegisterEntries<T>(Func<TestEntry<T>[]>). There is no longer a separate RegisterLazy method or LazyTestEntrySource<T> type. TestEntrySource<T> itself is now natively lazy.

AggregateException from Parallel.ForEach — The engine already unwraps AggregateException in TUnitMessageBus and FailureCategorizer. No action needed.

Validation timing — Intentional tradeoff. Empty-entries validation now happens at resolution time rather than module init. This only catches source-gen bugs, which are caught the moment the engine discovers the affected class.

ClassName benign race — Intentional. GetNestedTypeName(typeof(T)) is pure/idempotent, string writes are atomic. No correctness risk.

Bundled assertion API changes — The Tests.Assertions_Library_Has_No_API_Changes.Net4_7.verified.txt change is pre-existing snapshot staleness on net472, not introduced by this PR. It just happened to get picked up when regenerating snapshots.

Reflection path — Not affected. Reflection-based discovery uses Assembly.GetTypes() / MethodInfo scanning, not pre-registered factories. The JIT cold-start issue is specific to the source-gen path where [ModuleInitializer] eagerly triggers all class static constructors.

ParallelOptions — Default Parallel.ForEach uses Environment.ProcessorCount which is appropriate for CPU-bound JIT work. Adding explicit options would be over-specification.

claude · 2026-04-05T00:13:23Z

Code Review (Follow-up)

This is a follow-up review comparing the current state against the previous review comments.

Previous Issues: Status

Fixed: The Sum() call that eagerly resolved all lazy sources in Phase 1 has been removed. The matching list is now initialized without pre-sizing (new List<...>()), which was the suggested fix. This is the most important change.

Not addressed (still present): The other issue from the previous review regarding Phase 1 resolving non-type-matched sources is still present in a slightly different form.

Remaining Issue: Phase 1 Still Resolves All Sources in Filtered Runs

The Sum() was removed, but Phase 1 still calls source.Count for every registered source, including those that don't match the type filter:

foreach (var kvp in Sources.TestEntries)
{
    var classType = kvp.Key;
    var source = kvp.Value;
    var typeMatches = !filterHints.HasHints || filterHints.CouldTypeMatch(classType);

    for (var i = 0; i < source.Count; i++)   // <-- Resolve() called here for ALL sources
    {
        var filterData = source.GetFilterData(i);

        if (typeMatches && ...)  // <-- type check happens AFTER resolution

source.Count invokes Resolve() which calls the factory. In a filtered run targeting 1 class out of 1000, Phase 0 correctly resolves only the 1 matching source, but Phase 1 then resolves all 999 non-matching sources sequentially in the source.Count call.

The fix is straightforward — skip iteration entirely for non-type-matched sources:

foreach (var kvp in Sources.TestEntries)
{
    var classType = kvp.Key;
    var source = kvp.Value;

    if (filterHints.HasHints && !filterHints.CouldTypeMatch(classType))
        continue;  // Skip: don't even call source.Count, preserving laziness

    for (var i = 0; i < source.Count; i++)
    {
        var filterData = source.GetFilterData(i);
        if (!filterHints.HasHints || filterHints.CouldMatch(filterData.ClassName, filterData.MethodName))
        {
            matching.Add((source, i));
            if (filterData.DependsOn.Length > 0)
                hasDependencies = true;
        }
    }
}

This matters significantly for the filtered case the PR description highlights as a key benefit: "For filtered runs (common during development), only type-matched classes are resolved at all." Without this fix, the per-class JIT work is still O(N) in filtered runs, just shifted from module init to discovery time.

Minor: `ParallelOptions` Still Not Configured

The Parallel.ForEach at line 82 still has no ParallelOptions. This was raised as a minor point in the previous review. The default degree-of-parallelism is Environment.ProcessorCount, which is acceptable. The absence of a cancellation token means the parallel resolution cannot be cancelled early (e.g., if the user hits Ctrl+C before discovery completes). This is low priority.

Good: What Has Been Addressed

The Sum() pre-sizing that defeated the filter-aware laziness is removed.
Double-check locking in Resolve() with volatile _entries is correct.
The AddFactory post-resolution fallback correctly handles race conditions.
static () => lambda captures are correct for zero-allocation delegate caching.
Snapshot tests were correctly regenerated and committed.

The PR is in good shape overall. The remaining Phase 1 issue is meaningful for the filtered-run performance claim in the PR description and should be addressed before merge.

…iltered runs Previously, Phase 1 called source.Count (triggering Resolve/JIT) for ALL registered sources before checking the type filter. Now non-matching types are skipped via early continue, so filtered runs only JIT matching classes.

thomhurst · 2026-04-05T00:21:20Z

Thanks for catching the Phase 1 issue! Fixed in 16d3e34.

The typeMatches variable was computed before the loop but source.Count was still called unconditionally inside it, triggering Resolve() for non-matching types. Now non-type-matched sources are skipped via early continue before source.Count is ever accessed.

Re: ParallelOptions — Agreed this is low priority. The default Environment.ProcessorCount is appropriate for CPU-bound JIT work, and cancellation during discovery isn't a supported scenario.

…omments

thomhurst had a problem deploying to Pull Requests April 4, 2026 23:58 — with GitHub Actions Error

thomhurst had a problem deploying to Pull Requests April 5, 2026 00:04 — with GitHub Actions Error

chore: update public API snapshots for lazy registration refactor

aaa7f78

RegisterEntries<T> now takes Func<TestEntry<T>[]> instead of TestEntry<T>[]. TestEntrySource<T> constructor matches. Also picks up unrelated Assertions API snapshot for net472.

thomhurst had a problem deploying to Pull Requests April 5, 2026 00:07 — with GitHub Actions Error

claude bot reviewed Apr 5, 2026

View reviewed changes

thomhurst had a problem deploying to Pull Requests April 5, 2026 00:11 — with GitHub Actions Error

This was referenced Apr 5, 2026

perf: explore assembly-attribute discovery as alternative to module initializers #5396

Open

perf: use reflection for attribute factories instead of source-generating attribute construction #5397

Closed

thomhurst had a problem deploying to Pull Requests April 5, 2026 00:21 — with GitHub Actions Error

refactor: simplify array merge with spread syntax, remove redundant c…

4bc4c2b

…omments

thomhurst temporarily deployed to Pull Requests April 5, 2026 00:24 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: defer per-class JIT via lazy test registration + parallel resolution#5395

perf: defer per-class JIT via lazy test registration + parallel resolution#5395
thomhurst merged 6 commits intomainfrom
perf/lazy-test-registration

thomhurst commented Apr 4, 2026

Uh oh!

codacy-production bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 5, 2026

Uh oh!

claude bot left a comment

Uh oh!

thomhurst commented Apr 5, 2026

Uh oh!

claude bot commented Apr 5, 2026

Uh oh!

thomhurst commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thomhurst commented Apr 4, 2026

Summary

Startup JIT cost (1000 test classes)

No impact on incremental generation

Files changed

Test plan

Uh oh!

codacy-production bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

claude bot commented Apr 5, 2026

Code Review

Critical: Phase 1's Sum() call defeats the filter-aware laziness claim

Architectural concern: Phase 0 is redundant when Phase 1 resolves everything anyway

Minor: No ParallelOptions — unbounded thread pool usage

Minor: RegisterEntries is kept as a public API with no callers

Good aspects of this PR

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review: perf: defer per-class JIT via lazy test registration + parallel resolution

Architecture / Design

Specific Concerns

Minor Notes

Summary

Uh oh!

thomhurst commented Apr 5, 2026

Uh oh!

claude bot commented Apr 5, 2026

Code Review (Follow-up)

Previous Issues: Status

Remaining Issue: Phase 1 Still Resolves All Sources in Filtered Runs

Minor: ParallelOptions Still Not Configured

Good: What Has Been Addressed

Uh oh!

thomhurst commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production bot commented Apr 5, 2026 •

edited

Loading

Critical: Phase 1's `Sum()` call defeats the filter-aware laziness claim

Minor: No `ParallelOptions` — unbounded thread pool usage

Minor: `RegisterEntries` is kept as a public API with no callers

Minor: `ParallelOptions` Still Not Configured