Skip to content

perf: defer per-class JIT via lazy test registration + parallel resolution#5395

Merged
thomhurst merged 6 commits intomainfrom
perf/lazy-test-registration
Apr 5, 2026
Merged

perf: defer per-class JIT via lazy test registration + parallel resolution#5395
thomhurst merged 6 commits intomainfrom
perf/lazy-test-registration

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Lazy registration: Source generator now emits RegisterLazy<T>(static () => Source.Entries) instead of RegisterEntries(Source.Entries). The lambda captures a function pointer without triggering the target class's static constructor — deferring all per-class JIT from module init to discovery time.
  • Parallel resolution: Engine resolves lazy sources in parallel via Parallel.ForEach before filtering, turning sequential per-class JIT into parallel JIT on multi-core machines.
  • Filter-aware: For filtered runs (common during development), only type-matched classes are resolved at all — unmatched classes never JIT their .cctor.

Startup JIT cost (1000 test classes)

Phase Before (Eager) After (Lazy)
Module init JIT 1000 .cctors sequentially JIT 1 .cctor + 1000 lambda allocs
Discovery (filtered to 1 class) Already done JIT 1 .cctor
Discovery (unfiltered) Already done JIT 1000 .cctors in parallel

At module init, only 4 methods are JIT'd regardless of N (the shared .cctor, RegisterLazy<__Canon>, LazyTestEntrySource<__Canon>..ctor, Func<__Canon>..ctor) — O(1) due to .NET's reference-type generic code sharing.

No impact on incremental generation

Each test class still generates its own file with its own partial contribution to TUnit_TestRegistration. No Collect needed. The only change per file is the registration expression.

Files changed

File Change
TUnit.Core/LazyTestEntrySource.cs New — lazy wrapper with factory list support for generic merging
TUnit.Core/SourceRegistrar.cs New RegisterLazy<T>() overload
TUnit.Core.SourceGenerator/.../TestMetadataGenerator.cs Both registration call sites emit RegisterLazy
TUnit.Engine/.../AotTestDataCollector.cs Parallel pre-resolution before filtering
122 .verified.txt files Updated registration lines in snapshots

Context: dotnet/runtime#126541
Follow-up: #5394

Test plan

  • Snapshot tests pass on all 4 TFMs (net8.0, net9.0, net10.0, net472)
  • End-to-end: BasicTests, DataDrivenTests, GenericMethodTests, PropertySetterTests all pass
  • Build succeeds on netstandard2.0 and net10.0
  • CI passes

…ution

Source-generated test registration previously forced all per-class static
constructors to run eagerly during module initialization, causing O(N) JIT
compilations at startup. For large test suites (1000+ classes), this cold-start
cost exceeded reflection-based discovery.

Changes:
- Add LazyTestEntrySource<T> that wraps Func<TestEntry<T>[]> factories and
  resolves lazily on first access (Count, GetFilterData, or Materialize)
- Add SourceRegistrar.RegisterLazy<T>() that stores factories without invoking
  them, reducing module init to O(1) JIT (shared generic code for ref types)
- Generator now emits RegisterLazy<T>(static () => Source.Entries) instead of
  RegisterEntries(Source.Entries) — lambda creation does NOT trigger target .cctor
- Engine resolves lazy sources in parallel via Parallel.ForEach before filtering,
  turning sequential per-class JIT into parallel JIT on multi-core machines
- For filtered runs, only type-matched classes are resolved at all

Preserves incremental generation (no Collect needed) and AOT compatibility.

Context: dotnet/runtime#126541
Follow-up: #5394 (split filter data into separate lightweight class)
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 5, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 10 complexity

Metric Results
Complexity 10

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Merge the two-type approach (TestEntrySource + LazyTestEntrySource) into
a single TestEntrySource<T> that natively wraps factories with lazy resolution.

- Delete LazyTestEntrySource<T> entirely
- TestEntrySource<T> now takes Func<TestEntry<T>[]> and resolves on first access
- Consolidate RegisterEntries + RegisterLazy into a single RegisterEntries<T>(Func<>)
- Generator emits RegisterEntries<T>(static () => Source.Entries) — same method name
  as before, just with a factory wrapper and explicit type parameter
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 5, 2026

Code Review

This PR introduces lazy test registration to defer per-class JIT compilation from module initialization to discovery time. The design goals are sound, and the implementation is largely correct. I have identified a few architectural concerns worth addressing.

Critical: Phase 1's Sum() call defeats the filter-aware laziness claim

The PR description states: "For filtered runs (common during development), only type-matched classes are resolved at all — unmatched classes never JIT their .cctor.". This claim is not accurate.

In AotTestDataCollector.CollectTestsFromTestEntries, Phase 0 correctly limits parallel resolution to type-matched sources. However, immediately after, Phase 1 starts with:

// Phase 1: Filter using pure data (no JIT of test-specific methods)
var totalEntries = Sources.TestEntries.Sum(static kvp => kvp.Value.Count);  // Line 87

This Sum call iterates all registered sources and accesses .Count on each one. For LazyTestEntrySource<T>, accessing .Count triggers Resolve(), which invokes the factory and runs the TestSource static constructor. So even in a filtered run targeting a single class out of 1000, all 1000 lazy sources get resolved sequentially during this Sum.

The Sum is only used to pre-size the matching list as a capacity hint — it is not functionally necessary. The fix is to avoid the eager full resolution here:

// Option A: Use a conservative estimate (avoids full resolution, minor over-allocation)
var matchingCapacity = filterHints.HasHints
    ? Math.Min(Sources.TestEntries.Count * 4, 1024)  // rough estimate
    : Sources.TestEntries.Sum(static kvp => kvp.Value.Count);  // ok when no filter

// Option B: Just remove pre-sizing entirely (list grows dynamically, small perf cost)
var matching = new List<(ITestEntrySource Source, int Index)>();

The penalty of skipping pre-sizing is minor (a few list reallocations) compared to the cost of JIT-compiling every TestSource class for a filtered run.


Architectural concern: Phase 0 is redundant when Phase 1 resolves everything anyway

Even for the unfiltered case, Phase 0's benefit over simply letting Phase 1 resolve lazily is only the parallelism. But the Sum on line 87 is still sequential. You could consolidate both phases:

// Combined parallel resolve + count for unfiltered runs
if (!filterHints.HasHints && Sources.TestEntries.Count > 1)
{
    Parallel.ForEach(Sources.TestEntries, kvp =>
    {
        if (!filterHints.HasHints || filterHints.CouldTypeMatch(kvp.Key))
            _ = kvp.Value.Count;
    });
}

Then remove the Sum entirely and just use new List<...>() without pre-sizing. This keeps the parallel speedup for unfiltered runs while not eagerly resolving unmatched sources in filtered runs.


Minor: No ParallelOptions — unbounded thread pool usage

Parallel.ForEach(sourcesToResolve, static source => _ = source.Count);

The Parallel.ForEach has no ParallelOptions configured, so it uses Environment.ProcessorCount as the default degree of parallelism. This is generally fine for CPU-bound work like JIT compilation. However, it also has no cancellation support. For very large suites or constrained environments, consider:

var parallelOptions = new ParallelOptions
{
    MaxDegreeOfParallelism = Math.Min(sourcesToResolve.Count, Environment.ProcessorCount)
};
Parallel.ForEach(sourcesToResolve, parallelOptions, static source => _ = source.Count);

This is a minor point since the work is intentionally CPU-bound and unbounded parallelism is a Parallel.ForEach default, but making intent explicit avoids confusion.


Minor: RegisterEntries is kept as a public API with no callers

RegisterEntries remains as a public static method in SourceRegistrar. The source generator now exclusively emits RegisterLazy. The handling for the mixed case ("someone called RegisterEntries before RegisterLazy") eagerly invokes factory(), defeating the lazy benefit. If there are no known scenarios where user code calls RegisterEntries directly, this could be [Obsolete] or at least have a [EditorBrowsable(Never)] guard added to discourage use. This prevents the mixed-mode path from being hit accidentally by user-authored module initializers.


Good aspects of this PR

  • The double-check locking in LazyTestEntrySource.Resolve() is correctly implemented with a volatile field and proper lock ordering.
  • The AddFactory post-resolution fallback (eagerly calling _resolved.AddEntries(factory())) correctly handles race conditions between module initializers and early resolution.
  • Using static () => ... lambdas is correct: the compiler caches these as static delegate instances, eliminating per-call heap allocation.
  • The filter-aware Phase 0 (using filterHints.CouldTypeMatch) correctly avoids adding non-matching types to the parallel work list.
  • The BFS dependency expansion in Phase 2 is unchanged and unaffected by this refactor.
  • Snapshot tests were correctly regenerated and committed.

RegisterEntries<T> now takes Func<TestEntry<T>[]> instead of TestEntry<T>[].
TestEntrySource<T> constructor matches.
Also picks up unrelated Assertions API snapshot for net472.
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: perf: defer per-class JIT via lazy test registration + parallel resolution

The core optimization here is well-conceived and correctly implemented. Deferring per-class JIT from module init to discovery time is a meaningful improvement for large test suites, and the parallel warm-up in AotTestDataCollector compounds the benefit on multi-core machines.

Architecture / Design

The lazy factory pattern is the right abstraction. Converting TestEntry<T>[] to Func<TestEntry<T>[]> cleanly separates registration (cheap: store a pointer to the generated array property) from resolution (expensive: trigger the class's static ctor). The generated code diff confirms the key invariant: the lambda captures TUnit_TestProject_X__TestSource.Entries (the generated companion class, not the user's test class), so the user's .cctor is deferred correctly.

Double-checked locking in Resolve() is correctly implemented. volatile TestEntry<T>[]? _entries + the outer null check before acquiring the lock is the standard DCL pattern and is correct here. The _factories = null teardown after resolution is a nice memory hygiene touch.

Specific Concerns

1. AggregateException vs. sequential exceptions (behavioral change)

In AotTestDataCollector.cs:

Parallel.ForEach(sourcesToResolve, static source => _ = source.Count);

If any factory throws (e.g. a source generator bug produces a class with empty entries), Parallel.ForEach wraps all exceptions in an AggregateException. Previously, the engine would surface one InvalidOperationException from Resolve() at a time. This isn't a showstopper, but it's worth ensuring the engine's error handling upstream unwraps AggregateException properly so error messages remain actionable.

2. Validation timing regression

Previously, empty-entries validation happened at module init via RegisterEntries. Now it happens lazily in Resolve(). This means a source-gen bug that produces empty entries won't be caught until discovery — which may be mid-run after other setup has occurred. The new InvalidOperationException message in Resolve() is good, but it might be worth adding a comment explaining this tradeoff so future maintainers don't accidentally "fix" it back to eager.

3. ClassName lazy initialization is a benign race, but worth noting

public string ClassName => _className ??= TUnit.Core.Extensions.TestContextExtensions.GetNestedTypeName(typeof(T));

This is not lock-protected. Two threads can both read null, both compute, and both write. In practice this is fine (pure/idempotent, always same result, string writes are atomic), but if the codebase has a strict "no benign races" policy, a simple Lazy<string> or moving the assignment into Resolve() would eliminate it entirely. At minimum, a comment acknowledging the intentional benign race would help.

4. Unrelated assertion API changes bundled in

The TUnit.PublicAPI snapshot changes — removal of CountWrapper, LengthWrapper, HasCount(), HasLength() overloads, and addition of StringLengthWithInlineAssertionAssertion — appear to be from the follow-up PR #5394, not from this lazy JIT work. Bundling them here makes the diff harder to review and means this PR carries assertion API risk alongside the engine change. If this is intentional (rebased for convenience), it's fine, but it's worth a note in the PR description.

5. Reflection path

The PR description mentions both source-gen and reflection modes. The AotTestDataCollector pre-warm pass is great for the source-gen path. Is there a corresponding change for the reflection-based collector? Or does reflection-mode discovery not suffer the same sequential JIT problem because it uses Assembly.GetTypes() rather than pre-registered factories? If so, it's worth documenting that assumption explicitly.

Minor Notes

  • The if (sourcesToResolve.Count > 1) guard in AotTestDataCollector correctly avoids Parallel.ForEach overhead for the common single-class case. Good.
  • The outer if (Sources.TestEntries.Count > 1) guard is also good — avoids building the sourcesToResolve list entirely for trivially small suites.
  • The snapshot BOM character changes (// <auto-generated/>// <auto-generated/>) are pre-existing whitespace noise, not introduced by this PR's logic.

Summary

The lazy registration + parallel warm-up design is solid. The implementation is correct and the thread-safety model is well-considered. The main actionable items are:

  • Verify AggregateException handling upstream in the engine
  • Consider a note on the validation timing tradeoff
  • Clarify whether the reflection path needs similar treatment
  • Separate or document the bundled assertion API changes

Sources.TestEntries.Sum(kvp => kvp.Value.Count) was forcing resolution of
ALL lazy sources just for a list capacity hint — including sources that
don't match the filter. Use a growing list instead.

Also remove self-evident WHAT comment.
@thomhurst
Copy link
Copy Markdown
Owner Author

Thanks for the thorough review! Here's the status of each item:

Critical: Sum() defeating laziness — Fixed in a500ef5. The Sum() has been removed; matching now uses a growing list.

RegisterEntries kept with no callers — Addressed in c55d700. The refactor consolidated RegisterEntries + RegisterLazy into a single RegisterEntries<T>(Func<TestEntry<T>[]>). There is no longer a separate RegisterLazy method or LazyTestEntrySource<T> type. TestEntrySource<T> itself is now natively lazy.

AggregateException from Parallel.ForEach — The engine already unwraps AggregateException in TUnitMessageBus and FailureCategorizer. No action needed.

Validation timing — Intentional tradeoff. Empty-entries validation now happens at resolution time rather than module init. This only catches source-gen bugs, which are caught the moment the engine discovers the affected class.

ClassName benign race — Intentional. GetNestedTypeName(typeof(T)) is pure/idempotent, string writes are atomic. No correctness risk.

Bundled assertion API changes — The Tests.Assertions_Library_Has_No_API_Changes.Net4_7.verified.txt change is pre-existing snapshot staleness on net472, not introduced by this PR. It just happened to get picked up when regenerating snapshots.

Reflection path — Not affected. Reflection-based discovery uses Assembly.GetTypes() / MethodInfo scanning, not pre-registered factories. The JIT cold-start issue is specific to the source-gen path where [ModuleInitializer] eagerly triggers all class static constructors.

ParallelOptions — Default Parallel.ForEach uses Environment.ProcessorCount which is appropriate for CPU-bound JIT work. Adding explicit options would be over-specification.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 5, 2026

Code Review (Follow-up)

This is a follow-up review comparing the current state against the previous review comments.

Previous Issues: Status

Fixed: The Sum() call that eagerly resolved all lazy sources in Phase 1 has been removed. The matching list is now initialized without pre-sizing (new List<...>()), which was the suggested fix. This is the most important change.

Not addressed (still present): The other issue from the previous review regarding Phase 1 resolving non-type-matched sources is still present in a slightly different form.


Remaining Issue: Phase 1 Still Resolves All Sources in Filtered Runs

The Sum() was removed, but Phase 1 still calls source.Count for every registered source, including those that don't match the type filter:

foreach (var kvp in Sources.TestEntries)
{
    var classType = kvp.Key;
    var source = kvp.Value;
    var typeMatches = !filterHints.HasHints || filterHints.CouldTypeMatch(classType);

    for (var i = 0; i < source.Count; i++)   // <-- Resolve() called here for ALL sources
    {
        var filterData = source.GetFilterData(i);

        if (typeMatches && ...)  // <-- type check happens AFTER resolution

source.Count invokes Resolve() which calls the factory. In a filtered run targeting 1 class out of 1000, Phase 0 correctly resolves only the 1 matching source, but Phase 1 then resolves all 999 non-matching sources sequentially in the source.Count call.

The fix is straightforward — skip iteration entirely for non-type-matched sources:

foreach (var kvp in Sources.TestEntries)
{
    var classType = kvp.Key;
    var source = kvp.Value;

    if (filterHints.HasHints && !filterHints.CouldTypeMatch(classType))
        continue;  // Skip: don't even call source.Count, preserving laziness

    for (var i = 0; i < source.Count; i++)
    {
        var filterData = source.GetFilterData(i);
        if (!filterHints.HasHints || filterHints.CouldMatch(filterData.ClassName, filterData.MethodName))
        {
            matching.Add((source, i));
            if (filterData.DependsOn.Length > 0)
                hasDependencies = true;
        }
    }
}

This matters significantly for the filtered case the PR description highlights as a key benefit: "For filtered runs (common during development), only type-matched classes are resolved at all." Without this fix, the per-class JIT work is still O(N) in filtered runs, just shifted from module init to discovery time.


Minor: ParallelOptions Still Not Configured

The Parallel.ForEach at line 82 still has no ParallelOptions. This was raised as a minor point in the previous review. The default degree-of-parallelism is Environment.ProcessorCount, which is acceptable. The absence of a cancellation token means the parallel resolution cannot be cancelled early (e.g., if the user hits Ctrl+C before discovery completes). This is low priority.


Good: What Has Been Addressed

  • The Sum() pre-sizing that defeated the filter-aware laziness is removed.
  • Double-check locking in Resolve() with volatile _entries is correct.
  • The AddFactory post-resolution fallback correctly handles race conditions.
  • static () => lambda captures are correct for zero-allocation delegate caching.
  • Snapshot tests were correctly regenerated and committed.

The PR is in good shape overall. The remaining Phase 1 issue is meaningful for the filtered-run performance claim in the PR description and should be addressed before merge.

…iltered runs

Previously, Phase 1 called source.Count (triggering Resolve/JIT) for ALL
registered sources before checking the type filter. Now non-matching types
are skipped via early continue, so filtered runs only JIT matching classes.
@thomhurst
Copy link
Copy Markdown
Owner Author

Thanks for catching the Phase 1 issue! Fixed in 16d3e34.

The typeMatches variable was computed before the loop but source.Count was still called unconditionally inside it, triggering Resolve() for non-matching types. Now non-type-matched sources are skipped via early continue before source.Count is ever accessed.

Re: ParallelOptions — Agreed this is low priority. The default Environment.ProcessorCount is appropriate for CPU-bound JIT work, and cancellation during discovery isn't a supported scenario.

This was referenced Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant