perf: defer per-class JIT via lazy test registration + parallel resolution#5395
perf: defer per-class JIT via lazy test registration + parallel resolution#5395
Conversation
…ution Source-generated test registration previously forced all per-class static constructors to run eagerly during module initialization, causing O(N) JIT compilations at startup. For large test suites (1000+ classes), this cold-start cost exceeded reflection-based discovery. Changes: - Add LazyTestEntrySource<T> that wraps Func<TestEntry<T>[]> factories and resolves lazily on first access (Count, GetFilterData, or Materialize) - Add SourceRegistrar.RegisterLazy<T>() that stores factories without invoking them, reducing module init to O(1) JIT (shared generic code for ref types) - Generator now emits RegisterLazy<T>(static () => Source.Entries) instead of RegisterEntries(Source.Entries) — lambda creation does NOT trigger target .cctor - Engine resolves lazy sources in parallel via Parallel.ForEach before filtering, turning sequential per-class JIT into parallel JIT on multi-core machines - For filtered runs, only type-matched classes are resolved at all Preserves incremental generation (no Collect needed) and AOT compatibility. Context: dotnet/runtime#126541 Follow-up: #5394 (split filter data into separate lightweight class)
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 10 |
TIP This summary will be updated as you push new changes. Give us feedback
Merge the two-type approach (TestEntrySource + LazyTestEntrySource) into a single TestEntrySource<T> that natively wraps factories with lazy resolution. - Delete LazyTestEntrySource<T> entirely - TestEntrySource<T> now takes Func<TestEntry<T>[]> and resolves on first access - Consolidate RegisterEntries + RegisterLazy into a single RegisterEntries<T>(Func<>) - Generator emits RegisterEntries<T>(static () => Source.Entries) — same method name as before, just with a factory wrapper and explicit type parameter
Code ReviewThis PR introduces lazy test registration to defer per-class JIT compilation from module initialization to discovery time. The design goals are sound, and the implementation is largely correct. I have identified a few architectural concerns worth addressing. Critical: Phase 1's
|
RegisterEntries<T> now takes Func<TestEntry<T>[]> instead of TestEntry<T>[]. TestEntrySource<T> constructor matches. Also picks up unrelated Assertions API snapshot for net472.
There was a problem hiding this comment.
Code Review: perf: defer per-class JIT via lazy test registration + parallel resolution
The core optimization here is well-conceived and correctly implemented. Deferring per-class JIT from module init to discovery time is a meaningful improvement for large test suites, and the parallel warm-up in AotTestDataCollector compounds the benefit on multi-core machines.
Architecture / Design
The lazy factory pattern is the right abstraction. Converting TestEntry<T>[] to Func<TestEntry<T>[]> cleanly separates registration (cheap: store a pointer to the generated array property) from resolution (expensive: trigger the class's static ctor). The generated code diff confirms the key invariant: the lambda captures TUnit_TestProject_X__TestSource.Entries (the generated companion class, not the user's test class), so the user's .cctor is deferred correctly.
Double-checked locking in Resolve() is correctly implemented. volatile TestEntry<T>[]? _entries + the outer null check before acquiring the lock is the standard DCL pattern and is correct here. The _factories = null teardown after resolution is a nice memory hygiene touch.
Specific Concerns
1. AggregateException vs. sequential exceptions (behavioral change)
In AotTestDataCollector.cs:
Parallel.ForEach(sourcesToResolve, static source => _ = source.Count);If any factory throws (e.g. a source generator bug produces a class with empty entries), Parallel.ForEach wraps all exceptions in an AggregateException. Previously, the engine would surface one InvalidOperationException from Resolve() at a time. This isn't a showstopper, but it's worth ensuring the engine's error handling upstream unwraps AggregateException properly so error messages remain actionable.
2. Validation timing regression
Previously, empty-entries validation happened at module init via RegisterEntries. Now it happens lazily in Resolve(). This means a source-gen bug that produces empty entries won't be caught until discovery — which may be mid-run after other setup has occurred. The new InvalidOperationException message in Resolve() is good, but it might be worth adding a comment explaining this tradeoff so future maintainers don't accidentally "fix" it back to eager.
3. ClassName lazy initialization is a benign race, but worth noting
public string ClassName => _className ??= TUnit.Core.Extensions.TestContextExtensions.GetNestedTypeName(typeof(T));This is not lock-protected. Two threads can both read null, both compute, and both write. In practice this is fine (pure/idempotent, always same result, string writes are atomic), but if the codebase has a strict "no benign races" policy, a simple Lazy<string> or moving the assignment into Resolve() would eliminate it entirely. At minimum, a comment acknowledging the intentional benign race would help.
4. Unrelated assertion API changes bundled in
The TUnit.PublicAPI snapshot changes — removal of CountWrapper, LengthWrapper, HasCount(), HasLength() overloads, and addition of StringLengthWithInlineAssertionAssertion — appear to be from the follow-up PR #5394, not from this lazy JIT work. Bundling them here makes the diff harder to review and means this PR carries assertion API risk alongside the engine change. If this is intentional (rebased for convenience), it's fine, but it's worth a note in the PR description.
5. Reflection path
The PR description mentions both source-gen and reflection modes. The AotTestDataCollector pre-warm pass is great for the source-gen path. Is there a corresponding change for the reflection-based collector? Or does reflection-mode discovery not suffer the same sequential JIT problem because it uses Assembly.GetTypes() rather than pre-registered factories? If so, it's worth documenting that assumption explicitly.
Minor Notes
- The
if (sourcesToResolve.Count > 1)guard inAotTestDataCollectorcorrectly avoidsParallel.ForEachoverhead for the common single-class case. Good. - The outer
if (Sources.TestEntries.Count > 1)guard is also good — avoids building thesourcesToResolvelist entirely for trivially small suites. - The snapshot BOM character changes (
// <auto-generated/>→// <auto-generated/>) are pre-existing whitespace noise, not introduced by this PR's logic.
Summary
The lazy registration + parallel warm-up design is solid. The implementation is correct and the thread-safety model is well-considered. The main actionable items are:
- Verify
AggregateExceptionhandling upstream in the engine - Consider a note on the validation timing tradeoff
- Clarify whether the reflection path needs similar treatment
- Separate or document the bundled assertion API changes
Sources.TestEntries.Sum(kvp => kvp.Value.Count) was forcing resolution of ALL lazy sources just for a list capacity hint — including sources that don't match the filter. Use a growing list instead. Also remove self-evident WHAT comment.
|
Thanks for the thorough review! Here's the status of each item: Critical:
Validation timing — Intentional tradeoff. Empty-entries validation now happens at resolution time rather than module init. This only catches source-gen bugs, which are caught the moment the engine discovers the affected class.
Bundled assertion API changes — The Reflection path — Not affected. Reflection-based discovery uses
|
Code Review (Follow-up)This is a follow-up review comparing the current state against the previous review comments. Previous Issues: StatusFixed: The Not addressed (still present): The other issue from the previous review regarding Phase 1 resolving non-type-matched sources is still present in a slightly different form. Remaining Issue: Phase 1 Still Resolves All Sources in Filtered RunsThe foreach (var kvp in Sources.TestEntries)
{
var classType = kvp.Key;
var source = kvp.Value;
var typeMatches = !filterHints.HasHints || filterHints.CouldTypeMatch(classType);
for (var i = 0; i < source.Count; i++) // <-- Resolve() called here for ALL sources
{
var filterData = source.GetFilterData(i);
if (typeMatches && ...) // <-- type check happens AFTER resolution
The fix is straightforward — skip iteration entirely for non-type-matched sources: foreach (var kvp in Sources.TestEntries)
{
var classType = kvp.Key;
var source = kvp.Value;
if (filterHints.HasHints && !filterHints.CouldTypeMatch(classType))
continue; // Skip: don't even call source.Count, preserving laziness
for (var i = 0; i < source.Count; i++)
{
var filterData = source.GetFilterData(i);
if (!filterHints.HasHints || filterHints.CouldMatch(filterData.ClassName, filterData.MethodName))
{
matching.Add((source, i));
if (filterData.DependsOn.Length > 0)
hasDependencies = true;
}
}
}This matters significantly for the filtered case the PR description highlights as a key benefit: "For filtered runs (common during development), only type-matched classes are resolved at all." Without this fix, the per-class JIT work is still O(N) in filtered runs, just shifted from module init to discovery time. Minor:
|
…iltered runs Previously, Phase 1 called source.Count (triggering Resolve/JIT) for ALL registered sources before checking the type filter. Now non-matching types are skipped via early continue, so filtered runs only JIT matching classes.
|
Thanks for catching the Phase 1 issue! Fixed in 16d3e34. The Re: |
Summary
RegisterLazy<T>(static () => Source.Entries)instead ofRegisterEntries(Source.Entries). The lambda captures a function pointer without triggering the target class's static constructor — deferring all per-class JIT from module init to discovery time.Parallel.ForEachbefore filtering, turning sequential per-class JIT into parallel JIT on multi-core machines..cctor.Startup JIT cost (1000 test classes)
.cctors sequentially.cctor+ 1000 lambda allocs.cctor.cctors in parallelAt module init, only 4 methods are JIT'd regardless of N (the shared
.cctor,RegisterLazy<__Canon>,LazyTestEntrySource<__Canon>..ctor,Func<__Canon>..ctor) — O(1) due to .NET's reference-type generic code sharing.No impact on incremental generation
Each test class still generates its own file with its own partial contribution to
TUnit_TestRegistration. NoCollectneeded. The only change per file is the registration expression.Files changed
TUnit.Core/LazyTestEntrySource.csTUnit.Core/SourceRegistrar.csRegisterLazy<T>()overloadTUnit.Core.SourceGenerator/.../TestMetadataGenerator.csRegisterLazyTUnit.Engine/.../AotTestDataCollector.cs.verified.txtfilesContext: dotnet/runtime#126541
Follow-up: #5394
Test plan