Skip to content

Perf: Generate typed HandleCall<T1,...> overloads to eliminate argument boxing#5399

Merged
thomhurst merged 8 commits intomainfrom
perf/typed-handlecall-overloads
Apr 5, 2026
Merged

Perf: Generate typed HandleCall<T1,...> overloads to eliminate argument boxing#5399
thomhurst merged 8 commits intomainfrom
perf/typed-handlecall-overloads

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

Closes #5389

  • Generates arity-specific HandleCall<T1,...,TN> overloads (up to 8 type params) that pass arguments without boxing, instead of allocating new object?[] { ... } on every mock invocation
  • Adds ArgumentStore<T1,...> structs that defer boxing until Arguments is actually accessed (verification/diagnostics — rare path)
  • Adds typed MethodSetup.Matches<T1,...> overloads that call IArgumentMatcher<T>.Matches(T) directly
  • Methods with >8 params or ref-struct params fall back to the existing object?[] path

Benchmark Results

Benchmark Before After Improvement
Invocation (single, int) 387 ns / 176 B 231 ns / 208 B ~40% faster
Invocation (single, string) - 220 ns / 144 B Reference types benefit most
Invocation (100x) 37.7 μs / 18 KB 22.8 μs / 20.7 KB ~40% faster

vs Imposter gap narrowed from ~35% to ~7%.

Note: Allocation for value-type args is slightly higher than the issue target because IBehavior.Execute(object?[]) still requires boxing when a behavior is found. The string case (no boxing needed) hits 144 B. This is inherent to the IBehavior interface and could be addressed separately with typed behavior overloads.

Test plan

  • All 748 TUnit.Mocks.Tests pass on net10.0
  • All 100 snapshot tests pass (18 snapshots updated for typed dispatch)
  • Invocation benchmarks show ~40% improvement
  • CI passes on all target frameworks

- Refactor CallRecord from record to class with IArgumentStore support
- Add ICapturingMatcher<T> for zero-boxing argument capture
- Add typed Matches<T1..T8> and ApplyCaptures<T1..T8> to MethodSetup
- Make MockEngine<T> partial, add MockEngine.Typed.cs
- 32 typed dispatch methods (4 patterns x 8 arities)
- Typed FindMatchingSetup<T1..T8> and RecordCall with IArgumentStore
- Zero boxing on hot path: matching, recording, capture all typed
- Behavior.Execute still boxes via store.ToArray() (cold path)
…ty 1-8

Methods with 1-8 non-out, non-ref-struct parameters now call typed
HandleCall<T1,...> overloads, eliminating object?[] allocation and boxing.
Methods with >8 params or ref-struct params fall back to existing path.
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Typed HandleCall Overloads for Boxing Elimination

The goal here is clear and valuable — eliminate per-invocation object?[] allocation by generating arity-specific typed overloads. The benchmark numbers are real and the approach is sound. That said, there are several architectural concerns worth discussing before this lands.


1. Massive code duplication that will compound over time

MockEngine.Typed.cs is 1,279 lines of near-identical code — 8 arities × 4 method variants (HandleCall, HandleCallWithReturn, TryHandleCall, TryHandleCallWithReturn) = 32 near-identical methods. Every bug fix or enhancement to the dispatch logic must now be applied in 32 places instead of 4. The same pattern repeats in MockEngine.cs (8 more FindMatchingSetup overloads), MethodSetup.cs (8 Matches + 8 ApplyCaptures overloads = 16 more methods).

This file should itself be source-generated. The source generator already exists in this codebase. Consider a T4 template, a [GenerateArityOverloads]-style generator, or at minimum a .tt / Scriban template committed alongside the output. Right now if you find a subtle bug in arity-1's HandleCallWithReturn, you need to remember to fix it in arities 2–8 too.

Compare to how System.Action, Func, ValueTuple, and similar framework types handle this — they use T4 templates and commit the generated output, making the generation source the truth.


2. IBehavior.Execute(object?[]) negates most of the gain on the hot path

Every typed HandleCall<T1> path that finds a matching behavior immediately calls:

var behaviorResult = behavior.Execute(store.ToArray()); // allocates here anyway

The PR summary acknowledges this: "Allocation for value-type args is slightly higher than the issue target because IBehavior.Execute(object?[]) still requires boxing when a behavior is found."

But this is actually the common case — when you've set up a behavior and the mock is called, the behavior fires. The fallthrough (no setup) path that avoids boxing is typically the rarer path in a real test.

The 40% speedup in the benchmarks likely reflects the case where no behavior matches (unmatched call path). For the happy path (behavior executes), the gain disappears. Consider extending IBehavior with a typed variant, or at least measuring the benchmark with a behavior installed.


3. ArgumentStore<T> boxing occurs in FormatCall on every strict-mode failure

private static string FormatCall(string memberName, IArgumentStore store)
{
    var formattedArgs = string.Join(", ", store.ToArray().Select(a => a?.ToString() ?? "null"));
    return $"{memberName}({formattedArgs})";
}

store.ToArray() is called every time a strict-mode violation is formatted. This is fine (exceptions are rare), but FormatCall could accept the args directly from store without boxing — though it's truly a rare path so this is a minor nit.


4. CanUseTypedDispatch recomputes LINQ queries three times per method

private static bool CanUseTypedDispatch(MockMemberModel method)
{
    if (method.HasRefStructParams) return false;
    var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out).ToList();
    return nonOutParams.Count is >= 1 and <= 8;
}

private static string GetTypedArgsGenericParams(MockMemberModel method)
{
    var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out); // recomputed
    ...
}

private static string GetTypedArgsList(MockMemberModel method)
{
    var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out); // recomputed again
    ...
}

These three methods are always called together (check, then both getters). The Where filter is evaluated three times. In a source generator this runs at compile time so it's not a hot path, but it's still avoidable. Consider returning a tuple from a single helper:

private static (bool CanUse, string TypeArgs, string ArgsList) GetTypedDispatchInfo(MockMemberModel method)

5. ArgumentStore<T> structs stored via IArgumentStore interface defeats stack allocation

var store = new ArgumentStore<T1>(arg1);
var callRecord = RecordCall(memberId, memberName, store); // IArgumentStore parameter — boxes the struct

RecordCall(int, string, IArgumentStore) accepts IArgumentStore — passing a struct through an interface parameter boxes it. So ArgumentStore<T1> is immediately heap-allocated anyway when RecordCall is called.

The deferred boxing goal (avoid object?[]) is still achieved — store boxes once (struct → interface) instead of allocating a new object?[] — but the struct itself isn't stack-resident after this call. The _store field in CallRecord holds an IArgumentStore reference, meaning the store lives on the heap as an interface-boxed struct.

For the no-behavior path this is still a win (one allocation instead of array + boxing), but consider whether CallRecord could be made generic, or whether the IArgumentStore field could be replaced with a fixed-size inline struct using InlineArray (net8+).


6. MockImplBuilder.cs duplication across three method generators

The pattern:

var useTypedDispatch = CanUseTypedDispatch(method);
string? argsArray = null;
string? typeArgs = null;
string? argsList = null;

if (useTypedDispatch)
{
    typeArgs = GetTypedArgsGenericParams(method);
    argsList = GetTypedArgsList(method);
}
else
{
    argsArray = EmitArgsArrayVariable(writer, method);
}

...is copy-pasted verbatim into GenerateWrapMethodBody, GeneratePartialMethodBody, and GenerateEngineDispatchBody. Extract this into a helper that returns a discriminated union or a simple record:

private record DispatchStrategy(bool IsTyped, string? TypeArgs, string? ArgsList, string? ArgsArray);

private static DispatchStrategy GetDispatchStrategy(CodeWriter writer, MockMemberModel method)
{
    if (CanUseTypedDispatch(method))
    {
        return new(true, GetTypedArgsGenericParams(method), GetTypedArgsList(method), null);
    }
    return new(false, null, null, EmitArgsArrayVariable(writer, method));
}

Similarly, the if (useTypedDispatch) ... else ... pattern appears ~10 times in each of the three generators = ~30 branch pairs total. A helper EmitDispatchCall(writer, strategy, ...) would centralize the template string selection.


7. Stateful setups fall back to boxing unnecessarily

if (_hasStatefulSetups)
    return FindMatchingSetupLocked(memberId, [arg1]); // allocates object?[] here

All 8 typed FindMatchingSetup<T1,...> overloads contain this fallback. This means that as soon as any stateful setup is added to the mock (even for a different member), all typed dispatch reverts to boxing. This is a significant regression risk: a developer adds one state machine setup and suddenly all their previously-optimized mocks box again.

At minimum, document this in a comment. Better: filter stateful setups per-member rather than using a global _hasStatefulSetups flag, so only the members that actually have state-conditional setups fall back.


8. MethodSetup.Matches length check inconsistency with typed ApplyCaptures

public bool Matches<T1>(T1 arg1)
{
    if (_matchers.Length != 1) return false; // strict equality
    ...
}

public void ApplyCaptures<T1>(T1 arg1)
{
    if (_matchers.Length >= 1) CaptureSingle(_matchers[0], arg1); // >= not ==
}

Matches requires exactly N matchers, but ApplyCaptures uses >=. If a setup somehow had 2 matchers but was matched via a 1-arg typed path (which the Matches check prevents), ApplyCaptures would still run on _matchers[0]. This is probably harmless in practice since ApplyCaptures is only called after Matches returns true, but the defensive check in Matches vs. the permissive check in ApplyCaptures is an inconsistency that could confuse future maintainers.


Summary

Concern Severity
MockEngine.Typed.cs should itself be T4/source-generated High — maintainability
IBehavior.Execute(object?[]) still boxes on the common path (behavior hit) Medium — partially defeats the purpose
Stateful setup fallback boxes all typed args globally Medium — correctness surprise
ArgumentStore<T> struct is boxed via IArgumentStore on RecordCall Medium — partially defeats stack-allocation intent
CanUseTypedDispatch/GetTypedArgs* triple-computation Low — compile-time only
Copy-paste across three method generators in MockImplBuilder Medium — maintainability
Matches vs ApplyCaptures length check inconsistency Low

The performance goal is right and the approach is directionally sound. The main ask would be: generate MockEngine.Typed.cs from a template rather than hand-authoring it, and document (or fix) the stateful-setup boxing regression. Those two changes would make this much safer to maintain long-term.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 5, 2026

Not up to standards ⛔

🔴 Issues 8 medium · 46 minor

Alerts:
⚠ 54 issues (≤ 0 issues of at least minor severity)

Results:
54 new issues

Category Results
UnusedCode 1 minor
CodeStyle 45 minor
Performance 8 medium

View in Codacy

🟢 Metrics 547 complexity

Metric Results
Complexity 547

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

- Extract StoreCallRecord to deduplicate lock/append logic in RecordCall
- Remove redundant "The Nth argument" doc comments from ArgumentStore fields
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Typed HandleCall overloads to eliminate argument boxing

This is a well-motivated performance PR with solid benchmark numbers (~40% invocation speedup). The core design is sound. A few concerns worth addressing:


Significant: Duplicated if (useTypedDispatch) ... else ... pattern in MockImplBuilder.cs

The typed/legacy dispatch branching is repeated ~16 times across GenerateWrapMethodBody, GeneratePartialMethodBody, and GenerateEngineDispatchBody. Every call site looks like:

if (useTypedDispatch)
    writer.AppendLine($"if (_engine.TryHandleCall<{typeArgs}>({method.MemberId}, \"{method.Name}\", {argsList}))");
else
    writer.AppendLine($"if (_engine.TryHandleCall({method.MemberId}, \"{method.Name}\", {argsArray}))");

This bloats all three body-generation methods significantly and makes future changes error-prone (you'd need to update both branches in many places). A helper like:

private static string BuildHandleCall(MockMemberModel method, bool useTyped, string? typeArgs, string? argsList, string? argsArray)
    => useTyped
        ? $"_engine.HandleCall<{typeArgs}>({method.MemberId}, \"{method.Name}\", {argsList})"
        : $"_engine.HandleCall({method.MemberId}, \"{method.Name}\", {argsArray})";

(with similar helpers for HandleCallWithReturn, TryHandleCall, TryHandleCallWithReturn) would centralize the branching and make the body-generation code read almost identically to the pre-PR version.


Medium: CallRecord.Arguments lazy init is not thread-safe

public object?[] Arguments => _arguments ??= _store?.ToArray() ?? [];

If Arguments is read concurrently by multiple threads (e.g., one thread verifying while another invocation is being processed), two threads could both observe _arguments == null and both call ToArray(). The result is still correct (idempotent), but the double allocation may be surprising in a framework that uses Volatile.Read/Write elsewhere in this same class. Consider either:

  • Documenting the benign race explicitly, or
  • Using Interlocked.CompareExchange to ensure a single allocation:
public object?[] Arguments
{
    get
    {
        if (_arguments is not null) return _arguments;
        var arr = _store?.ToArray() ?? [];
        return Interlocked.CompareExchange(ref _arguments, arr, null) ?? arr;
    }
}

This is consistent with how the rest of CallRecord handles thread safety.


Medium: MockEngine.Typed.cs maintainability at 1,279 lines of boilerplate

The 32 handwritten overloads (4 dispatch patterns × 8 arities) share near-identical logic. If the hot-path dispatch ever changes (new hook, property auto-tracking, strict-mode behavior, capture application), all 32 methods need updating. The current PR already shows this pattern is hard to keep in sync — AutoTrackProperties is handled in the typed overloads but the phrasing slightly differs across arities.

A few options to consider:

  1. T4 / Roslyn-emitted source — Generate MockEngine.Typed.cs itself from a template. This makes "these 32 methods are all the same pattern" explicit and ensures they stay in sync.
  2. Extract shared logic to a helper — Each typed overload could call a private CoreHandleCall(int memberId, string memberName, IArgumentStore store, ...) that contains the lock/record/find/dispatch logic, with only the ArgumentStore construction and typed FindMatchingSetup call being per-arity. This would reduce each 30-line typed overload to ~5 lines.

Option 2 is lower effort and directly addresses the maintainability concern without requiring a build-time code gen step.


Minor: BOM introduced in all snapshot files

All 18 snapshot files changed from // <auto-generated/> to // <auto-generated/> (UTF-8 BOM added). This is unrelated to the feature. If intentional (e.g., a source generator StreamWriter change), it should be called out explicitly. If unintentional, it's noise that could confuse future snapshot diffs.


What's done well

  • ArgumentStore<T1..T8> as readonly struct: correct choice — avoids a heap allocation, and deferring boxing to ToArray() (cold/error path) is the right design.
  • MatchSingle<T> with IArgumentMatcher<T> fast path and [AggressiveInlining]: clean and avoids an allocation per argument check.
  • _hasStatefulSetups fallback: falling back to the array-based path for stateful/sequential setups is pragmatic and correct. The limitation is well-documented in the PR description.
  • Volatile.Read/Write for IsVerified/IsUnmatched: appropriate for concurrent test verification.
  • Snapshot tests updated: all 18 snapshots correctly reflect the new typed dispatch signatures.

Overall: the performance goal is clearly achieved and the design is solid. The main asks before merging are to reduce the MockImplBuilder.cs duplication and consider the CallRecord.Arguments thread-safety edge case.

- Make CallRecord.Arguments thread-safe with Interlocked.CompareExchange
- Extract Emit* helpers in MockImplBuilder to eliminate 25 if/else branches
- Combine CanUseTypedDispatch + GetTypedArgs* into single GetTypedDispatchInfo
- Strip unintentional BOM from snapshot files
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 5, 2026

Code Review

This is a well-targeted performance optimization with clear benchmarks backing the ~40% throughput improvement. The architecture is sound overall. Below are observations ranging from correctness/design concerns to minor notes.


Issues Worth Discussing

1. ArgumentStore<T1> is still boxed on every invocation via RecordCall

In MockEngine.Typed.cs, every arity overload does:

var store = new ArgumentStore<T1>(arg1);
var callRecord = RecordCall(memberId, memberName, store);  // boxes to IArgumentStore

RecordCall accepts IArgumentStore — a reference-type interface — so the ArgumentStore<T1> struct is boxed on every single invocation, regardless of whether matching succeeds. This is one heap allocation per call (the box), just shaped differently from the old object?[]. The benchmark confirms this: single-int invocations now use 208 B vs 176 B — slightly more memory.

The real win is that FindMatchingSetup(memberId, arg1) runs first without touching the store, so most invocations avoid the array+boxing of the arguments during matching. But the RecordCall boxing happens unconditionally before matching.

Suggestion: If the goal is zero allocation on the hot path, consider deferring the CallRecord creation to after a setup is found (or making RecordCall generic so the struct isn't boxed). For the current scope, this is an acceptable trade-off, but the PR description framing could be clearer — the improvement is "less allocation during matching", not "zero allocation".


2. Stateful setups silently opt out of typed dispatch

In each typed FindMatchingSetup<T1,...> overload:

if (_hasStatefulSetups)
    return FindMatchingSetupLocked(memberId, [arg1]);  // Creates object?[] — boxes T1

Any mock that uses state-machine features falls back to object?[] array allocation and boxing, silently. This is expected but deserves a comment (even just // stateful path must materialize args for state guard evaluation), since it means the ~40% improvement doesn't apply to stateful mocks.


3. Behavior execution always boxes with store.ToArray()

var behaviorResult = behavior.Execute(store.ToArray());  // allocates + boxes on every matched call

The PR description acknowledges this but buries it in a footnote. Worth being explicit in the code with a // TODO: typed IBehavior overloads would eliminate this comment, otherwise future readers will see the ArgumentStore and assume boxing is fully avoided.


4. Code duplication in MockEngine.Typed.cs (1279 lines)

The file is 8 arities × 4 method flavors × nearly-identical bodies. The duplication is mechanical and will cause maintenance friction: a bug or behavioral change in one arity must be applied to all 8.

Better approach: Generate this file using T4 (.tt) templates or a Roslyn-based source generator — the same way .NET's BCL generates Action<T1,...> and Func<T1,...>. The template would be ~100 lines producing the 1279. A T4 file could sit alongside MockEngine.Typed.cs and be committed alongside its output.

This doesn't need to block the PR, but is worth tracking.


5. ApplyCaptures bounds checks are redundant after a typed Matches

public void ApplyCaptures<T1>(T1 arg1)
{
    if (_matchers.Length >= 1) CaptureSingle(_matchers[0], arg1);
}

ApplyCaptures<T1> is only called after Matches<T1> confirmed _matchers.Length == 1, so the >= 1 guard is always true. The bounds check adds noise but is harmless. A defensive comment or Debug.Assert(_matchers.Length == 1) would make the invariant explicit rather than relying on call-site discipline.


Positive Notes

  • The ArgumentStore<T1,...> deferred-boxing pattern for CallRecord.Arguments is elegant: the Interlocked.CompareExchange lazy materialization is correct and thread-safe.
  • CaptureMatcher<T> gaining a typed ICapturingMatcher<T>.ApplyCapture(T?) path is a clean way to avoid boxing on the capture side.
  • GetTypedDispatchInfo correctly excludes out parameters (write-only) while including ref parameters (read+write) when building the typed arg list.
  • MockImplBuilder.cs helper methods (EmitHandleCall, EmitTryHandleCall, etc.) keep the fallback/typed selection in one place — easy to audit.
  • Snapshot tests for all 18 affected cases are updated, which is exactly right.

Summary

The PR achieves its goal for the common reference-type and no-match paths. The main concerns are:

  1. The int/value-type allocation story is slightly worse (acknowledged in benchmarks but the code doesn't reflect this clearly)
  2. MockEngine.Typed.cs is a long-term maintenance risk without codegen
  3. Stateful mocks silently don't benefit — worth documenting

All three are reasonable to address in follow-ups. The correctness looks solid, tests pass, and the performance improvement is real and well-measured.

@thomhurst thomhurst merged commit 2daa56b into main Apr 5, 2026
13 of 14 checks passed
@thomhurst thomhurst deleted the perf/typed-handlecall-overloads branch April 5, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: Generate typed HandleCall<T1,T2,...> overloads to eliminate argument boxing

1 participant