Perf: Generate typed HandleCall<T1,...> overloads to eliminate argument boxing#5399
Perf: Generate typed HandleCall<T1,...> overloads to eliminate argument boxing#5399
Conversation
- Refactor CallRecord from record to class with IArgumentStore support - Add ICapturingMatcher<T> for zero-boxing argument capture - Add typed Matches<T1..T8> and ApplyCaptures<T1..T8> to MethodSetup
- Make MockEngine<T> partial, add MockEngine.Typed.cs - 32 typed dispatch methods (4 patterns x 8 arities) - Typed FindMatchingSetup<T1..T8> and RecordCall with IArgumentStore - Zero boxing on hot path: matching, recording, capture all typed - Behavior.Execute still boxes via store.ToArray() (cold path)
…ty 1-8 Methods with 1-8 non-out, non-ref-struct parameters now call typed HandleCall<T1,...> overloads, eliminating object?[] allocation and boxing. Methods with >8 params or ref-struct params fall back to existing path.
There was a problem hiding this comment.
Code Review: Typed HandleCall Overloads for Boxing Elimination
The goal here is clear and valuable — eliminate per-invocation object?[] allocation by generating arity-specific typed overloads. The benchmark numbers are real and the approach is sound. That said, there are several architectural concerns worth discussing before this lands.
1. Massive code duplication that will compound over time
MockEngine.Typed.cs is 1,279 lines of near-identical code — 8 arities × 4 method variants (HandleCall, HandleCallWithReturn, TryHandleCall, TryHandleCallWithReturn) = 32 near-identical methods. Every bug fix or enhancement to the dispatch logic must now be applied in 32 places instead of 4. The same pattern repeats in MockEngine.cs (8 more FindMatchingSetup overloads), MethodSetup.cs (8 Matches + 8 ApplyCaptures overloads = 16 more methods).
This file should itself be source-generated. The source generator already exists in this codebase. Consider a T4 template, a [GenerateArityOverloads]-style generator, or at minimum a .tt / Scriban template committed alongside the output. Right now if you find a subtle bug in arity-1's HandleCallWithReturn, you need to remember to fix it in arities 2–8 too.
Compare to how System.Action, Func, ValueTuple, and similar framework types handle this — they use T4 templates and commit the generated output, making the generation source the truth.
2. IBehavior.Execute(object?[]) negates most of the gain on the hot path
Every typed HandleCall<T1> path that finds a matching behavior immediately calls:
var behaviorResult = behavior.Execute(store.ToArray()); // allocates here anywayThe PR summary acknowledges this: "Allocation for value-type args is slightly higher than the issue target because IBehavior.Execute(object?[]) still requires boxing when a behavior is found."
But this is actually the common case — when you've set up a behavior and the mock is called, the behavior fires. The fallthrough (no setup) path that avoids boxing is typically the rarer path in a real test.
The 40% speedup in the benchmarks likely reflects the case where no behavior matches (unmatched call path). For the happy path (behavior executes), the gain disappears. Consider extending IBehavior with a typed variant, or at least measuring the benchmark with a behavior installed.
3. ArgumentStore<T> boxing occurs in FormatCall on every strict-mode failure
private static string FormatCall(string memberName, IArgumentStore store)
{
var formattedArgs = string.Join(", ", store.ToArray().Select(a => a?.ToString() ?? "null"));
return $"{memberName}({formattedArgs})";
}store.ToArray() is called every time a strict-mode violation is formatted. This is fine (exceptions are rare), but FormatCall could accept the args directly from store without boxing — though it's truly a rare path so this is a minor nit.
4. CanUseTypedDispatch recomputes LINQ queries three times per method
private static bool CanUseTypedDispatch(MockMemberModel method)
{
if (method.HasRefStructParams) return false;
var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out).ToList();
return nonOutParams.Count is >= 1 and <= 8;
}
private static string GetTypedArgsGenericParams(MockMemberModel method)
{
var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out); // recomputed
...
}
private static string GetTypedArgsList(MockMemberModel method)
{
var nonOutParams = method.Parameters.Where(p => p.Direction != ParameterDirection.Out); // recomputed again
...
}These three methods are always called together (check, then both getters). The Where filter is evaluated three times. In a source generator this runs at compile time so it's not a hot path, but it's still avoidable. Consider returning a tuple from a single helper:
private static (bool CanUse, string TypeArgs, string ArgsList) GetTypedDispatchInfo(MockMemberModel method)5. ArgumentStore<T> structs stored via IArgumentStore interface defeats stack allocation
var store = new ArgumentStore<T1>(arg1);
var callRecord = RecordCall(memberId, memberName, store); // IArgumentStore parameter — boxes the structRecordCall(int, string, IArgumentStore) accepts IArgumentStore — passing a struct through an interface parameter boxes it. So ArgumentStore<T1> is immediately heap-allocated anyway when RecordCall is called.
The deferred boxing goal (avoid object?[]) is still achieved — store boxes once (struct → interface) instead of allocating a new object?[] — but the struct itself isn't stack-resident after this call. The _store field in CallRecord holds an IArgumentStore reference, meaning the store lives on the heap as an interface-boxed struct.
For the no-behavior path this is still a win (one allocation instead of array + boxing), but consider whether CallRecord could be made generic, or whether the IArgumentStore field could be replaced with a fixed-size inline struct using InlineArray (net8+).
6. MockImplBuilder.cs duplication across three method generators
The pattern:
var useTypedDispatch = CanUseTypedDispatch(method);
string? argsArray = null;
string? typeArgs = null;
string? argsList = null;
if (useTypedDispatch)
{
typeArgs = GetTypedArgsGenericParams(method);
argsList = GetTypedArgsList(method);
}
else
{
argsArray = EmitArgsArrayVariable(writer, method);
}...is copy-pasted verbatim into GenerateWrapMethodBody, GeneratePartialMethodBody, and GenerateEngineDispatchBody. Extract this into a helper that returns a discriminated union or a simple record:
private record DispatchStrategy(bool IsTyped, string? TypeArgs, string? ArgsList, string? ArgsArray);
private static DispatchStrategy GetDispatchStrategy(CodeWriter writer, MockMemberModel method)
{
if (CanUseTypedDispatch(method))
{
return new(true, GetTypedArgsGenericParams(method), GetTypedArgsList(method), null);
}
return new(false, null, null, EmitArgsArrayVariable(writer, method));
}Similarly, the if (useTypedDispatch) ... else ... pattern appears ~10 times in each of the three generators = ~30 branch pairs total. A helper EmitDispatchCall(writer, strategy, ...) would centralize the template string selection.
7. Stateful setups fall back to boxing unnecessarily
if (_hasStatefulSetups)
return FindMatchingSetupLocked(memberId, [arg1]); // allocates object?[] hereAll 8 typed FindMatchingSetup<T1,...> overloads contain this fallback. This means that as soon as any stateful setup is added to the mock (even for a different member), all typed dispatch reverts to boxing. This is a significant regression risk: a developer adds one state machine setup and suddenly all their previously-optimized mocks box again.
At minimum, document this in a comment. Better: filter stateful setups per-member rather than using a global _hasStatefulSetups flag, so only the members that actually have state-conditional setups fall back.
8. MethodSetup.Matches length check inconsistency with typed ApplyCaptures
public bool Matches<T1>(T1 arg1)
{
if (_matchers.Length != 1) return false; // strict equality
...
}
public void ApplyCaptures<T1>(T1 arg1)
{
if (_matchers.Length >= 1) CaptureSingle(_matchers[0], arg1); // >= not ==
}Matches requires exactly N matchers, but ApplyCaptures uses >=. If a setup somehow had 2 matchers but was matched via a 1-arg typed path (which the Matches check prevents), ApplyCaptures would still run on _matchers[0]. This is probably harmless in practice since ApplyCaptures is only called after Matches returns true, but the defensive check in Matches vs. the permissive check in ApplyCaptures is an inconsistency that could confuse future maintainers.
Summary
| Concern | Severity |
|---|---|
MockEngine.Typed.cs should itself be T4/source-generated |
High — maintainability |
IBehavior.Execute(object?[]) still boxes on the common path (behavior hit) |
Medium — partially defeats the purpose |
| Stateful setup fallback boxes all typed args globally | Medium — correctness surprise |
ArgumentStore<T> struct is boxed via IArgumentStore on RecordCall |
Medium — partially defeats stack-allocation intent |
CanUseTypedDispatch/GetTypedArgs* triple-computation |
Low — compile-time only |
Copy-paste across three method generators in MockImplBuilder |
Medium — maintainability |
Matches vs ApplyCaptures length check inconsistency |
Low |
The performance goal is right and the approach is directionally sound. The main ask would be: generate MockEngine.Typed.cs from a template rather than hand-authoring it, and document (or fix) the stateful-setup boxing regression. Those two changes would make this much safer to maintain long-term.
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| UnusedCode | 1 minor |
| CodeStyle | 45 minor |
| Performance | 8 medium |
🟢 Metrics 547 complexity
Metric Results Complexity 547
TIP This summary will be updated as you push new changes. Give us feedback
- Extract StoreCallRecord to deduplicate lock/append logic in RecordCall - Remove redundant "The Nth argument" doc comments from ArgumentStore fields
There was a problem hiding this comment.
Code Review: Typed HandleCall overloads to eliminate argument boxing
This is a well-motivated performance PR with solid benchmark numbers (~40% invocation speedup). The core design is sound. A few concerns worth addressing:
Significant: Duplicated if (useTypedDispatch) ... else ... pattern in MockImplBuilder.cs
The typed/legacy dispatch branching is repeated ~16 times across GenerateWrapMethodBody, GeneratePartialMethodBody, and GenerateEngineDispatchBody. Every call site looks like:
if (useTypedDispatch)
writer.AppendLine($"if (_engine.TryHandleCall<{typeArgs}>({method.MemberId}, \"{method.Name}\", {argsList}))");
else
writer.AppendLine($"if (_engine.TryHandleCall({method.MemberId}, \"{method.Name}\", {argsArray}))");This bloats all three body-generation methods significantly and makes future changes error-prone (you'd need to update both branches in many places). A helper like:
private static string BuildHandleCall(MockMemberModel method, bool useTyped, string? typeArgs, string? argsList, string? argsArray)
=> useTyped
? $"_engine.HandleCall<{typeArgs}>({method.MemberId}, \"{method.Name}\", {argsList})"
: $"_engine.HandleCall({method.MemberId}, \"{method.Name}\", {argsArray})";(with similar helpers for HandleCallWithReturn, TryHandleCall, TryHandleCallWithReturn) would centralize the branching and make the body-generation code read almost identically to the pre-PR version.
Medium: CallRecord.Arguments lazy init is not thread-safe
public object?[] Arguments => _arguments ??= _store?.ToArray() ?? [];If Arguments is read concurrently by multiple threads (e.g., one thread verifying while another invocation is being processed), two threads could both observe _arguments == null and both call ToArray(). The result is still correct (idempotent), but the double allocation may be surprising in a framework that uses Volatile.Read/Write elsewhere in this same class. Consider either:
- Documenting the benign race explicitly, or
- Using
Interlocked.CompareExchangeto ensure a single allocation:
public object?[] Arguments
{
get
{
if (_arguments is not null) return _arguments;
var arr = _store?.ToArray() ?? [];
return Interlocked.CompareExchange(ref _arguments, arr, null) ?? arr;
}
}This is consistent with how the rest of CallRecord handles thread safety.
Medium: MockEngine.Typed.cs maintainability at 1,279 lines of boilerplate
The 32 handwritten overloads (4 dispatch patterns × 8 arities) share near-identical logic. If the hot-path dispatch ever changes (new hook, property auto-tracking, strict-mode behavior, capture application), all 32 methods need updating. The current PR already shows this pattern is hard to keep in sync — AutoTrackProperties is handled in the typed overloads but the phrasing slightly differs across arities.
A few options to consider:
- T4 / Roslyn-emitted source — Generate
MockEngine.Typed.csitself from a template. This makes "these 32 methods are all the same pattern" explicit and ensures they stay in sync. - Extract shared logic to a helper — Each typed overload could call a private
CoreHandleCall(int memberId, string memberName, IArgumentStore store, ...)that contains the lock/record/find/dispatch logic, with only theArgumentStoreconstruction and typedFindMatchingSetupcall being per-arity. This would reduce each 30-line typed overload to ~5 lines.
Option 2 is lower effort and directly addresses the maintainability concern without requiring a build-time code gen step.
Minor: BOM introduced in all snapshot files
All 18 snapshot files changed from // <auto-generated/> to // <auto-generated/> (UTF-8 BOM added). This is unrelated to the feature. If intentional (e.g., a source generator StreamWriter change), it should be called out explicitly. If unintentional, it's noise that could confuse future snapshot diffs.
What's done well
ArgumentStore<T1..T8>asreadonly struct: correct choice — avoids a heap allocation, and deferring boxing toToArray()(cold/error path) is the right design.MatchSingle<T>withIArgumentMatcher<T>fast path and[AggressiveInlining]: clean and avoids an allocation per argument check._hasStatefulSetupsfallback: falling back to the array-based path for stateful/sequential setups is pragmatic and correct. The limitation is well-documented in the PR description.Volatile.Read/WriteforIsVerified/IsUnmatched: appropriate for concurrent test verification.- Snapshot tests updated: all 18 snapshots correctly reflect the new typed dispatch signatures.
Overall: the performance goal is clearly achieved and the design is solid. The main asks before merging are to reduce the MockImplBuilder.cs duplication and consider the CallRecord.Arguments thread-safety edge case.
- Make CallRecord.Arguments thread-safe with Interlocked.CompareExchange - Extract Emit* helpers in MockImplBuilder to eliminate 25 if/else branches - Combine CanUseTypedDispatch + GetTypedArgs* into single GetTypedDispatchInfo - Strip unintentional BOM from snapshot files
Code ReviewThis is a well-targeted performance optimization with clear benchmarks backing the ~40% throughput improvement. The architecture is sound overall. Below are observations ranging from correctness/design concerns to minor notes. Issues Worth Discussing1.
|
Summary
Closes #5389
HandleCall<T1,...,TN>overloads (up to 8 type params) that pass arguments without boxing, instead of allocatingnew object?[] { ... }on every mock invocationArgumentStore<T1,...>structs that defer boxing untilArgumentsis actually accessed (verification/diagnostics — rare path)MethodSetup.Matches<T1,...>overloads that callIArgumentMatcher<T>.Matches(T)directlyobject?[]pathBenchmark Results
vs Imposter gap narrowed from ~35% to ~7%.
Note: Allocation for value-type args is slightly higher than the issue target because
IBehavior.Execute(object?[])still requires boxing when a behavior is found. The string case (no boxing needed) hits 144 B. This is inherent to theIBehaviorinterface and could be addressed separately with typed behavior overloads.Test plan