Skip to content

Performance tweaks across HTTP/1.1, H2, and rules engine#619

Merged
haga-rak merged 27 commits into
mainfrom
dev/tweak-perf
Apr 13, 2026
Merged

Performance tweaks across HTTP/1.1, H2, and rules engine#619
haga-rak merged 27 commits into
mainfrom
dev/tweak-perf

Conversation

@haga-rak
Copy link
Copy Markdown
Owner

Summary

  • Optimize HTTP/1.1 header parsing and writing hot paths
  • Reduce allocations and contention in H2 (HPACK decode, header pooling, lock-free pool reuse, deferred flush)
  • Pre-partition rules by FilterScope and replace VariableBuildingContext dictionary with allocation-free TryEvaluate
  • Add benchmarking, allocation, and contention analyzer tooling

haga-rak added 27 commits April 9, 2026 21:40
HPACK dynamic-table mutation must be serialized per H2 connection, and
the single lock that enforced this dominated blocked-wait time under H2
proxy load (99.5% of contention, ~30s of wait per 13s capture in the
0-byte body benchmark at 56 concurrent streams).

Instead of locking, hand off encoding to the single-threaded WriteLoop,
which already owns the wire. WriteResponseHeader becomes a fire-and-
forget enqueue onto a new PendingHeaderWrite channel; the WriteLoop
drains it into the ring buffer, with no synchronization around the
shared HPACK state.

Trailers take the same path: DataFrameEntry gains an inline trailer-job
variant that the WriteLoop encodes on the fly, keeping per-stream wire
order (trailers after DATA) via the existing FIFO channel.

Phase 2 of the WriteLoop re-drains pending headers at the top of each
data iteration to guarantee the HEADERS frame for a stream always
precedes that stream's first DATA frame on the wire, even when a header
is enqueued while data for another stream is being written.
…d LINQ allocations

- Add ReadStringPrefix to read wire length without Huffman decoding, replace
  GetStringLength calls in HPackDecoder to eliminate double/triple tree walks
- Remove redundant GetDecodedLength from ReadString, use upper-bound sizing
- Replace Encoding.ASCII with direct byte<->char widening loops
- Replace LINQ (Sum, Where, ToDictionary, Select) in Http11Parser.InternalWrite
  with foreach loops and inline cookie joining
…oncurrentDictionary

- Hoist pool-reuse lookup before the per-authority semaphore so the common
  case (pool already exists) skips Synchronizer overhead entirely
- Move Init() before storing pools in the dictionary, closing a race window
  where an uninitialised pool was briefly visible to other threads
- Replace Dictionary+lock with ConcurrentDictionary for lock-free reads
- Skip redundant lock(_locks) in Synchronizer when preserve=true
EnforceRules is called 4+ times per exchange and previously iterated
every rule, performing scope-matching (including a type check for
MultipleScopeAction) per rule per call. Rules are now partitioned into
per-scope arrays once at Init/UpdateRules time, so EnforceRules just
looks up the scope bucket and iterates only the rules that apply.

PartitionedRules is immutable and swapped atomically via volatile +
Interlocked.Exchange, preserving the existing hot-reload guarantees.
- Drop Encoding.ASCII dispatches for constant literals, use u8 spans
- Sum byte length directly from char length in GetHttp11LengthOnly
- Add int-keyed status line byte map, format status code via
  Utf8Formatter to remove StatusCode.ToString() allocation per response
Parallel to the existing --contention flag, add an opt-in FLUXZY_BENCH_ALLOC
path that captures CLR GC/AllocationTick events with managed stacks into a
.nettrace per benchmark case. Ships with TraceAllocationAnalyzer, a small
TraceEvent-based CLI that aggregates the events by type, top frame, and
first Fluxzy frame — the allocation view BenchmarkDotNet's speedscope
export doesn't provide.
…rker

Both workers allocated a fresh byte[MaxHeaderSize] per stream (16 KB
default) to accumulate HEADERS/CONTINUATION fragments before HPACK decode.
Allocation sampling showed this was the #1 byte[] allocator across all
four throughput benchmark cases (28-48% of bytes, even on the H1
downstream path since upstream is ALPN-negotiated to H2).

Route through ArrayPool<byte>.Shared: rent on first fragment (with grow
for oversize responses in StreamWorker), return on Dispose. The decoded
headers (H2Helper.DecodeAndAllocate, DecodeTrailerFields) produce fresh
char buffers and HeaderField lists — no aliasing back to the pooled
bytes, so returning at stream end is safe.

Regression test H2LargeHeaderTests exercises the grow path (~30 KB
response headers across 20 sequential requests) and the fast-path
rent/return churn (50 sequential requests) to catch double-return or
prefix-copy bugs.
…aluate

Rule.Enforce creates one VariableBuildingContext per exchange per rule per
scope. The constructor allocated a Dictionary<string, Func<string>>, nine
Func<string> closure instances, a <>c__DisplayClass capturing state, and
a resized Entry[] — roughly 1 KB/call. Allocation sampling showed this
ctor as 6-12% of total bytes across all four benchmark cases, with
Dictionary.Resize as a separate 3-6% line item.

The only reader (VariableContext.EvaluateVariable) did
  dict.TryGetValue(name, out var func); return func();
so the delegate indirection was pure waste.

Replace the dictionary with a switch-based TryEvaluate(string, out string)
method. Each of the nine built-in variable names returns its value
directly from the captured fields. Semantics are a byte-for-byte port of
the prior lambda bodies, including the null-exchange fallback to
string.Empty and the StatusCode > 0 guard for exchange.status.

Public API break: LazyVariableEvaluations property removed. Custom
variables should go through VariableContext.Set, which is the existing
extensibility surface.

Tests:
- VariableBuildingContextTests: 17 unit tests covering each built-in,
  Boolean.ToString() capitalisation for authority.secure, null-exchange
  fallback for all exchange-scoped names, unknown-name returning false,
  and end-to-end interpolation via VariableContext.EvaluateVariable.
- Self_Generated_Context_Variables Theory extended with exchange.path
  (previously uncovered).
@haga-rak haga-rak merged commit 8941c68 into main Apr 13, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant