Performance tweaks across HTTP/1.1, H2, and rules engine#619
Merged
Conversation
HPACK dynamic-table mutation must be serialized per H2 connection, and the single lock that enforced this dominated blocked-wait time under H2 proxy load (99.5% of contention, ~30s of wait per 13s capture in the 0-byte body benchmark at 56 concurrent streams). Instead of locking, hand off encoding to the single-threaded WriteLoop, which already owns the wire. WriteResponseHeader becomes a fire-and- forget enqueue onto a new PendingHeaderWrite channel; the WriteLoop drains it into the ring buffer, with no synchronization around the shared HPACK state. Trailers take the same path: DataFrameEntry gains an inline trailer-job variant that the WriteLoop encodes on the fly, keeping per-stream wire order (trailers after DATA) via the existing FIFO channel. Phase 2 of the WriteLoop re-drains pending headers at the top of each data iteration to guarantee the HEADERS frame for a stream always precedes that stream's first DATA frame on the wire, even when a header is enqueued while data for another stream is being written.
…d LINQ allocations - Add ReadStringPrefix to read wire length without Huffman decoding, replace GetStringLength calls in HPackDecoder to eliminate double/triple tree walks - Remove redundant GetDecodedLength from ReadString, use upper-bound sizing - Replace Encoding.ASCII with direct byte<->char widening loops - Replace LINQ (Sum, Where, ToDictionary, Select) in Http11Parser.InternalWrite with foreach loops and inline cookie joining
…oncurrentDictionary - Hoist pool-reuse lookup before the per-authority semaphore so the common case (pool already exists) skips Synchronizer overhead entirely - Move Init() before storing pools in the dictionary, closing a race window where an uninitialised pool was briefly visible to other threads - Replace Dictionary+lock with ConcurrentDictionary for lock-free reads - Skip redundant lock(_locks) in Synchronizer when preserve=true
… async in most path
EnforceRules is called 4+ times per exchange and previously iterated every rule, performing scope-matching (including a type check for MultipleScopeAction) per rule per call. Rules are now partitioned into per-scope arrays once at Init/UpdateRules time, so EnforceRules just looks up the scope bucket and iterates only the rules that apply. PartitionedRules is immutable and swapped atomically via volatile + Interlocked.Exchange, preserving the existing hot-reload guarantees.
- Drop Encoding.ASCII dispatches for constant literals, use u8 spans - Sum byte length directly from char length in GetHttp11LengthOnly - Add int-keyed status line byte map, format status code via Utf8Formatter to remove StatusCode.ToString() allocation per response
Parallel to the existing --contention flag, add an opt-in FLUXZY_BENCH_ALLOC path that captures CLR GC/AllocationTick events with managed stacks into a .nettrace per benchmark case. Ships with TraceAllocationAnalyzer, a small TraceEvent-based CLI that aggregates the events by type, top frame, and first Fluxzy frame — the allocation view BenchmarkDotNet's speedscope export doesn't provide.
…rker Both workers allocated a fresh byte[MaxHeaderSize] per stream (16 KB default) to accumulate HEADERS/CONTINUATION fragments before HPACK decode. Allocation sampling showed this was the #1 byte[] allocator across all four throughput benchmark cases (28-48% of bytes, even on the H1 downstream path since upstream is ALPN-negotiated to H2). Route through ArrayPool<byte>.Shared: rent on first fragment (with grow for oversize responses in StreamWorker), return on Dispose. The decoded headers (H2Helper.DecodeAndAllocate, DecodeTrailerFields) produce fresh char buffers and HeaderField lists — no aliasing back to the pooled bytes, so returning at stream end is safe. Regression test H2LargeHeaderTests exercises the grow path (~30 KB response headers across 20 sequential requests) and the fast-path rent/return churn (50 sequential requests) to catch double-return or prefix-copy bugs.
…aluate Rule.Enforce creates one VariableBuildingContext per exchange per rule per scope. The constructor allocated a Dictionary<string, Func<string>>, nine Func<string> closure instances, a <>c__DisplayClass capturing state, and a resized Entry[] — roughly 1 KB/call. Allocation sampling showed this ctor as 6-12% of total bytes across all four benchmark cases, with Dictionary.Resize as a separate 3-6% line item. The only reader (VariableContext.EvaluateVariable) did dict.TryGetValue(name, out var func); return func(); so the delegate indirection was pure waste. Replace the dictionary with a switch-based TryEvaluate(string, out string) method. Each of the nine built-in variable names returns its value directly from the captured fields. Semantics are a byte-for-byte port of the prior lambda bodies, including the null-exchange fallback to string.Empty and the StatusCode > 0 guard for exchange.status. Public API break: LazyVariableEvaluations property removed. Custom variables should go through VariableContext.Set, which is the existing extensibility surface. Tests: - VariableBuildingContextTests: 17 unit tests covering each built-in, Boolean.ToString() capitalisation for authority.secure, null-exchange fallback for all exchange-scoped names, unknown-name returning false, and end-to-end interpolation via VariableContext.EvaluateVariable. - Self_Generated_Context_Variables Theory extended with exchange.path (previously uncovered).
This was referenced Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary