prototype(protobuf): template-based per-schema encoder (H3A)#5
Closed
intech wants to merge 1 commit intofeat/prototype-estimator-map-oneoffrom
Closed
prototype(protobuf): template-based per-schema encoder (H3A)#5intech wants to merge 1 commit intofeat/prototype-estimator-map-oneoffrom
intech wants to merge 1 commit intofeat/prototype-estimator-map-oneoffrom
Conversation
Replaces switch-by-fieldKind dispatch in toBinaryFast with a pre-built array of closures per DescMessage. Each closure pre-captures (tagBytes as Uint8Array, localName, scalar-specific writer), eliminating dispatch overhead and tag re-encoding on the hot path. CSP-safe — no eval, no new Function(), no dynamic source generation. Step arrays are built on first touch of a schema and cached in WeakMap<DescMessage, Step[]>, so the walk of descriptor.fields / descriptor.oneofs runs exactly once per schema for the lifetime of the process. Measurements (Node 25.8, benchmarks/, 1 iteration average): | Workload / variant | ops/s | Δ vs H2 | |-------------------------------------|---------:|--------:| | SimpleMessage toBinary | 1.16M | — | | SimpleMessage toBinaryFast (H3) | 1.93M | +66% | | OTel 100-span toBinary | 463 | — | | OTel 100-span toBinaryFast (H3) | 472 | wash | | OTel 100-span protobufjs (ref) | 2,413 | 5.1x | The big win lands on flat-scalar schemas (SimpleMessage), where eliminating the switch(fieldKind)/switch(scalar) hops dominates. On the OTel shape the bottleneck has moved off dispatch entirely — profile now points at bigint/varint64 work for ns timestamps, UTF-8 encoding for attribute strings, Uint8Array.set() for trace/span IDs, and SizeMap bookkeeping for the deep nesting. Closing the remaining gap vs the pbjs-generated encoder would need specialization of *those* paths (ASCII-known branches inlined per field, per-entry size cache avoiding Map allocation, or a switch to a pre-scanned descriptor plan), not further dispatch removal. The H3 cache allocation itself shows up in the memory benchmark — per 1000 iterations the heap delta for toBinaryFast rose from ~19.5 MB (H2) to ~54 MB. Most of that is transient ADT/value objects retained by SizeMap during an encode; the per-schema step arrays are amortized and stable. Follow-ups: (1) replace Object.keys() in map steps with an iterator-free loop, (2) shrink SizeMap footprint by keying body sizes on a per-call integer handle instead of the object identity. Existing 2,839 tests pass. Byte-identical output maintained on both the OTel 100-span fixture (32,926 bytes) and SimpleMessage (19 bytes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9 tasks
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
switch(fieldKind) / switch(scalar)dispatch insidetoBinaryFastwith a pre-built array of closures perDescMessage. Each closure pre-capturestagBytes: Uint8Array,localName, and the scalar-specific writer, so the inner encode loop becomesfor (const step of steps) step(c, msg, sizes)with no branch tables and no tag re-encoding on the hot path. CSP-safe — noeval, nonew Function(), no dynamic source generation.Step arrays are built on first touch of a schema and cached in
WeakMap<DescMessage, Step[]>, amortized for the lifetime of the process. Stacked on #4.Approach
buildSizeSteps(desc)walksdesc.fieldsanddesc.oneofsonce, producesSizeStep[]— each element a closure specialized to one field (scalar variant, enum, message, list-of-T, map<K,V>, oneof dispatch table)buildEncodeSteps(desc)does the same for writes, with pre-encodedtagBytes: Uint8ArraycomputeMessageSize(desc, msg, sizes)andwriteMessageInto(c, desc, msg, sizes)become 3-line iterators over the cached step arrayMap<string, Step>built at compile time; no linear scan at runtimeResults
Node 25.8,
benchmarks/, run locally:toBinarytoBinaryFast(H3)toBinarytoBinaryFast(H3)protobufjs(reference)The big win lands on flat-scalar schemas (SimpleMessage), where eliminating the dispatch hops dominates. On the OTel shape the bottleneck has moved off dispatch entirely — the remaining ~5x gap vs the pbjs-generated encoder is now dominated by:
protoInt64.enc(...)for everystartTimeUnixNano/endTimeUnixNanoand every int64 attribute (100 spans × 2 timestamps = 200 bigint→(lo,hi) conversions per encode, plus 100 × ~1 int attribute)Uint8Array.set()for 16-byte trace IDs and 8-byte span IDsSizeMapbookkeeping (new Map()+ manysizes.set()calls) during the deep nestingClosing that gap would need specialization of those paths, not more dispatch removal.
Memory
Per 1000 iterations on the 100-span OTel fixture:
toBinary(baseline)toBinaryFast(H3)protobufjs(reference)The rise in
toBinaryFastis transient object retention bySizeMapand ADT wrappers during one encode; per-schema step arrays themselves are stable.Follow-ups (not in this PR)
protoInt64.enccallsite for the commonbigintinput (avoid the type-detect branch)SizeMapobject-identity keying with an integer handle pre-assigned during size passTest plan
packages/protobuf-testsuite)ExportTraceRequest100 spans: 32,926 B identicalSimpleMessage: 19 B identicalbenchmarks/src/verify-correctness.tsgreenbiome lintcleantsc --noEmitcleanScope
Internal PR within Connectum-Framework fork. Not proposed upstream.
🤖 Generated with Claude Code