prototype(protobuf): template-based per-schema encoder (H3A) by intech · Pull Request #5 · Connectum-Framework/protobuf-es

intech · 2026-04-19T14:14:53Z

Summary

Replaces the switch(fieldKind) / switch(scalar) dispatch inside toBinaryFast with a pre-built array of closures per DescMessage. Each closure pre-captures tagBytes: Uint8Array, localName, and the scalar-specific writer, so the inner encode loop becomes for (const step of steps) step(c, msg, sizes) with no branch tables and no tag re-encoding on the hot path. CSP-safe — no eval, no new Function(), no dynamic source generation.

Step arrays are built on first touch of a schema and cached in WeakMap<DescMessage, Step[]>, amortized for the lifetime of the process. Stacked on #4.

Approach

buildSizeSteps(desc) walks desc.fields and desc.oneofs once, produces SizeStep[] — each element a closure specialized to one field (scalar variant, enum, message, list-of-T, map<K,V>, oneof dispatch table)
buildEncodeSteps(desc) does the same for writes, with pre-encoded tagBytes: Uint8Array
computeMessageSize(desc, msg, sizes) and writeMessageInto(c, desc, msg, sizes) become 3-line iterators over the cached step array
Oneof dispatch is table-driven via a per-case Map<string, Step> built at compile time; no linear scan at runtime
Map field steps pre-compute outer tag bytes, key tag bytes, and value tag bytes; per-entry body size is still recomputed inline (cached would require a second identity-keyed cache)

Results

Node 25.8, benchmarks/, run locally:

Workload	ops/s	Δ vs H2
SimpleMessage `toBinary`	1.16M	—
SimpleMessage `toBinaryFast` (H3)	1.93M	+66%
OTel 100-span `toBinary`	463	—
OTel 100-span `toBinaryFast` (H3)	472	wash
OTel 100-span `protobufjs` (reference)	2,413	5.1x

The big win lands on flat-scalar schemas (SimpleMessage), where eliminating the dispatch hops dominates. On the OTel shape the bottleneck has moved off dispatch entirely — the remaining ~5x gap vs the pbjs-generated encoder is now dominated by:

protoInt64.enc(...) for every startTimeUnixNano / endTimeUnixNano and every int64 attribute (100 spans × 2 timestamps = 200 bigint→(lo,hi) conversions per encode, plus 100 × ~1 int attribute)
UTF-8 encoding for attribute strings (~1000 strings per 100-span payload)
Uint8Array.set() for 16-byte trace IDs and 8-byte span IDs
SizeMap bookkeeping (new Map() + many sizes.set() calls) during the deep nesting

Closing that gap would need specialization of those paths, not more dispatch removal.

Memory

Per 1000 iterations on the 100-span OTel fixture:

Variant	Heap delta (MB)
`toBinary` (baseline)	14.2
`toBinaryFast` (H3)	54.2
`protobufjs` (reference)	47.5

The rise in toBinaryFast is transient object retention by SizeMap and ADT wrappers during one encode; per-schema step arrays themselves are stable.

Follow-ups (not in this PR)

Specialize protoInt64.enc callsite for the common bigint input (avoid the type-detect branch)
Replace SizeMap object-identity keying with an integer handle pre-assigned during size pass
Inline an ASCII-known branch for string fields whose values the schema marks as canonically ASCII
Package-level: emit a codegen variant that bakes the step array at build time (pbjs-style), skipping even the first-touch walk

Test plan

2,839 existing tests pass (packages/protobuf-test suite)
Byte-identical output on both fixtures:
- ExportTraceRequest 100 spans: 32,926 B identical
- SimpleMessage: 19 B identical
benchmarks/src/verify-correctness.ts green
biome lint clean
tsc --noEmit clean

Scope

Internal PR within Connectum-Framework fork. Not proposed upstream.

🤖 Generated with Claude Code

Replaces switch-by-fieldKind dispatch in toBinaryFast with a pre-built array of closures per DescMessage. Each closure pre-captures (tagBytes as Uint8Array, localName, scalar-specific writer), eliminating dispatch overhead and tag re-encoding on the hot path. CSP-safe — no eval, no new Function(), no dynamic source generation. Step arrays are built on first touch of a schema and cached in WeakMap<DescMessage, Step[]>, so the walk of descriptor.fields / descriptor.oneofs runs exactly once per schema for the lifetime of the process. Measurements (Node 25.8, benchmarks/, 1 iteration average): | Workload / variant | ops/s | Δ vs H2 | |-------------------------------------|---------:|--------:| | SimpleMessage toBinary | 1.16M | — | | SimpleMessage toBinaryFast (H3) | 1.93M | +66% | | OTel 100-span toBinary | 463 | — | | OTel 100-span toBinaryFast (H3) | 472 | wash | | OTel 100-span protobufjs (ref) | 2,413 | 5.1x | The big win lands on flat-scalar schemas (SimpleMessage), where eliminating the switch(fieldKind)/switch(scalar) hops dominates. On the OTel shape the bottleneck has moved off dispatch entirely — profile now points at bigint/varint64 work for ns timestamps, UTF-8 encoding for attribute strings, Uint8Array.set() for trace/span IDs, and SizeMap bookkeeping for the deep nesting. Closing the remaining gap vs the pbjs-generated encoder would need specialization of *those* paths (ASCII-known branches inlined per field, per-entry size cache avoiding Map allocation, or a switch to a pre-scanned descriptor plan), not further dispatch removal. The H3 cache allocation itself shows up in the memory benchmark — per 1000 iterations the heap delta for toBinaryFast rose from ~19.5 MB (H2) to ~54 MB. Most of that is transient ADT/value objects retained by SizeMap during an encode; the per-schema step arrays are amortized and stable. Follow-ups: (1) replace Object.keys() in map steps with an iterator-free loop, (2) shrink SizeMap footprint by keying body sizes on a per-call integer handle instead of the object identity. Existing 2,839 tests pass. Byte-identical output maintained on both the OTel 100-span fixture (32,926 bytes) and SimpleMessage (19 bytes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

intech · 2026-04-19T22:17:34Z

Superseded by L0 (#8) + L1+L2 (#10). H3 approach hit V8 megamorphic cliff.

intech self-assigned this Apr 19, 2026

intech mentioned this pull request Apr 19, 2026

Add L0 contiguous-buffer BinaryWriter #8

Merged

9 tasks

intech closed this Apr 19, 2026

intech deleted the feat/prototype-per-schema-codegen branch April 21, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype(protobuf): template-based per-schema encoder (H3A)#5

prototype(protobuf): template-based per-schema encoder (H3A)#5
intech wants to merge 1 commit intofeat/prototype-estimator-map-oneoffrom
feat/prototype-per-schema-codegen

intech commented Apr 19, 2026

Uh oh!

intech commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intech commented Apr 19, 2026

Summary

Approach

Results

Memory

Follow-ups (not in this PR)

Test plan

Scope

Uh oh!

intech commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant