prototype(protobuf): two-pass size estimator (+6.28x encode, -54% memory) by intech · Pull Request #3 · Connectum-Framework/protobuf-es

intech · 2026-04-19T11:11:51Z

Summary

Opt-in fast-path toBinaryFast using two-pass size estimation and a
pre-allocated buffer. Ports opentelemetry-js#6390
(ProtobufLogsSerializer) to the protobuf-es reflective encode.

Hypothesis

Existing reflective toBinary uses BinaryWriter with fork/join
for every length-delimited field — each nested message and every packed
repeated scalar stacks its own chunks: Uint8Array[] and buf: number[],
then re-emits a varint length prefix and concatenates. On OTel-shaped
workloads (ResourceSpans -> ScopeSpans -> Span -> KeyValue) that produces
many small allocations and a final double-copy in finish(). Two-pass
variant (size estimate -> pre-alloc -> single straight-line write) should
eliminate this.

Results

Node 25.8.1, x86_64, tinybench, 100-span OTel-like payload.

Throughput (create + encode combined):

Variant	ops/s	vs toBinary
`toBinary`	353	1.00x
`toBinaryFast`	1,758	+397% (4.98x)

Throughput (encode-only, pre-built message):

Variant	ops/s	vs toBinary
`toBinary`	385	1.00x
`toBinaryFast`	2,417	+528% (6.28x)

Cross-library (encode-only, pre-built):

Library	ops/s
protobuf-es `toBinary`	428
protobufjs `encode`	3,259
protobuf-es `toBinaryFast`	3,868

Fast path now beats protobufjs by +19% on the encode-only path for
this workload.

Memory (1000 iters, forced GC, heapUsed delta):

Case	B/op	vs toBinary
`create + toBinary`	10,211	1.00x
`create + toBinaryFast`	4,670	-54%
protobufjs `create + encode`	7,450	—

Fast path uses less heap than protobufjs on the same workload.

Scope (MVP)

Supported:

All 15 scalar types (including zigzag and 64-bit integers)
Enum (singular and repeated)
Repeated scalar (packed and unpacked)
Nested messages (arbitrary depth)
Repeated messages

Fallback to toBinary when schema uses:

Map fields
Oneof groups
Extensions
Delimited (group) encoding

The support decision is cached per DescMessage in a WeakMap, so the
fallback check does not dominate the hot path after the first call per
schema.

Unknown fields are dropped by the fast path. Callers that must round-trip
unknowns should continue to use toBinary.

Correctness

benchmarks/src/verify-correctness.ts exercises the OTel
ExportTraceRequest fixture and SimpleMessage fixture. Both produce
byte-identical output compared to toBinary (stricter than the claimed
semantic-identity guarantee).
Existing @bufbuild/protobuf-test suite: 2,823 / 2,823 passing.
decode(toBinaryFast(msg)) structurally equal to decode(toBinary(msg))
on OTel fixtures.

Trade-offs

Opt-in API. toBinary is untouched — no behaviour change for
existing callers.
Exact size estimate costs one full descriptor walk. For scalar-only
flat messages the combined create + toBinaryFast path is only ~1.4x
faster than baseline (vs ~5x on the nested OTel payload) because the
estimator walk is a larger share of the budget when there is little to
serialize.
MVP surface. Maps/oneofs fall back transparently, but complex
schemas with many unsupported fields will not benefit.

Scope of this PR

Internal PR within the Connectum-Framework/protobuf-es fork. Upstream
submission to bufbuild/protobuf-es is gated on further review; the
point of this stacked PR is to measure the pattern and decide whether to
fold it into the default toBinary path or keep it as a separate
export.

Test plan

Existing 2,823 tests pass (@bufbuild/protobuf-test)
Round-trip decode(toBinaryFast(x)) == decode(toBinary(x)) on OTel
and SimpleMessage fixtures
Benchmarks show the measured improvement
Memory benchmark shows the measured memory reduction

Base: feat/add-benchmark-suite (stacked on the benchmark infrastructure PR).

Generated with Claude Code

Adds toBinaryFast() — opt-in fast path using two-pass size estimation and a pre-allocated buffer for write. Ports the pattern from open-telemetry/opentelemetry-js#6390 (ProtobufLogsSerializer) to the protobuf-es reflective encode. Motivation ---------- The existing toBinary uses BinaryWriter with fork/join per length- delimited field — every nested message and every packed repeated scalar pushes chunk/buf state onto a stack, serializes into its own chunk list, then re-emits a varint length prefix and concatenates. On OTel-shaped workloads (ResourceSpans -> ScopeSpans -> Span -> KeyValue) that produces many small Uint8Array/number[] allocations and a final double-copy in finish(). The two-pass variant walks the descriptor once to compute the exact encoded size, allocates a single Uint8Array of that size, then writes bytes into it at fixed offsets. Length prefixes computed in pass 1 are cached per submessage object and reused in pass 2, so pass 2 is a straight-line write loop. Results (Node 25.8.1, x86_64, tinybench, OTel-like 100-span payload) -------------------------------------------------------------------- create() + toBinary() combined workload: create + toBinary 353 ops/s baseline create + toBinaryFast 1758 ops/s +397% (4.98x) toBinary() on pre-built message: toBinary 385 ops/s baseline toBinaryFast 2417 ops/s +528% (6.28x) Cross-library (vs protobufjs pbjs static-module): protobuf-es toBinary pre-built 428 ops/s protobuf-es toBinaryFast pre-built 3868 ops/s protobufjs encode pre-built 3259 ops/s -> toBinaryFast beats protobufjs by +19% on encode path. Memory (1000 iters, forced GC, heapUsed delta): protobuf-es create+toBinary 10,211 B/op protobuf-es create+toBinaryFast 4,670 B/op -54% protobufjs create+encode 7,450 B/op -> toBinaryFast now uses less heap than protobufjs. Scope (MVP) ----------- Supported: all 15 scalar types, enum, repeated scalar (packed and unpacked), nested messages, repeated messages. Correctness verified with semantic round-trip (decode(toBinaryFast) structurally-equal to decode(toBinary)) on the OTel ExportTraceRequest fixture and on SimpleMessage; both fixtures in fact produce byte-identical output in the current code path. Fallback: schemas using maps, oneofs, extensions, or delimited/group encoding fall back to toBinary. The decision is cached per DescMessage in a WeakMap, so the support check does not dominate the hot path after the first call. Unknown fields are dropped by the fast path. Callers that must round-trip unknown fields should continue to use toBinary. Testing ------- - Existing protobuf-test suite: 2823/2823 passing. - Correctness verification: benchmarks/src/verify-correctness.ts exercises ExportTraceRequest and SimpleMessage fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

intech · 2026-04-19T22:17:29Z

Superseded by L0 (#8) + L1+L2 (#10). Kept branch as experimental reference.

intech self-assigned this Apr 19, 2026

This was referenced Apr 19, 2026

prototype(protobuf): extend toBinaryFast with map + oneof (+ real OTel fixture) #4

Closed

Add L0 contiguous-buffer BinaryWriter #8

Merged

intech closed this Apr 19, 2026

intech deleted the feat/prototype-size-estimator branch April 21, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prototype(protobuf): two-pass size estimator (+6.28x encode, -54% memory)#3

prototype(protobuf): two-pass size estimator (+6.28x encode, -54% memory)#3
intech wants to merge 1 commit intofeat/add-benchmark-suitefrom
feat/prototype-size-estimator

intech commented Apr 19, 2026

Uh oh!

intech commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intech commented Apr 19, 2026

Summary

Hypothesis

Results

Scope (MVP)

Correctness

Trade-offs

Scope of this PR

Test plan

Uh oh!

intech commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant