Skip to content

prototype(protobuf): two-pass size estimator (+6.28x encode, -54% memory)#3

Closed
intech wants to merge 1 commit intofeat/add-benchmark-suitefrom
feat/prototype-size-estimator
Closed

prototype(protobuf): two-pass size estimator (+6.28x encode, -54% memory)#3
intech wants to merge 1 commit intofeat/add-benchmark-suitefrom
feat/prototype-size-estimator

Conversation

@intech
Copy link
Copy Markdown

@intech intech commented Apr 19, 2026

Summary

Opt-in fast-path toBinaryFast using two-pass size estimation and a
pre-allocated buffer. Ports opentelemetry-js#6390
(ProtobufLogsSerializer) to the protobuf-es reflective encode.

Hypothesis

Existing reflective toBinary uses BinaryWriter with fork/join
for every length-delimited field — each nested message and every packed
repeated scalar stacks its own chunks: Uint8Array[] and buf: number[],
then re-emits a varint length prefix and concatenates. On OTel-shaped
workloads (ResourceSpans -> ScopeSpans -> Span -> KeyValue) that produces
many small allocations and a final double-copy in finish(). Two-pass
variant (size estimate -> pre-alloc -> single straight-line write) should
eliminate this.

Results

Node 25.8.1, x86_64, tinybench, 100-span OTel-like payload.

Throughput (create + encode combined):

Variant ops/s vs toBinary
toBinary 353 1.00x
toBinaryFast 1,758 +397% (4.98x)

Throughput (encode-only, pre-built message):

Variant ops/s vs toBinary
toBinary 385 1.00x
toBinaryFast 2,417 +528% (6.28x)

Cross-library (encode-only, pre-built):

Library ops/s
protobuf-es toBinary 428
protobufjs encode 3,259
protobuf-es toBinaryFast 3,868

Fast path now beats protobufjs by +19% on the encode-only path for
this workload.

Memory (1000 iters, forced GC, heapUsed delta):

Case B/op vs toBinary
create + toBinary 10,211 1.00x
create + toBinaryFast 4,670 -54%
protobufjs create + encode 7,450

Fast path uses less heap than protobufjs on the same workload.

Scope (MVP)

Supported:

  • All 15 scalar types (including zigzag and 64-bit integers)
  • Enum (singular and repeated)
  • Repeated scalar (packed and unpacked)
  • Nested messages (arbitrary depth)
  • Repeated messages

Fallback to toBinary when schema uses:

  • Map fields
  • Oneof groups
  • Extensions
  • Delimited (group) encoding

The support decision is cached per DescMessage in a WeakMap, so the
fallback check does not dominate the hot path after the first call per
schema.

Unknown fields are dropped by the fast path. Callers that must round-trip
unknowns should continue to use toBinary.

Correctness

  • benchmarks/src/verify-correctness.ts exercises the OTel
    ExportTraceRequest fixture and SimpleMessage fixture. Both produce
    byte-identical output compared to toBinary (stricter than the claimed
    semantic-identity guarantee).
  • Existing @bufbuild/protobuf-test suite: 2,823 / 2,823 passing.
  • decode(toBinaryFast(msg)) structurally equal to decode(toBinary(msg))
    on OTel fixtures.

Trade-offs

  • Opt-in API. toBinary is untouched — no behaviour change for
    existing callers.
  • Exact size estimate costs one full descriptor walk. For scalar-only
    flat messages the combined create + toBinaryFast path is only ~1.4x
    faster than baseline (vs ~5x on the nested OTel payload) because the
    estimator walk is a larger share of the budget when there is little to
    serialize.
  • MVP surface. Maps/oneofs fall back transparently, but complex
    schemas with many unsupported fields will not benefit.

Scope of this PR

Internal PR within the Connectum-Framework/protobuf-es fork. Upstream
submission to bufbuild/protobuf-es is gated on further review; the
point of this stacked PR is to measure the pattern and decide whether to
fold it into the default toBinary path or keep it as a separate
export.

Test plan

  • Existing 2,823 tests pass (@bufbuild/protobuf-test)
  • Round-trip decode(toBinaryFast(x)) == decode(toBinary(x)) on OTel
    and SimpleMessage fixtures
  • Benchmarks show the measured improvement
  • Memory benchmark shows the measured memory reduction

Base: feat/add-benchmark-suite (stacked on the benchmark infrastructure PR).

Generated with Claude Code

Adds toBinaryFast() — opt-in fast path using two-pass size estimation
and a pre-allocated buffer for write. Ports the pattern from
open-telemetry/opentelemetry-js#6390 (ProtobufLogsSerializer) to the
protobuf-es reflective encode.

Motivation
----------
The existing toBinary uses BinaryWriter with fork/join per length-
delimited field — every nested message and every packed repeated
scalar pushes chunk/buf state onto a stack, serializes into its own
chunk list, then re-emits a varint length prefix and concatenates.
On OTel-shaped workloads (ResourceSpans -> ScopeSpans -> Span ->
KeyValue) that produces many small Uint8Array/number[] allocations
and a final double-copy in finish(). The two-pass variant walks the
descriptor once to compute the exact encoded size, allocates a
single Uint8Array of that size, then writes bytes into it at fixed
offsets. Length prefixes computed in pass 1 are cached per submessage
object and reused in pass 2, so pass 2 is a straight-line write loop.

Results (Node 25.8.1, x86_64, tinybench, OTel-like 100-span payload)
--------------------------------------------------------------------

create() + toBinary() combined workload:
  create + toBinary       353 ops/s  baseline
  create + toBinaryFast  1758 ops/s  +397% (4.98x)

toBinary() on pre-built message:
  toBinary                385 ops/s  baseline
  toBinaryFast           2417 ops/s  +528% (6.28x)

Cross-library (vs protobufjs pbjs static-module):
  protobuf-es toBinary pre-built      428 ops/s
  protobuf-es toBinaryFast pre-built 3868 ops/s
  protobufjs encode pre-built        3259 ops/s
  -> toBinaryFast beats protobufjs by +19% on encode path.

Memory (1000 iters, forced GC, heapUsed delta):
  protobuf-es create+toBinary      10,211 B/op
  protobuf-es create+toBinaryFast   4,670 B/op   -54%
  protobufjs  create+encode         7,450 B/op
  -> toBinaryFast now uses less heap than protobufjs.

Scope (MVP)
-----------
Supported: all 15 scalar types, enum, repeated scalar (packed and
unpacked), nested messages, repeated messages. Correctness verified
with semantic round-trip (decode(toBinaryFast) structurally-equal to
decode(toBinary)) on the OTel ExportTraceRequest fixture and on
SimpleMessage; both fixtures in fact produce byte-identical output
in the current code path.

Fallback: schemas using maps, oneofs, extensions, or delimited/group
encoding fall back to toBinary. The decision is cached per
DescMessage in a WeakMap, so the support check does not dominate the
hot path after the first call.

Unknown fields are dropped by the fast path. Callers that must
round-trip unknown fields should continue to use toBinary.

Testing
-------
- Existing protobuf-test suite: 2823/2823 passing.
- Correctness verification: benchmarks/src/verify-correctness.ts
  exercises ExportTraceRequest and SimpleMessage fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech
Copy link
Copy Markdown
Author

intech commented Apr 19, 2026

Superseded by L0 (#8) + L1+L2 (#10). Kept branch as experimental reference.

@intech intech closed this Apr 19, 2026
@intech intech deleted the feat/prototype-size-estimator branch April 21, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant