Skip to content

Add correctness test harness for encode-path optimizations#6

Merged
intech merged 8 commits intomainfrom
feat/benchmark-matrix
Apr 19, 2026
Merged

Add correctness test harness for encode-path optimizations#6
intech merged 8 commits intomainfrom
feat/benchmark-matrix

Conversation

@intech
Copy link
Copy Markdown

@intech intech commented Apr 19, 2026

Summary

Adds comprehensive correctness test harness to guarantee wire-format compatibility across any future encode-path optimization (L0 contiguous writer, schema plans, specialized field writers).

What's included

  • correctness-matrix.test.ts — matrix test across encoders × fixtures (10 fixtures: SimpleMessage, Packed/Unpacked repeated, Nested chains, Map, Oneof variants, AnyValue, Struct)
  • round-trip-property.test.ts — property-based round-trip (100 random scalars, 50 packed, 50 nested, oneof variants, maps, 8 edge cases; seeded mulberry32 for determinism)
  • byte-identity.test.ts — strict byte-identical assertions (proto3 defaults, packed/unpacked, oneof, UTF-8 multi-byte, varint boundaries, determinism)

Encoder registry

ENCODERS array currently contains only toBinary (main-branch baseline). Documented as the extension point — new encoders (toBinaryFast from prototype stack, future L0/L1 variants) add in one line and matrix auto-expands to N×N pairs.

Test results

  • 54 new tests across 26 suites
  • Full protobuf-test suite: 2877 pass / 0 fail (was 2823; additive only)
  • Runtime: 551ms isolated, 21.1s full suite
  • Typecheck + biome lint clean

Scope

Internal PR within the Connectum-Framework fork. Upstream submission to bufbuild/protobuf-es is gated on user approval.

🤖 Generated with Claude Code

intech and others added 4 commits April 19, 2026 04:50
Adds a new `benchmarks` workspace with tinybench-based microbenchmarks
for create(), toBinary(), and combined/JSON-intermediate paths on small
and OTLP-like nested message schemas.

Motivation: addresses the reproducibility gap discussed in bufbuild#333 and
bufbuild#1035, where performance arguments have relied on ad-hoc user-provided
numbers without a suite living alongside the library. The nested
fixture mirrors the shape of opentelemetry.proto ExportTraceServiceRequest
so the workload that produced open-telemetry/opentelemetry-js#6221
can be exercised in a controlled environment.

Suites:
- bench-create.ts         isolates create(Schema, init) cost
- bench-toBinary.ts       isolates toBinary() on pre-built messages
- bench-create-toBinary.ts combined workload (OTel-like, 100 spans)
- bench-fromJson-path.ts  fromJsonString/fromJson + toBinary paths

Additive only: new `benchmarks/` workspace, new entry in root
`workspaces`, new entries in package-lock.json. No existing files
modified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Adds protobufjs comparison on identical nested.proto fixture using
  pbjs static-module codegen, covering create+encode, encode-only
  (pre-built), and decode paths.
- Adds memory benchmark measuring bytes/op via heapUsed delta after
  forced GC (1000 iterations per case, requires --expose-gc).
- Adds fromBinary() parsing benchmarks symmetric to bench-toBinary.ts.
- Extends README with new result tables, methodology notes for the
  heapUsed approach, and run-to-run variance caveats.

Observations on the OTLP-like 100-span workload:
- protobuf-es is ~5-7x slower than protobufjs on encode (create+encode
  or pre-built), ~4-6x slower on decode.
- protobuf-es allocates ~3x more heap per op on encode; decode-side
  allocations are within jitter between the two libraries.
- Run-to-run variance on an unpinned host moves ratios by roughly +/-20%.

The observed gap is consistent with the OTel #6221 regression report's
direction but smaller than the externally-cited 13-30x because this suite
deliberately isolates only the encoder/decoder walks (pbjs static mode
vs protobuf-es reflective path), without app-level traversal, JSON
conversion, or BigInt handling overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds toBinaryFast() — opt-in fast path using two-pass size estimation
and a pre-allocated buffer for write. Ports the pattern from
open-telemetry/opentelemetry-js#6390 (ProtobufLogsSerializer) to the
protobuf-es reflective encode.

Motivation
----------
The existing toBinary uses BinaryWriter with fork/join per length-
delimited field — every nested message and every packed repeated
scalar pushes chunk/buf state onto a stack, serializes into its own
chunk list, then re-emits a varint length prefix and concatenates.
On OTel-shaped workloads (ResourceSpans -> ScopeSpans -> Span ->
KeyValue) that produces many small Uint8Array/number[] allocations
and a final double-copy in finish(). The two-pass variant walks the
descriptor once to compute the exact encoded size, allocates a
single Uint8Array of that size, then writes bytes into it at fixed
offsets. Length prefixes computed in pass 1 are cached per submessage
object and reused in pass 2, so pass 2 is a straight-line write loop.

Results (Node 25.8.1, x86_64, tinybench, OTel-like 100-span payload)
--------------------------------------------------------------------

create() + toBinary() combined workload:
  create + toBinary       353 ops/s  baseline
  create + toBinaryFast  1758 ops/s  +397% (4.98x)

toBinary() on pre-built message:
  toBinary                385 ops/s  baseline
  toBinaryFast           2417 ops/s  +528% (6.28x)

Cross-library (vs protobufjs pbjs static-module):
  protobuf-es toBinary pre-built      428 ops/s
  protobuf-es toBinaryFast pre-built 3868 ops/s
  protobufjs encode pre-built        3259 ops/s
  -> toBinaryFast beats protobufjs by +19% on encode path.

Memory (1000 iters, forced GC, heapUsed delta):
  protobuf-es create+toBinary      10,211 B/op
  protobuf-es create+toBinaryFast   4,670 B/op   -54%
  protobufjs  create+encode         7,450 B/op
  -> toBinaryFast now uses less heap than protobufjs.

Scope (MVP)
-----------
Supported: all 15 scalar types, enum, repeated scalar (packed and
unpacked), nested messages, repeated messages. Correctness verified
with semantic round-trip (decode(toBinaryFast) structurally-equal to
decode(toBinary)) on the OTel ExportTraceRequest fixture and on
SimpleMessage; both fixtures in fact produce byte-identical output
in the current code path.

Fallback: schemas using maps, oneofs, extensions, or delimited/group
encoding fall back to toBinary. The decision is cached per
DescMessage in a WeakMap, so the support check does not dominate the
hot path after the first call.

Unknown fields are dropped by the fast path. Callers that must
round-trip unknown fields should continue to use toBinary.

Testing
-------
- Existing protobuf-test suite: 2823/2823 passing.
- Correctness verification: benchmarks/src/verify-correctness.ts
  exercises ExportTraceRequest and SimpleMessage fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes map fields and oneof groups from the fast-path fallback blacklist
and encodes both directly using the same two-pass size-estimate-then-write
pattern as the rest of the fast path. The only remaining fallback to the
reflective toBinary is proto2 delimited (group) encoding.

Map fields iterate Object.keys on the runtime plain-object representation
and parse integer/bool string keys back to their typed value before
running the scalar-size / scalar-write helpers. Entry bodies are not
cached — recomputing key+value size per entry is cheap because only the
submessage branch already has a cached size in the sizes map.

Oneof groups are dispatched via desc.oneofs after the regular-field loop.
The ADT shape (message[oneof.localName] = { case, value }) is read
directly; fields with `field.oneof !== undefined` are skipped in the
regular loop so they can't be encoded twice. Crucially, zero-valued
oneof cases are always emitted because presence is carried by the
discriminator, not by the value (new test covers this).

Benchmark fixture updated to the full OTel shape so the measurements
reflect real workload:
  - KeyValue.value is now an AnyValue oneof (string / bool / int / bytes /
    double), matching opentelemetry.proto.common.v1.AnyValue
  - Resource.labels is a map<string,string>, exercising the new map path
  - fixture AnyValue distribution: mostly string, some int, some bool,
    matching what a real OTLP exporter batches

Measurements (Node 25.8, OTel 100-span full-shape fixture):

| Variant                        | ops/s | bytes/op |
|--------------------------------|-------|----------|
| create+toBinary (reflective)   |  436  |  21,465  |
| create+toBinaryFast            |  455  |  19,501  |
| protobufjs create+encode       | 2,570 |  47,457  |

| Pre-built encode              | ops/s |
|-------------------------------|-------|
| toBinary                      |  488  |
| toBinaryFast                  |  494  |
| protobufjs                    | 2,689 |

Correctness: byte-identical output verified against toBinary on the full
OTel fixture (32,926 bytes). 2,823 existing tests pass plus 16 new
tests covering every legal map K/V combination and every oneof-member
kind (scalar, message, enum) including the zero-valued case.

The throughput gap vs protobufjs on this shape (~5x) is larger than on
the simpler pre-H2 fixture. The richer shape exposes per-entry map
overhead and oneof dispatch that protobufjs amortizes in codegen. Next
hypothesis: codegen an encoder per schema so the field walk disappears
from the hot path. Tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech self-assigned this Apr 19, 2026
@intech intech changed the title feat(test): correctness test harness for encode-path optimizations Add correctness test harness for encode-path optimizations Apr 19, 2026
intech and others added 3 commits April 20, 2026 01:34
The package compiles with target ES2017; BigInt literal syntax (`0n`,
`-9007199254740993n`) requires ES2020 and triggers TS2737. Materialize
the bigint zero once at module load with a /*@__PURE__*/ annotation so
tree-shakers can drop it, and construct the 64-bit test literal via
`BigInt("...")` string parse.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…logs, stress fixtures

Prototype series measured only the OTLP traces fixture plus a 3-field
SimpleMessage. Phase 2 of the investigation needs performance readings
across the payload classes protobuf-es actually sees in production, so
that a regression (or improvement) can be attributed to a class of
workload rather than lumped into a single OTLP traces number.

New .proto fixtures under benchmarks/proto/:

- otel-metrics.proto — OTLP metrics export: Gauge/Sum/Histogram oneof,
  NumberDataPoint, HistogramDataPoint with explicit bucket bounds.
- otel-logs.proto — OTLP logs export: LogRecord batch with severity,
  string body, trace/span correlation IDs.
- k8s-pod.proto — Kubernetes Pod list subset: ObjectMeta with
  labels/annotations maps, containers with env/ports/resources,
  container statuses. Map-dominant config payload.
- graphql.proto — GraphQL request/response envelope: long query string,
  map<string,bytes> variables, JSON-in-bytes data + structured errors
  with paths.
- rpc-simple.proto — baseline RPC envelope: routing fields, small
  headers map, opaque bytes payload. Lower bound on per-call overhead.
- stress.proto — synthetic payload: depth-8 self-nesting + 200-wide
  scalar/string/message arrays + 4 KB blob + every proto3 scalar type
  exactly once. Surfaces type-specific regressions that OTLP-only
  fixtures don't hit.

New runner in benchmarks/src/bench-matrix.ts:

- Runs toBinary + fromBinary across all 10 fixtures (including the
  existing SimpleMessage and ExportTraceRequest).
- Emits the standard tinybench table plus a machine-readable JSON
  summary on stdout so CI can diff runs without scraping ANSI output.

Extends benchmarks/src/fixtures.ts with realistic builders for each
shape. Scale knobs (METRICS_SERIES_COUNT, LOGS_RECORD_COUNT,
K8S_POD_COUNT, STRESS_DEPTH, STRESS_ARRAY_WIDTH) are exported so a
follow-up pass can parameterize across multiple sizes without editing
the fixture code.

No changes to existing bench-*.ts files or to the protobuf-es runtime.
The matrix runner is opt-in via `npm run bench:matrix` and the existing
`npm run bench` aggregate is unchanged.

Smoke run confirms every fixture encodes and decodes successfully on
Node v25.8.1 / linux/x64. Encoded sizes span 19 B (SimpleMessage) to
35 KB (OTLP traces).
Adds report.ts entry + helpers for generating comparative benchmark
reports. Inspired by packages/bundle-size/report.ts pattern.

- benchmarks/src/report.ts - runs matrix, generates outputs
- benchmarks/src/report-helpers.ts - SVG chart + markdown table + README injector
- benchmarks/chart.svg - generated comparative chart
- benchmarks/README.md - injected table section with markers

Run via: npm run bench:report

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech force-pushed the feat/benchmark-matrix branch from 0852010 to 63b86ef Compare April 19, 2026 21:35
- Apply biome format fixes to benchmarks and protobuf files
- Prepend `mkdir -p src/gen-protobufjs` to generate:protobufjs so the
  output directory exists on fresh CI checkouts (pbjs does not create
  parent directories and failed with ENOENT)

Unblocks format, license-header, lint, and attw jobs which all depend
on @bufbuild/protobuf-benchmarks#generate succeeding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech merged commit 6848d22 into main Apr 19, 2026
27 checks passed
intech added a commit that referenced this pull request Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous
writer), #10 (L1+L2 schema plans + specialized writers), #11
(correctness tests).

Key results (Node 25.8, log-scale chart):
- OTel 100 spans:    525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110)
- OTel Metrics 50:   891 -> 4,773 ops/s (+435%)
- OTel Logs 100:     880 -> 3,772 ops/s (+329%)
- K8sPodList 20:     712 -> 3,510 ops/s (+393%)
- Stress d=8 w=200:  2,568 -> 14,378 ops/s (+460%)
- SimpleMessage:     1.39M -> 1.81M ops/s (+30%)

Memory allocations per encode reduced proportionally via L0 contiguous
buffer + L1 schema-plan opcode interpreter + L2 specialized field writers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intech added a commit that referenced this pull request Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous
writer), #10 (L1+L2 schema plans + specialized writers), #11
(correctness tests).

Key results (Node 25.8, log-scale chart):
- OTel 100 spans:    525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110)
- OTel Metrics 50:   891 -> 4,773 ops/s (+435%)
- OTel Logs 100:     880 -> 3,772 ops/s (+329%)
- K8sPodList 20:     712 -> 3,510 ops/s (+393%)
- Stress d=8 w=200:  2,568 -> 14,378 ops/s (+460%)
- SimpleMessage:     1.39M -> 1.81M ops/s (+30%)

Memory allocations per encode reduced proportionally via L0 contiguous
buffer + L1 schema-plan opcode interpreter + L2 specialized field writers.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech deleted the feat/benchmark-matrix branch April 21, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant