Add correctness test harness for encode-path optimizations#6
Merged
Conversation
Adds a new `benchmarks` workspace with tinybench-based microbenchmarks for create(), toBinary(), and combined/JSON-intermediate paths on small and OTLP-like nested message schemas. Motivation: addresses the reproducibility gap discussed in bufbuild#333 and bufbuild#1035, where performance arguments have relied on ad-hoc user-provided numbers without a suite living alongside the library. The nested fixture mirrors the shape of opentelemetry.proto ExportTraceServiceRequest so the workload that produced open-telemetry/opentelemetry-js#6221 can be exercised in a controlled environment. Suites: - bench-create.ts isolates create(Schema, init) cost - bench-toBinary.ts isolates toBinary() on pre-built messages - bench-create-toBinary.ts combined workload (OTel-like, 100 spans) - bench-fromJson-path.ts fromJsonString/fromJson + toBinary paths Additive only: new `benchmarks/` workspace, new entry in root `workspaces`, new entries in package-lock.json. No existing files modified. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Adds protobufjs comparison on identical nested.proto fixture using pbjs static-module codegen, covering create+encode, encode-only (pre-built), and decode paths. - Adds memory benchmark measuring bytes/op via heapUsed delta after forced GC (1000 iterations per case, requires --expose-gc). - Adds fromBinary() parsing benchmarks symmetric to bench-toBinary.ts. - Extends README with new result tables, methodology notes for the heapUsed approach, and run-to-run variance caveats. Observations on the OTLP-like 100-span workload: - protobuf-es is ~5-7x slower than protobufjs on encode (create+encode or pre-built), ~4-6x slower on decode. - protobuf-es allocates ~3x more heap per op on encode; decode-side allocations are within jitter between the two libraries. - Run-to-run variance on an unpinned host moves ratios by roughly +/-20%. The observed gap is consistent with the OTel #6221 regression report's direction but smaller than the externally-cited 13-30x because this suite deliberately isolates only the encoder/decoder walks (pbjs static mode vs protobuf-es reflective path), without app-level traversal, JSON conversion, or BigInt handling overhead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds toBinaryFast() — opt-in fast path using two-pass size estimation and a pre-allocated buffer for write. Ports the pattern from open-telemetry/opentelemetry-js#6390 (ProtobufLogsSerializer) to the protobuf-es reflective encode. Motivation ---------- The existing toBinary uses BinaryWriter with fork/join per length- delimited field — every nested message and every packed repeated scalar pushes chunk/buf state onto a stack, serializes into its own chunk list, then re-emits a varint length prefix and concatenates. On OTel-shaped workloads (ResourceSpans -> ScopeSpans -> Span -> KeyValue) that produces many small Uint8Array/number[] allocations and a final double-copy in finish(). The two-pass variant walks the descriptor once to compute the exact encoded size, allocates a single Uint8Array of that size, then writes bytes into it at fixed offsets. Length prefixes computed in pass 1 are cached per submessage object and reused in pass 2, so pass 2 is a straight-line write loop. Results (Node 25.8.1, x86_64, tinybench, OTel-like 100-span payload) -------------------------------------------------------------------- create() + toBinary() combined workload: create + toBinary 353 ops/s baseline create + toBinaryFast 1758 ops/s +397% (4.98x) toBinary() on pre-built message: toBinary 385 ops/s baseline toBinaryFast 2417 ops/s +528% (6.28x) Cross-library (vs protobufjs pbjs static-module): protobuf-es toBinary pre-built 428 ops/s protobuf-es toBinaryFast pre-built 3868 ops/s protobufjs encode pre-built 3259 ops/s -> toBinaryFast beats protobufjs by +19% on encode path. Memory (1000 iters, forced GC, heapUsed delta): protobuf-es create+toBinary 10,211 B/op protobuf-es create+toBinaryFast 4,670 B/op -54% protobufjs create+encode 7,450 B/op -> toBinaryFast now uses less heap than protobufjs. Scope (MVP) ----------- Supported: all 15 scalar types, enum, repeated scalar (packed and unpacked), nested messages, repeated messages. Correctness verified with semantic round-trip (decode(toBinaryFast) structurally-equal to decode(toBinary)) on the OTel ExportTraceRequest fixture and on SimpleMessage; both fixtures in fact produce byte-identical output in the current code path. Fallback: schemas using maps, oneofs, extensions, or delimited/group encoding fall back to toBinary. The decision is cached per DescMessage in a WeakMap, so the support check does not dominate the hot path after the first call. Unknown fields are dropped by the fast path. Callers that must round-trip unknown fields should continue to use toBinary. Testing ------- - Existing protobuf-test suite: 2823/2823 passing. - Correctness verification: benchmarks/src/verify-correctness.ts exercises ExportTraceRequest and SimpleMessage fixtures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes map fields and oneof groups from the fast-path fallback blacklist
and encodes both directly using the same two-pass size-estimate-then-write
pattern as the rest of the fast path. The only remaining fallback to the
reflective toBinary is proto2 delimited (group) encoding.
Map fields iterate Object.keys on the runtime plain-object representation
and parse integer/bool string keys back to their typed value before
running the scalar-size / scalar-write helpers. Entry bodies are not
cached — recomputing key+value size per entry is cheap because only the
submessage branch already has a cached size in the sizes map.
Oneof groups are dispatched via desc.oneofs after the regular-field loop.
The ADT shape (message[oneof.localName] = { case, value }) is read
directly; fields with `field.oneof !== undefined` are skipped in the
regular loop so they can't be encoded twice. Crucially, zero-valued
oneof cases are always emitted because presence is carried by the
discriminator, not by the value (new test covers this).
Benchmark fixture updated to the full OTel shape so the measurements
reflect real workload:
- KeyValue.value is now an AnyValue oneof (string / bool / int / bytes /
double), matching opentelemetry.proto.common.v1.AnyValue
- Resource.labels is a map<string,string>, exercising the new map path
- fixture AnyValue distribution: mostly string, some int, some bool,
matching what a real OTLP exporter batches
Measurements (Node 25.8, OTel 100-span full-shape fixture):
| Variant | ops/s | bytes/op |
|--------------------------------|-------|----------|
| create+toBinary (reflective) | 436 | 21,465 |
| create+toBinaryFast | 455 | 19,501 |
| protobufjs create+encode | 2,570 | 47,457 |
| Pre-built encode | ops/s |
|-------------------------------|-------|
| toBinary | 488 |
| toBinaryFast | 494 |
| protobufjs | 2,689 |
Correctness: byte-identical output verified against toBinary on the full
OTel fixture (32,926 bytes). 2,823 existing tests pass plus 16 new
tests covering every legal map K/V combination and every oneof-member
kind (scalar, message, enum) including the zero-valued case.
The throughput gap vs protobufjs on this shape (~5x) is larger than on
the simpler pre-H2 fixture. The richer shape exposes per-entry map
overhead and oneof dispatch that protobufjs amortizes in codegen. Next
hypothesis: codegen an encoder per schema so the field walk disappears
from the hot path. Tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The package compiles with target ES2017; BigInt literal syntax (`0n`,
`-9007199254740993n`) requires ES2020 and triggers TS2737. Materialize
the bigint zero once at module load with a /*@__PURE__*/ annotation so
tree-shakers can drop it, and construct the 64-bit test literal via
`BigInt("...")` string parse.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…logs, stress fixtures Prototype series measured only the OTLP traces fixture plus a 3-field SimpleMessage. Phase 2 of the investigation needs performance readings across the payload classes protobuf-es actually sees in production, so that a regression (or improvement) can be attributed to a class of workload rather than lumped into a single OTLP traces number. New .proto fixtures under benchmarks/proto/: - otel-metrics.proto — OTLP metrics export: Gauge/Sum/Histogram oneof, NumberDataPoint, HistogramDataPoint with explicit bucket bounds. - otel-logs.proto — OTLP logs export: LogRecord batch with severity, string body, trace/span correlation IDs. - k8s-pod.proto — Kubernetes Pod list subset: ObjectMeta with labels/annotations maps, containers with env/ports/resources, container statuses. Map-dominant config payload. - graphql.proto — GraphQL request/response envelope: long query string, map<string,bytes> variables, JSON-in-bytes data + structured errors with paths. - rpc-simple.proto — baseline RPC envelope: routing fields, small headers map, opaque bytes payload. Lower bound on per-call overhead. - stress.proto — synthetic payload: depth-8 self-nesting + 200-wide scalar/string/message arrays + 4 KB blob + every proto3 scalar type exactly once. Surfaces type-specific regressions that OTLP-only fixtures don't hit. New runner in benchmarks/src/bench-matrix.ts: - Runs toBinary + fromBinary across all 10 fixtures (including the existing SimpleMessage and ExportTraceRequest). - Emits the standard tinybench table plus a machine-readable JSON summary on stdout so CI can diff runs without scraping ANSI output. Extends benchmarks/src/fixtures.ts with realistic builders for each shape. Scale knobs (METRICS_SERIES_COUNT, LOGS_RECORD_COUNT, K8S_POD_COUNT, STRESS_DEPTH, STRESS_ARRAY_WIDTH) are exported so a follow-up pass can parameterize across multiple sizes without editing the fixture code. No changes to existing bench-*.ts files or to the protobuf-es runtime. The matrix runner is opt-in via `npm run bench:matrix` and the existing `npm run bench` aggregate is unchanged. Smoke run confirms every fixture encodes and decodes successfully on Node v25.8.1 / linux/x64. Encoded sizes span 19 B (SimpleMessage) to 35 KB (OTLP traces).
Adds report.ts entry + helpers for generating comparative benchmark reports. Inspired by packages/bundle-size/report.ts pattern. - benchmarks/src/report.ts - runs matrix, generates outputs - benchmarks/src/report-helpers.ts - SVG chart + markdown table + README injector - benchmarks/chart.svg - generated comparative chart - benchmarks/README.md - injected table section with markers Run via: npm run bench:report Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0852010 to
63b86ef
Compare
- Apply biome format fixes to benchmarks and protobuf files - Prepend `mkdir -p src/gen-protobufjs` to generate:protobufjs so the output directory exists on fresh CI checkouts (pbjs does not create parent directories and failed with ENOENT) Unblocks format, license-header, lint, and attw jobs which all depend on @bufbuild/protobuf-benchmarks#generate succeeding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 19, 2026
intech
added a commit
that referenced
this pull request
Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous writer), #10 (L1+L2 schema plans + specialized writers), #11 (correctness tests). Key results (Node 25.8, log-scale chart): - OTel 100 spans: 525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110) - OTel Metrics 50: 891 -> 4,773 ops/s (+435%) - OTel Logs 100: 880 -> 3,772 ops/s (+329%) - K8sPodList 20: 712 -> 3,510 ops/s (+393%) - Stress d=8 w=200: 2,568 -> 14,378 ops/s (+460%) - SimpleMessage: 1.39M -> 1.81M ops/s (+30%) Memory allocations per encode reduced proportionally via L0 contiguous buffer + L1 schema-plan opcode interpreter + L2 specialized field writers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
intech
added a commit
that referenced
this pull request
Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous writer), #10 (L1+L2 schema plans + specialized writers), #11 (correctness tests). Key results (Node 25.8, log-scale chart): - OTel 100 spans: 525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110) - OTel Metrics 50: 891 -> 4,773 ops/s (+435%) - OTel Logs 100: 880 -> 3,772 ops/s (+329%) - K8sPodList 20: 712 -> 3,510 ops/s (+393%) - Stress d=8 w=200: 2,568 -> 14,378 ops/s (+460%) - SimpleMessage: 1.39M -> 1.81M ops/s (+30%) Memory allocations per encode reduced proportionally via L0 contiguous buffer + L1 schema-plan opcode interpreter + L2 specialized field writers. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds comprehensive correctness test harness to guarantee wire-format compatibility across any future encode-path optimization (L0 contiguous writer, schema plans, specialized field writers).
What's included
correctness-matrix.test.ts— matrix test across encoders × fixtures (10 fixtures: SimpleMessage, Packed/Unpacked repeated, Nested chains, Map, Oneof variants, AnyValue, Struct)round-trip-property.test.ts— property-based round-trip (100 random scalars, 50 packed, 50 nested, oneof variants, maps, 8 edge cases; seeded mulberry32 for determinism)byte-identity.test.ts— strict byte-identical assertions (proto3 defaults, packed/unpacked, oneof, UTF-8 multi-byte, varint boundaries, determinism)Encoder registry
ENCODERS array currently contains only
toBinary(main-branch baseline). Documented as the extension point — new encoders (toBinaryFast from prototype stack, future L0/L1 variants) add in one line and matrix auto-expands to N×N pairs.Test results
Scope
Internal PR within the Connectum-Framework fork. Upstream submission to bufbuild/protobuf-es is gated on user approval.
🤖 Generated with Claude Code