Add upstream protobuf-es as honest baseline in benchmark chart by intech · Pull Request #20 · Connectum-Framework/protobuf-es

intech · 2026-04-20T20:25:11Z

Summary

The benchmark delta chart compared toBinaryFast against the fork's
in-tree toBinary, but that baseline already ships the L0 contiguous-
writer optimisation from #8. That hid the cumulative improvement over
the original protobuf-es and understated the stacked L0+L1+L2 gain.

This PR installs @bufbuild/protobuf@2.11.0 under the npm alias
upstream-protobuf-es so the unmodified upstream encoder runs alongside
the fork's in-tree copy in the same Node process. Since both fork and
upstream are on v2.11.0 and the descriptor protocol is wire-compatible,
upstream's toBinary accepts fork-generated schemas and produces
byte-identical output — no second codegen pass required.

Encoder matrix

The report now emits four encoder columns per fixture:

column	what it measures
`upstream`	`@bufbuild/protobuf@latest` unmodified — honest original baseline
`toBinary`	fork reflective encoder (includes L0 from #8)
`toBinaryFast`	fork codegen-generated fast path (L1 + L2)
`protobufjs`	cross-library reference

Delta chart

The chart-delta.svg now grows a third baseline row per fixture:

vs upstream (red-orange) — honest cumulative gain from original protobuf-es
vs toBinary (fork, L0) (grey) — incremental gain on top of L0
vs protobufjs (blue) — cross-library reference

Bar width in the main chart shrinks 20px → 18px so the four-bar group
stays legible at the same layout.

Sample numbers

Local run on Node v25.8.1, linux/x64 (same 600ms tinybench setting as
before, so variance is similar). ExportTraceRequest (100 spans) row:

encoder	ops/sec
upstream	445
toBinary (fork, L0)	1,633
toBinaryFast	2,094
protobufjs	2,262

StressMessage (depth=8, width=200):

encoder	ops/sec
upstream	2,083
toBinary (fork, L0)	10,017
toBinaryFast	11,321
protobufjs	16,655

On 19-byte SimpleMessage the upstream/toBinary numbers swap at the
noise floor (800K vs 480K ops/sec) — expected for sub-microsecond
operations at a 600ms measurement window. Realistic fixtures show the
full L0+L1+L2 stack clearly.

Test plan

upstream-protobuf-es alias installed via npm install --save-dev "upstream-protobuf-es@npm:@bufbuild/protobuf@2.11.0"
Byte-for-byte parity verified between fork toBinary and upstream toBinary across SimpleMessage, ExportTraceRequest, StressMessage fixtures
npx tsc --noEmit in benchmarks/ — clean
npm run lint -w @bufbuild/protobuf-benchmarks — clean
npm run format -w @bufbuild/protobuf-benchmarks — no fixes applied
npm run bench:report -w @bufbuild/protobuf-benchmarks — emits 40 results = 10 fixtures × 4 encoders
BENCH_REPORT_READ_ONLY=1 npm run bench:report — re-renders charts from the new JSON shape
chart-delta.svg renders three stacked sub-bars per fixture row; fixtures without a pbjs stub still render two rows (vs upstream, vs toBinary) correctly

Draft — not for merge until a full pinned-core run on the CI host replaces the local-desktop numbers currently committed to the chart SVGs.

The delta chart previously compared `toBinaryFast` against the fork's `toBinary`, but that baseline already ships the L0 contiguous-writer optimisation from PR #8. The comparison therefore hid the cumulative improvement relative to the original protobuf-es and understated the stacked L0+L1+L2 gain. Install `@bufbuild/protobuf@2.11.0` under the `upstream-protobuf-es` npm alias so the unmodified upstream encoder runs alongside the fork's in-tree copy in the same process. Verified that upstream's `toBinary` accepts fork-generated schemas and emits byte-identical output, so a single schema drives both bars without a second codegen pass. The bench report now emits four encoder columns per fixture (`upstream`, `toBinary`, `toBinaryFast`, `protobufjs`). The delta chart grows a third baseline row: `vs upstream` (honest cumulative gain), `vs toBinary` (incremental gain on top of L0), and `vs protobufjs` (cross-library reference). Bar width in the main chart shrinks from 20px to 18px to keep the four-bar group legible at the same layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-20T20:34:36Z

Benchmark: 7 regression(s)

Thresholds: throughput regression >5%, memory regression >10%. Runner pinned to CPU 0 via taskset. Current run on linux/x64, Node v22.22.2, captured 2026-04-20T20:34:31.464Z.
Baseline captured 2026-04-20T17:34:45.007Z on linux/x64, Node v22.22.2.

Summary: 7 regressed, 0 improved, 0 new, 13 unchanged.

Fixture	Baseline ops/s	PR ops/s	Δ ops	Baseline B/op	PR B/op	Δ mem	Status
SimpleMessage :: toBinary (pre-built, 19 B)	849,817	868,228	+2.2%	–	–	–	ok
ExportTraceRequest (100 spans) :: toBinary (pre-built, 32926 B)	1,231	1,123	-8.7%	–	–	–	REGRESSION
ExportMetricsRequest (50 series) :: toBinary (pre-built, 17696 B)	2,168	1,970	-9.1%	–	–	–	REGRESSION
ExportLogsRequest (100 records) :: toBinary (pre-built, 21319 B)	2,171	1,968	-9.3%	–	–	–	REGRESSION
K8sPodList (20 pods) :: toBinary (pre-built, 28900 B)	2,342	2,162	-7.7%	–	–	–	REGRESSION
GraphQLRequest :: toBinary (pre-built, 624 B)	176,305	173,269	-1.7%	–	–	–	ok
GraphQLResponse :: toBinary (pre-built, 1366 B)	236,876	239,713	+1.2%	–	–	–	ok
RpcRequest :: toBinary (pre-built, 501 B)	296,046	294,321	-0.6%	–	–	–	ok
RpcResponse :: toBinary (pre-built, 602 B)	434,888	437,922	+0.7%	–	–	–	ok
StressMessage (depth=8, width=200) :: toBinary (pre-built, 12868 B)	7,860	7,290	-7.3%	–	–	–	REGRESSION
SimpleMessage :: fromBinary (19 B)	1,020,891	1,038,777	+1.8%	–	–	–	ok
ExportTraceRequest (100 spans) :: fromBinary (32926 B)	599.8	610.5	+1.8%	–	–	–	ok
ExportMetricsRequest (50 series) :: fromBinary (17696 B)	1,149	1,171	+2.0%	–	–	–	ok
ExportLogsRequest (100 records) :: fromBinary (21319 B)	1,073	1,091	+1.7%	–	–	–	ok
K8sPodList (20 pods) :: fromBinary (28900 B)	1,398	1,312	-6.1%	–	–	–	REGRESSION
GraphQLRequest :: fromBinary (624 B)	300,513	287,857	-4.2%	–	–	–	ok
GraphQLResponse :: fromBinary (1366 B)	265,540	269,060	+1.3%	–	–	–	ok
RpcRequest :: fromBinary (501 B)	269,405	256,502	-4.8%	–	–	–	ok
RpcResponse :: fromBinary (602 B)	378,014	369,598	-2.2%	–	–	–	ok
StressMessage (depth=8, width=200) :: fromBinary (12868 B)	4,046	3,752	-7.3%	–	–	–	REGRESSION

Produced by benchmarks/scripts/compare-results.ts. Artifacts: bench-results-<pr> (current), bench-baseline-main (baseline).

…seline Previous state on main: toBinary (L0 contiguous writer, PR #8) + toBinaryFast (L1+L2 schema plans, opt-in). Merging L1+L2 into toBinary broke extension conformance (PR #21). Keeping two encoders required users to change their call sites to see L1+L2 gains — also unacceptable. This change: - Removes toBinaryFast export and L1+L2 source files from main (packages/protobuf/src/to-binary-fast.ts, its unit test, and all benchmark wiring that referenced it). - Preserves the full L1+L2 implementation on branch archive/l1-l2-schema-plans-experimental for future iteration, not discarded. Closes PR #21. - Adds upstream-protobuf-es (npm alias for @bufbuild/protobuf@latest) as a third baseline column in benchmarks (re-applying the infrastructure from the closed PR #20), so chart.svg and chart-delta.svg honestly show "this fork's toBinary vs original upstream vs protobufjs" instead of fork-internal self-comparison. - Regenerates chart.svg and chart-delta.svg with the three-way layout and rewrites the README narrative around the new encoders and the "Current state" archival note. - Updates correctness-matrix.test.ts comments to point at the archive branch so contributors know where L1+L2 lives. Local run (Node v25.8.1, unpinned host — CI pinned numbers will differ): - upstream toBinary (OTel 100 spans): 331 ops/s (baseline) - fork toBinary (OTel 100 spans): 1,012 ops/s (+206% vs upstream) - protobufjs (OTel 100 spans): 1,680 ops/s L0 alone gets the fork past 3x original upstream on the real OTel workload that drove this whole investigation, with byte-identical wire output.

* Drop toBinaryFast from main, archive L1+L2 to branch, add upstream baseline Previous state on main: toBinary (L0 contiguous writer, PR #8) + toBinaryFast (L1+L2 schema plans, opt-in). Merging L1+L2 into toBinary broke extension conformance (PR #21). Keeping two encoders required users to change their call sites to see L1+L2 gains — also unacceptable. This change: - Removes toBinaryFast export and L1+L2 source files from main (packages/protobuf/src/to-binary-fast.ts, its unit test, and all benchmark wiring that referenced it). - Preserves the full L1+L2 implementation on branch archive/l1-l2-schema-plans-experimental for future iteration, not discarded. Closes PR #21. - Adds upstream-protobuf-es (npm alias for @bufbuild/protobuf@latest) as a third baseline column in benchmarks (re-applying the infrastructure from the closed PR #20), so chart.svg and chart-delta.svg honestly show "this fork's toBinary vs original upstream vs protobufjs" instead of fork-internal self-comparison. - Regenerates chart.svg and chart-delta.svg with the three-way layout and rewrites the README narrative around the new encoders and the "Current state" archival note. - Updates correctness-matrix.test.ts comments to point at the archive branch so contributors know where L1+L2 lives. Local run (Node v25.8.1, unpinned host — CI pinned numbers will differ): - upstream toBinary (OTel 100 spans): 331 ops/s (baseline) - fork toBinary (OTel 100 spans): 1,012 ops/s (+206% vs upstream) - protobufjs (OTel 100 spans): 1,680 ops/s L0 alone gets the fork past 3x original upstream on the real OTel workload that drove this whole investigation, with byte-identical wire output. * Use median-of-5 in bench:report to stabilize per-fixture numbers Single-run benchmark output from report.ts was vulnerable to host jitter on small/fast fixtures — a SimpleMessage measurement spread 2-8x across back-to-back runs on the same machine. A committed chart snapshot could therefore show fork's toBinary 19% slower than upstream in one run and 15-20% faster in the next four. Wraps the benchmark loop inside report.ts with a BENCH_REPORT_RUNS counter (default 5) and takes the per-fixture per-encoder median. Override via the env var for faster iteration or longer sweeps. Fresh median-of-5 numbers (local unpinned host, taskset -c 0): - SimpleMessage toBinary: upstream 1.27M, fork 1.35M (+6.3%) - OTel 100 spans toBinary: upstream 536, fork 1634 (+205%) - K8sPodList toBinary: upstream 837, fork 3140 (+275%) - StressMessage toBinary: upstream 2873, fork 10476 (+265%) - RpcResponse toBinary: fork 630K vs protobufjs 532K (fork ahead on this shape) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Remove leftover L3 / toBinaryFast references from test harness Files deleted: - packages/protobuf-test/src/schema-plan-adaptive.test.ts — L3 test file, imports toBinaryFast + @bufbuild/protobuf/wire/schema-plan-adaptive both of which were already removed from main - packages/protobuf/src/wire/schema-plan-adaptive.ts — L3 implementation, orphaned after its only consumer was removed - benchmarks/src/bench-multishape.ts — L3-only benchmark, imports toBinaryFast These files survived the initial cleanup because they were only exercised by L3/L1+L2 code paths that were themselves removed. The full L1+L2 + L3 implementation remains on archive/l1-l2-schema-plans-experimental. Full test suite now passes: 2909 / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update benchmarks README to match L0-only main Removed stale L1+L2 descriptions and the toBinaryFast 0.80x claim from 'Reading the results'. Replaced with L0 description, median-of-5 note, and fresh numbers (OTel 100 spans 0.88x protobufjs, cumulative vs upstream +6% to +275%). Dropped bench-multishape reference from 'Future work' (file deleted in 2dfe2fa). Added 'Archived work' section pointing at archive/l1-l2-schema-plans-experimental for L1+L2 and L3 prototypes. * Drop stale wire/schema-plan-adaptive export from package.json The subpath export + typesVersions alias survived the L3 cleanup even though wire/schema-plan-adaptive.ts was deleted in 2442fd7. attw was reporting 'Resolution failed' on the orphaned entry, failing PR #22 CI. * Bench CI: same-runner baseline instead of cross-runner artifact Previously the bench-matrix job downloaded the baseline JSON from the most recent push-to-main workflow artifact. That baseline was captured on whatever ubuntu-latest runner GitHub happened to assign at that time; the PR run then happened on a different physical host. Even with taskset -c 0 + median-of-5 on both sides, cross-host variance (different P/E-core topologies, SMT neighbours, thermal state) remained 5-7%, producing chronic false-positive regressions on PRs that did not touch the encode/decode hot path at all. Switch the PR job to benchmark origin/main and the PR merge commit in sequence on the SAME runner within the SAME workflow invocation. Every factor except the code under test is now held constant. The 'bench-baseline-main' artifact upload on push-to-main is preserved for external trend consumers but is no longer read by PR comparison. Mechanics: - Checkout PR as usual. - Record current + origin/main SHAs. - git checkout origin/main, install, build, generate, run matrix into baseline-results.json. - git checkout back to PR head, rerun install/build/generate (PR may have changed package-lock), run matrix into bench-results.json. - compare-results.ts diffs the two local JSONs, posts the sticky comment, flags regressions. Doubled bench work pushes the job from ~9 min to ~18 min; timeout raised 25 -> 40 min with buffer for slow runners. Also: compare-results.ts header text now notes 'baseline and current are benchmarked on the same runner' so the PR comment reflects the new guarantee, and .gitignore covers baseline-results.json / bench-report.md to keep the PR working tree clean after a local bench pass. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

intech self-assigned this Apr 20, 2026

intech mentioned this pull request Apr 20, 2026

Drop toBinaryFast from main, archive L1+L2, add upstream baseline #22

Merged

6 tasks

intech closed this Apr 21, 2026

intech deleted the feat/upstream-baseline-in-benchmarks branch April 21, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upstream protobuf-es as honest baseline in benchmark chart#20

Add upstream protobuf-es as honest baseline in benchmark chart#20
intech wants to merge 1 commit intomainfrom
feat/upstream-baseline-in-benchmarks

intech commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intech commented Apr 20, 2026

Summary

Encoder matrix

Delta chart

Sample numbers

Test plan

Uh oh!

github-actions Bot commented Apr 20, 2026

Benchmark: 7 regression(s)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant