Add L3 runtime monomorphization over L1+L2 fast path by intech · Pull Request #18 · Connectum-Framework/protobuf-es

intech · 2026-04-20T19:16:59Z

Summary

Adds L3 runtime monomorphization as an opt-in overlay on top of the L1+L2 fast path in toBinaryFast. Implements the design committed in analysis/p1-t6-l3-design-spec.md (12 pinned decisions D1-D12), adapted to the actual L1+L2 surface on main (direct estimate/write helpers rather than the opcode interpreter described by the spec).

Default behaviour is unchanged. Opt in per call with toBinaryFast(schema, msg, { adaptive: true }) or globally via PROTOBUF_ES_L3=1.

What L3 does

Observes message shapes per DescMessage via a slot-presence bitmap (bigint).
After L3_WARMUP = 10 observations of the same shape, graduates a specialized plan variant that skips the generic isFieldSet presence gate for known-present fields and drops opcodes for known-absent slots.
4-variant cap (D3). 5th unique shape seals the record; further novel shapes flow through the generic plan — already-graduated shapes keep being served.
Shape drift after seal remains byte-parity correct (falls back to generic).

Two execution modes

Mode A (CSP-safe, default) — variant = pre-computed VariantStep[]; the executor is a statically-imported interpreter that delegates to the L1+L2 estimate*/write* helpers. Safe under strict CSP.
Mode B (CSP-unsafe, opt-in) — per-variant new Function() executor with fully template-generated source (no user data in the source). Enabled by globalThis[Symbol.for('@bufbuild/protobuf.adaptive-codegen')] = true.

Gates (5-run median, pinned CPU)

Fixture	Delta vs L1+L2	Target
SimpleMessage multi-shape	+55.5 %	>= +10 %
SimpleMessage single-shape	+40.8 %	regression <= 3 %
Span multi-shape	+19.0 %	>= +10 %
Span single-shape	+12.2 %	regression <= 3 %

Byte-parity preserved across:

11 new schema-plan-adaptive tests (shape hashing, graduation, cap, drift, oneof, Mode B)
All 16 pre-existing toBinaryFast feature-coverage tests
correctness-matrix.test.ts + byte-identity.test.ts

Spec adaptations for current `main`

The L1+L2 reference implementation on main is the direct estimator/writer function set in to-binary-fast.ts, not the opcode-based schema-plan.ts assumed by the spec (that lives on feat/l1-l2-schema-plans and was not merged to main). The adaptation:

VariantPlan carries a compact VariantStep[] rather than a trimmed Int32Array opcode stream. Same monomorphization effect (known-present slot list + unrolled dispatch), fewer moving parts.
buildVariantExecutor is replaced by two closures inside compileVariantPlan (Mode A static / Mode B codegen).

All 12 pinned decisions (D1-D12) are honoured. D7 side-table sharing is achieved implicitly because variants delegate back into the generic helpers (no table duplication).

Files changed

packages/protobuf/src/wire/schema-plan-adaptive.ts — new (+503 LOC)
packages/protobuf/src/to-binary-fast.ts — adaptive routing (+86 LOC)
packages/protobuf/package.json — internal subpath export for tests
packages/protobuf-test/src/schema-plan-adaptive.test.ts — new (+369 LOC, 11 tests)
benchmarks/src/bench-multishape.ts — new (+258 LOC)

Test plan

node_modules/.bin/tsc --noEmit --project packages/protobuf/tsconfig.json
node_modules/.bin/tsc --noEmit --project packages/protobuf-test/tsconfig.json
tsx --test src/schema-plan-adaptive.test.ts — 11/11 pass
tsx --test src/to-binary-fast.test.ts — 16/16 pass (no regression)
tsx --test src/correctness-matrix.test.ts src/byte-identity.test.ts — all pass
taskset -c 0 npx tsx src/bench-multishape.ts x 5 runs — gates pass on median

Draft status

Keeping as draft for per-PR user review before merge. Internal fork only.

Per design spec (analysis/p1-t6-l3-design-spec.md). Observes message shapes across first 10 encodes per schema; graduates frequent shapes to specialized plan variants that skip the generic `isFieldSet` presence gate for known-present fields and known-absent slots. 4- variant cap with seal-on-breach prevents cache explosion. Two execution modes: - Mode A (CSP-safe, default): variant = pre-computed `VariantStep[]` list of known-present slots; executor is a statically-imported interpreter that delegates to the L1+L2 `estimate*/write*` helpers. Safe under strict CSP. - Mode B (CSP-unsafe, opt-in): per-variant `new Function()` executor with unrolled call sites for per-variant IC isolation. Enabled via `globalThis[Symbol.for('@bufbuild/protobuf.adaptive-codegen')] = true`. Spec adaptations for current main: - The L1+L2 reference implementation on main is the direct estimate/ write function set in `to-binary-fast.ts` rather than the opcode- based `schema-plan.ts` assumed by the spec. The variant plan shape therefore drops the opcode trim/filter step and instead carries a compact `VariantStep[]` — same monomorphization effect, fewer moving parts for this code base. - `buildVariantExecutor` is replaced by the two closures (Mode A static / Mode B codegen) in `compileVariantPlan` with identical semantics. Gates (5-run median on pinned CPU): - Byte-parity: preserved across 11 new L3 tests + 16 pre-existing toBinaryFast tests + correctness-matrix. - SimpleMessage multi-shape: +55.5% (spec target: >= +10%) - SimpleMessage single-shape: +40.8% (spec target: regression <= 3%) - Span multi-shape: +19.0% (spec target: >= +10%) - Span single-shape: +12.2% (spec target: regression <= 3%) - Memory overhead: bounded by D3 + D7 (shared side tables). Opt-in: `toBinaryFast(schema, msg, { adaptive: true })` or `PROTOBUF_ES_L3=1`. Default behaviour unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…chains - schema-plan-adaptive.ts: remove suppressions for a rule biome isn't enforcing in this project config - schema-plan-adaptive.ts + to-binary-fast.ts: collapse defensive globalThis process env lookups into optional chain form - biome.json: ignore gen/gen-protobufjs/.tmp from root scope so turbo lint doesn't catch pbjs-generated files and scratch dirs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-20T19:34:38Z

Benchmark: no regressions

Thresholds: throughput regression >5%, memory regression >10%. Runner pinned to CPU 0 via taskset. Current run on linux/x64, Node v22.22.2, captured 2026-04-20T19:55:39.913Z.
Baseline captured 2026-04-20T17:34:45.007Z on linux/x64, Node v22.22.2.

Summary: 0 regressed, 3 improved, 0 new, 17 unchanged.

Fixture	Baseline ops/s	PR ops/s	Δ ops	Baseline B/op	PR B/op	Δ mem	Status
SimpleMessage :: toBinary (pre-built, 19 B)	849,817	891,706	+4.9%	–	–	–	ok
ExportTraceRequest (100 spans) :: toBinary (pre-built, 32926 B)	1,231	1,282	+4.1%	–	–	–	ok
ExportMetricsRequest (50 series) :: toBinary (pre-built, 17696 B)	2,168	2,242	+3.4%	–	–	–	ok
ExportLogsRequest (100 records) :: toBinary (pre-built, 21319 B)	2,171	2,235	+3.0%	–	–	–	ok
K8sPodList (20 pods) :: toBinary (pre-built, 28900 B)	2,342	2,498	+6.6%	–	–	–	improved
GraphQLRequest :: toBinary (pre-built, 624 B)	176,305	184,415	+4.6%	–	–	–	ok
GraphQLResponse :: toBinary (pre-built, 1366 B)	236,876	245,317	+3.6%	–	–	–	ok
RpcRequest :: toBinary (pre-built, 501 B)	296,046	314,074	+6.1%	–	–	–	improved
RpcResponse :: toBinary (pre-built, 602 B)	434,888	448,977	+3.2%	–	–	–	ok
StressMessage (depth=8, width=200) :: toBinary (pre-built, 12868 B)	7,860	8,299	+5.6%	–	–	–	improved
SimpleMessage :: fromBinary (19 B)	1,020,891	1,046,590	+2.5%	–	–	–	ok
ExportTraceRequest (100 spans) :: fromBinary (32926 B)	599.8	621.6	+3.6%	–	–	–	ok
ExportMetricsRequest (50 series) :: fromBinary (17696 B)	1,149	1,187	+3.4%	–	–	–	ok
ExportLogsRequest (100 records) :: fromBinary (21319 B)	1,073	1,106	+3.0%	–	–	–	ok
K8sPodList (20 pods) :: fromBinary (28900 B)	1,398	1,410	+0.8%	–	–	–	ok
GraphQLRequest :: fromBinary (624 B)	300,513	303,615	+1.0%	–	–	–	ok
GraphQLResponse :: fromBinary (1366 B)	265,540	271,515	+2.2%	–	–	–	ok
RpcRequest :: fromBinary (501 B)	269,405	273,347	+1.5%	–	–	–	ok
RpcResponse :: fromBinary (602 B)	378,014	385,593	+2.0%	–	–	–	ok
StressMessage (depth=8, width=200) :: fromBinary (12868 B)	4,046	4,033	-0.3%	–	–	–	ok

Produced by benchmarks/scripts/compare-results.ts. Artifacts: bench-results-<pr> (current), bench-baseline-main (baseline).

intech · 2026-04-20T19:54:45Z

Closing as not-ready after CI evaluation.

CI bench-matrix (single-shape repeated encode, pinned median-of-5) showed L3 net negative on realistic workloads:

Fixture	Δ ops/s
ExportTraceRequest toBinary	-5.9%
ExportLogsRequest toBinary	-5.3%
K8sPodList fromBinary	-5.3%
StressMessage fromBinary	-6.6%
SimpleMessage toBinary	+8.0%
RpcResponse toBinary	+5.1%

Root cause: L3 adds `shapeHash()` + `variants.get()` overhead per encode. On single-shape workloads (which dominate real traffic for a given schema) this overhead is not amortized by the variant specialization.

Custom multi-shape bench on my host showed +19-55%, but that's a niche scenario and the matrix — which mirrors typical deployments — regressed on nested-message fixtures.

Deferred, not abandoned

Cheaper shape hashing (Int32Array of field-presence bits, no BigInt, no Map lookup) could eliminate the per-encode overhead
Conditional activation only after observing ≥3 distinct shapes (not on first encode)
bench-multishape.ts should land in CI matrix separately before another L3 attempt

Next steps

Keeping L0+L1+L2 stack on main as the shipped improvement. Current status vs pbjs: 0.80x on OTel traces without codegen (from baseline 0.18x before PR #8). L3 micro-optimizations not worth the additional scope given this floor.

intech and others added 2 commits April 20, 2026 23:16

Apply biome format to L3 files

755327c

intech closed this Apr 20, 2026

intech deleted the feat/l3-runtime-monomorphization branch April 21, 2026 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add L3 runtime monomorphization over L1+L2 fast path#18

Add L3 runtime monomorphization over L1+L2 fast path#18
intech wants to merge 3 commits intomainfrom
feat/l3-runtime-monomorphization

intech commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

intech commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

intech commented Apr 20, 2026

Summary

What L3 does

Two execution modes

Gates (5-run median, pinned CPU)

Spec adaptations for current main

Files changed

Test plan

Draft status

Uh oh!

github-actions Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark: no regressions

Uh oh!

intech commented Apr 20, 2026

Deferred, not abandoned

Next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Spec adaptations for current `main`

github-actions Bot commented Apr 20, 2026 •

edited

Loading