Add L3 runtime monomorphization over L1+L2 fast path#18
Conversation
Per design spec (analysis/p1-t6-l3-design-spec.md). Observes message
shapes across first 10 encodes per schema; graduates frequent shapes
to specialized plan variants that skip the generic `isFieldSet`
presence gate for known-present fields and known-absent slots. 4-
variant cap with seal-on-breach prevents cache explosion.
Two execution modes:
- Mode A (CSP-safe, default): variant = pre-computed `VariantStep[]`
list of known-present slots; executor is a statically-imported
interpreter that delegates to the L1+L2 `estimate*/write*`
helpers. Safe under strict CSP.
- Mode B (CSP-unsafe, opt-in): per-variant `new Function()` executor
with unrolled call sites for per-variant IC isolation. Enabled
via `globalThis[Symbol.for('@bufbuild/protobuf.adaptive-codegen')] = true`.
Spec adaptations for current main:
- The L1+L2 reference implementation on main is the direct estimate/
write function set in `to-binary-fast.ts` rather than the opcode-
based `schema-plan.ts` assumed by the spec. The variant plan shape
therefore drops the opcode trim/filter step and instead carries a
compact `VariantStep[]` — same monomorphization effect, fewer
moving parts for this code base.
- `buildVariantExecutor` is replaced by the two closures (Mode A
static / Mode B codegen) in `compileVariantPlan` with identical
semantics.
Gates (5-run median on pinned CPU):
- Byte-parity: preserved across 11 new L3 tests + 16 pre-existing
toBinaryFast tests + correctness-matrix.
- SimpleMessage multi-shape: +55.5% (spec target: >= +10%)
- SimpleMessage single-shape: +40.8% (spec target: regression <= 3%)
- Span multi-shape: +19.0% (spec target: >= +10%)
- Span single-shape: +12.2% (spec target: regression <= 3%)
- Memory overhead: bounded by D3 + D7 (shared side tables).
Opt-in: `toBinaryFast(schema, msg, { adaptive: true })` or
`PROTOBUF_ES_L3=1`. Default behaviour unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…chains - schema-plan-adaptive.ts: remove suppressions for a rule biome isn't enforcing in this project config - schema-plan-adaptive.ts + to-binary-fast.ts: collapse defensive globalThis process env lookups into optional chain form - biome.json: ignore gen/gen-protobufjs/.tmp from root scope so turbo lint doesn't catch pbjs-generated files and scratch dirs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark: no regressionsThresholds: throughput regression Summary:
Produced by |
|
Closing as not-ready after CI evaluation. CI bench-matrix (single-shape repeated encode, pinned median-of-5) showed L3 net negative on realistic workloads:
Root cause: L3 adds `shapeHash()` + `variants.get()` overhead per encode. On single-shape workloads (which dominate real traffic for a given schema) this overhead is not amortized by the variant specialization. Custom multi-shape bench on my host showed +19-55%, but that's a niche scenario and the matrix — which mirrors typical deployments — regressed on nested-message fixtures. Deferred, not abandoned
Next stepsKeeping L0+L1+L2 stack on main as the shipped improvement. Current status vs pbjs: 0.80x on OTel traces without codegen (from baseline 0.18x before PR #8). L3 micro-optimizations not worth the additional scope given this floor. |
Summary
Adds L3 runtime monomorphization as an opt-in overlay on top of the L1+L2 fast path in
toBinaryFast. Implements the design committed inanalysis/p1-t6-l3-design-spec.md(12 pinned decisions D1-D12), adapted to the actual L1+L2 surface onmain(direct estimate/write helpers rather than the opcode interpreter described by the spec).Default behaviour is unchanged. Opt in per call with
toBinaryFast(schema, msg, { adaptive: true })or globally viaPROTOBUF_ES_L3=1.What L3 does
DescMessagevia a slot-presence bitmap (bigint).L3_WARMUP = 10observations of the same shape, graduates a specialized plan variant that skips the genericisFieldSetpresence gate for known-present fields and drops opcodes for known-absent slots.Two execution modes
VariantStep[]; the executor is a statically-imported interpreter that delegates to the L1+L2estimate*/write*helpers. Safe under strict CSP.new Function()executor with fully template-generated source (no user data in the source). Enabled byglobalThis[Symbol.for('@bufbuild/protobuf.adaptive-codegen')] = true.Gates (5-run median, pinned CPU)
Byte-parity preserved across:
schema-plan-adaptivetests (shape hashing, graduation, cap, drift, oneof, Mode B)toBinaryFastfeature-coverage testscorrectness-matrix.test.ts+byte-identity.test.tsSpec adaptations for current
mainThe L1+L2 reference implementation on
mainis the direct estimator/writer function set into-binary-fast.ts, not the opcode-basedschema-plan.tsassumed by the spec (that lives onfeat/l1-l2-schema-plansand was not merged tomain). The adaptation:VariantPlancarries a compactVariantStep[]rather than a trimmedInt32Arrayopcode stream. Same monomorphization effect (known-present slot list + unrolled dispatch), fewer moving parts.buildVariantExecutoris replaced by two closures insidecompileVariantPlan(Mode A static / Mode B codegen).All 12 pinned decisions (D1-D12) are honoured. D7 side-table sharing is achieved implicitly because variants delegate back into the generic helpers (no table duplication).
Files changed
packages/protobuf/src/wire/schema-plan-adaptive.ts— new (+503 LOC)packages/protobuf/src/to-binary-fast.ts— adaptive routing (+86 LOC)packages/protobuf/package.json— internal subpath export for testspackages/protobuf-test/src/schema-plan-adaptive.test.ts— new (+369 LOC, 11 tests)benchmarks/src/bench-multishape.ts— new (+258 LOC)Test plan
node_modules/.bin/tsc --noEmit --project packages/protobuf/tsconfig.jsonnode_modules/.bin/tsc --noEmit --project packages/protobuf-test/tsconfig.jsontsx --test src/schema-plan-adaptive.test.ts— 11/11 passtsx --test src/to-binary-fast.test.ts— 16/16 pass (no regression)tsx --test src/correctness-matrix.test.ts src/byte-identity.test.ts— all passtaskset -c 0 npx tsx src/bench-multishape.tsx 5 runs — gates pass on medianDraft status
Keeping as draft for per-PR user review before merge. Internal fork only.