Add L1 schema plan and L2 specialized writer#10
Merged
intech merged 2 commits intofeat/l0-contiguous-writerfrom Apr 19, 2026
Merged
Add L1 schema plan and L2 specialized writer#10intech merged 2 commits intofeat/l0-contiguous-writerfrom
intech merged 2 commits intofeat/l0-contiguous-writerfrom
Conversation
Introduce a compiled-plan fast path for protobuf encoding. Each
`DescMessage` compiles to a flat `Int32Array` opcode stream plus a
handful of side tables (field names, pre-encoded tag bytes, sub-plans,
map-entry plans, oneof case tables). A single dense switch in
`executeSchemaPlan` interprets the stream with every `BinaryWriter`
call inlined into the hot loop, eliminating the reflective dispatch
that dominated the previous encoder.
Implementation follows the 20 pinned decisions in the L1/L2 design
spec:
- P1-P10 (L1): flat Int32Array opcodes with variable stride, side
tables indexed by slot, `WeakMap` plan cache, cycle-safe two-phase
compile, reflective fallback for groups / messages carrying
unknown fields, `toBinaryFast` as the sole public entry point.
- P11-P20 (L2): inherits the ASCII fast path and int64 tri-dispatch
from the L0 `BinaryWriter`, emits pre-encoded tag bytes via
`writer.raw`, packs repeated scalars inline inside `fork/join`,
keeps element writes monomorphic via a small `writeScalarByOp`
dispatcher, skips a separate field-writers module so V8 sees one
call site per writer method.
Correctness:
- 2,889 tests green (2,875 pre-existing + 14 new schema-plan parity
tests covering scalars, strings ASCII / UTF-8, packed scalars,
singular and repeated messages, string- and int-keyed maps, oneof
arms, and unset oneofs).
- Byte-parity verified against the reflective `toBinary` on the
16,132-byte `PerfMessage` fixture (100-message payload: scalars,
lists, maps, oneofs, nested messages).
Measurements (Node.js 25.8.1, 2s windows):
- PerfMessage throughput: 8,961 -> 15,861 ops/s (+77%)
- ScalarValuesMessage: 453,597 -> 505,119 ops/s (+11%)
- MessageFieldMessage: 737,761 -> 725,993 ops/s (-1.6%, noise)
- PerfMessage heapUsed/op: 664 B -> 119 B (-82%)
Files:
- packages/protobuf/src/wire/schema-plan.ts (compiler + interpreter)
- packages/protobuf/src/to-binary-fast.ts (public entry + fallback)
- packages/protobuf/src/index.ts (export toBinaryFast)
- packages/protobuf-test/src/wire/schema-plan.test.ts (parity suite)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the biome-ignore for noDoubleEquals in schema-plan.ts: `v` is typed as `unknown` at the comparison site, so the rule never fires and the suppression triggers `suppressions/unused`. - Re-run `biome format` on the two files touched by the L1/L2 commit so CI's gh-diffcheck stays green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6e276e6 to
85f4dd7
Compare
This was referenced Apr 19, 2026
intech
added a commit
that referenced
this pull request
Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous writer), #10 (L1+L2 schema plans + specialized writers), #11 (correctness tests). Key results (Node 25.8, log-scale chart): - OTel 100 spans: 525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110) - OTel Metrics 50: 891 -> 4,773 ops/s (+435%) - OTel Logs 100: 880 -> 3,772 ops/s (+329%) - K8sPodList 20: 712 -> 3,510 ops/s (+393%) - Stress d=8 w=200: 2,568 -> 14,378 ops/s (+460%) - SimpleMessage: 1.39M -> 1.81M ops/s (+30%) Memory allocations per encode reduced proportionally via L0 contiguous buffer + L1 schema-plan opcode interpreter + L2 specialized field writers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
intech
added a commit
that referenced
this pull request
Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous writer), #10 (L1+L2 schema plans + specialized writers), #11 (correctness tests). Key results (Node 25.8, log-scale chart): - OTel 100 spans: 525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110) - OTel Metrics 50: 891 -> 4,773 ops/s (+435%) - OTel Logs 100: 880 -> 3,772 ops/s (+329%) - K8sPodList 20: 712 -> 3,510 ops/s (+393%) - Stress d=8 w=200: 2,568 -> 14,378 ops/s (+460%) - SimpleMessage: 1.39M -> 1.81M ops/s (+30%) Memory allocations per encode reduced proportionally via L0 contiguous buffer + L1 schema-plan opcode interpreter + L2 specialized field writers. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces reflective encoding with a compiled-plan interpreter on the
toBinaryFastentry point. EachDescMessageis compiled once into aflat
Int32Arrayopcode stream plus side tables (field names,pre-encoded tag bytes, sub-plans, map-entry plans, oneof case tables);
a single dense switch in
executeSchemaPlanwalks the stream withevery
BinaryWritercall inlined so V8 keeps monomorphic receiverson the hot path.
Implementation follows the 20 pinned decisions in the L1/L2 design
spec (
analysis/p1-t4-l1-l2-design-spec.md, P1-P20).Approach
Int32Arrayopcodes per schema, variable stride (2 forsingular scalars, 3 for message/list/map/oneof), side-tables indexed
by slot. Compiled once, cached in
WeakMap<DescMessage, SchemaPlan | null>. Cycle-safe two-phase compile.ASCII fast path and int64 tri-dispatch inherited from L0 (no
duplication, P11/P12). Pre-encoded tag bytes emitted via
writer.raw(tags[slot])(P13). Packed repeated scalars use a tightinline loop inside a single
fork/join(P14). Element dispatchfactored through a small
writeScalarByOphelper so V8 sees onecall site per writer method (P19).
lists, and messages carrying unknown fields with
writeUnknownFields: trueall transparently delegate to thereflective
toBinary(P5, P10).Measurements
Node.js 25.8.1, 2-second windows, warmup + 50-iter outer loop.
toBinaryFast(L1+L2)Byte-parity verified against reflective on all three fixtures and on
the 14-case parity test suite.
Gates
src/, 192 LOC tests)Stacked on
PR #8 (L0 contiguous-buffer writer). Cannot merge until L0 merges;
rebases automatically once L0 lands on
main.🤖 Generated with Claude Code