Skip to content

Add L1 schema plan and L2 specialized writer#10

Merged
intech merged 2 commits intofeat/l0-contiguous-writerfrom
feat/l1-l2-schema-plans
Apr 19, 2026
Merged

Add L1 schema plan and L2 specialized writer#10
intech merged 2 commits intofeat/l0-contiguous-writerfrom
feat/l1-l2-schema-plans

Conversation

@intech
Copy link
Copy Markdown

@intech intech commented Apr 19, 2026

Summary

Replaces reflective encoding with a compiled-plan interpreter on the
toBinaryFast entry point. Each DescMessage is compiled once into a
flat Int32Array opcode stream plus side tables (field names,
pre-encoded tag bytes, sub-plans, map-entry plans, oneof case tables);
a single dense switch in executeSchemaPlan walks the stream with
every BinaryWriter call inlined so V8 keeps monomorphic receivers
on the hot path.

Implementation follows the 20 pinned decisions in the L1/L2 design
spec (analysis/p1-t4-l1-l2-design-spec.md, P1-P20).

Approach

  • L1: flat Int32Array opcodes per schema, variable stride (2 for
    singular scalars, 3 for message/list/map/oneof), side-tables indexed
    by slot. Compiled once, cached in WeakMap<DescMessage, SchemaPlan | null>. Cycle-safe two-phase compile.
  • L2: specialized writers inlined into the interpreter switch.
    ASCII fast path and int64 tri-dispatch inherited from L0 (no
    duplication, P11/P12). Pre-encoded tag bytes emitted via
    writer.raw(tags[slot]) (P13). Packed repeated scalars use a tight
    inline loop inside a single fork/join (P14). Element dispatch
    factored through a small writeScalarByOp helper so V8 sees one
    call site per writer method (P19).
  • Fallback: proto2 groups, delimited-encoded messages inside
    lists, and messages carrying unknown fields with
    writeUnknownFields: true all transparently delegate to the
    reflective toBinary (P5, P10).

Measurements

Node.js 25.8.1, 2-second windows, warmup + 50-iter outer loop.

Workload Reflective toBinaryFast (L1+L2) Delta
PerfMessage (nested, 16,132 B) 8,961 ops/s 15,861 ops/s +77.0%
ScalarValuesMessage (97 B) 453,597 ops/s 505,119 ops/s +11.4%
MessageFieldMessage (9 B) 737,761 ops/s 725,993 ops/s -1.6% (noise)
PerfMessage heapUsed/op 664 B 119 B -82%

Byte-parity verified against reflective on all three fixtures and on
the 14-case parity test suite.

Note: the external benchmarks/ folder used for the L0 OTel
100-span measurements is not merged into the L0 branch. The
PerfMessage fixture (100 nested message payload bundling lists,
maps, oneofs, and strings) is the closest representative workload
available in-tree.

Gates

  • Byte-parity on all fixtures (14 new parity tests + PerfMessage round-trip)
  • 2,889 existing tests pass (2,875 pre-existing + 14 new)
  • Throughput improvement on realistic workload (+77%)
  • Memory improvement on realistic workload (-82%)
  • No regression on SimpleMessage-style workload (within noise)
  • Diff under 1,500 LOC (971 LOC in src/, 192 LOC tests)

Stacked on

PR #8 (L0 contiguous-buffer writer). Cannot merge until L0 merges;
rebases automatically once L0 lands on main.

🤖 Generated with Claude Code

@intech intech changed the title feat(protobuf): L1 schema plans + L2 specialized writers feat(protobuf): Add L1 schema plan and L2 specialized writer Apr 19, 2026
@intech intech changed the title feat(protobuf): Add L1 schema plan and L2 specialized writer Add L1 schema plan and L2 specialized writer Apr 19, 2026
intech and others added 2 commits April 20, 2026 01:39
Introduce a compiled-plan fast path for protobuf encoding. Each
`DescMessage` compiles to a flat `Int32Array` opcode stream plus a
handful of side tables (field names, pre-encoded tag bytes, sub-plans,
map-entry plans, oneof case tables). A single dense switch in
`executeSchemaPlan` interprets the stream with every `BinaryWriter`
call inlined into the hot loop, eliminating the reflective dispatch
that dominated the previous encoder.

Implementation follows the 20 pinned decisions in the L1/L2 design
spec:
  - P1-P10 (L1): flat Int32Array opcodes with variable stride, side
    tables indexed by slot, `WeakMap` plan cache, cycle-safe two-phase
    compile, reflective fallback for groups / messages carrying
    unknown fields, `toBinaryFast` as the sole public entry point.
  - P11-P20 (L2): inherits the ASCII fast path and int64 tri-dispatch
    from the L0 `BinaryWriter`, emits pre-encoded tag bytes via
    `writer.raw`, packs repeated scalars inline inside `fork/join`,
    keeps element writes monomorphic via a small `writeScalarByOp`
    dispatcher, skips a separate field-writers module so V8 sees one
    call site per writer method.

Correctness:
  - 2,889 tests green (2,875 pre-existing + 14 new schema-plan parity
    tests covering scalars, strings ASCII / UTF-8, packed scalars,
    singular and repeated messages, string- and int-keyed maps, oneof
    arms, and unset oneofs).
  - Byte-parity verified against the reflective `toBinary` on the
    16,132-byte `PerfMessage` fixture (100-message payload: scalars,
    lists, maps, oneofs, nested messages).

Measurements (Node.js 25.8.1, 2s windows):
  - PerfMessage throughput: 8,961 -> 15,861 ops/s (+77%)
  - ScalarValuesMessage:    453,597 -> 505,119 ops/s (+11%)
  - MessageFieldMessage:    737,761 -> 725,993 ops/s (-1.6%, noise)
  - PerfMessage heapUsed/op: 664 B -> 119 B (-82%)

Files:
  - packages/protobuf/src/wire/schema-plan.ts      (compiler + interpreter)
  - packages/protobuf/src/to-binary-fast.ts        (public entry + fallback)
  - packages/protobuf/src/index.ts                 (export toBinaryFast)
  - packages/protobuf-test/src/wire/schema-plan.test.ts (parity suite)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove the biome-ignore for noDoubleEquals in schema-plan.ts: `v` is
  typed as `unknown` at the comparison site, so the rule never fires
  and the suppression triggers `suppressions/unused`.
- Re-run `biome format` on the two files touched by the L1/L2 commit
  so CI's gh-diffcheck stays green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech force-pushed the feat/l1-l2-schema-plans branch from 6e276e6 to 85f4dd7 Compare April 19, 2026 21:42
@intech intech self-assigned this Apr 19, 2026
@intech intech merged commit 57c12eb into feat/l0-contiguous-writer Apr 19, 2026
1 check passed
intech added a commit that referenced this pull request Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous
writer), #10 (L1+L2 schema plans + specialized writers), #11
(correctness tests).

Key results (Node 25.8, log-scale chart):
- OTel 100 spans:    525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110)
- OTel Metrics 50:   891 -> 4,773 ops/s (+435%)
- OTel Logs 100:     880 -> 3,772 ops/s (+329%)
- K8sPodList 20:     712 -> 3,510 ops/s (+393%)
- Stress d=8 w=200:  2,568 -> 14,378 ops/s (+460%)
- SimpleMessage:     1.39M -> 1.81M ops/s (+30%)

Memory allocations per encode reduced proportionally via L0 contiguous
buffer + L1 schema-plan opcode interpreter + L2 specialized field writers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
intech added a commit that referenced this pull request Apr 19, 2026
Regenerated after merge of #6 (benchmark matrix), #8 (L0 contiguous
writer), #10 (L1+L2 schema plans + specialized writers), #11
(correctness tests).

Key results (Node 25.8, log-scale chart):
- OTel 100 spans:    525 -> 2,501 ops/s (+376%), 0.80x pbjs (3,110)
- OTel Metrics 50:   891 -> 4,773 ops/s (+435%)
- OTel Logs 100:     880 -> 3,772 ops/s (+329%)
- K8sPodList 20:     712 -> 3,510 ops/s (+393%)
- Stress d=8 w=200:  2,568 -> 14,378 ops/s (+460%)
- SimpleMessage:     1.39M -> 1.81M ops/s (+30%)

Memory allocations per encode reduced proportionally via L0 contiguous
buffer + L1 schema-plan opcode interpreter + L2 specialized field writers.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@intech intech deleted the feat/l1-l2-schema-plans branch April 21, 2026 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant