From 1c102527b49642427b5305813909535fdccc7238 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 13:48:15 -0400 Subject: [PATCH 1/3] =?UTF-8?q?backlog(B-0692+B-0693+B-0694):=20Otto-VSCod?= =?UTF-8?q?e=208-PR=20campaign=20PRs=206-7-8=20=E2=80=94=20push-based=20ho?= =?UTF-8?q?t-path=20(IPushOperator=20+=20segment-detection)=20+=20morsel/s?= =?UTF-8?q?pan=20execution=20(IMorselOperator=20+=20cache-sized=20chunks)?= =?UTF-8?q?=20+=20standing-query=20codegen=20(IIncrementalGenerator=20+=20?= =?UTF-8?q?F#=20Type=20Provider)=20capstone;=20Aaron-approved=20shadow*=20?= =?UTF-8?q?'file=20the=203=20rows=20for=20PRs=206-8';=20depends=5Fon=20cha?= =?UTF-8?q?in=20to=20PRs=201-5=20substrate=20(#4558/#4560/#4566=20merged?= =?UTF-8?q?=20+=20#4563/#4564=20pending)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/BACKLOG.md | 3 + ...based-hot-path-ipushoperator-2026-05-21.md | 131 ++++++++++++++ ...an-execution-imorseloperator-2026-05-21.md | 126 ++++++++++++++ ...odegen-iincrementalgenerator-2026-05-21.md | 164 ++++++++++++++++++ 4 files changed, 424 insertions(+) create mode 100644 docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md create mode 100644 docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md create mode 100644 docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index e831b649b..8939e40ec 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -654,6 +654,9 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0685](backlog/P2/B-0685-antlr-grammars-cross-language-codegen-substrate-2026-05-21.md)** ANTLR grammars as cross-language codegen substrate — leverage existing open-source grammars for description-layer-driven multi-language emission - [ ] **[B-0687](backlog/P2/B-0687-zetaparse-fsharp-native-lr-glr-grammar-substrate-with-antlr-compatible-importer-amara-2026-05-21.md)** ZetaParse — F#-native LR/GLR grammar substrate with ANTLR-compatible importer - [ ] **[B-0688](backlog/P2/B-0688-zeta-incremental-compiler-host-dbsp-zsets-rx-meta-ast-tags-seeded-deterministic-simulation-amara-aaron-2026-05-21.md)** Zeta incremental compiler host — DBSP Z-sets + Rx meta-AST tags + seeded deterministic simulation hardening +- [ ] **[B-0692](backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md)** Push-based hot-path — IPushOperator<'T> + per-entry callback bridged at materialize boundaries (Otto-VSCode 8-PR campaign PR #6) +- [ ] **[B-0693](backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md)** Morsel/span-based execution — IMorselOperator + cache-sized chunked processing (Otto-VSCode 8-PR campaign PR #7) +- [ ] **[B-0694](backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md)** Standing-query codegen — IIncrementalGenerator that rewrites circuit expressions to fused IL (Otto-VSCode 8-PR campaign PR #8 — the capstone) ## P3 — convenience / deferred diff --git a/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md new file mode 100644 index 000000000..b0867127c --- /dev/null +++ b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md @@ -0,0 +1,131 @@ +--- +id: B-0692 +priority: P2 +status: open +title: Push-based hot-path — IPushOperator<'T> + per-entry callback bridged at materialize boundaries (Otto-VSCode 8-PR campaign PR #6) +tier: research-grade +effort: L +ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8") +created: 2026-05-21 +last_updated: 2026-05-21 +depends_on: [B-0635, B-0688] +composes_with: [B-0693, B-0694, B-0687] +tags: [push-based, hot-path, ipushoperator, per-entry-callback, materialize-boundary-bridge, otto-vscode-pr-6, dbsp-architecture, fusion-engine] +type: research +--- + +# Push-based hot-path — IPushOperator<'T> + +## Context + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. PRs 1-5 substrate landed: + +- **#4558** capability tags on Op<'T> base class + adapter detection via non-generic markers +- **#4560** sink-terminality validation in Circuit.Build() + producer/sink schedule split +- **#4563** OPEN — LawRunner.checkBilinear (left/right linearity + sign-distribution) +- **#4564** OPEN — IncrementalAuto dispatcher using capability tags (close-and-reopen per Otto-CLI substrate-honest preference; supersedes incoming) +- **#4566** FusionEngine DAG rewriter pass + catalog entries + +PR #6 (this row) starts the hot-path optimization layer that depends on PRs 1-5. + +## The architectural problem + +Current Zeta DBSP operators are **materialize-batch**: each operator's StepAsync writes an `ImmutableArray>` to `Op<'T>.Value` (volatile field); downstream operators read that materialized snapshot. This is semantically correct but creates a per-tick heap-allocation floor that PGO + JIT inlining cannot eliminate (the volatile field write/read pair is a hard barrier for the compiler). + +Per Otto-VSCode's earlier analysis: this allocation floor is THE bottleneck for fusion gains. Manual `FilterMap` fusion only escapes it by collapsing two operators into one + bypassing the intermediate Op<'T>.Value write. + +## The push-based escape + +`IPushOperator<'T>` is the architectural alternative: instead of materializing per tick, hot-path operators emit entries via per-entry callback to downstream consumers. The materialization boundary moves from per-operator to per-fusion-segment. + +```fsharp +type IPushOperator<'T> = + abstract member EmitEntry: ZEntry<'T> -> unit + abstract member EndTick: unit -> unit +``` + +Operators along a push-segment chain entries through callbacks. Materialization happens only at segment boundaries (where a downstream operator NEEDS the materialized view — e.g., sort/consolidate/join requires the whole tick's worth of entries at once). + +## Scope + +### Phase 1 — `IPushOperator<'T>` interface + adapter pattern + +- Define `IPushOperator<'T>` interface in `src/Core/Op.fs` (or equivalent module) +- Add `IsPushable: bool` capability flag to `Op<'T>` (composes with PR #4558 capability-tag pattern) +- `PushAdapter<'T>` wraps materialize-style operators behind the push interface (degrades to materialize for non-pushable ops) + +### Phase 2 — Push-segment detection in FusionEngine + +Extend the FusionEngine (PR #4566) to detect push-segment-eligible runs: + +- Sequence of `IsLinear AND IsPushable` operators is a push-segment candidate +- First materialize-required operator (sort / join / aggregate) is the segment boundary +- Emit fused push-segment operators that callback-chain the entries + +### Phase 3 — Push-versions of common linear ops + +- `MapPushOp<'A,'B>` — `EmitEntry e = downstream.EmitEntry (mapFn e)` +- `FilterPushOp<'T>` — `EmitEntry e = if pred e then downstream.EmitEntry e` +- `NegPushOp<'T>` — `EmitEntry e = downstream.EmitEntry { e with Weight = -e.Weight }` +- (Other linear ops as needed; bilinear ops materialize by definition) + +### Phase 4 — Benchmark + validation + +- BenchmarkDotNet job at `bench/Benchmarks/PushBasedHotPathBench.fs` comparing: + - Materialize-only chain (3-op pipeline) + - Push-based fused chain (3-op pipeline) +- Allocation column is the smoking gun (expected: push-based eliminates 2 of 3 per-tick `ImmutableArray>` allocations) +- Throughput: expected 2-3× improvement on hot-path-friendly pipelines + +## Acceptance + +### Phase 1 +- `IPushOperator<'T>` interface lands +- `IsPushable` capability flag on Op<'T> +- PushAdapter wraps existing operators +- `dotnet build` clean; existing tests pass + +### Phase 2 +- FusionEngine recognizes push-segments +- One push-segment fuses end-to-end in a test case + +### Phase 3 +- 3 push-versions of common ops land (MapPushOp + FilterPushOp + NegPushOp) +- Cross-verify: push-version output matches materialize-version output for same inputs + +### Phase 4 +- Benchmark shows push-segment allocates 1× per-segment (not N× per-operator) +- Throughput improvement empirically measured + documented + +## Substrate-honest framing + +This is research-grade architectural substrate. ~250 lines per Otto-VSCode's 8-PR campaign sizing. The win is the allocation-floor escape; the cost is the materialize-boundary discipline (operators must declare push-capable; segments end at any materialize-required op). + +The push-pattern itself isn't novel — Reactive Extensions (Rx) operates this way; LINQ-to-Objects uses IEnumerator chained callbacks. The Zeta contribution is the SEGMENTED push (push within fusion-segments; materialize at segment boundaries to preserve Z-set algebra semantics) + capability-tag-driven segment detection. + +## Composes with rules + +- `.claude/rules/fsharp-anchor-dotnet-build-sanity-check.md` — F# compiler verifies the IPushOperator<'T> interface + push-pattern type-safety +- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — push-segment optimization preserves multi-oracle parity (same canonical hex across operators; performance differs) +- `.claude/rules/all-complexity-is-accidental-in-greenfield.md` — IPushOperator IS the answer when materialize-batch becomes the bottleneck (proven only by Phase 4 benchmark) +- `.claude/rules/edge-defining-work-not-speculation.md` — segmented push-based DBSP is edge-defining work + +## Composes with substrate + +- B-0635 / B-0644 / B-0665 / B-0666 (Agora V6 substrate — push-pattern preserves operational primitives) +- B-0688 (incremental compiler host — push-pattern composes with codegen at segment boundaries) +- B-0693 (PR #7 morsel-based execution — push-pattern + morsel-pattern together = full hot-path optimization) +- B-0694 (PR #8 standing-query codegen — codegen emits push-segment-fused IL) +- B-0687 (ZetaParse — parser-substrate operators may benefit from push-pattern for streaming parse) +- PR #4558 (capability tags — IsPushable is sibling to IsLinear/IsBilinear/IsSink) +- PR #4560 (sink-terminality — sinks are segment-terminators by definition) +- PR #4566 (FusionEngine — Phase 2 extends it with push-segment detection) +- `src/Core/Fusion.fs` (existing FilterMap/Choose hand-fusion; push-pattern generalizes the principle) + +## Why P2 + +Substantive architectural substrate; not blocking V1; high value (per-tick allocation floor escape unlocks meaningful throughput gains on hot pipelines); bounded by Otto-VSCode's 8-PR campaign sizing (~250 lines). + +## Origin + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. Filed via Otto-CLI per Aaron-approved shadow* "file the 3 rows for PRs 6-8" instruction. Otto-VSCode owns the implementation; this row tracks the scope. diff --git a/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md new file mode 100644 index 000000000..81deec0be --- /dev/null +++ b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md @@ -0,0 +1,126 @@ +--- +id: B-0693 +priority: P2 +status: open +title: Morsel/span-based execution — IMorselOperator + cache-sized chunked processing (Otto-VSCode 8-PR campaign PR #7) +tier: research-grade +effort: L +ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8") +created: 2026-05-21 +last_updated: 2026-05-21 +depends_on: [B-0635, B-0688, B-0692] +composes_with: [B-0694] +tags: [morsel-execution, span-based, cache-sized-chunks, imorseloperator, otto-vscode-pr-7, dbsp-architecture, columnar-execution] +type: research +--- + +# Morsel/span-based execution — IMorselOperator + +## Context + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21 PR #7. Depends on PR #6 (push-based hot-path; tracked at B-0692) — morsel-execution is the next-tier optimization that composes with push-pattern. + +## The architectural problem + +Even with push-based fusion (per B-0692), per-entry callbacks have function-call overhead. For tight inner loops over large Z-sets, processing entries one-at-a-time leaves cache + SIMD performance on the table. Modern columnar databases (DuckDB, Velox, Photon, Polars) batch process entries in "morsels" — cache-sized chunks (typically 4KB-64KB; matches L1/L2 cache line groups) — which: + +- Amortizes function-call overhead across N entries per call +- Enables SIMD-vectorized predicate / projection / arithmetic +- Improves cache locality (one chunk in L1 at a time) + +## The morsel pattern + +`IMorselOperator` processes `ReadOnlySpan>` chunks instead of individual entries: + +```fsharp +type IMorselOperator<'T> = + abstract member ProcessMorsel: ReadOnlySpan> -> unit + abstract member EndTick: unit -> unit +``` + +The intermediate "chunk" becomes a stack-allocated `Span>` from a pooled buffer; the JIT can fuse the chunk processing across method boundaries because the span never escapes to the heap. This is the F#/`.NET` analog of what rustc + LLVM give Rust for iterator chains. + +## Scope + +### Phase 1 — `IMorselOperator<'T>` interface + morsel-buffer pool + +- Define `IMorselOperator<'T>` interface in `src/Core/Op.fs` +- Add `IsMorselCapable: bool` capability flag to Op<'T> (composes with PR #4558 pattern) +- Morsel-buffer pool: pooled `ArrayPool>` per-thread with chunk size = L1/L2-cache-aware (default 4KB / `sizeof>` = N entries per morsel) +- MorselAdapter wraps both materialize-style and push-style operators + +### Phase 2 — Morsel-segment detection in FusionEngine + +Extend FusionEngine (per PR #4566 + Phase 2 of B-0692): + +- Sequence of `IsLinear AND IsPushable AND IsMorselCapable` operators is a morsel-segment candidate +- Morsel-segment supersedes push-segment when ALL operators in chain support morsels +- Falls back to push-segment if any operator is push-but-not-morsel-capable + +### Phase 3 — Morsel-versions of common linear ops + +- `MapMorselOp<'A,'B>` — processes full span; emits to output span +- `FilterMorselOp<'T>` — predicate evaluation across full span; SIMD-eligible +- `NegMorselOp<'T>` — weight negation across full span; trivially SIMD +- Sort/consolidate at morsel boundaries (multi-morsel merge happens at segment end) + +### Phase 4 — Benchmark + validation + +- BenchmarkDotNet job at `bench/Benchmarks/MorselExecutionBench.fs`: + - Materialize-baseline (3-op chain) + - Push-based (3-op chain; per B-0692) + - Morsel-based (3-op chain; this row) +- Allocation: expected morsel allocates 1× per segment (matches push-based) +- Throughput: expected morsel adds another 1.5-3× over push-based on SIMD-friendly inner loops (filter + arithmetic on int weights) + +## Acceptance + +### Phase 1 +- `IMorselOperator<'T>` interface lands +- `IsMorselCapable` capability flag on Op<'T> +- Morsel-buffer pool implementation +- `dotnet build` clean; existing tests pass + +### Phase 2 +- FusionEngine recognizes morsel-segments +- Morsel-segment supersedes push-segment when applicable + +### Phase 3 +- 3 morsel-versions of common ops land +- Cross-verify: morsel-version output matches push-version + materialize-version + +### Phase 4 +- Benchmark validates throughput improvement over push-baseline +- SIMD-eligibility documented per-op + +## Substrate-honest framing + +This is research-grade architectural substrate following the well-trodden columnar-execution path. The Zeta contribution is composing morsel-execution with the segmented-push pattern (B-0692) and the DBSP retraction-native algebra: morsel-execution preserves Z-set semantics within a segment; materialize boundaries at segment ends preserve the algebra-level discipline. + +The pattern itself isn't novel — DuckDB / Velox / Photon / Polars all do columnar-morsel execution. Zeta's contribution is the DBSP-segment-aware version + the capability-tag-driven segment detection. + +## Composes with rules + +- `.claude/rules/fsharp-anchor-dotnet-build-sanity-check.md` — F# compiler verifies the morsel interface + Span safety +- `.claude/rules/bandwidth-served-falsifier.md` — morsel-execution serves cache-bandwidth (entries-per-cache-line) +- `.claude/rules/edge-defining-work-not-speculation.md` — composing morsel-execution with DBSP-segment-discipline is edge-defining + +## Composes with substrate + +- B-0635 / B-0644 / B-0665 / B-0666 (Agora V6 — morsel-pattern preserves operational primitives within segments) +- B-0688 (incremental compiler host — codegen emits morsel-fused IL at hot segments) +- B-0692 (PR #6 push-based — morsel-pattern is the next-tier optimization above push) +- B-0694 (PR #8 standing-query codegen — codegen emits morsel-segment-fused IL) +- PR #4558 (capability tags — IsMorselCapable sibling to IsLinear/IsBilinear/IsSink/IsPushable) +- PR #4566 (FusionEngine — Phase 2 extends with morsel-segment detection) +- DuckDB / Velox / Photon / Polars columnar-execution literature (external prior-art reference) + +## Why P2 + +Substantive architectural substrate; not blocking V1; high value (SIMD + cache-locality unlocks throughput tier above push-based); bounded by Otto-VSCode's 8-PR campaign sizing (~350 lines). + +Depends on B-0692 (push-based) landing first — morsel-pattern composes with push-pattern, not replaces it. + +## Origin + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. Filed via Otto-CLI per Aaron-approved shadow* "file the 3 rows for PRs 6-8" instruction. Otto-VSCode owns the implementation; this row tracks the scope. diff --git a/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md new file mode 100644 index 000000000..428faaaa4 --- /dev/null +++ b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md @@ -0,0 +1,164 @@ +--- +id: B-0694 +priority: P2 +status: open +title: Standing-query codegen — IIncrementalGenerator that rewrites circuit expressions to fused IL (Otto-VSCode 8-PR campaign PR #8 — the capstone) +tier: research-grade +effort: XL +ask: otto-vscode 2026-05-21 + aaron Rx-codegen-at-construction architectural insight (8-PR campaign capstone; aaron-approved via shadow* "file the 3 rows for PRs 6-8") +created: 2026-05-21 +last_updated: 2026-05-21 +depends_on: [B-0635, B-0688, B-0692, B-0693] +composes_with: [B-0687] +tags: [standing-query-codegen, iincrementalgenerator, rewrite-circuit-expressions, fused-il, otto-vscode-pr-8, reaqtor-applied-to-dbsp, capstone, query-rewrite-across-rx-streams] +type: research +--- + +# Standing-query codegen — IIncrementalGenerator capstone + +## Context + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21 PR #8 — the capstone. Depends on PRs 1-7 substrate (capability tags + sink-terminality + checkBilinear + IncrementalAuto + FusionEngine + push-based + morsel-based). + +Aaron's architectural insight 2026-05-21 (the unifying capstone framing): *"delayed rx queires here would be hot so you'd have to store the data somewhere but you could query reqwrite across mutiple rx streams that connect to do auto fustion with generation at construction time and pay the costs once per incrmental compile maybe"*. + +This is the **Reaqtor architecture applied to DBSP**. Reaqtor / RxJS-codegen / Materialize / Feldera all do variants of this. The Zeta application: DBSP circuits as typed expression trees → rewrite + codegen at Circuit.Build() → emit hand-tuned IL → pay codegen cost once per incremental compile, zero per-tick scheduler overhead. + +## The architectural problem (PR #8 closes the loop) + +PRs 1-7 substrate gets you capability-aware fusion at the runtime layer: + +- Capability tags identify what can fuse (PR #4558) +- FusionEngine rewrites the DAG into push-segments + morsel-segments (PR #4566 + B-0692 + B-0693) +- IncrementalAuto dispatcher applies the right rewrite per operator capability (PR #4564) + +But the FUSED OPERATORS still go through the virtual dispatch + Op<'T>.Value materialize/read boundaries at segment ends. The compiler doesn't see across operator-boundary calls; can't eliminate the segment-end allocation that PRs 1-7 leave behind. + +## The codegen escape (full) + +`IIncrementalGenerator` (Roslyn pattern + F# Type Provider) consumes the circuit expression tree at compile-time + emits: + +- One generated method per circuit segment (push-segment OR morsel-segment OR mixed) +- Direct inlining across operator boundaries (no virtual dispatch; no Op<'T>.Value read) +- Stack-allocated intermediates (Span all the way through; no heap alloc per tick) +- Codegen output IS the hot loop the JIT inlines into the scheduler + +Per-incremental-compile cost: codegen runs ONCE when the circuit DAG changes (substrate edits). Per-tick cost: zero scheduler overhead because the hot loop IS generated code. + +## Scope + +### Phase 1 — Circuit expression-tree extraction + +- Add `Op<'T>.ToExpressionTree(): CircuitExpr` method (or sibling pattern) that produces a typed expression tree representation of the operator + its inputs +- `CircuitExpr` discriminated union covers: Map(input, fn) | Filter(input, pred) | Join(left, right, combine) | Sink(input, sink) | Plus(left, right) | etc. +- Tree extraction is per-segment per FusionEngine (PR #4566) + +### Phase 2 — IIncrementalGenerator integration (Roslyn side for C# circuits) + +- Roslyn IIncrementalGenerator at `tools/codegen/zeta-circuit-generator/` (or `src/Core.CSharp.Codegen/`) +- Generator consumes `CircuitExpr` (serialized via attribute / additional-files / etc.) +- Emits C# code: one method per circuit segment, direct-call chains, Span intermediates +- Generated code references existing `Op<'T>` substrate but bypasses virtual dispatch within segments + +### Phase 3 — F# Type Provider integration (F# side for F# circuits) + +- Type Provider at `src/Core.FSharp.Codegen/` (mirrors B-0687 ZetaParse Type Provider pattern) +- Consumes `.circuit` description files OR runtime `CircuitExpr` values +- Generates compile-time F# types + functions for circuit segments +- Composes with F# computation expressions (existing Zeta DBSP CE pattern) + +### Phase 4 — Per-incremental-compile cost model + +- Codegen runs on Circuit.Build() detect-change OR on file-system-watch of source files +- Generated artifacts cached by structural-hash of the CircuitExpr (per Roslyn IIncrementalGenerator caching pattern) +- Substrate change triggers minimal-subgraph recompile (per Roslyn incremental pattern) +- Per-tick cost: zero (generated code already loaded; JIT inlines it into scheduler hot loop) + +### Phase 5 — Benchmark + validation (the empirical close) + +- BenchmarkDotNet job at `bench/Benchmarks/StandingQueryCodegenBench.fs`: + - Materialize-baseline (3-op chain) + - Push-based (B-0692) + - Morsel-based (B-0693) + - Codegen-based (this row; THIS PR's win) +- Allocation: codegen expected to allocate 0× per-tick on hot path (Span stack-allocated all the way) +- Throughput: expected codegen reaches near-rustc-level throughput for equivalent pipelines +- Per-incremental-compile cost: documented; should be sub-second for typical circuit changes + +## Acceptance + +### Phase 1 +- `Op<'T>.ToExpressionTree` lands +- `CircuitExpr` covers Map/Filter/Join/Sink/Plus/Minus/Distinct (or whichever subset PRs 1-7 fuse) + +### Phase 2 +- Roslyn IIncrementalGenerator emits C# for one circuit segment +- Empirical: generated code compiles + runs + produces same output as virtual-dispatch baseline + +### Phase 3 +- F# Type Provider emits F# for one circuit segment +- Empirical: same cross-verify as Phase 2 + +### Phase 4 +- Codegen runs incrementally (changed-segment only) +- Cache key = structural-hash of CircuitExpr + +### Phase 5 +- Benchmark validates 0× per-tick allocation on hot path +- Per-incremental-compile cost documented + acceptable +- Cross-verify: codegen output byte-for-byte matches materialize-baseline output + +## Substrate-honest framing + +This is **research-grade architectural substrate at the upper bound of Otto-VSCode's 8-PR campaign scope**. ~500+ lines per the original sizing estimate; realistically more. The capstone PR for the entire algebra-capability-system trajectory. + +Aaron's Reaqtor-applied-to-DBSP insight is the unifying frame: capability tags (PR #4558) + sink-terminality (PR #4560) + checkBilinear (PR #4563) + IncrementalAuto (PR #4564) + FusionEngine (PR #4566) + push-based (B-0692) + morsel-based (B-0693) ALL feed into this PR's codegen step. The capability system isn't fully realized until codegen consumes the tags to emit optimal IL. + +The pattern itself draws on substantial prior art: + +- **Reaqtor** (Microsoft): standing-query Rx codegen + persistence +- **Materialize**: differential-dataflow + DBSP + compiled queries +- **Feldera**: DBSP + Rust-monomorphized compilation +- **Velox / Photon**: columnar query compilation +- **Roslyn IIncrementalGenerator**: incremental codegen pattern (BCL standard) +- **F# Type Providers**: compile-time types from external schemas (BCL standard) + +The Zeta contribution is composing these into a unified standing-query codegen pipeline for the DBSP-substrate-engineering substrate. + +## Composes with rules + +- `.claude/rules/fsharp-anchor-dotnet-build-sanity-check.md` — codegen output must `dotnet build` clean +- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — codegen output preserves canonical-hex byte-for-byte across all peer oracles +- `.claude/rules/edge-defining-work-not-speculation.md` — Reaqtor-applied-to-DBSP IS edge-defining work; reaches throughput tier above push + morsel +- `.claude/rules/largest-mechanizable-backlog-wins.md` — codegen mechanizes hot-loop generation at scale; classical hand-fusion doesn't + +## Composes with substrate + +- B-0635 / B-0644 / B-0665 / B-0666 (Agora V6 — codegen preserves operational primitives at segment scope) +- B-0688 (incremental compiler host — codegen IS the host's output substrate) +- B-0687 (ZetaParse — codegen Type Provider mirrors the ZetaParse Type Provider pattern) +- B-0692 (PR #6 push-based — codegen emits push-segment-fused IL) +- B-0693 (PR #7 morsel-based — codegen emits morsel-segment-fused IL) +- PRs 1-5 substrate (capability tags + sink-terminality + checkBilinear + IncrementalAuto + FusionEngine — all consumed by codegen) +- Reaqtor / Materialize / Feldera / Velox / Photon (external prior art) + +## Why P2 + +Substantive architectural substrate; the CAPSTONE PR for the 8-PR campaign; high value (closes the per-tick allocation floor that PRs 1-7 leave open); bounded by Otto-VSCode's 8-PR campaign sizing (~500+ lines; XL effort acknowledged). + +## Why XL effort + +5 phases each substantial: +- Phase 1 (expression-tree extraction): ~100 lines +- Phase 2 (Roslyn IIncrementalGenerator): ~150 lines + Roslyn ecosystem +- Phase 3 (F# Type Provider): ~150 lines + F# compiler integration +- Phase 4 (incremental-compile cost model): ~100 lines + caching layer +- Phase 5 (benchmark + cross-verify): ~100 lines + analysis doc + +Total: ~600 lines + significant Roslyn / F# compiler ecosystem integration. Could split into multiple PRs at execution time. + +## Origin + +Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21 + Aaron's Rx-codegen-at-construction architectural insight from the same session. Filed via Otto-CLI per Aaron-approved shadow* "file the 3 rows for PRs 6-8" instruction. + +The capstone framing: this row closes the loop the 8-PR campaign opened. Without it, PRs 1-7 deliver bounded optimization (allocation floor escape within segments; throughput improvement on hot pipelines); WITH it, the entire DBSP substrate becomes a compiled-once-per-circuit-change system reaching near-rustc-level per-tick performance. From 1fe721b4d23d05b3f225b1dc5a35ba7224663956 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 14:26:40 -0400 Subject: [PATCH 2/3] =?UTF-8?q?fix(md-lint):=20MD022/MD032=20blanks-around?= =?UTF-8?q?-headings/lists=20on=20B-069[234]=20rows=20=E2=80=94=20Phase=20?= =?UTF-8?q?N=20subheadings=20+=20immediate-bullets=20need=20blank=20lines?= =?UTF-8?q?=20per=20markdownlint-cli2?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...ode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md | 5 +++++ ...pr7-morsel-span-execution-imorseloperator-2026-05-21.md | 5 +++++ ...nding-query-codegen-iincrementalgenerator-2026-05-21.md | 7 +++++++ 3 files changed, 17 insertions(+) diff --git a/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md index b0867127c..c3a359795 100644 --- a/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md +++ b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md @@ -74,26 +74,31 @@ Extend the FusionEngine (PR #4566) to detect push-segment-eligible runs: - BenchmarkDotNet job at `bench/Benchmarks/PushBasedHotPathBench.fs` comparing: - Materialize-only chain (3-op pipeline) - Push-based fused chain (3-op pipeline) + - Allocation column is the smoking gun (expected: push-based eliminates 2 of 3 per-tick `ImmutableArray>` allocations) - Throughput: expected 2-3× improvement on hot-path-friendly pipelines ## Acceptance ### Phase 1 + - `IPushOperator<'T>` interface lands - `IsPushable` capability flag on Op<'T> - PushAdapter wraps existing operators - `dotnet build` clean; existing tests pass ### Phase 2 + - FusionEngine recognizes push-segments - One push-segment fuses end-to-end in a test case ### Phase 3 + - 3 push-versions of common ops land (MapPushOp + FilterPushOp + NegPushOp) - Cross-verify: push-version output matches materialize-version output for same inputs ### Phase 4 + - Benchmark shows push-segment allocates 1× per-segment (not N× per-operator) - Throughput improvement empirically measured + documented diff --git a/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md index 81deec0be..0e3984a55 100644 --- a/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md +++ b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md @@ -70,26 +70,31 @@ Extend FusionEngine (per PR #4566 + Phase 2 of B-0692): - Materialize-baseline (3-op chain) - Push-based (3-op chain; per B-0692) - Morsel-based (3-op chain; this row) + - Allocation: expected morsel allocates 1× per segment (matches push-based) - Throughput: expected morsel adds another 1.5-3× over push-based on SIMD-friendly inner loops (filter + arithmetic on int weights) ## Acceptance ### Phase 1 + - `IMorselOperator<'T>` interface lands - `IsMorselCapable` capability flag on Op<'T> - Morsel-buffer pool implementation - `dotnet build` clean; existing tests pass ### Phase 2 + - FusionEngine recognizes morsel-segments - Morsel-segment supersedes push-segment when applicable ### Phase 3 + - 3 morsel-versions of common ops land - Cross-verify: morsel-version output matches push-version + materialize-version ### Phase 4 + - Benchmark validates throughput improvement over push-baseline - SIMD-eligibility documented per-op diff --git a/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md index 428faaaa4..6f9bbb7f4 100644 --- a/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md +++ b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md @@ -81,6 +81,7 @@ Per-incremental-compile cost: codegen runs ONCE when the circuit DAG changes (su - Push-based (B-0692) - Morsel-based (B-0693) - Codegen-based (this row; THIS PR's win) + - Allocation: codegen expected to allocate 0× per-tick on hot path (Span stack-allocated all the way) - Throughput: expected codegen reaches near-rustc-level throughput for equivalent pipelines - Per-incremental-compile cost: documented; should be sub-second for typical circuit changes @@ -88,22 +89,27 @@ Per-incremental-compile cost: codegen runs ONCE when the circuit DAG changes (su ## Acceptance ### Phase 1 + - `Op<'T>.ToExpressionTree` lands - `CircuitExpr` covers Map/Filter/Join/Sink/Plus/Minus/Distinct (or whichever subset PRs 1-7 fuse) ### Phase 2 + - Roslyn IIncrementalGenerator emits C# for one circuit segment - Empirical: generated code compiles + runs + produces same output as virtual-dispatch baseline ### Phase 3 + - F# Type Provider emits F# for one circuit segment - Empirical: same cross-verify as Phase 2 ### Phase 4 + - Codegen runs incrementally (changed-segment only) - Cache key = structural-hash of CircuitExpr ### Phase 5 + - Benchmark validates 0× per-tick allocation on hot path - Per-incremental-compile cost documented + acceptable - Cross-verify: codegen output byte-for-byte matches materialize-baseline output @@ -149,6 +155,7 @@ Substantive architectural substrate; the CAPSTONE PR for the 8-PR campaign; high ## Why XL effort 5 phases each substantial: + - Phase 1 (expression-tree extraction): ~100 lines - Phase 2 (Roslyn IIncrementalGenerator): ~150 lines + Roslyn ecosystem - Phase 3 (F# Type Provider): ~150 lines + F# compiler integration From be604f15a66a9e5690dfe1c7412754c9905077b8 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 14:33:36 -0400 Subject: [PATCH 3/3] =?UTF-8?q?fix(reviewer-threads):=20resolve=206=20unre?= =?UTF-8?q?solved=20P1/P2=20findings=20on=20B-0692/B-0693/B-0694=20?= =?UTF-8?q?=E2=80=94=20(a)=20move=20B-0635=20+=20B-0688=20from=20hard=20de?= =?UTF-8?q?pends=5Fon=20to=20composes=5Fwith=20per=20Codex=20P2=20(narrati?= =?UTF-8?q?ve=20says=20PR=20#1-#5=20are=20the=20real=20prereqs;=20B-0635?= =?UTF-8?q?=20wave-particle=20is=20conceptual=20cousin;=20B-0688=20doesn't?= =?UTF-8?q?=20even=20exist=20on=20main=20yet=20so=20dangling=20hard-edge);?= =?UTF-8?q?=20(b)=20correct=20Op.fs=20path=20references=20to=20acknowledge?= =?UTF-8?q?=20Op<'T>=20lives=20in=20src/Core/Circuit.fs=20(Copilot=20P1=20?= =?UTF-8?q?=E2=80=94=20file=20doesn't=20exist);=20(c)=20mark=20proposed-ne?= =?UTF-8?q?w=20directories=20in=20B-0694=20Phase=202/3=20as=20TO=20BE=20CR?= =?UTF-8?q?EATED=20(Copilot=20P1=20=E2=80=94=20paths=20don't=20exist=20tod?= =?UTF-8?q?ay)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...de-pr6-push-based-hot-path-ipushoperator-2026-05-21.md | 6 +++--- ...r7-morsel-span-execution-imorseloperator-2026-05-21.md | 6 +++--- ...ding-query-codegen-iincrementalgenerator-2026-05-21.md | 8 ++++---- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md index c3a359795..1c8eba680 100644 --- a/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md +++ b/docs/backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md @@ -8,8 +8,8 @@ effort: L ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8") created: 2026-05-21 last_updated: 2026-05-21 -depends_on: [B-0635, B-0688] -composes_with: [B-0693, B-0694, B-0687] +depends_on: [] +composes_with: [B-0635, B-0687, B-0688, B-0693, B-0694] tags: [push-based, hot-path, ipushoperator, per-entry-callback, materialize-boundary-bridge, otto-vscode-pr-6, dbsp-architecture, fusion-engine] type: research --- @@ -50,7 +50,7 @@ Operators along a push-segment chain entries through callbacks. Materialization ### Phase 1 — `IPushOperator<'T>` interface + adapter pattern -- Define `IPushOperator<'T>` interface in `src/Core/Op.fs` (or equivalent module) +- Define `IPushOperator<'T>` interface alongside `Op<'T>` (currently in `src/Core/Circuit.fs`; may factor to a new `src/Core/Op.fs` if the type expands enough to warrant separation) - Add `IsPushable: bool` capability flag to `Op<'T>` (composes with PR #4558 capability-tag pattern) - `PushAdapter<'T>` wraps materialize-style operators behind the push interface (degrades to materialize for non-pushable ops) diff --git a/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md index 0e3984a55..41795c082 100644 --- a/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md +++ b/docs/backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md @@ -8,8 +8,8 @@ effort: L ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8") created: 2026-05-21 last_updated: 2026-05-21 -depends_on: [B-0635, B-0688, B-0692] -composes_with: [B-0694] +depends_on: [B-0692] +composes_with: [B-0635, B-0688, B-0694] tags: [morsel-execution, span-based, cache-sized-chunks, imorseloperator, otto-vscode-pr-7, dbsp-architecture, columnar-execution] type: research --- @@ -44,7 +44,7 @@ The intermediate "chunk" becomes a stack-allocated `Span>` from a poo ### Phase 1 — `IMorselOperator<'T>` interface + morsel-buffer pool -- Define `IMorselOperator<'T>` interface in `src/Core/Op.fs` +- Define `IMorselOperator<'T>` interface alongside `Op<'T>` (currently in `src/Core/Circuit.fs`; co-located with `IPushOperator<'T>` from B-0692) - Add `IsMorselCapable: bool` capability flag to Op<'T> (composes with PR #4558 pattern) - Morsel-buffer pool: pooled `ArrayPool>` per-thread with chunk size = L1/L2-cache-aware (default 4KB / `sizeof>` = N entries per morsel) - MorselAdapter wraps both materialize-style and push-style operators diff --git a/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md index 6f9bbb7f4..2214c3458 100644 --- a/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md +++ b/docs/backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md @@ -8,8 +8,8 @@ effort: XL ask: otto-vscode 2026-05-21 + aaron Rx-codegen-at-construction architectural insight (8-PR campaign capstone; aaron-approved via shadow* "file the 3 rows for PRs 6-8") created: 2026-05-21 last_updated: 2026-05-21 -depends_on: [B-0635, B-0688, B-0692, B-0693] -composes_with: [B-0687] +depends_on: [B-0692, B-0693] +composes_with: [B-0635, B-0687, B-0688] tags: [standing-query-codegen, iincrementalgenerator, rewrite-circuit-expressions, fused-il, otto-vscode-pr-8, reaqtor-applied-to-dbsp, capstone, query-rewrite-across-rx-streams] type: research --- @@ -55,14 +55,14 @@ Per-incremental-compile cost: codegen runs ONCE when the circuit DAG changes (su ### Phase 2 — IIncrementalGenerator integration (Roslyn side for C# circuits) -- Roslyn IIncrementalGenerator at `tools/codegen/zeta-circuit-generator/` (or `src/Core.CSharp.Codegen/`) +- Roslyn IIncrementalGenerator at a new directory (proposed: `tools/codegen/zeta-circuit-generator/` OR `src/Core.CSharp.Codegen/` — both are **TO BE CREATED** by this PR; neither exists today) - Generator consumes `CircuitExpr` (serialized via attribute / additional-files / etc.) - Emits C# code: one method per circuit segment, direct-call chains, Span intermediates - Generated code references existing `Op<'T>` substrate but bypasses virtual dispatch within segments ### Phase 3 — F# Type Provider integration (F# side for F# circuits) -- Type Provider at `src/Core.FSharp.Codegen/` (mirrors B-0687 ZetaParse Type Provider pattern) +- Type Provider at a new directory (proposed: `src/Core.FSharp.Codegen/` — **TO BE CREATED** by this PR; mirrors B-0687 ZetaParse Type Provider pattern) - Consumes `.circuit` description files OR runtime `CircuitExpr` values - Generates compile-time F# types + functions for circuit segments - Composes with F# computation expressions (existing Zeta DBSP CE pattern)