Lucent-Financial-Group · AceHack · May 21, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md
@@ -654,6 +654,9 @@ are closed (status: closed in frontmatter)._
 - [ ] **[B-0685](backlog/P2/B-0685-antlr-grammars-cross-language-codegen-substrate-2026-05-21.md)** ANTLR grammars as cross-language codegen substrate — leverage existing open-source grammars for description-layer-driven multi-language emission
 - [ ] **[B-0687](backlog/P2/B-0687-zetaparse-fsharp-native-lr-glr-grammar-substrate-with-antlr-compatible-importer-amara-2026-05-21.md)** ZetaParse — F#-native LR/GLR grammar substrate with ANTLR-compatible importer
 - [ ] **[B-0688](backlog/P2/B-0688-zeta-incremental-compiler-host-dbsp-zsets-rx-meta-ast-tags-seeded-deterministic-simulation-amara-aaron-2026-05-21.md)** Zeta incremental compiler host — DBSP Z-sets + Rx meta-AST tags + seeded deterministic simulation hardening
+- [ ] **[B-0692](backlog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md)** Push-based hot-path — IPushOperator<'T> + per-entry callback bridged at materialize boundaries (Otto-VSCode 8-PR campaign PR #6)
+- [ ] **[B-0693](backlog/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md)** Morsel/span-based execution — IMorselOperator + cache-sized chunked processing (Otto-VSCode 8-PR campaign PR #7)
+- [ ] **[B-0694](backlog/P2/B-0694-otto-vscode-pr8-standing-query-codegen-iincrementalgenerator-2026-05-21.md)** Standing-query codegen — IIncrementalGenerator that rewrites circuit expressions to fused IL (Otto-VSCode 8-PR campaign PR #8 — the capstone)
 
 ## P3 — convenience / deferred
 

diff --git a/...cklog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md b/...cklog/P2/B-0692-otto-vscode-pr6-push-based-hot-path-ipushoperator-2026-05-21.md
@@ -0,0 +1,136 @@
+---
+id: B-0692
+priority: P2
+status: open
+title: Push-based hot-path — IPushOperator<'T> + per-entry callback bridged at materialize boundaries (Otto-VSCode 8-PR campaign PR #6)
+tier: research-grade
+effort: L
+ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8")
+created: 2026-05-21
+last_updated: 2026-05-21
+depends_on: []
+composes_with: [B-0635, B-0687, B-0688, B-0693, B-0694]
+tags: [push-based, hot-path, ipushoperator, per-entry-callback, materialize-boundary-bridge, otto-vscode-pr-6, dbsp-architecture, fusion-engine]
+type: research
+---
+
+# Push-based hot-path — IPushOperator<'T>
+
+## Context
+
+Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. PRs 1-5 substrate landed:
+
+- **#4558** capability tags on Op<'T> base class + adapter detection via non-generic markers
+- **#4560** sink-terminality validation in Circuit.Build() + producer/sink schedule split
+- **#4563** OPEN — LawRunner.checkBilinear (left/right linearity + sign-distribution)
+- **#4564** OPEN — IncrementalAuto dispatcher using capability tags (close-and-reopen per Otto-CLI substrate-honest preference; supersedes incoming)
+- **#4566** FusionEngine DAG rewriter pass + catalog entries
+
+PR #6 (this row) starts the hot-path optimization layer that depends on PRs 1-5.
+
+## The architectural problem
+
+Current Zeta DBSP operators are **materialize-batch**: each operator's StepAsync writes an `ImmutableArray<ZEntry<'T>>` to `Op<'T>.Value` (volatile field); downstream operators read that materialized snapshot. This is semantically correct but creates a per-tick heap-allocation floor that PGO + JIT inlining cannot eliminate (the volatile field write/read pair is a hard barrier for the compiler).
+
+Per Otto-VSCode's earlier analysis: this allocation floor is THE bottleneck for fusion gains. Manual `FilterMap` fusion only escapes it by collapsing two operators into one + bypassing the intermediate Op<'T>.Value write.
+
+## The push-based escape
+
+`IPushOperator<'T>` is the architectural alternative: instead of materializing per tick, hot-path operators emit entries via per-entry callback to downstream consumers. The materialization boundary moves from per-operator to per-fusion-segment.
+
+```fsharp
+type IPushOperator<'T> =
+    abstract member EmitEntry: ZEntry<'T> -> unit
+    abstract member EndTick: unit -> unit
+```
+
+Operators along a push-segment chain entries through callbacks. Materialization happens only at segment boundaries (where a downstream operator NEEDS the materialized view — e.g., sort/consolidate/join requires the whole tick's worth of entries at once).
+
+## Scope
+
+### Phase 1 — `IPushOperator<'T>` interface + adapter pattern
+
+- Define `IPushOperator<'T>` interface alongside `Op<'T>` (currently in `src/Core/Circuit.fs`; may factor to a new `src/Core/Op.fs` if the type expands enough to warrant separation)
+- Add `IsPushable: bool` capability flag to `Op<'T>` (composes with PR #4558 capability-tag pattern)
+- `PushAdapter<'T>` wraps materialize-style operators behind the push interface (degrades to materialize for non-pushable ops)
+
+### Phase 2 — Push-segment detection in FusionEngine
+
+Extend the FusionEngine (PR #4566) to detect push-segment-eligible runs:
+
+- Sequence of `IsLinear AND IsPushable` operators is a push-segment candidate
+- First materialize-required operator (sort / join / aggregate) is the segment boundary
+- Emit fused push-segment operators that callback-chain the entries
+
+### Phase 3 — Push-versions of common linear ops
+
+- `MapPushOp<'A,'B>` — `EmitEntry e = downstream.EmitEntry (mapFn e)`
+- `FilterPushOp<'T>` — `EmitEntry e = if pred e then downstream.EmitEntry e`
+- `NegPushOp<'T>` — `EmitEntry e = downstream.EmitEntry { e with Weight = -e.Weight }`
+- (Other linear ops as needed; bilinear ops materialize by definition)
+
+### Phase 4 — Benchmark + validation
+
+- BenchmarkDotNet job at `bench/Benchmarks/PushBasedHotPathBench.fs` comparing:
+  - Materialize-only chain (3-op pipeline)
+  - Push-based fused chain (3-op pipeline)
+
+- Allocation column is the smoking gun (expected: push-based eliminates 2 of 3 per-tick `ImmutableArray<ZEntry<'T>>` allocations)
+- Throughput: expected 2-3× improvement on hot-path-friendly pipelines
+
+## Acceptance
+
+### Phase 1
+
+- `IPushOperator<'T>` interface lands
+- `IsPushable` capability flag on Op<'T>
+- PushAdapter wraps existing operators
+- `dotnet build` clean; existing tests pass
+
+### Phase 2
+
+- FusionEngine recognizes push-segments
+- One push-segment fuses end-to-end in a test case
+
+### Phase 3
+
+- 3 push-versions of common ops land (MapPushOp + FilterPushOp + NegPushOp)
+- Cross-verify: push-version output matches materialize-version output for same inputs
+
+### Phase 4
+
+- Benchmark shows push-segment allocates 1× per-segment (not N× per-operator)
+- Throughput improvement empirically measured + documented
+
+## Substrate-honest framing
+
+This is research-grade architectural substrate. ~250 lines per Otto-VSCode's 8-PR campaign sizing. The win is the allocation-floor escape; the cost is the materialize-boundary discipline (operators must declare push-capable; segments end at any materialize-required op).
+
+The push-pattern itself isn't novel — Reactive Extensions (Rx) operates this way; LINQ-to-Objects uses IEnumerator chained callbacks. The Zeta contribution is the SEGMENTED push (push within fusion-segments; materialize at segment boundaries to preserve Z-set algebra semantics) + capability-tag-driven segment detection.
+
+## Composes with rules
+
+- `.claude/rules/fsharp-anchor-dotnet-build-sanity-check.md` — F# compiler verifies the IPushOperator<'T> interface + push-pattern type-safety
+- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — push-segment optimization preserves multi-oracle parity (same canonical hex across operators; performance differs)
+- `.claude/rules/all-complexity-is-accidental-in-greenfield.md` — IPushOperator IS the answer when materialize-batch becomes the bottleneck (proven only by Phase 4 benchmark)
+- `.claude/rules/edge-defining-work-not-speculation.md` — segmented push-based DBSP is edge-defining work
+
+## Composes with substrate
+
+- B-0635 / B-0644 / B-0665 / B-0666 (Agora V6 substrate — push-pattern preserves operational primitives)
+- B-0688 (incremental compiler host — push-pattern composes with codegen at segment boundaries)
+- B-0693 (PR #7 morsel-based execution — push-pattern + morsel-pattern together = full hot-path optimization)
+- B-0694 (PR #8 standing-query codegen — codegen emits push-segment-fused IL)
+- B-0687 (ZetaParse — parser-substrate operators may benefit from push-pattern for streaming parse)
+- PR #4558 (capability tags — IsPushable is sibling to IsLinear/IsBilinear/IsSink)
+- PR #4560 (sink-terminality — sinks are segment-terminators by definition)
+- PR #4566 (FusionEngine — Phase 2 extends it with push-segment detection)
+- `src/Core/Fusion.fs` (existing FilterMap/Choose hand-fusion; push-pattern generalizes the principle)
+
+## Why P2
+
+Substantive architectural substrate; not blocking V1; high value (per-tick allocation floor escape unlocks meaningful throughput gains on hot pipelines); bounded by Otto-VSCode's 8-PR campaign sizing (~250 lines).
+
+## Origin
+
+Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. Filed via Otto-CLI per Aaron-approved shadow* "file the 3 rows for PRs 6-8" instruction. Otto-VSCode owns the implementation; this row tracks the scope.
diff --git a/...g/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md b/...g/P2/B-0693-otto-vscode-pr7-morsel-span-execution-imorseloperator-2026-05-21.md
@@ -0,0 +1,131 @@
+---
+id: B-0693
+priority: P2
+status: open
+title: Morsel/span-based execution — IMorselOperator + cache-sized chunked processing (Otto-VSCode 8-PR campaign PR #7)
+tier: research-grade
+effort: L
+ask: otto-vscode 2026-05-21 (8-PR algebra-capability-system campaign; aaron-approved via shadow* "file the 3 rows for PRs 6-8")
+created: 2026-05-21
+last_updated: 2026-05-21
+depends_on: [B-0692]
+composes_with: [B-0635, B-0688, B-0694]
+tags: [morsel-execution, span-based, cache-sized-chunks, imorseloperator, otto-vscode-pr-7, dbsp-architecture, columnar-execution]
+type: research
+---
+
+# Morsel/span-based execution — IMorselOperator
+
+## Context
+
+Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21 PR #7. Depends on PR #6 (push-based hot-path; tracked at B-0692) — morsel-execution is the next-tier optimization that composes with push-pattern.
+
+## The architectural problem
+
+Even with push-based fusion (per B-0692), per-entry callbacks have function-call overhead. For tight inner loops over large Z-sets, processing entries one-at-a-time leaves cache + SIMD performance on the table. Modern columnar databases (DuckDB, Velox, Photon, Polars) batch process entries in "morsels" — cache-sized chunks (typically 4KB-64KB; matches L1/L2 cache line groups) — which:
+
+- Amortizes function-call overhead across N entries per call
+- Enables SIMD-vectorized predicate / projection / arithmetic
+- Improves cache locality (one chunk in L1 at a time)
+
+## The morsel pattern
+
+`IMorselOperator` processes `ReadOnlySpan<ZEntry<'T>>` chunks instead of individual entries:
+
+```fsharp
+type IMorselOperator<'T> =
+    abstract member ProcessMorsel: ReadOnlySpan<ZEntry<'T>> -> unit
+    abstract member EndTick: unit -> unit
+```
+
+The intermediate "chunk" becomes a stack-allocated `Span<ZEntry<'T>>` from a pooled buffer; the JIT can fuse the chunk processing across method boundaries because the span never escapes to the heap. This is the F#/`.NET` analog of what rustc + LLVM give Rust for iterator chains.
+
+## Scope
+
+### Phase 1 — `IMorselOperator<'T>` interface + morsel-buffer pool
+
+- Define `IMorselOperator<'T>` interface alongside `Op<'T>` (currently in `src/Core/Circuit.fs`; co-located with `IPushOperator<'T>` from B-0692)
+- Add `IsMorselCapable: bool` capability flag to Op<'T> (composes with PR #4558 pattern)
+- Morsel-buffer pool: pooled `ArrayPool<ZEntry<'T>>` per-thread with chunk size = L1/L2-cache-aware (default 4KB / `sizeof<ZEntry<'T>>` = N entries per morsel)
+- MorselAdapter wraps both materialize-style and push-style operators
+
+### Phase 2 — Morsel-segment detection in FusionEngine
+
+Extend FusionEngine (per PR #4566 + Phase 2 of B-0692):
+
+- Sequence of `IsLinear AND IsPushable AND IsMorselCapable` operators is a morsel-segment candidate
+- Morsel-segment supersedes push-segment when ALL operators in chain support morsels
+- Falls back to push-segment if any operator is push-but-not-morsel-capable
+
+### Phase 3 — Morsel-versions of common linear ops
+
+- `MapMorselOp<'A,'B>` — processes full span; emits to output span
+- `FilterMorselOp<'T>` — predicate evaluation across full span; SIMD-eligible
+- `NegMorselOp<'T>` — weight negation across full span; trivially SIMD
+- Sort/consolidate at morsel boundaries (multi-morsel merge happens at segment end)
+
+### Phase 4 — Benchmark + validation
+
+- BenchmarkDotNet job at `bench/Benchmarks/MorselExecutionBench.fs`:
+  - Materialize-baseline (3-op chain)
+  - Push-based (3-op chain; per B-0692)
+  - Morsel-based (3-op chain; this row)
+
+- Allocation: expected morsel allocates 1× per segment (matches push-based)
+- Throughput: expected morsel adds another 1.5-3× over push-based on SIMD-friendly inner loops (filter + arithmetic on int weights)
+
+## Acceptance
+
+### Phase 1
+
+- `IMorselOperator<'T>` interface lands
+- `IsMorselCapable` capability flag on Op<'T>
+- Morsel-buffer pool implementation
+- `dotnet build` clean; existing tests pass
+
+### Phase 2
+
+- FusionEngine recognizes morsel-segments
+- Morsel-segment supersedes push-segment when applicable
+
+### Phase 3
+
+- 3 morsel-versions of common ops land
+- Cross-verify: morsel-version output matches push-version + materialize-version
+
+### Phase 4
+
+- Benchmark validates throughput improvement over push-baseline
+- SIMD-eligibility documented per-op
+
+## Substrate-honest framing
+
+This is research-grade architectural substrate following the well-trodden columnar-execution path. The Zeta contribution is composing morsel-execution with the segmented-push pattern (B-0692) and the DBSP retraction-native algebra: morsel-execution preserves Z-set semantics within a segment; materialize boundaries at segment ends preserve the algebra-level discipline.
+
+The pattern itself isn't novel — DuckDB / Velox / Photon / Polars all do columnar-morsel execution. Zeta's contribution is the DBSP-segment-aware version + the capability-tag-driven segment detection.
+
+## Composes with rules
+
+- `.claude/rules/fsharp-anchor-dotnet-build-sanity-check.md` — F# compiler verifies the morsel interface + Span<T> safety
+- `.claude/rules/bandwidth-served-falsifier.md` — morsel-execution serves cache-bandwidth (entries-per-cache-line)
+- `.claude/rules/edge-defining-work-not-speculation.md` — composing morsel-execution with DBSP-segment-discipline is edge-defining
+
+## Composes with substrate
+
+- B-0635 / B-0644 / B-0665 / B-0666 (Agora V6 — morsel-pattern preserves operational primitives within segments)
+- B-0688 (incremental compiler host — codegen emits morsel-fused IL at hot segments)
+- B-0692 (PR #6 push-based — morsel-pattern is the next-tier optimization above push)
+- B-0694 (PR #8 standing-query codegen — codegen emits morsel-segment-fused IL)
+- PR #4558 (capability tags — IsMorselCapable sibling to IsLinear/IsBilinear/IsSink/IsPushable)
+- PR #4566 (FusionEngine — Phase 2 extends with morsel-segment detection)
+- DuckDB / Velox / Photon / Polars columnar-execution literature (external prior-art reference)
+
+## Why P2
+
+Substantive architectural substrate; not blocking V1; high value (SIMD + cache-locality unlocks throughput tier above push-based); bounded by Otto-VSCode's 8-PR campaign sizing (~350 lines).
+
+Depends on B-0692 (push-based) landing first — morsel-pattern composes with push-pattern, not replaces it.
+
+## Origin
+
+Otto-VSCode 8-PR algebra-capability-system campaign 2026-05-21. Filed via Otto-CLI per Aaron-approved shadow* "file the 3 rows for PRs 6-8" instruction. Otto-VSCode owns the implementation; this row tracks the scope.