Skip to content

feat(core): PR 5 of 8 — Fusion catalog (MapMap / FilterFilter / MapFilter)#4566

Merged
AceHack merged 2 commits into
mainfrom
feat/fusion-catalog-expansion-2026-05-21
May 21, 2026
Merged

feat(core): PR 5 of 8 — Fusion catalog (MapMap / FilterFilter / MapFilter)#4566
AceHack merged 2 commits into
mainfrom
feat/fusion-catalog-expansion-2026-05-21

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 21, 2026

Summary

Expands the Fusion.fs operator catalog with three new fused single-pass operators. Each declares IsLinear = true (the PR 1 capability tag). Each saves one intermediate ZSet allocation + one scheduler dispatch vs the manual chain.

Operator Pattern When to use
MapMap(s, f, g) map g ∘ map f Two adjacent transformations
FilterFilter(s, p1, p2) filter p₂ ∘ filter p₁ Two adjacent predicates; short-circuits p₁
MapFilter(s, f, p) filter p ∘ map f Predicate depends on mapped value (different from FilterMap)

PR 5 of 8. Depends on PR 1 (#4558) for Op.IsLinear.

What this PR is NOT

Ships the catalog only. Does NOT ship a DAG-rewriter that auto-detects Map(Map(s, f), g) and rewrites it to MapMap(s, f, g) at Circuit.Build() time. That rewriter requires Circuit refactors (immutable Op.Inputs would need to become rewriteable, schedule rebuild after fusion, etc.) and ships in its own PR. The catalog here is the target the rewriter would emit; it's load-bearing for that work.

Tests (10 new, 20/20 fusion tests pass)

For each new operator:

  • Basic correctness (specific inputs → expected outputs)
  • Compositional equivalence: fused output ≡ manual chain output
  • IsLinear = true capability verified

Plus MapMap-specific: colliding output keys consolidate correctly (uses modulo-based fixture where {1,2,3,4} → {0,1} via composition).

Test plan

  • dotnet build clean
  • dotnet test --filter FusionTests — 20/20 pass
  • CI green

AceHack added 2 commits May 21, 2026 13:21
…aces onto Op<'T> base class

PR 1 of an 8-PR campaign that wires the algebra-capability system from
declarative-but-unenforced markers into a load-bearing, uniformly-detected
property surface on every operator (internal + plugin).

## What changes

`Op` base class (Circuit.fs) gains four abstract properties — `IsLinear`,
`IsBilinear`, `IsSink`, `IsStatefulStrict` — each defaulting to `false`.
Concrete operators override only the capabilities they actually have.
Until this change, the algebra tags lived ONLY as plugin marker
interfaces in PluginApi.fs and were ignored by `PluginOperatorAdapter`
(which detected `IStrictOperator`/`IAsyncOperator`/`INestedFixpointParticipant`
but not the algebra markers). That asymmetry meant:

  - Internal operators (MapZSetOp, JoinZSetOp, etc.) had no capability
    surface at all — algebra was implicit-by-code-shape.
  - Plugin operators declared capabilities via marker interfaces but
    `PluginOperatorAdapter` discarded the declarations.
  - Consumers (Incremental.IncrementalJoin, future Fusion/IncrementalAuto)
    had no uniform way to ask "is this operator linear?" without
    custom type tests per call site.

## Non-generic marker pattern

F# generic-interface tests require exact type-parameter match —
`(box plugin) :? IBilinearOperator<obj, obj, 'TOut>` against a concrete
`IBilinearOperator<int, string, decimal>` returns false. The fix is the
BCL `IEnumerable` / `IEnumerable<T>` pattern: a non-generic marker
interface (`ILinearMarker`, `IBilinearMarker`, `ISinkMarker`,
`IStatefulStrictMarker`) for runtime `:?` tests, and the typed interface
inheriting the marker. Plugin authors continue implementing the typed
interface; the marker is satisfied automatically via interface
inheritance.

`PluginOperatorAdapter` now caches one `:?` check per marker at
construction (zero per-tick cost) and surfaces the results through
the new `Op` overrides.

## Internal-operator overrides

| Operator | Capability | Reasoning |
|---|---|---|
| MapZSetOp, FilterZSetOp, FlatMapZSetOp, NegZSetOp | IsLinear=true | Z-set algebra: distributes over addition, op(0)=0 |
| IndexWithOp | IsLinear=true | Indexing distributes over per-key value-group sum |
| JoinZSetOp, CartesianZSetOp, IndexedJoinOp | IsBilinear=true | Weights multiply; per-arg linear; op(0,b)=op(a,0)=0 |
| DelayOp, IntegrateOp, DifferentiateOp | IsLinear=true | Time-shift / running-sum / difference commute with group |
| FilterMapOp, FilterMapOptionalOp | IsLinear=true | Composition of linear ops |
| PlusZSetOp, MinusZSetOp | (default false) | Additive but NOT unary-linear: Plus(0,b)=b≠0 |
| DistinctZSetOp, DistinctIncrementalOp | (default false) | Clamps weights — breaks linearity |
| GroupBySumOp | (default false) | Output keys depend on summed weights, breaks linearity |
| ConstantOp | (default false) | Affine; const_c(0)=c≠0 unless c=0 |

## Tests

21 new tests in `tests/Tests.FSharp/Plugin/Capabilities.Tests.fs`:

  - 15 internal-operator capability tests (one per named op)
  - 5 plugin-marker-detection tests via PluginOperatorAdapter
  - 1 negative test: plain IOperator plugin reports all caps false

All 31 plugin tests pass (10 pre-existing + 21 new); 480 / 481 broader
operator/algebra/circuit tests pass (1 SKIP is pre-existing). Build
clean: 0 warnings, 0 errors on full solution Release build.

## Foundation for PRs 2-8

This is the load-bearing dependency for:

  - PR 2: Circuit.Build() consults IsSink for terminal-placement
    enforcement (the docstring promise that's currently vapor).
  - PR 4: IncrementalAuto dispatcher reads IsLinear/IsBilinear to
    pick Q^Δ=Q vs three-term-bilinear vs D∘Q∘I fallback.
  - PR 5: FusionEngine composes capability tags through DAG rewrite.
  - PRs 6-8: push/morsel/codegen architectures all need uniform
    capability surfacing to dispatch correctly.

No public-API breakage: the marker interfaces still work the same
way for plugin authors; the new Op-base-class properties are
purely additive.
…ter — PR 5 of 8

Adds three new fused operators to `Fusion.fs`, each declaring the
correct `IsLinear` capability tag (from PR 1, #4558). Each fused op
saves one intermediate `ZSet` allocation and one scheduler dispatch
vs the equivalent manual chain.

## New operators

  - `MapMapOp<'A, 'B, 'C>` — `map g ∘ map f` in one pass.
    Function composition is inlined per-entry; output keys may
    collide so sort+consolidate is still required.
    API: `circuit.MapMap(s, f, g)`

  - `FilterFilterOp<'K>` — `filter p₂ ∘ filter p₁` with short-circuit
    on `p₁`. Filter preserves keys + uniqueness so no sort needed.
    API: `circuit.FilterFilter(s, p1, p2)`

  - `MapFilterOp<'A, 'B>` — `filter p ∘ map f` (predicate sees the
    *mapped* value `'B`). Distinct from `FilterMapOp` which is
    `map f ∘ filter p`. Saves intermediate ZSet + the separate
    filter sort pass.
    API: `circuit.MapFilter(s, f, p)`

All three:
  - Override `IsLinear = true` (linear composition of linear ops)
  - Skip the input.IsEmpty fast path correctly
  - Pool-rent + Pool.FreezeSlice for the output buffer
  - Use `ZSetBuilder.sortAndConsolidate` when output keys can collide
    (MapMap + MapFilter); skip the sort when they can't (FilterFilter)

## Tests (10 new in Fusion.Tests.fs, 20/20 total pass)

For each new operator:
  - Basic correctness (specific inputs → expected outputs)
  - Compositional equivalence: fused output == manual chain output
  - IsLinear capability tag verified

Plus MapMap-specific:
  - Colliding output keys consolidate correctly (modulo-based example
    where {1, 2, 3, 4} → {0, 1} via composition)

## What this PR is NOT

This PR ships the *catalog* of fused operators. It does NOT ship a
DAG-rewriter that automatically detects `circuit.Map(circuit.Map(s, f), g)`
and replaces it with `circuit.MapMap(s, f, g)` at `Circuit.Build()`
time. That rewriter would require:

  - Operator-graph mutation (current Circuit has immutable Inputs)
  - Capability composition rules (Linear ∘ Linear = Linear, etc.)
  - Schedule rebuild after fusion

That's an invasive Circuit refactor and ships in its own PR (call it
PR 5.1 / 5.2). The catalog here is the *target* the rewriter would
emit; without the catalog the rewriter would have nowhere to emit
*to*. So this PR is load-bearing for the rewriter's design.

## Dependency

PR 5 depends on PR 1 (#4558) for the `Op.IsLinear` field that the
new operators override. Stacked on `feat/op-capability-tags-2026-05-21`.

## Foundation for later work

  - DAG rewriter (PR 5.1) consumes this catalog
  - PR 6 (push-based) can register push-equivalent variants of these
    fused ops for hot-path sub-circuits
  - PR 8 (codegen) can emit these directly from query expression trees
Copilot AI review requested due to automatic review settings May 21, 2026 17:35
@AceHack AceHack enabled auto-merge (squash) May 21, 2026 17:35
@AceHack AceHack merged commit 8767d30 into main May 21, 2026
34 of 35 checks passed
@AceHack AceHack deleted the feat/fusion-catalog-expansion-2026-05-21 branch May 21, 2026 17:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new fused Z-set operators to the Fusion catalog (MapMap / FilterFilter / MapFilter) and extends the core operator capability-tag surface (IsLinear/IsBilinear/IsSink/IsStatefulStrict) with corresponding tests and operator overrides.

Changes:

  • Add fused single-pass Fusion operators: MapMap, FilterFilter, MapFilter (+ IsLinear = true).
  • Introduce algebra capability tags on Op and wire plugin marker detection through PluginOperatorAdapter.
  • Add/extend tests to validate capability tags and fused-operator equivalence/correctness.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/Tests.FSharp/Tests.FSharp.fsproj Registers new plugin capability test file in the test project compile list.
tests/Tests.FSharp/Plugin/Capabilities.Tests.fs Adds tests for internal-op capability overrides and plugin marker detection via adapter.
tests/Tests.FSharp/Operators/Fusion.Tests.fs Adds Fusion catalog tests for MapMap / FilterFilter / MapFilter equivalence, correctness, and IsLinear.
src/Core/Primitive.fs Marks Delay/Integrate/Differentiate as linear via Op.IsLinear overrides.
src/Core/PluginApi.fs Adds non-generic capability markers and adapter detection; updates capability docs.
src/Core/Operators.fs Adds IsLinear / IsBilinear overrides and explanatory comments to core operators.
src/Core/Fusion.fs Adds new fused operators + extension methods and tags them IsLinear.
src/Core/Circuit.fs Adds capability-tag properties to the Op base class with descriptive comments.
Comments suppressed due to low confidence (1)

src/Core/PluginApi.fs:154

  • This comment claims sink placement is enforced “via the Circuit.Build() validation pass,” but Circuit.Build() currently has no IsSink validation. Please either implement the validation or adjust the documentation so it doesn’t over-promise enforcement that isn’t present.
/// Algebra capability: the operator is a *sink* — terminal,
/// non-Z-set-emitting, potentially retraction-lossy. Sink
/// operators are consciously exempt from relational
/// composition laws and the scheduler enforces terminal
/// placement (a sink may not feed another operator inside a
/// relational path) via the `Circuit.Build()` validation pass.
/// Bayesian aggregates are the canonical example.

Comment thread src/Core/Primitive.fs
Comment on lines +17 to +21
/// Linear: `z⁻¹` is a time-shift; it distributes over addition
/// trivially when `initial = 0` for the group. Callers passing a
/// non-zero initial are responsible for the resulting affine
/// offset — DBSP usage always passes the group zero.
override _.IsLinear = true
Comment thread src/Core/Circuit.fs
Comment on lines +75 to +78
/// retraction-lossy, may emit a non-Z-set output. Sinks are
/// excluded from relational composition: `Circuit.Build()` rejects
/// any operator that reads from a sink's output stream (terminal-
/// placement enforcement). Bayesian aggregates and external-system
Comment thread src/Core/Circuit.fs
Comment on lines +61 to +70
/// input un-accumulates correctly. `IncrementalAuto` uses this to
/// emit `Q^Δ = Q` (linear operators incrementalize trivially).
abstract IsLinear: bool
default _.IsLinear = false

/// Algebra capability: operator is *bilinear* in its two inputs.
/// `op(a₁+a₂, b) = op(a₁, b) + op(a₂, b)` and symmetrically for the
/// second argument; additionally `op(0, b) = op(a, 0) = 0`.
/// `IncrementalAuto` uses this to emit the three-term incremental
/// form `Δa ⋈ Δb + z⁻¹(I(a)) ⋈ Δb + Δa ⋈ z⁻¹(I(b))`.
Comment thread src/Core/PluginApi.fs
Comment on lines 129 to 134
/// Algebra capability: the operator is *linear* — `op(a + b) =
/// op(a) + op(b)` and `op(0) = 0`. Retraction-native: a
/// negative weight un-accumulates correctly. Declared at the
/// type level so the scheduler can run `LinearLaw` at
/// `Circuit.Build()`.
/// `Circuit.Build()` (test-time, via `LawRunner.checkLinear`).
type ILinearOperator<'TIn, 'TOut> =
Comment thread src/Core/PluginApi.fs
/// inputs (e.g. a join). Incrementalisation generates the
/// standard `Δa ⋈ Δb + z^-1(I(a)) ⋈ Δb + Δa ⋈ z^-1(I(b))`
/// form.
/// form. Verified by `LawRunner.checkBilinear` (when available).
Comment on lines +247 to +248
// Two odds (1, 3) collapse to key 0... wait, 1 % 2 = 1 and 3 % 2 = 1, so both → 1.
// Two evens (2, 4) → 0. Each pair sums to weight 2.
Comment thread src/Core/Circuit.fs
Comment on lines +45 to +52
// ─────────────────────────────────────────────────────────────────
// Algebra capability tags. Promoted from plugin-only marker
// interfaces (PluginApi.fs) to first-class fields on the Op base
// class so internal operators and plugin operators declare
// capabilities through the same surface. The scheduler, fusion
// engine, and incremental-rewriter dispatcher all consult these
// fields — they're load-bearing for capability-aware optimization,
// not decorative.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cce1ccbfe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/Core/Fusion.fs
Comment on lines +183 to +184
if p.Invoke mapped then
rented.[n] <- ZEntry(mapped, span.[i].Weight)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve checked-overflow behavior in MapFilter fusion

MapFilterOp filters mapped rows before consolidation, but the documented equivalent chain Filter(Map(...)) consolidates first in ZSet.map and only then filters. Because weight summation is checked, this changes observable behavior: if multiple input keys map to a key that is later filtered out (for example weights Int64.MaxValue and 1L), the manual chain throws OverflowException during map consolidation while the fused path silently drops those rows and succeeds. That makes the fused operator non-equivalent for overflow-sensitive inputs and can hide data-corruption signals.

Useful? React with 👍 / 👎.

Comment thread src/Core/Primitive.fs
/// trivially when `initial = 0` for the group. Callers passing a
/// non-zero initial are responsible for the resulting affine
/// offset — DBSP usage always passes the group zero.
override _.IsLinear = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not mark non-zero DelayOp as linear

DelayOp is marked IsLinear = true for all instances, but Delay(initial) is affine whenever initial is not the algebraic zero (op(0) emits initial on the first tick). This violates the capability contract and can make capability-based rewrites unsound for callers using Circuit.Delay(s, initial) with non-zero initial values.

Useful? React with 👍 / 👎.

AceHack added a commit that referenced this pull request May 21, 2026
…push-based + morsel + codegen capstone (#4568)

* backlog(B-0692+B-0693+B-0694): Otto-VSCode 8-PR campaign PRs 6-7-8 — push-based hot-path (IPushOperator + segment-detection) + morsel/span execution (IMorselOperator + cache-sized chunks) + standing-query codegen (IIncrementalGenerator + F# Type Provider) capstone; Aaron-approved shadow* 'file the 3 rows for PRs 6-8'; depends_on chain to PRs 1-5 substrate (#4558/#4560/#4566 merged + #4563/#4564 pending)

* fix(md-lint): MD022/MD032 blanks-around-headings/lists on B-069[234] rows — Phase N subheadings + immediate-bullets need blank lines per markdownlint-cli2

* fix(reviewer-threads): resolve 6 unresolved P1/P2 findings on B-0692/B-0693/B-0694 — (a) move B-0635 + B-0688 from hard depends_on to composes_with per Codex P2 (narrative says PR #1-#5 are the real prereqs; B-0635 wave-particle is conceptual cousin; B-0688 doesn't even exist on main yet so dangling hard-edge); (b) correct Op.fs path references to acknowledge Op<'T> lives in src/Core/Circuit.fs (Copilot P1 — file doesn't exist); (c) mark proposed-new directories in B-0694 Phase 2/3 as TO BE CREATED (Copilot P1 — paths don't exist today)
AceHack added a commit that referenced this pull request May 21, 2026
…rdination of load-bearing-substrate changes (#4575)

Mechanizes the human-as-coordination-substrate pattern Aaron explicitly
named 2026-05-21 ("i'm here right now" — for now ferrying load-bearing-
substrate-change notifications between AI surfaces; trajectory is bus-
based mechanization).

## The gap this row addresses

When one AI surface lands a load-bearing substrate change — capability
tags on `Op<'T>` (PR #4558), `IncrementalAuto`'s chain-walk logic
(#4567), new files in `.claude/rules/`, new computation expressions —
other AI surfaces working in adjacent substrate need to inherit the
change for their next session. Today: Aaron ferries. Cluster-scale
(10-20 surfaces per Aaron's $100k cluster expansion 2026-05-21):
human-ferry breaks empirically.

## The mechanism

New bus topic `substrate-surface-change` (extends `tools/bus/`):

  - **Publish discipline**: after any PR landing that modifies load-
    bearing surfaces, publishing AI calls `bun tools/bus/publish.ts
    --topic substrate-surface-change --from <sender-id> --payload <json>`.
  - **Subscribe discipline (cold-boot)**: AI bootstreams extend to
    include `bun tools/bus/list.ts --topic substrate-surface-change
    --since 24h` — recent envelopes show "what load-bearing substrate
    changed in the last 24h."
  - **Retention**: 7d default; expired envelopes fall back to auto-
    loaded rules + commit history. The envelope is the *cache* of
    recent changes; the *truth* is the substrate itself.

## What this row does NOT do

- Does NOT replace auto-loaded `.claude/rules/` inheritance (that
  stays the durable substrate)
- Does NOT replace claim-acquire-before-worktree-work (that stays
  the per-row collision prevention)
- Does NOT replace Knights Guild / KSK (that stays the policy gate)

It complements all three by adding the **recent-changes-cache** layer
that closes the "I just shipped X; how do other surfaces find out
before their next session?" gap.

## Composition with broader trajectory

- B-0400 — bus protocol substrate this row extends
- B-0689 — Otto-VSCode SENDER_IDS pattern this row leans on for `from` field
- B-0695 — fast/life-branch experiment; sibling coordination-cost-reduction
- Algebra-campaign PRs (#4558/#4560/#4563/#4566/#4567) — substrate-surface changes that would have benefited from this envelope pattern

## Substrate-honest framing on the file itself

Filed per Aaron's explicit "feel free we can'thave too much backlog in
my opinion the infinate backlog win when labor=0" framing, applying the
`largest-mechanizable-backlog-wins.md` discipline. Recalibrated from
earlier "I won't file unilaterally" reasoning — that was a misapplication
of the row-collision lesson (which was about coordination, not about
backlog overhead).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants