Skip to content

feat(core): PR 3 of 8 — LawRunner.checkBilinear + PluginHarness.runTwoInputs#4563

Merged
AceHack merged 2 commits into
mainfrom
feat/lawrunner-checkbilinear-2026-05-21
May 21, 2026
Merged

feat(core): PR 3 of 8 — LawRunner.checkBilinear + PluginHarness.runTwoInputs#4563
AceHack merged 2 commits into
mainfrom
feat/lawrunner-checkbilinear-2026-05-21

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 21, 2026

Summary

Closes the algebra-tag verification gap on IBilinearOperator. Until this PR, LawRunner had checkLinear and checkRetractionCompleteness but no checkBilinear — meaning the 3-term incremental-join rewrite in Incremental.IncrementalJoin trusted the bilinear tag without any test-time way to verify it.

PR 3 of 8. Independent of PR 1 (#4558) and PR 2 (#4560) — ships off main directly.

New surface

PluginHarness.runTwoInputs

Drives a two-input plugin operator through paired input sequences in lock-step (Seq.zip semantics — shorter sequence wins). Mirrors runSingleInput with two HarnessSourceOp sources + adapter; same exactly-one-Publish discipline.

LawRunner.checkBilinear

Tests three sub-properties per sample:

  • L1op(a₁+a₂, b) ≡ op(a₁, b) + op(a₂, b) (left-linearity)
  • L2op(a, b₁+b₂) ≡ op(a, b₁) + op(a, b₂) (right-linearity)
  • L3op(-a, b) ≡ -op(a, b) (sign-distribution)

Generates four traces (A₁, A₂, B₁, B₂), runs the operator through six combinations (A₁B₁, A₂B₁, ASumB₁, A₁B₂, A₁BSum, ANegB₁), checks the three laws per tick. (seed, sampleIndex)-reproducibility and Result<unit, LawViolation> return shape match checkLinear.

Math note on L3

Over an abelian group with standard addition (the int and ZSet<'K> case), L1 + L2 imply L3 — setting a₁=a, a₂=-a in L1 gives op(0, b) = op(a, b) + op(-a, b), so op(0, b) = 0 collapses to L3. In that regime L3 is the cleanup law; the affine offset that breaks L3 also breaks L1.

L3 becomes load-bearing when the user supplies a non-abelian-group (addOut, negOut) pair (e.g. a monoid where negOut isn't truly inverse). Checking all three keeps checkBilinear correct across the full range of 'TOut algebras.

Tests (6 new, 15/15 LawRunner pass)

  • BilinearMultOp (integer multiplication) — passes
  • LinearOffsetLiar ((a+b)*2) — catches L1
  • AffineBilinearLiar (a*b + 7) — catches L1 (with docstring explaining why L3-only failure doesn't exist over int)
  • Reproducibility on same seed
  • Bad-args validation
  • runTwoInputs lock-step + truncation behavior

Foundation for later PRs

PR 4's IncrementalAuto dispatcher will read Op.IsBilinear to pick the three-term-bilinear rewrite. A debug-mode Circuit.Build() hook can run checkBilinear against any operator claiming the tag, catching tag-vs-implementation drift early.

Test plan

  • dotnet build clean
  • dotnet test --filter LawRunnerTests — 15/15 pass (9 pre-existing + 6 new)
  • CI green

… 3 of 8

Closes the algebra-tag verification gap on `IBilinearOperator`. Until
this PR, `LawRunner` had `checkLinear` and `checkRetractionCompleteness`
but no `checkBilinear` — meaning the 3-term incremental-join rewrite in
`Incremental.IncrementalJoin` trusted the bilinear tag without any
test-time way to verify it.

## New surface

### `PluginHarness.runTwoInputs`

Drives a two-input plugin operator through paired input sequences in
lock-step (`Seq.zip` semantics — shorter sequence wins). Two
`HarnessSourceOp` sources at synthetic ids 0+1, adapter at id 2;
mirrors the `runSingleInput` shape so the per-tick publish-counter
discipline is identical.

### `LawRunner.checkBilinear`

Tests three sub-properties per sample:

  L1 — `op(a₁+a₂, b) ≡ op(a₁, b) + op(a₂, b)`   (left-linearity)
  L2 — `op(a, b₁+b₂) ≡ op(a, b₁) + op(a, b₂)`   (right-linearity)
  L3 — `op(-a, b) ≡ -op(a, b)`                   (sign-distribution)

The check generates four independent traces (A₁, A₂, B₁, B₂), runs the
operator through six combinations (A₁B₁, A₂B₁, ASumB₁, A₁B₂, A₁BSum,
ANegB₁), and checks the three laws per tick. Same `(seed,
sampleIndex)`-reproducibility shape as `checkLinear`; same `Result<unit,
LawViolation>` return type.

## Math note on L3 (corrected from initial sketch)

Over an abelian group with standard addition (the case for `int` and
`ZSet<'K>`), L1 + L2 *imply* L3: setting `a₁ = a, a₂ = -a` in L1 gives
`op(0, b) = op(a, b) + op(-a, b)`, so the classical bilinear condition
`op(0, b) = 0` collapses to L3. In that regime L3 is the cleanup law —
an affine offset like `op(a, b) = a*b + c` breaks L1 first (the
constant lands once on LHS, twice on RHS), so the L3-only failure mode
doesn't exist over `int`.

L3 becomes load-bearing — not redundant — when the user supplies a
non-abelian-group `(addOut, negOut)` pair (e.g. a monoid where
`negOut` isn't truly inverse). Checking all three keeps
`checkBilinear` correct across the full range of `'TOut` algebras a
plugin author might supply, not just `int` / `ZSet<_>`.

## Tests (6 new in LawRunner.Tests.fs, 15/15 total pass)

- `BilinearMultOp` (genuine integer multiplication): passes
- `LinearOffsetLiar` (`op(a,b) = (a+b)*2`): catches L1 violation
- `AffineBilinearLiar` (`op(a,b) = a*b + 7`): catches L1 (the L3
  failure that I'd originally claimed was unique to this fixture
  doesn't exist over `int`; the docstring + test docstring now
  explain the math)
- Reproducibility on same seed
- Bad-args validation (samples and scheduleLength)
- `runTwoInputs` lock-step behavior + truncation to shorter input

15/15 LawRunner tests pass, no regressions in broader plugin tests.
Build clean.

## Independence

PR 3 does NOT depend on PR 1 (#4558) or PR 2 (#4560) — `checkBilinear`
verifies bilinearity by trace-comparison, not by reading `Op.IsBilinear`.
This means PR 3 can ship orthogonally; targets `main` directly.

## Foundation for later PRs

PR 4's `IncrementalAuto` dispatcher will read `Op.IsBilinear` (from PR
1) to pick the three-term-bilinear rewrite. A debug-mode build hook
(planned for PR 4 or later) can run `checkBilinear` against any
operator claiming the tag, catching tag-vs-implementation drift early.
Copilot AI review requested due to automatic review settings May 21, 2026 17:28
@AceHack AceHack enabled auto-merge (squash) May 21, 2026 17:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e410a651b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/Core/PluginHarness.fs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing bilinearity verification support for plugin operators by extending the test harness to drive two-input plugins and introducing a new LawRunner.checkBilinear check, with accompanying unit tests to validate both passing and failing fixtures.

Changes:

  • Introduces PluginHarness.runTwoInputs to run two-input IOperator<'TOut> plugins over paired input sequences (zip/lock-step semantics) with the existing exactly-one-Publish discipline.
  • Adds LawRunner.checkBilinear to validate left-linearity (L1), right-linearity (L2), and sign distribution (L3) across generated traces with deterministic (seed, sampleIndex) reproducibility.
  • Adds new LawRunner/PluginHarness tests covering bilinear success, failure modes, bad-args validation, and runTwoInputs truncation behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/Tests.FSharp/Plugin/LawRunner.Tests.fs Adds bilinearity fixtures/tests and runTwoInputs behavior tests.
src/Core/PluginHarness.fs Adds runTwoInputs helper mirroring runSingleInput behavior for two-input plugins.
src/Core/LawRunner.fs Adds checkBilinear law runner for IBilinearOperator-tag verification.
Comments suppressed due to low confidence (4)

tests/Tests.FSharp/Plugin/LawRunner.Tests.fs:250

  • This docstring says the affine-constant fixture "Satisfies L1 + L2" and only fails L3, but with standard integer addition the constant offset causes L1 to fail first (as noted later in the test). Update the wording here so the fixture description matches the actual algebra (and the expected failure ordering).
/// **L3 liar** — the load-bearing failure case. Satisfies L1 + L2
/// (because the constant offset is independent of inputs), but
/// fails L3:

src/Core/LawRunner.fs:168

  • The rationale for only checking L3 on the first argument is currently not well-supported: "L2 + L3-on-first + abelian-group structure imply L3-on-second" does not follow (sign distribution on the second input is about the input2 additive inverse, which isn't part of the current parameters). Either accept a negIn2 parameter and check sign-distribution on the second argument too, or adjust the comment to avoid claiming an implication that isn't justified by the provided operations.
    /// - `negIn1` — required for the L3 sign-distribution check on
    ///   the first argument. We test L3 only on the first argument
    ///   because L2 + L3-on-first + abelian-group structure imply
    ///   L3-on-second; the asymmetry is intentional, mirroring what
    ///   `IncrementalJoin` actually requires.

src/Core/LawRunner.fs:163

  • Doc inconsistency: the scheduleLength bullet says longer traces exercise stateful bilinear ops, but the implementation later says the law is for "STATELESS bilinearity per tick". Please reconcile these comments so it's clear whether stateful operators are intended to be in-scope for this check, and (if not) avoid implying longer schedules help exercise stateful behavior.
    /// - `scheduleLength` — ticks per trace. Each per-argument trace
    ///   uses one fresh draw per tick, so longer schedules exercise
    ///   stateful bilinear ops (e.g. windowed joins).

src/Core/LawRunner.fs:204

  • This comment says the bilinear law here applies to "STATELESS bilinearity per tick" and that stateful ops need a separate formulation. Given the check compares full output traces from fresh op instances (similar to checkLinear), this reads misleading—stateful stream operators can still be linear/bilinear as functions of their input histories. Please clarify the intended scope and adjust the wording accordingly.
                // Run all the per-arg cases through the same operator
                // factory. Each run gets a fresh op instance — the
                // bilinear law applies to STATELESS bilinearity per
                // tick; stateful bilinear ops need a stronger
                // formulation (a separate law, not in scope here).

Comment thread tests/Tests.FSharp/Plugin/LawRunner.Tests.fs Outdated
Comment thread src/Core/LawRunner.fs Outdated
AceHack added a commit that referenced this pull request May 21, 2026
…ed; transitive coverage via cross-verify; unblocks 4 stale PRs (#4570)

* fix(stryker): retarget from F# Core.fsproj to C# Core.CSharp.ZetaId.csproj — Stryker.NET 4.x F# unsupported (NotSupportedException Language not supported Fsharp); C# mutation coverage transitively constrains F# substrate via cross-verify harness (PRs #4517 + #4522 + #4548 byte-identical hex agreement); F# coverage remains rich via FsCheck + TLA+ + Z3 + Lean + Alloy + CodeQL; unblocks 4 stale PRs (#4563/#4567/#4568/#4569 BLOCKED on pre-existing F# Stryker bug)

* fix(stryker-pr-4570): expand path filter to include src/Core/** (Tests.CSharp.csproj transitive F# reference per Codex P2) + tests/cross-verification/zeta-id/** (CrossVerifyTests reads vectors.yaml per Copilot P1)
AceHack added a commit that referenced this pull request May 21, 2026
…push-based + morsel + codegen capstone (#4568)

* backlog(B-0692+B-0693+B-0694): Otto-VSCode 8-PR campaign PRs 6-7-8 — push-based hot-path (IPushOperator + segment-detection) + morsel/span execution (IMorselOperator + cache-sized chunks) + standing-query codegen (IIncrementalGenerator + F# Type Provider) capstone; Aaron-approved shadow* 'file the 3 rows for PRs 6-8'; depends_on chain to PRs 1-5 substrate (#4558/#4560/#4566 merged + #4563/#4564 pending)

* fix(md-lint): MD022/MD032 blanks-around-headings/lists on B-069[234] rows — Phase N subheadings + immediate-bullets need blank lines per markdownlint-cli2

* fix(reviewer-threads): resolve 6 unresolved P1/P2 findings on B-0692/B-0693/B-0694 — (a) move B-0635 + B-0688 from hard depends_on to composes_with per Codex P2 (narrative says PR #1-#5 are the real prereqs; B-0635 wave-particle is conceptual cousin; B-0688 doesn't even exist on main yet so dangling hard-edge); (b) correct Op.fs path references to acknowledge Op<'T> lives in src/Core/Circuit.fs (Copilot P1 — file doesn't exist); (c) mark proposed-new directories in B-0694 Phase 2/3 as TO BE CREATED (Copilot P1 — paths don't exist today)
@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 21, 2026

Coordination signal from Otto-CLI:

Stryker workflow blocker: I just shipped PR #4571 (iteration-2 retarget). #4570's iteration-1 landed with 0%-kill-rate target (Core.CSharp.ZetaId.csproj had no .NET test projects directly referencing it). #4571 retargets to Core.CSharp.csproj + drops break-threshold to 0 (observational mode; still records signal + uploads HTML). Once #4571 merges, the cached Stryker.NET mutation test (Core.fsproj) failure on this PR becomes irrelevant (the workflow re-runs against the new C# target on the next push to this branch).

Remaining unresolved review threads: these are substantive P1 findings on your LawRunner / IncrementalAuto F# substrate — they're not Stryker-blocked, they're correctness-concerned. Codex flagged PluginHarness.runTwoInputs post-step-hook invocation + IncrementalAuto probe semantics; Copilot flagged docstring claims about monoid/group inverses + test naming mismatches. These need your domain attention, not workflow fixes. Surfacing so the path forward is visible: Stryker bottleneck is on me; substantive threads are yours.

…docstring math corrections

Addresses 3 reviewer findings on PR #4563:

1. **Codex P1** (PluginHarness.fs): `runTwoInputs` (and `runSingleInput`,
   pre-existing same gap) advanced plugin operators with `StepAsync` but
   never called `AfterStepAsync` — so any plugin implementing
   `IStrictOperator` exercised semantics differing from
   `Circuit.Step/StepAsync`, with strict state never committing between
   ticks. `LawRunner.checkBilinear` on strict ops would silently validate
   against incorrect state. **Fix**: mirror the Circuit's post-step
   hook — call `AfterStepAsync` after each tick's `StepAsync` completes.
   Same fix applied to both `runSingleInput` and `runTwoInputs` (pattern
   parity).

2. **Copilot** (LawRunner.fs:158): docstring described the
   `(addOut, negOut)` pair as a "monoid" with non-inverse `negOut` —
   monoids by definition lack inverses, so the example was malformed.
   **Fix**: rephrase to "caller-supplied pair that doesn't actually form
   an abelian group" (negOut might not be a true inverse; operations
   might not be associative/commutative; hidden state might creep in).
   Same point, correct algebra.

3. **Copilot** (LawRunner.Tests.fs:218 + 248): the comment block claimed
   *"The interesting failure case is L3: a plugin can satisfy L1+L2 while
   smuggling an additive offset (op(a,b) = a*b + c)"* — but over the
   abelian-group output type (`int`), any constant offset breaks L1
   first (constant lands once on LHS, twice on RHS). The framing was
   mathematically incorrect. **Fix**: reword both the bilinearity-
   fixtures intro comment AND the `AffineBilinearLiar` fixture docstring
   to reflect that L1 trips first over abelian groups; L3 is the
   load-bearing law only over non-abelian-group output types. Aligns
   with the math note already in `LawRunner.checkBilinear`'s docstring.

## Tests

18/18 LawRunner + Harness tests pass after fixes. Build clean. No
behavioral changes to test outputs — the AfterStepAsync addition is
load-bearing for strict ops but no existing test exercises a strict
bilinear op (BilinearMultOp / LinearOffsetLiar / AffineBilinearLiar
are all non-strict).

## Review-thread resolution

These 3 threads on #4563 close with this commit. Remaining unresolved
state on #4563 (Stryker.NET F# unsupported error) is workflow-side and
addressed by Otto-CLI's PR #4570 + #4571 retarget work.
@AceHack AceHack merged commit 9103622 into main May 21, 2026
31 checks passed
@AceHack AceHack deleted the feat/lawrunner-checkbilinear-2026-05-21 branch May 21, 2026 19:04
AceHack added a commit that referenced this pull request May 21, 2026
…rdination of load-bearing-substrate changes (#4575)

Mechanizes the human-as-coordination-substrate pattern Aaron explicitly
named 2026-05-21 ("i'm here right now" — for now ferrying load-bearing-
substrate-change notifications between AI surfaces; trajectory is bus-
based mechanization).

## The gap this row addresses

When one AI surface lands a load-bearing substrate change — capability
tags on `Op<'T>` (PR #4558), `IncrementalAuto`'s chain-walk logic
(#4567), new files in `.claude/rules/`, new computation expressions —
other AI surfaces working in adjacent substrate need to inherit the
change for their next session. Today: Aaron ferries. Cluster-scale
(10-20 surfaces per Aaron's $100k cluster expansion 2026-05-21):
human-ferry breaks empirically.

## The mechanism

New bus topic `substrate-surface-change` (extends `tools/bus/`):

  - **Publish discipline**: after any PR landing that modifies load-
    bearing surfaces, publishing AI calls `bun tools/bus/publish.ts
    --topic substrate-surface-change --from <sender-id> --payload <json>`.
  - **Subscribe discipline (cold-boot)**: AI bootstreams extend to
    include `bun tools/bus/list.ts --topic substrate-surface-change
    --since 24h` — recent envelopes show "what load-bearing substrate
    changed in the last 24h."
  - **Retention**: 7d default; expired envelopes fall back to auto-
    loaded rules + commit history. The envelope is the *cache* of
    recent changes; the *truth* is the substrate itself.

## What this row does NOT do

- Does NOT replace auto-loaded `.claude/rules/` inheritance (that
  stays the durable substrate)
- Does NOT replace claim-acquire-before-worktree-work (that stays
  the per-row collision prevention)
- Does NOT replace Knights Guild / KSK (that stays the policy gate)

It complements all three by adding the **recent-changes-cache** layer
that closes the "I just shipped X; how do other surfaces find out
before their next session?" gap.

## Composition with broader trajectory

- B-0400 — bus protocol substrate this row extends
- B-0689 — Otto-VSCode SENDER_IDS pattern this row leans on for `from` field
- B-0695 — fast/life-branch experiment; sibling coordination-cost-reduction
- Algebra-campaign PRs (#4558/#4560/#4563/#4566/#4567) — substrate-surface changes that would have benefited from this envelope pattern

## Substrate-honest framing on the file itself

Filed per Aaron's explicit "feel free we can'thave too much backlog in
my opinion the infinate backlog win when labor=0" framing, applying the
`largest-mechanizable-backlog-wins.md` discipline. Recalibrated from
earlier "I won't file unilaterally" reasoning — that was a misapplication
of the row-collision lesson (which was about coordination, not about
backlog overhead).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants