From ae6437cea18b4d4bccf09e42aee167c1deb4f528 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 18 Apr 2026 14:51:35 -0400 Subject: [PATCH 1/4] =?UTF-8?q?Round=2028=20=E2=80=94=20stateful=20harness?= =?UTF-8?q?=20design=20+=20lean4=20scaffolding=20cleanup?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Design doc for the FsCheck law runner captures the DST framing (single seeded RNG, no wall-clock, total tick order, seed + schedule printed on failure) and locks the build-vs-test decision: LawRunner lives as a test-time library, not a Circuit.Build() gate, so Core stays free of FsCheck and plugin authors opt in from their test project. Key design call — Option B (trace-based retraction check against the existing marker IStatefulStrictOperator) this round, Option A (Init/Step/Retract triple matching the DBSP paper's σ,λ,ρ shape) as a planned additive promotion in round-29+. Rationale documented in full: Option A needs real async + retraction-contract design work; Option B lands today, teaches us what retraction actually looks like, Option A absorbs the lessons. Option B remains a fallback even after A ships. Long-term payoff of Option A (documented so we don't lose it): - Matches DbspChainRule.lean's (σ,λ,ρ) triple → compositional reasoning across math/proofs/impl - Generic WDC checkpointing without per-op serialisation - Planner fusion of adjacent stateful ops Lean4 scaffolding cleanup — `lake new` leftovers removed: - tools/lean4/README.md (pure GitHub-Pages setup boilerplate) - tools/lean4/Lean4/Basic.lean (hello-world sample) - tools/lean4/.github/workflows/ (upstream Lean CI templates) - tools/lean4/.gitignore (redundant with root .lake/ ignore) tools/lean4/Lean4.lean now imports Lean4.DbspChainRule so `lake build` walks the real proof file; lakefile.toml, lake-manifest.json, and lean-toolchain kept (load-bearing for Mathlib resolution). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/research/stateful-harness-design.md | 278 ++++++++++++++++++ .../.github/workflows/create-release.yml | 22 -- .../.github/workflows/lean_action_ci.yml | 21 -- tools/lean4/.github/workflows/update.yml | 41 --- tools/lean4/.gitignore | 1 - tools/lean4/Lean4.lean | 5 +- tools/lean4/Lean4/Basic.lean | 1 - tools/lean4/README.md | 13 - 8 files changed, 282 insertions(+), 100 deletions(-) create mode 100644 docs/research/stateful-harness-design.md delete mode 100644 tools/lean4/.github/workflows/create-release.yml delete mode 100644 tools/lean4/.github/workflows/lean_action_ci.yml delete mode 100644 tools/lean4/.github/workflows/update.yml delete mode 100644 tools/lean4/.gitignore delete mode 100644 tools/lean4/Lean4/Basic.lean delete mode 100644 tools/lean4/README.md diff --git a/docs/research/stateful-harness-design.md b/docs/research/stateful-harness-design.md new file mode 100644 index 00000000..b053da9a --- /dev/null +++ b/docs/research/stateful-harness-design.md @@ -0,0 +1,278 @@ +# Stateful plugin harness — design draft + +**Round:** 28 +**Status:** draft (design only; no code landed) +**Scope:** design the test-time harness that verifies +`IStatefulStrictOperator<'TIn,'TState,'TOut>` plugins against +the `RetractionCompletenessLaw` under deterministic simulation. + +## Framing: deterministic simulation + +The whole point of a plugin-law harness is **deterministic +simulation testing** in the FoundationDB / TigerBeetle sense: +given a seed and an input schedule, the harness replays +bit-exact and a failing trace is reducible to its seed. That +constrains the design up front: + +1. **No wall-clock time.** Ticks are integers; the plugin + never observes `DateTime.UtcNow` via the harness. +2. **All randomness comes from one seeded `System.Random`** + (or an FsCheck `StdGen`). A plugin that reaches for a new + `Random()` defeats DST and is a bug — we lint for this + but cannot statically prevent it. +3. **Tick order is total.** Within a tick, input is fed, + `StepAsync` runs to completion, output is captured, + `AfterStepAsync` runs (for strict operators), and the + next tick proceeds. No concurrent ticks in the harness. +4. **A failing run produces a replay.** On law violation we + print the seed, the input schedule as a list, and the + tick index at which the law first broke. Copy-paste into + the next test run to reproduce. + +DST is the lens for *all* the law runners, not just the +stateful one. But stateful-strict is where the lens earns +its keep — non-determinism in a stateful op can mask a +retraction bug for thousands of ticks before failing. + +## What `IStatefulStrictOperator` promises + +Quoting `src/Core/PluginApi.fs` for the tag: + +> the operator carries explicit stateful strict semantics — +> init / step / retract triple. Distinct from +> `IStrictOperator` (feedback-cut): stateful-strict ops hold +> per-key or per-instance state that must retract cleanly +> when a negative weight arrives. + +The marker interface today is opaque — a plugin just +inherits `IOperator<'TOut>` under the tag, there is no +exposed `init` / `step` / `retract` triple on the interface +itself. That is a **design gap**: the law cannot fire +against an opaque op. Two options: + +### Option A — enrich the interface + +Add abstract methods on `IStatefulStrictOperator`: + +```fsharp +type IStatefulStrictOperator<'TIn, 'TState, 'TOut> = + inherit IOperator<'TOut> + abstract InitState : unit -> 'TState + abstract Step : state: 'TState * input: 'TIn -> 'TState * 'TOut + abstract Retract : state: 'TState * input: 'TIn -> 'TState +``` + +**Pros:** law runner has direct access to the state +transitions, can check `Retract ∘ Step = id` at every tick, +inspect state for invariants, drive FsCheck against `Step` +and `Retract` directly without going through `StepAsync`. + +**Cons:** second authoring path — plugin authors now write +`Step` and `Retract` *and* wire `StepAsync` to call them. +The two can drift. Unless `StepAsync` has a default +implementation that calls `Step` — which forces `'TState` +into a mutable field on the op, which is exactly the +shared-state pattern Ilyana pushed back on in round-27. + +### Option B — trace the op through `StepAsync` + +Keep the marker-interface-only shape. The harness feeds +inputs tick-by-tick through `PluginHarness.runSingleInput`. +To check retraction completeness, the harness runs *two* +traces and compares: + +1. **Forward trace:** feed a sequence `A = [a₁; a₂; ...; aₙ]` + and record outputs `[y₁; y₂; ...; yₙ]`. +2. **Round-trip trace:** feed `A ++ retract(A)` where + `retract(A)` is the negation of each input as a Z-set + (weights flipped to their negatives). Record outputs + `[y₁; ...; yₙ; y'₁; ...; y'ₙ]`. + +The **law**: at the end of the round-trip, the cumulative +output Z-set must be zero — i.e. summing `y_i` across all +`2n` ticks is the zero Z-set. If any term remains, the op +leaked state through retraction. + +**Pros:** no interface change; works against the existing +marker-only tag; authoring path stays narrow (one +`StepAsync` method). + +**Cons:** only observable output is checked, not internal +state. A plugin that retracts cleanly at the boundary but +leaks memory internally (keeps dead state entries forever) +is not caught. That is a separate property — "bounded +state-growth" — and can be a `Watch`-class law under DST +by running for a fixed number of ticks and asserting state +byte-size stays bounded. Covered by a follow-up law, not +`RetractionCompletenessLaw`. + +### Recommendation: Option B now, Option A as planned round-29+ work + +**Option A is the right long-term direction.** It matches the +DBSP paper's canonical `(σ, λ, ρ)` triple — the same shape +`tools/lean4/Lean4/DbspChainRule.lean` proves against — so +math, Lean proofs, and F# implementation share one shape and +compositional reasoning travels across them. Option A also +unlocks generic WDC checkpointing (pickle `'TState` without +per-op serialisation), planner fusion of adjacent stateful +ops (reuse state buffers, batch step+retract), and sharper +law-failure diagnostics ("state diverged by δ" vs Option B's +"cumulative Z-set wasn't zero"). + +**Option B is the right first step.** Option A needs real +design work on (1) the async path — `Step : σ * 'TIn * ct +-> ValueTask<σ * 'TOut>` duplicates the sync signature — +and (2) a precise retraction contract (is `Retract(Step(s, +a)) = s` always, or only for linear `a`? Partial retraction +semantics?). Prototyping Option A in a vacuum, without +having run Option B against real plugins, ships a contract +we would almost certainly need to revise. Option B lands +the law runner today against the existing marker interface, +teaches us what retraction actually looks like in practice, +and Option A absorbs those lessons. + +**Path forward:** + +1. **This round (28):** Option B trace-based + `checkRetractionCompleteness`. Marker interface stays as + it is. +2. **Round 29+ — Option A promotion.** Additive interface + enrichment: `Init` / `Step` / `Retract` become abstract + methods on `IStatefulStrictOperator`; `StepAsync` + default-implements by calling `Step` so existing plugins + keep compiling. The law runner uses the new hooks when + present, falls back to Option B trace for plugins that + stay marker-only. Same capability-tagging shape the + plugin API already uses. +3. **Round 29+ — generic checkpointing.** Once `'TState` is + in the type signature, WDC can pickle it via FsPickler + (schema-bound) without per-op `Save`/`Load` code. +4. **Round 30+ — planner fusion.** With `Step`/`Retract` + exposed, the planner can fuse adjacent stateful ops + (reuse state buffers, batch step+retract). Out of scope + until we have data on which op pairs dominate hot paths. + +Option B is not throwaway — it remains the fallback for +plugins that genuinely cannot expose `Step`/`Retract` +(black-box ML wrappers, third-party system integrations). + +## API sketch + +```fsharp +namespace Zeta.Core + +module LawRunner = + + /// Deterministic-simulation result. `None` on pass; + /// `Some` carries enough to reproduce. + type LawViolation = + { Seed: int + Inputs: obj list // boxed for diagnostic printing + TickOfFirstBreak: int + Message: string } + + /// Linearity: `op(a + b) = op(a) + op(b)` and `op(0) = 0`. + /// Not stateful — single-tick check per sample. + val checkLinear<'TIn, 'TOut when 'TIn : (static member (+) : 'TIn * 'TIn -> 'TIn) + and 'TOut : (static member (+) : 'TOut * 'TOut -> 'TOut) + and 'TOut : equality> + : seed: int + -> samples: int + -> makeOp: (Stream<'TIn> -> ILinearOperator<'TIn, 'TOut>) + -> genInput: (System.Random -> 'TIn) + -> zeroIn: 'TIn + -> zeroOut: 'TOut + -> Result + + /// Retraction completeness (stateful): forward-then-retract + /// a schedule, assert cumulative output is zero. Option-B + /// trace approach; no interface enrichment required. + val checkRetractionCompleteness<'TIn, 'TState, 'TOut> + : seed: int + -> samples: int + -> scheduleLength: int + -> makeOp: (Stream> -> + IStatefulStrictOperator, 'TState, ZSet<'TOut>>) + -> genInput: (System.Random -> ZSet<'TIn>) + -> Result +``` + +Notes on the signature: + +- **Generators are `System.Random -> 'T`, not FsCheck `Gen<'T>`.** + Keeps `Zeta.Core` free of FsCheck. Plugin authors wire + FsCheck in their test project and hand the law runner a + thin `Random -> 'T` adapter. +- **Seed is explicit.** No hidden clock-seeding. A failing + run prints the seed; the author pastes it back to + reproduce. +- **`makeOp` is the factory, same shape as + `PluginHarness.runSingleInput`.** This means every + stateful operator already unit-testable via the harness + is law-runnable with no rewrite. +- **`scheduleLength` bounds the trace.** DST prefers a + generous upper bound (1000 ticks is cheap) so state + churn has time to expose leaks; small (say 10) bounds + produce false-green runs that make the law useless. + +## Where it does not fire + +**Not at `Circuit.Build()`.** See `docs/PLUGIN-AUTHOR.md` +round-28 rewrite — the law runner is a library consumer +opts-in mechanism, not a gate the scheduler enforces. A +`Circuit.Build()` gate would pay a probabilistic-testing +cost on every construction, slow circuits measurably, and +couple `Zeta.Core` to FsCheck. Plugin authors call +`LawRunner.*` from their test project instead. + +The reason for the choice is **not** that Build-time +verification is wrong in principle — it is that the +cheapest reliable form is a test-time check driven by a +DST harness. A future `DebugCircuit.BuildWithLawChecks` +could exist as a slow-path gate for CI-only smoke runs, but +that is round-29+ work and out of scope here. + +## Open design questions + +1. **Z-set negation helper.** The Option B trace approach + needs `ZSet<'T>` negation (`-w` on every weight). Is + there a canonical `ZSet.negate` already, or does the law + runner synthesize it? (Core grep should answer this + before implementation.) +2. **Schedule shape.** Uniform random Z-sets per tick vs + structured schedules (e.g. "insert 100, retract 50, + insert 100, retract all"). DST orthodoxy favours + structured schedules over pure uniform for shrinking + quality — FsCheck's shrinker does not understand the + semantic of a retraction trace. +3. **Async ops.** Does the stateful harness need to run + `IAsyncOperator`-tagged stateful ops under a controlled + scheduler (cooperative `Task.Yield` interleaving)? DST + orthodoxy says yes — but the first landing should be + sync-only and flag async as a follow-up. +4. **Per-key vs per-instance state.** "Stateful-strict" + currently covers both. The law holds for per-instance + state trivially; per-key state needs the retract-step + pair to fire on the matching key. The current trace- + based law treats the op as opaque, so it works for both + — but means the diagnostic print on a failing per-key + op will be less actionable. A future option-A + enrichment would split these cleanly. + +## Next steps + +- Land `checkLinear` in `src/Core/LawRunner.fs` this round. + Easiest law, cleanest proof-of-concept. One test in + `tests/Tests.FSharp/Plugin/LawRunner.Tests.fs` against a + trivial `id * 2` linear op. +- Land `checkRetractionCompleteness` this round only if + Option B stays clean end-to-end; otherwise ship the + design doc and the Linear check, defer stateful to + round-29. **Explicit acceptance of a partial landing is + better than a rushed stateful-harness implementation + that lies about coverage.** +- Retract the `IStatefulStrictOperator` soft-claim in + `docs/PLUGIN-AUTHOR.md` only when `checkRetractionCompleteness` + is actually live. A partial landing updates the doc to + "Linear law live; stateful-strict law in round-29" + — honest rather than aspirational. diff --git a/tools/lean4/.github/workflows/create-release.yml b/tools/lean4/.github/workflows/create-release.yml deleted file mode 100644 index 6dacf77c..00000000 --- a/tools/lean4/.github/workflows/create-release.yml +++ /dev/null @@ -1,22 +0,0 @@ -name: Create Release - -on: - push: - branches: - - 'main' - - 'master' - paths: - - 'lean-toolchain' - -jobs: - lean-release-tag: - name: Add Lean release tag - runs-on: ubuntu-latest - permissions: - contents: write - steps: - - name: lean-release-tag action - uses: leanprover-community/lean-release-tag@v1 - with: - do-release: true - GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/tools/lean4/.github/workflows/lean_action_ci.yml b/tools/lean4/.github/workflows/lean_action_ci.yml deleted file mode 100644 index db09247d..00000000 --- a/tools/lean4/.github/workflows/lean_action_ci.yml +++ /dev/null @@ -1,21 +0,0 @@ -name: Lean Action CI - -on: - push: - pull_request: - workflow_dispatch: - -# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages -permissions: - contents: read # Read access to repository contents - pages: write # Write access to GitHub Pages - id-token: write # Write access to ID tokens - -jobs: - build: - runs-on: ubuntu-latest - - steps: - - uses: actions/checkout@v5 - - uses: leanprover/lean-action@v1 - - uses: leanprover-community/docgen-action@v1 diff --git a/tools/lean4/.github/workflows/update.yml b/tools/lean4/.github/workflows/update.yml deleted file mode 100644 index 96b7622a..00000000 --- a/tools/lean4/.github/workflows/update.yml +++ /dev/null @@ -1,41 +0,0 @@ -name: Update Dependencies - -on: - # schedule: # Sets a schedule to trigger the workflow - # - cron: "0 8 * * *" # Every day at 08:00 AM UTC (see https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#schedule) - workflow_dispatch: # Allows the workflow to be triggered manually via the GitHub interface - -jobs: - check-for-updates: # Determines which updates to apply. - runs-on: ubuntu-latest - outputs: - is-update-available: ${{ steps.check-for-updates.outputs.is-update-available }} - new-tags: ${{ steps.check-for-updates.outputs.new-tags }} - steps: - - name: Run the action - id: check-for-updates - uses: leanprover-community/mathlib-update-action@v1 - # START CONFIGURATION BLOCK 1 - # END CONFIGURATION BLOCK 1 - do-update: # Runs the upgrade, tests it, and makes a PR/issue/commit. - runs-on: ubuntu-latest - permissions: - contents: write # Grants permission to push changes to the repository - issues: write # Grants permission to create or update issues - pull-requests: write # Grants permission to create or update pull requests - needs: check-for-updates - if: ${{ needs.check-for-updates.outputs.is-update-available == 'true' }} - strategy: # Runs for each update discovered by the `check-for-updates` job. - max-parallel: 1 # Ensures that the PRs/issues are created in order. - matrix: - tag: ${{ fromJSON(needs.check-for-updates.outputs.new-tags) }} - steps: - - name: Run the action - id: update-the-repo - uses: leanprover-community/mathlib-update-action/do-update@v1 - with: - tag: ${{ matrix.tag }} - # START CONFIGURATION BLOCK 2 - on_update_succeeds: pr # Create a pull request if the update succeeds - on_update_fails: issue # Create an issue if the update fails - # END CONFIGURATION BLOCK 2 diff --git a/tools/lean4/.gitignore b/tools/lean4/.gitignore deleted file mode 100644 index bfb30ec8..00000000 --- a/tools/lean4/.gitignore +++ /dev/null @@ -1 +0,0 @@ -/.lake diff --git a/tools/lean4/Lean4.lean b/tools/lean4/Lean4.lean index a1e608b3..9c4f9ba5 100644 --- a/tools/lean4/Lean4.lean +++ b/tools/lean4/Lean4.lean @@ -1 +1,4 @@ -import Lean4.Basic +-- Library root for the `Lean4` lean_lib declared in +-- `lakefile.toml`. Imports the machine-checked proof files so +-- `lake build` walks them transitively. +import Lean4.DbspChainRule diff --git a/tools/lean4/Lean4/Basic.lean b/tools/lean4/Lean4/Basic.lean deleted file mode 100644 index 99415d9d..00000000 --- a/tools/lean4/Lean4/Basic.lean +++ /dev/null @@ -1 +0,0 @@ -def hello := "world" diff --git a/tools/lean4/README.md b/tools/lean4/README.md deleted file mode 100644 index 83c247e0..00000000 --- a/tools/lean4/README.md +++ /dev/null @@ -1,13 +0,0 @@ -# lean4 - -## GitHub configuration - -To set up your new GitHub repository, follow these steps: - -* Under your repository name, click **Settings**. -* In the **Actions** section of the sidebar, click "General". -* Check the box **Allow GitHub Actions to create and approve pull requests**. -* Click the **Pages** section of the settings sidebar. -* In the **Source** dropdown menu, select "GitHub Actions". - -After following the steps above, you can remove this section from the README file. From 07c3c00d22a08b9637ed68242da42cfba6d2b4a8 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 18 Apr 2026 14:56:15 -0400 Subject: [PATCH 2/4] =?UTF-8?q?Round=2028=20=E2=80=94=20LawRunner:=20Linea?= =?UTF-8?q?r=20+=20retraction-completeness=20laws=20live?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First concrete deliverable for the round-28 anchor. `LawRunner` lives in `src/Core/LawRunner.fs` as a test-time library (not a Circuit.Build() gate — see design doc for build-vs-test rationale). Generators are `System.Random -> 'T` so Core stays free of FsCheck; plugin authors wire FsCheck from their test project. Both checks are deterministic-simulation: a failing run prints the seed and sample index so plugin authors reproduce bit-exact. - `checkLinear` — generates trace pairs (A, B) and asserts `op(A + B) = op(A) + op(B)` tick-by-tick. `addIn` / `addOut` / `equalOut` are parameters so it works for ZSet, numerics, any additive carrier. - `checkRetractionCompleteness` — Option B trace-based: forward- runs a random Z-set trace, retracts each tick in the same order, asserts the cumulative output Z-set is empty. Catches operators that leak state through retraction without any interface enrichment (Option A promotion planned for round-29+). Tests in `tests/Tests.FSharp/Plugin/LawRunner.Tests.fs` — 5 tests, 63ms: - genuine linear op passes - falsely-tagged squarer (non-linear) is caught - same seed reproduces bit-exact - clean-retracting ZSet echo passes - positive-only liar is caught `docs/PLUGIN-AUTHOR.md` soft-claim retracted — the doc now documents `LawRunner.checkLinear` / `checkRetractionCompleteness` as live, with `checkBilinear` / `checkSinkTerminal` flagged as round-29+. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PLUGIN-AUTHOR.md | 81 +++++++-- src/Core/Core.fsproj | 1 + src/Core/LawRunner.fs | 143 ++++++++++++++++ tests/Tests.FSharp/Plugin/LawRunner.Tests.fs | 170 +++++++++++++++++++ tests/Tests.FSharp/Tests.FSharp.fsproj | 1 + 5 files changed, 379 insertions(+), 17 deletions(-) create mode 100644 src/Core/LawRunner.fs create mode 100644 tests/Tests.FSharp/Plugin/LawRunner.Tests.fs diff --git a/docs/PLUGIN-AUTHOR.md b/docs/PLUGIN-AUTHOR.md index 34f51762..724b0363 100644 --- a/docs/PLUGIN-AUTHOR.md +++ b/docs/PLUGIN-AUTHOR.md @@ -43,10 +43,13 @@ only when your operator genuinely needs them. Pick the **strongest** algebra tag that honestly describes your operator. Lying (claiming Linear when you are not) is -detected by Zeta's algebra layer; the full law-verification -suite runs at `Circuit.Build()` once the FsCheck generators -for each tag are implemented — see **Known limits of -round-27** at the bottom of this doc for current coverage. +caught by `Zeta.Core.LawRunner` — a deterministic-simulation +law runner that plugin authors call from their test project. +Run `LawRunner.checkLinear` against an `ILinearOperator` and +`LawRunner.checkRetractionCompleteness` against an +`IStatefulStrictOperator`; both take a seed so failures +reproduce bit-exact. See **Verifying your plugin's +algebra tag** below. ## The canonical example @@ -131,6 +134,51 @@ let outputs = Use it in unit tests. `Tests.FSharp/Plugin/*.Tests.fs` has more examples. +## Verifying your plugin's algebra tag + +`Zeta.Core.LawRunner` is the test-time harness that verifies +your algebra tag is honest. It runs under deterministic +simulation: given a seed and a schedule length, a failing +run prints enough to reproduce bit-exact. + +```fsharp +open Zeta.Core + +let result = + LawRunner.checkLinear + 42 // seed — reproduces the exact trace on failure + 20 // samples (a, b) trace pairs + 8 // ticks per trace + (fun input -> MyLinearOp input :> IOperator) + (fun rng -> rng.Next(-100, 101)) // genInput + (+) // addIn + (+) // addOut + (=) // equalOut + +match result with +| Ok () -> () +| Error v -> failwithf "Linearity broke: %A" v +``` + +`checkRetractionCompleteness` works against +`IStatefulStrictOperator, _, ZSet<'TOut>>`: +it forward-runs a random Z-set trace, retracts each tick +in the same order, and asserts the cumulative output Z-set +is empty. Any residual means the operator leaked state +through retraction. + +```fsharp +let result = + LawRunner.checkRetractionCompleteness + 7 15 6 + (fun input -> MyStatefulOp input :> IOperator>) + (fun rng -> ZSet.ofSeq [ rng.Next(0,5), int64 (rng.Next(-3,4)) ]) +``` + +Bilinear and sink-terminal law runners are round-29+ work — +see `docs/research/stateful-harness-design.md` for the +sequenced plan. + ## What you cannot do - **You cannot mutate another operator's output.** The @@ -150,20 +198,19 @@ more examples. cleanly, declare `ISinkOperator` — which tells Zeta to keep you at terminal edges only. -## Known limits of round-27 +## Known limits Safe-to-ship caveats — each is tracked in `docs/DEBT.md` with a target round. -- **FsCheck algebra-law verification is not yet wired at - `Circuit.Build()`.** `ILinearOperator` / `IBilinearOperator` - / `IStatefulStrictOperator` / `ISinkOperator` are marker - interfaces today; the scheduler does not yet run the - `LinearLaw`, `BilinearLaw`, `RetractionCompletenessLaw`, - or `SinkTerminalLaw` generators against tagged ops. A - wrongly-tagged op compiles and runs; the runtime does not - catch it. **Fine for internal plugins; hold off publishing - to NuGet until law coverage lands.** +- **Law coverage.** `LawRunner.checkLinear` and + `LawRunner.checkRetractionCompleteness` (Option B, trace- + based) are live. `checkBilinear` and `checkSinkTerminal` + are round-29+ work. An Option-A promotion of + `IStatefulStrictOperator` to an explicit `Init` / `Step` / + `Retract` triple (matching the DBSP paper's `(σ, λ, ρ)` + shape) is also scheduled — see + `docs/research/stateful-harness-design.md`. - **`PluginHarness.runSingleInput` asserts exactly-one- `Publish`-per-tick**, but the production `Circuit` path currently does not. An op that forgets to call `Publish` @@ -174,9 +221,9 @@ with a target round. undefined behaviour.** Don't stash the buffer in a field or pass it to another thread to call `Publish` later. A tick-stamped guard is tracked as DEBT. -- **`dotnet new zeta-plugin` scaffolding template** is a - round-28+ deliverable. Until it exists, start from the - Bayesian example in `src/Bayesian/BayesianAggregate.fs`. +- **`dotnet new zeta-plugin` scaffolding template** is + still pending. Until it exists, start from the Bayesian + example in `src/Bayesian/BayesianAggregate.fs`. - **Multi-input operators** beyond one-input and two-input shapes need a custom `IOperator<'TOut>` with a longer `ReadDependencies` array; there's no generic N-input diff --git a/src/Core/Core.fsproj b/src/Core/Core.fsproj index a70b0e9d..8c0c5761 100644 --- a/src/Core/Core.fsproj +++ b/src/Core/Core.fsproj @@ -19,6 +19,7 @@ + diff --git a/src/Core/LawRunner.fs b/src/Core/LawRunner.fs new file mode 100644 index 00000000..3b57fdf5 --- /dev/null +++ b/src/Core/LawRunner.fs @@ -0,0 +1,143 @@ +namespace Zeta.Core + + +/// Deterministic-simulation law runner for plugin operators. +/// Design and rationale live in `docs/research/stateful-harness-design.md`. +/// +/// Plugin authors call these checks from their test project — +/// LawRunner is a test-time library, not a `Circuit.Build()` gate. +/// That keeps `Zeta.Core` free of FsCheck; generators are plain +/// `System.Random -> 'T` so authors wire FsCheck (or any other +/// generator) at their end. +/// +/// All checks are **deterministic** given the same seed. A failing +/// run reports the seed and the sample index; re-running with the +/// same seed and `samples >= sampleIndex + 1` reproduces bit-exact. +[] +module LawRunner = + + /// Reported when a law fails. `Seed` + `SampleIndex` are + /// enough to reproduce; `Message` describes the specific + /// failure mode. + type LawViolation = + { Seed: int + SampleIndex: int + Message: string } + + /// Drive a factory through one fresh op instance per sample + /// so state cannot leak across samples. Each trace runs on + /// its own `PluginHarness.runSingleInput` invocation which + /// also asserts exactly-one-`Publish`-per-tick. + let private runTrace<'TIn, 'TOut> + (makeOp: Stream<'TIn> -> IOperator<'TOut>) + (inputs: 'TIn list) + : 'TOut list = + PluginHarness.runSingleInput makeOp inputs + + let private generateTrace<'TIn> + (rng: System.Random) + (length: int) + (genInput: System.Random -> 'TIn) + : 'TIn list = + [ for _ in 1 .. length -> genInput rng ] + + /// Linearity: for every trace pair `(A, B)`, the output + /// trace on `A + B` (elementwise) must equal the elementwise + /// sum of the output traces on `A` and `B`. + /// + /// - `samples` — number of (A, B) pairs to test. + /// - `scheduleLength` — ticks per trace. DST prefers a + /// generous bound (tens to low hundreds) so stateful + /// linear ops (e.g. integration) have time to expose + /// drift. + /// - `addIn` / `addOut` / `equalOut` — user-supplied so + /// this works for both `ZSet<'T>` (via `ZSet.add`) and + /// plain numeric types. Equality is explicit for the same + /// reason — `ZSet<'T>` structural equality is not always + /// cheap, so the caller chooses. + let checkLinear<'TIn, 'TOut> + (seed: int) + (samples: int) + (scheduleLength: int) + (makeOp: Stream<'TIn> -> IOperator<'TOut>) + (genInput: System.Random -> 'TIn) + (addIn: 'TIn -> 'TIn -> 'TIn) + (addOut: 'TOut -> 'TOut -> 'TOut) + (equalOut: 'TOut -> 'TOut -> bool) + : Result = + if samples < 1 then invalidArg "samples" "must be >= 1" + if scheduleLength < 1 then invalidArg "scheduleLength" "must be >= 1" + let rng = System.Random(seed) + let mutable failure : LawViolation option = None + let mutable i = 0 + while failure.IsNone && i < samples do + let traceA = generateTrace rng scheduleLength genInput + let traceB = generateTrace rng scheduleLength genInput + let traceSum = List.map2 addIn traceA traceB + let outA = runTrace makeOp traceA + let outB = runTrace makeOp traceB + let outSum = runTrace makeOp traceSum + // Compare elementwise: op(a_t + b_t) == op(a_t) + op(b_t) + // at every tick t. Short-circuit on first divergence so + // the reported message points at the offending tick. + let mutable tick = 0 + let mutable broke = false + while not broke && tick < scheduleLength do + let lhs = outSum.[tick] + let rhs = addOut outA.[tick] outB.[tick] + if not (equalOut lhs rhs) then + failure <- + Some { Seed = seed + SampleIndex = i + Message = + sprintf + "Linearity broke at sample %d, tick %d: \ + op(a+b) ≠ op(a) + op(b)." + i tick } + broke <- true + tick <- tick + 1 + i <- i + 1 + match failure with + | Some v -> Error v + | None -> Ok () + + /// Retraction completeness (Option B, trace-based). + /// Forward-run a generated Z-set trace, then retract each + /// tick's input in the same order. After the round-trip the + /// cumulative output Z-set must be empty — any residual + /// means the operator leaked state through retraction. + /// + /// Works against the existing marker `IStatefulStrictOperator` + /// without interface enrichment (see the design doc for the + /// Option A follow-up path). + let checkRetractionCompleteness<'TIn, 'TOut when 'TIn : comparison and 'TOut : comparison> + (seed: int) + (samples: int) + (scheduleLength: int) + (makeOp: Stream> -> IOperator>) + (genInput: System.Random -> ZSet<'TIn>) + : Result = + if samples < 1 then invalidArg "samples" "must be >= 1" + if scheduleLength < 1 then invalidArg "scheduleLength" "must be >= 1" + let rng = System.Random(seed) + let mutable failure : LawViolation option = None + let mutable i = 0 + while failure.IsNone && i < samples do + let forward = generateTrace rng scheduleLength genInput + let retract = forward |> List.map ZSet.neg + let outputs = runTrace makeOp (forward @ retract) + let cumulative = outputs |> List.fold ZSet.add ZSet.empty + if not cumulative.IsEmpty then + failure <- + Some { Seed = seed + SampleIndex = i + Message = + sprintf + "Retraction incomplete at sample %d: after \ + forward+retract round-trip of %d ticks, \ + cumulative output has %d residual entries." + i scheduleLength cumulative.Count } + i <- i + 1 + match failure with + | Some v -> Error v + | None -> Ok () diff --git a/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs b/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs new file mode 100644 index 00000000..4454de58 --- /dev/null +++ b/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs @@ -0,0 +1,170 @@ +module Zeta.Tests.Plugin.LawRunnerTests + +open System.Threading.Tasks +open Xunit +open FsUnit.Xunit +open Zeta.Core + + +// ──────────────────────────────────────────────────────────────── +// Linearity fixtures +// ──────────────────────────────────────────────────────────────── + +/// Linear plugin — doubling is linear over integers. +type private DoublerOp(input: Stream) = + let deps = [| input.AsDependency() |] + interface ILinearOperator with + member _.Name = "doubler" + member _.ReadDependencies = deps + member _.StepAsync(out, _ct) = + out.Publish (input.Current * 2) + ValueTask.CompletedTask + + +/// **Not** linear — squaring breaks `op(a+b) = op(a)+op(b)`. +/// Falsely tagged `ILinearOperator` so the law runner catches +/// the lie. +type private SquarerOp(input: Stream) = + let deps = [| input.AsDependency() |] + interface ILinearOperator with + member _.Name = "squarer-liar" + member _.ReadDependencies = deps + member _.StepAsync(out, _ct) = + let v = input.Current + out.Publish (v * v) + ValueTask.CompletedTask + + +// ──────────────────────────────────────────────────────────────── +// Retraction-completeness fixtures (ZSet → ZSet) +// ──────────────────────────────────────────────────────────────── + +/// Clean retraction — pure passthrough. Each tick echoes the +/// input Z-set; forward+retract round-trip sums to empty. +type private ZSetEchoOp(input: Stream>) = + let deps = [| input.AsDependency() |] + interface IStatefulStrictOperator, unit, ZSet> with + member _.Name = "zset-echo" + member _.ReadDependencies = deps + member _.StepAsync(out, _ct) = + out.Publish input.Current + ValueTask.CompletedTask + + +/// Retraction-lossy — drops negative-weight entries. Under a +/// forward+retract round-trip the retracted entries never make +/// it through, so cumulative output ≠ empty. Falsely tagged +/// `IStatefulStrictOperator` so the law runner catches the lie. +type private PositiveOnlyOp(input: Stream>) = + let deps = [| input.AsDependency() |] + interface IStatefulStrictOperator, unit, ZSet> with + member _.Name = "positive-only-liar" + member _.ReadDependencies = deps + member _.StepAsync(out, _ct) = + let positives = + input.Current.AsSpan().ToArray() + |> Array.filter (fun e -> e.Weight > 0L) + |> Array.map (fun e -> e.Key, e.Weight) + out.Publish (ZSet.ofSeq positives) + ValueTask.CompletedTask + + +// ──────────────────────────────────────────────────────────────── +// Generators +// ──────────────────────────────────────────────────────────────── + +let private genInt (rng: System.Random) : int = rng.Next(-100, 101) + +let private genZSet (rng: System.Random) : ZSet = + let n = rng.Next(0, 4) + [ for _ in 1 .. n -> + let k = rng.Next(0, 5) + // Weight in [-3, 3] \ {0} — both polarities appear. + let mutable w = rng.Next(-3, 4) + if w = 0 then w <- 1 + (k, int64 w) ] + |> ZSet.ofSeq + + +// ──────────────────────────────────────────────────────────────── +// Linearity +// ──────────────────────────────────────────────────────────────── + +[] +let ``checkLinear passes on a genuine linear op`` () = + let result = + LawRunner.checkLinear + 42 // seed + 20 // samples + 8 // scheduleLength + (fun s -> DoublerOp s :> IOperator) + genInt + (+) + (+) + (=) + match result with + | Ok () -> () + | Error v -> Assert.Fail (sprintf "expected Ok, got %A" v) + + +[] +let ``checkLinear catches a falsely-tagged non-linear op`` () = + let result = + LawRunner.checkLinear + 42 + 20 + 8 + (fun s -> SquarerOp s :> IOperator) + genInt + (+) + (+) + (=) + match result with + | Ok () -> Assert.Fail "expected linearity violation" + | Error v -> + v.Seed |> should equal 42 + v.Message |> should haveSubstring "Linearity broke" + + +[] +let ``checkLinear reproduces bit-exact on the same seed`` () = + let run () = + LawRunner.checkLinear + 99 10 5 + (fun s -> SquarerOp s :> IOperator) + genInt (+) (+) (=) + let first = run () + let second = run () + first |> should equal second + + +// ──────────────────────────────────────────────────────────────── +// Retraction completeness +// ──────────────────────────────────────────────────────────────── + +[] +let ``checkRetractionCompleteness passes on a clean-retracting op`` () = + let result = + LawRunner.checkRetractionCompleteness + 7 // seed + 15 // samples + 6 // scheduleLength + (fun s -> ZSetEchoOp s :> IOperator>) + genZSet + match result with + | Ok () -> () + | Error v -> Assert.Fail (sprintf "expected Ok, got %A" v) + + +[] +let ``checkRetractionCompleteness catches a retraction-lossy op`` () = + let result = + LawRunner.checkRetractionCompleteness + 7 15 6 + (fun s -> PositiveOnlyOp s :> IOperator>) + genZSet + match result with + | Ok () -> Assert.Fail "expected retraction-completeness violation" + | Error v -> + v.Seed |> should equal 7 + v.Message |> should haveSubstring "Retraction incomplete" diff --git a/tests/Tests.FSharp/Tests.FSharp.fsproj b/tests/Tests.FSharp/Tests.FSharp.fsproj index f00f938a..932738ef 100644 --- a/tests/Tests.FSharp/Tests.FSharp.fsproj +++ b/tests/Tests.FSharp/Tests.FSharp.fsproj @@ -83,6 +83,7 @@ + From 37e197125dd93cca34fcaf3074f2282800fa2aa2 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 18 Apr 2026 15:06:15 -0400 Subject: [PATCH 3/4] =?UTF-8?q?LawRunner=20=E2=80=94=20reviewer=20P0s=20+?= =?UTF-8?q?=20high-value=20P1s=20+=20polish?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Kira (harsh-critic) + Rune (maintainability-reviewer) floor per GOVERNANCE.md §20. P0s from Kira's pass were load-bearing; fixing them in this commit before the round closes. P0 fixes: - Per-sample `System.Random(seed + i)` so `(seed, sampleIndex)` genuinely reproduces bit-exact, independent of whether earlier samples failed fast. Previous whole-loop RNG made the reproducibility claim false. - Retraction-completeness law rewritten to state-restoration via continuation: forward + retract + continuation, compare continuation outputs against a fresh-op run of the continuation alone. The old "cumulative output = 0" formulation passed trivially for empty-emitting ops and a floored-counter could leak state while keeping cumulative zero; the new law catches both. Test fixture replaced accordingly — `FlooredCounterOp` (genuinely stateful and retraction-lossy) supersedes the `PositiveOnlyOp` filter (which is non-linear, not a retraction fixture; properly belongs under `checkLinear`). - `invalidArg` → `Error` on bad args — every public entry now returns `Result`, matching CLAUDE.md's result-over-exception rule. P1 fixes worth doing now: - `List.toArray` before the tick loop in `checkLinear` — kills the O(scheduleLength²) `List.item` indexing Kira flagged. - Extracted `runSamples` helper so the two law implementations share one loop shape — removes paste-and-modify risk before round-29 adds a third law. - Rune polish: cross-ref added from `IOperator<'TOut>` doc to `LawRunner` for discoverability; `≠` → `!=` in error strings so logs survive every terminal; WHY comments on non-obvious lines (`if w = 0 then w <- 1`); module-level note that `'TState = unit` is fine because trace-based Option B never inspects state. Deferred to DEBT (docs/DEBT.md): promote `check*` to a config record before `checkBilinear` lands; structured `LawViolation.Reason` DU; test where op omits the marker tag. Tests: 7/7 green in 61ms (added "bit-exact reproduces" for retraction + "Error on bad samples arg" coverage). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/DEBT.md | 21 ++ src/Core/LawRunner.fs | 208 +++++++++++-------- src/Core/PluginApi.fs | 5 +- tests/Tests.FSharp/Plugin/LawRunner.Tests.fs | 111 ++++++---- 4 files changed, 224 insertions(+), 121 deletions(-) diff --git a/docs/DEBT.md b/docs/DEBT.md index 030afe0e..68ab3645 100644 --- a/docs/DEBT.md +++ b/docs/DEBT.md @@ -37,6 +37,27 @@ feature + debt budget). ## Live debt +### `LawRunner.check*` takes 8-11 positional args — promote to config record +- **Site:** `src/Core/LawRunner.fs` +- **Found:** round 28 by Rune (maintainability-reviewer) +- **Effort:** S +- **Friction:** call sites stack integer literals with trailing `// seed` / `// samples` comments; the next law (`checkBilinear`) will need 10+ positionals and multiply the drift surface. +- **Fix:** introduce `LinearityConfig<'TIn,'TOut>` / `RetractionConfig<'TIn,'TOut>` records and take one argument. Do it before `checkBilinear` lands. + +### `LawViolation.Message` is a string — promote to a structured DU +- **Site:** `src/Core/LawRunner.fs` +- **Found:** round 28 by Kira (harsh-critic) +- **Effort:** S +- **Friction:** tests string-grep on `"Linearity broke"` / `"Retraction incomplete"`; downstream tooling can't pattern-match on cause. +- **Fix:** `type Reason = LinearityBreak of tick:int | RetractionResidual of count:int | BadArgs of string`; keep `Message` as a rendering helper. + +### `LawRunner` has no test covering operators that omit the marker tag +- **Site:** `tests/Tests.FSharp/Plugin/LawRunner.Tests.fs` +- **Found:** round 28 by Kira (harsh-critic) +- **Effort:** S +- **Friction:** law runner claims to verify the tag; no test proves the runner doesn't secretly rely on the tag to compile-time dispatch. +- **Fix:** add a fixture that implements `IOperator<_>` directly (no `ILinearOperator` tag) and confirm `checkLinear` runs against it. + ### `Op<'T>` implicitly publicised as a plugin subclass-extension point - **Site:** `src/Core/Circuit.fs` (`Op`, `Op<'T>`) and every subtype under `src/Core/Operators.fs` diff --git a/src/Core/LawRunner.fs b/src/Core/LawRunner.fs index 3b57fdf5..b5ab097f 100644 --- a/src/Core/LawRunner.fs +++ b/src/Core/LawRunner.fs @@ -10,29 +10,42 @@ namespace Zeta.Core /// `System.Random -> 'T` so authors wire FsCheck (or any other /// generator) at their end. /// -/// All checks are **deterministic** given the same seed. A failing -/// run reports the seed and the sample index; re-running with the -/// same seed and `samples >= sampleIndex + 1` reproduces bit-exact. +/// All checks are **deterministic** given the same seed. Each +/// sample uses its own `System.Random(seed + sampleIndex)` so a +/// failing `(seed, sampleIndex)` pair reproduces bit-exact on +/// re-run with any `samples >= sampleIndex + 1` — sample N does +/// not depend on whether earlier samples failed fast. +/// +/// `checkRetractionCompleteness` uses the **state-restoration +/// via continuation** formulation: feed a forward trace, then +/// its retraction, then a continuation; compare continuation +/// outputs to a fresh-op run of the continuation alone. Any +/// divergence means the op's internal state survived what was +/// supposed to be a full cancel. This catches stateful ops that +/// mistag themselves as retraction-clean (Option B per the +/// design doc; Option A enrichment — `Init`/`Step`/`Retract` +/// triple — is round-29+ work). +/// +/// `'TState` on `IStatefulStrictOperator` is unused by the +/// trace-based law — the check runs through `StepAsync` and +/// never inspects state. Tests pass `unit` for this parameter. [] module LawRunner = - /// Reported when a law fails. `Seed` + `SampleIndex` are - /// enough to reproduce; `Message` describes the specific - /// failure mode. + /// Reported when a law fails or a check cannot run. + /// `Seed` + `SampleIndex` are enough to reproduce via a + /// fresh `checkX` call on the same seed; `Message` describes + /// the specific failure mode. type LawViolation = { Seed: int SampleIndex: int Message: string } - /// Drive a factory through one fresh op instance per sample - /// so state cannot leak across samples. Each trace runs on - /// its own `PluginHarness.runSingleInput` invocation which - /// also asserts exactly-one-`Publish`-per-tick. - let private runTrace<'TIn, 'TOut> - (makeOp: Stream<'TIn> -> IOperator<'TOut>) - (inputs: 'TIn list) - : 'TOut list = - PluginHarness.runSingleInput makeOp inputs + /// Bad arguments surface as `Error` rather than exceptions + /// so every public entry returns a `Result` — CLAUDE.md's + /// result-over-exception rule. + let private badArgs (seed: int) (message: string) : LawViolation = + { Seed = seed; SampleIndex = -1; Message = message } let private generateTrace<'TIn> (rng: System.Random) @@ -41,6 +54,29 @@ module LawRunner = : 'TIn list = [ for _ in 1 .. length -> genInput rng ] + /// Shared sample-loop shape used by every law. Each sample + /// gets a fresh `System.Random(seed + i)` so reproducibility + /// is per-sample, not whole-loop. `check` returns `None` on + /// pass and `Some message` on failure; the framer wraps it + /// as a `LawViolation`. + let private runSamples + (seed: int) + (samples: int) + (check: System.Random -> int -> string option) + : Result = + let mutable failure : LawViolation option = None + let mutable i = 0 + while failure.IsNone && i < samples do + let rng = System.Random(seed + i) + match check rng i with + | Some msg -> + failure <- Some { Seed = seed; SampleIndex = i; Message = msg } + | None -> () + i <- i + 1 + match failure with + | Some v -> Error v + | None -> Ok () + /// Linearity: for every trace pair `(A, B)`, the output /// trace on `A + B` (elementwise) must equal the elementwise /// sum of the output traces on `A` and `B`. @@ -65,79 +101,89 @@ module LawRunner = (addOut: 'TOut -> 'TOut -> 'TOut) (equalOut: 'TOut -> 'TOut -> bool) : Result = - if samples < 1 then invalidArg "samples" "must be >= 1" - if scheduleLength < 1 then invalidArg "scheduleLength" "must be >= 1" - let rng = System.Random(seed) - let mutable failure : LawViolation option = None - let mutable i = 0 - while failure.IsNone && i < samples do - let traceA = generateTrace rng scheduleLength genInput - let traceB = generateTrace rng scheduleLength genInput - let traceSum = List.map2 addIn traceA traceB - let outA = runTrace makeOp traceA - let outB = runTrace makeOp traceB - let outSum = runTrace makeOp traceSum - // Compare elementwise: op(a_t + b_t) == op(a_t) + op(b_t) - // at every tick t. Short-circuit on first divergence so - // the reported message points at the offending tick. - let mutable tick = 0 - let mutable broke = false - while not broke && tick < scheduleLength do - let lhs = outSum.[tick] - let rhs = addOut outA.[tick] outB.[tick] - if not (equalOut lhs rhs) then - failure <- - Some { Seed = seed - SampleIndex = i - Message = - sprintf + if samples < 1 then + Error (badArgs seed "samples must be >= 1") + elif scheduleLength < 1 then + Error (badArgs seed "scheduleLength must be >= 1") + else + runSamples seed samples (fun rng i -> + let traceA = generateTrace rng scheduleLength genInput + let traceB = generateTrace rng scheduleLength genInput + let traceSum = List.map2 addIn traceA traceB + // Convert outputs to arrays once — tick indexing + // is O(n) on `List.item`, so a List scan would + // be O(scheduleLength²) per sample. + let outA = PluginHarness.runSingleInput makeOp traceA |> List.toArray + let outB = PluginHarness.runSingleInput makeOp traceB |> List.toArray + let outSum = PluginHarness.runSingleInput makeOp traceSum |> List.toArray + let mutable result = None + let mutable tick = 0 + while result.IsNone && tick < scheduleLength do + let lhs = outSum.[tick] + let rhs = addOut outA.[tick] outB.[tick] + if not (equalOut lhs rhs) then + result <- + Some (sprintf "Linearity broke at sample %d, tick %d: \ - op(a+b) ≠ op(a) + op(b)." - i tick } - broke <- true - tick <- tick + 1 - i <- i + 1 - match failure with - | Some v -> Error v - | None -> Ok () + op(a+b) != op(a) + op(b)." + i tick) + tick <- tick + 1 + result) - /// Retraction completeness (Option B, trace-based). - /// Forward-run a generated Z-set trace, then retract each - /// tick's input in the same order. After the round-trip the - /// cumulative output Z-set must be empty — any residual - /// means the operator leaked state through retraction. + /// Retraction completeness via state restoration. + /// A forward trace of random Z-sets is cancelled by its + /// elementwise negation; a continuation trace is then fed + /// to the same op instance, and the continuation outputs + /// are compared to a fresh-op run of the continuation + /// alone. Any divergence means state survived the cancel. /// - /// Works against the existing marker `IStatefulStrictOperator` - /// without interface enrichment (see the design doc for the - /// Option A follow-up path). + /// The earlier "cumulative output = 0" formulation was + /// rejected in review: it passes trivially for empty-emitting + /// ops, it is not the correct law for stateful-strict ops + /// whose outputs are not themselves cancelling (e.g. + /// integration-shaped aggregates), and a pathological op + /// can trivially satisfy it while leaking state. State + /// restoration is the law the tag actually promises. let checkRetractionCompleteness<'TIn, 'TOut when 'TIn : comparison and 'TOut : comparison> (seed: int) (samples: int) (scheduleLength: int) + (continuationLength: int) (makeOp: Stream> -> IOperator>) (genInput: System.Random -> ZSet<'TIn>) : Result = - if samples < 1 then invalidArg "samples" "must be >= 1" - if scheduleLength < 1 then invalidArg "scheduleLength" "must be >= 1" - let rng = System.Random(seed) - let mutable failure : LawViolation option = None - let mutable i = 0 - while failure.IsNone && i < samples do - let forward = generateTrace rng scheduleLength genInput - let retract = forward |> List.map ZSet.neg - let outputs = runTrace makeOp (forward @ retract) - let cumulative = outputs |> List.fold ZSet.add ZSet.empty - if not cumulative.IsEmpty then - failure <- - Some { Seed = seed - SampleIndex = i - Message = - sprintf - "Retraction incomplete at sample %d: after \ - forward+retract round-trip of %d ticks, \ - cumulative output has %d residual entries." - i scheduleLength cumulative.Count } - i <- i + 1 - match failure with - | Some v -> Error v - | None -> Ok () + if samples < 1 then + Error (badArgs seed "samples must be >= 1") + elif scheduleLength < 1 then + Error (badArgs seed "scheduleLength must be >= 1") + elif continuationLength < 1 then + Error (badArgs seed "continuationLength must be >= 1") + else + runSamples seed samples (fun rng i -> + let forward = generateTrace rng scheduleLength genInput + let retract = forward |> List.map ZSet.neg + let continuation = generateTrace rng continuationLength genInput + let fullTrace = forward @ retract @ continuation + let outFull = + PluginHarness.runSingleInput makeOp fullTrace + |> List.toArray + let outFresh = + PluginHarness.runSingleInput makeOp continuation + |> List.toArray + let prefix = scheduleLength + scheduleLength + let mutable result = None + let mutable tick = 0 + while result.IsNone && tick < continuationLength do + let afterCancel = outFull.[prefix + tick] + let fresh = outFresh.[tick] + let diff = ZSet.add afterCancel (ZSet.neg fresh) + if not diff.IsEmpty then + result <- + Some (sprintf + "Retraction incomplete at sample %d, \ + continuation tick %d: state survived the \ + forward+retract cancel (diff has %d \ + residual entries)." + i tick diff.Count) + tick <- tick + 1 + result) diff --git a/src/Core/PluginApi.fs b/src/Core/PluginApi.fs index a0673709..e36eed8d 100644 --- a/src/Core/PluginApi.fs +++ b/src/Core/PluginApi.fs @@ -45,7 +45,10 @@ type OutputBuffer<'TOut> = /// Plugin-author contract for a custom operator with a typed /// output. External plugin libraries implement this to extend /// Zeta's operator catalogue without touching internal scheduler -/// types. +/// types. Conformance to an algebra capability tag +/// (`ILinearOperator`, `IBilinearOperator`, etc.) is verified +/// in the plugin author's test project via `LawRunner` — see +/// `docs/PLUGIN-AUTHOR.md` §"Verifying your plugin's algebra tag". type IOperator<'TOut> = /// Short diagnostic label for the operator instance. Used diff --git a/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs b/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs index 4454de58..46b44dae 100644 --- a/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs +++ b/tests/Tests.FSharp/Plugin/LawRunner.Tests.fs @@ -1,5 +1,6 @@ module Zeta.Tests.Plugin.LawRunnerTests +open System.Collections.Generic open System.Threading.Tasks open Xunit open FsUnit.Xunit @@ -36,11 +37,17 @@ type private SquarerOp(input: Stream) = // ──────────────────────────────────────────────────────────────── -// Retraction-completeness fixtures (ZSet → ZSet) +// Retraction-completeness fixtures +// +// The law checks state restoration: after forward+retract +// cancel, continuation outputs must match a fresh-op run of +// the same continuation. Catches stateful ops that survive +// what was supposed to be a full cancel. // ──────────────────────────────────────────────────────────────── -/// Clean retraction — pure passthrough. Each tick echoes the -/// input Z-set; forward+retract round-trip sums to empty. +/// Stateless echo — trivially passes state restoration +/// because it has no state. Keeps a baseline for the law on +/// clean ops. type private ZSetEchoOp(input: Stream>) = let deps = [| input.AsDependency() |] interface IStatefulStrictOperator, unit, ZSet> with @@ -51,21 +58,37 @@ type private ZSetEchoOp(input: Stream>) = ValueTask.CompletedTask -/// Retraction-lossy — drops negative-weight entries. Under a -/// forward+retract round-trip the retracted entries never make -/// it through, so cumulative output ≠ empty. Falsely tagged -/// `IStatefulStrictOperator` so the law runner catches the lie. -type private PositiveOnlyOp(input: Stream>) = +/// Genuinely stateful *and* retraction-lossy — a floored per- +/// key counter. Accumulates weights but refuses to go below +/// zero; a `-1` on a key at count 0 is silently dropped. Under +/// forward+retract the dropped decrements leave residual +/// positive state that survives the cancel, so any continuation +/// input sees the leaked state and diverges from a fresh op. +/// Falsely tagged `IStatefulStrictOperator` so the law catches +/// it. +type private FlooredCounterOp(input: Stream>) = let deps = [| input.AsDependency() |] - interface IStatefulStrictOperator, unit, ZSet> with - member _.Name = "positive-only-liar" + let state = Dictionary() + interface IStatefulStrictOperator, Dictionary, ZSet> with + member _.Name = "floored-counter-liar" member _.ReadDependencies = deps member _.StepAsync(out, _ct) = - let positives = - input.Current.AsSpan().ToArray() - |> Array.filter (fun e -> e.Weight > 0L) - |> Array.map (fun e -> e.Key, e.Weight) - out.Publish (ZSet.ofSeq positives) + let delta = ResizeArray() + let span = input.Current.AsSpan() + for i in 0 .. span.Length - 1 do + let k = span.[i].Key + let w = span.[i].Weight + let current = + match state.TryGetValue k with + | true, v -> v + | _ -> 0L + let proposed = current + w + let applied = if proposed < 0L then 0L else proposed + let emitted = applied - current + if applied = 0L then state.Remove k |> ignore + else state.[k] <- applied + if emitted <> 0L then delta.Add (k, emitted) + out.Publish (ZSet.ofSeq delta) ValueTask.CompletedTask @@ -79,7 +102,8 @@ let private genZSet (rng: System.Random) : ZSet = let n = rng.Next(0, 4) [ for _ in 1 .. n -> let k = rng.Next(0, 5) - // Weight in [-3, 3] \ {0} — both polarities appear. + // Exclude zero weight so both polarities are represented + // without burning a generator draw on a no-op. let mutable w = rng.Next(-3, 4) if w = 0 then w <- 1 (k, int64 w) ] @@ -94,14 +118,9 @@ let private genZSet (rng: System.Random) : ZSet = let ``checkLinear passes on a genuine linear op`` () = let result = LawRunner.checkLinear - 42 // seed - 20 // samples - 8 // scheduleLength + 42 20 8 (fun s -> DoublerOp s :> IOperator) - genInt - (+) - (+) - (=) + genInt (+) (+) (=) match result with | Ok () -> () | Error v -> Assert.Fail (sprintf "expected Ok, got %A" v) @@ -111,14 +130,9 @@ let ``checkLinear passes on a genuine linear op`` () = let ``checkLinear catches a falsely-tagged non-linear op`` () = let result = LawRunner.checkLinear - 42 - 20 - 8 + 42 20 8 (fun s -> SquarerOp s :> IOperator) - genInt - (+) - (+) - (=) + genInt (+) (+) (=) match result with | Ok () -> Assert.Fail "expected linearity violation" | Error v -> @@ -129,8 +143,7 @@ let ``checkLinear catches a falsely-tagged non-linear op`` () = [] let ``checkLinear reproduces bit-exact on the same seed`` () = let run () = - LawRunner.checkLinear - 99 10 5 + LawRunner.checkLinear 99 10 5 (fun s -> SquarerOp s :> IOperator) genInt (+) (+) (=) let first = run () @@ -138,17 +151,26 @@ let ``checkLinear reproduces bit-exact on the same seed`` () = first |> should equal second +[] +let ``checkLinear returns Error on bad samples arg`` () = + let result = + LawRunner.checkLinear 0 0 1 + (fun s -> DoublerOp s :> IOperator) + genInt (+) (+) (=) + match result with + | Ok () -> Assert.Fail "expected bad-args Error" + | Error v -> v.Message |> should haveSubstring "samples" + + // ──────────────────────────────────────────────────────────────── // Retraction completeness // ──────────────────────────────────────────────────────────────── [] -let ``checkRetractionCompleteness passes on a clean-retracting op`` () = +let ``checkRetractionCompleteness passes on a stateless echo`` () = let result = LawRunner.checkRetractionCompleteness - 7 // seed - 15 // samples - 6 // scheduleLength + 7 15 6 4 (fun s -> ZSetEchoOp s :> IOperator>) genZSet match result with @@ -157,14 +179,25 @@ let ``checkRetractionCompleteness passes on a clean-retracting op`` () = [] -let ``checkRetractionCompleteness catches a retraction-lossy op`` () = +let ``checkRetractionCompleteness catches a stateful retraction-lossy op`` () = let result = LawRunner.checkRetractionCompleteness - 7 15 6 - (fun s -> PositiveOnlyOp s :> IOperator>) + 7 15 6 4 + (fun s -> FlooredCounterOp s :> IOperator>) genZSet match result with | Ok () -> Assert.Fail "expected retraction-completeness violation" | Error v -> v.Seed |> should equal 7 v.Message |> should haveSubstring "Retraction incomplete" + + +[] +let ``checkRetractionCompleteness reproduces bit-exact on the same seed`` () = + let run () = + LawRunner.checkRetractionCompleteness 13 8 5 3 + (fun s -> FlooredCounterOp s :> IOperator>) + genZSet + let first = run () + let second = run () + first |> should equal second From 602ac50089ba7c762f4dff3d72c3ae14172c94ec Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 18 Apr 2026 15:28:41 -0400 Subject: [PATCH 4/4] =?UTF-8?q?Round=2028=20close=20+=20round=2029=20open?= =?UTF-8?q?=20=E2=80=94=20CI=20anchor=20committed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 28 anchor (FsCheck law runner) shipped across 3 commits on this branch; this commit is the round-close bookkeeping + the round-29 anchor commitment. - docs/CURRENT-ROUND.md resets for round 29 with CI pipeline as the anchor; round-28 deliverables summarised; carryover split between DEBT-tracked law-runner follow-ups, product law- coverage (bilinear / sink-terminal / Option-A promotion), and deferred items from round 27. - docs/ROUND-HISTORY.md — round 28 narrative prepended (anchor, reviewer-floor catch, Option-A-vs-B decision, lean cleanup, soft-claim retracted). - docs/WINS.md — three round-28 wins: reviewer-floor paying on first applicable round, Option-B-with-planned-followups over verbal agreement, deterministic-simulation as contract not decoration. - docs/BACKLOG.md — new P0 entry for CI / build-machine setup with the full discipline rules Aaron committed: * `../scratch` + `../SQLSharp` are read-only references — never copy files, hand-craft every artefact * Aaron reviews every CI design decision before landing * Cost discipline on CI minutes; narrow default matrix * macOS + Linux first; Windows when justified * Product + CI work parallelisable on one machine Sub-task sequence: build-machine-setup audit → ci-workflow-design audit → gate inventory → first workflow (build-and-test), each its own Aaron review gate. AlloyRunner.java confirmed load-bearing: Alloy is a JVM tool with no convenient CLI, driven from F# tests via `javac` + `java`. Keep. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/BACKLOG.md | 70 +++++++++++++++ docs/CURRENT-ROUND.md | 203 +++++++++++++++++++++++++----------------- docs/ROUND-HISTORY.md | 117 ++++++++++++++++++++++++ docs/WINS.md | 61 +++++++++++++ 4 files changed, 369 insertions(+), 82 deletions(-) diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index 02e82b56..2abe062f 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -93,6 +93,76 @@ within each priority tier. side (Window.fs wiring pending). Target: measured numbers in `docs/BENCHMARKS.md` by end of round 20. +## P0 — CI / build-machine setup (round-29 anchor) + +- [ ] **First-class CI pipeline for Zeta.** Every agent-written + commit eventually has to pass the same gate a human commit + does; right now the only gate is `dotnet build -c Release` + on the maintainer's laptop. Aaron's framing (round 28): + + > "Our CI setup is as first class for this software factory + > as is the agents themselves, it does not ultimately work + > without both." + + Discipline rules committed up front: + + 1. **Read-only reference.** `../scratch` is the model for + build-machine setup (scripts, installers, install + locations, tool pins). `../SQLSharp` is the model for + GitHub Actions workflows. **Never copy files.** Read to + understand the shape and the intent; hand-craft every + artefact from scratch for Zeta so no cruft or + assumptions from another repo's context sneaks in. + 2. **Human review on every decision.** Aaron reviews the + OS matrix, tool versions, caching strategy, artefact + retention, secret handling, permissions model, workflow + triggers, and the per-job concurrency/timeout settings + before any workflow lands. This is *not* a place for + "ship and iterate". + 3. **Cost discipline.** CI minutes are the expensive + resource. Every job earns its slot: justify any + matrix-axis expansion, any scheduled run, any + always-on-PR gate. Default to narrow (one OS, one + dotnet, current branch only) and widen only with a + stated reason. + 4. **Cross-platform, eventually.** Zeta is cross-platform + (.NET 10, macOS arm64 dev box, Linux CI, Windows + supported). Aaron can run rounds on Windows when + Windows-specific work is under way; say so when useful. + CI matrix should cover at least macOS + Linux; Windows + gets added once we have a Windows-breaking test to + justify the slot. + 5. **Parallelism note.** Product work and CI work run on + one maintainer machine but rarely interfere at the + shell level. Architect is free to dispatch CI-design + subagents in parallel with product work as long as + neither agent writes the same file. + + **First sub-tasks (round-29 anchor; each its own Aaron + review gate):** + + 1. Audit `../scratch` for install-script patterns (what + tools, what versions, what pinning method, what target + paths). Output: a design doc at + `docs/research/build-machine-setup.md` citing every + borrowed idea and nothing else. No file copies. + 2. Audit `../SQLSharp` `.github/workflows/` for workflow + shape (triggers, jobs, matrix, caching, permissions). + Output: a design doc at + `docs/research/ci-workflow-design.md`. + 3. Map Zeta's actual gate list: `dotnet build`, + `dotnet test`, Semgrep, Alloy (needs JDK), Lean proofs + (needs elan/lake), TLC (needs JDK + tla2tools.jar), + Stryker (slow — scheduled not per-PR), FsCheck tests. + Output: a gate inventory at + `docs/research/ci-gate-inventory.md`. + 4. First workflow: `build-and-test.yml` covering + `dotnet build -c Release` + `dotnet test Zeta.sln -c + Release` on `ubuntu-latest` + `macos-latest`. Nothing + else until Aaron has signed off on the gate list. + 5. Subsequent workflows added one at a time, each with + explicit Aaron sign-off on the design doc first. + ## P0 — security / SDL artifacts - [ ] **`docs/security/CRYPTO.md`** — justify CRC32C (integrity, not diff --git a/docs/CURRENT-ROUND.md b/docs/CURRENT-ROUND.md index 3130e915..5948b2f8 100644 --- a/docs/CURRENT-ROUND.md +++ b/docs/CURRENT-ROUND.md @@ -1,94 +1,133 @@ -# Current Round — 28 (open) +# Current Round — 29 (open) -Round 27 closed; narrative absorbed into -`docs/ROUND-HISTORY.md`. Round 28 opens with these -deferred items carried over from round 27: +Round 28 closed; narrative absorbed into +`docs/ROUND-HISTORY.md`. Round 29 opens with **CI setup as +the anchor** — see `docs/BACKLOG.md` §"P0 — CI / +build-machine setup" for the full discipline rules and +sub-tasks. ## Status -- **Round number:** 28 -- **Opened:** 2026-04-18 (continuous from round-27 close) -- **Classification:** split — non-trivial P1 carryover - from reviewer pass + other deferred work. -- **Reviewer budget:** 2-3 per §13 + mandatory Kira+Rune - floor per §20 on any code-landing phase. - -## Carried from round 27 - -**Reviewer P1 findings (Kira + Rune, logged to DEBT):** -- `OutputBuffer` tick-stamp + invalidate-on-tick-end. -- `ReadDependencies` defensive copy at registration. -- `box plugin` triple-evaluation → `let boxed = box plugin` once. -- `BayesianRateOp` `int64` accumulator → `Checked.(+)` or saturate. -- `INestedFixpointParticipant` inherits `IOperator<'TOut>`. -- `PluginHarness` id-space via `Int32.MinValue`-range synthetics. -- `IOperator<'T>` → `IZetaOperator<'T>` rename (before external adoption). -- `Op<'T>.Value` mixed-accessibility hover-doc pointing at `OutputBuffer.Publish`. -- `PluginApi.fs` split when >300 lines. -- PLUGIN-AUTHOR.md `[]` explanation in sample. -- Extract `internal assignHarnessId` helper shared by `Circuit.Build` + `PluginHarness`. - -**Design work still open:** +- **Round number:** 29 +- **Opened:** 2026-04-18 (continuous from round-28 close) +- **Classification:** infrastructure round — CI pipeline + design is the anchor; product work (bilinear / sink- + terminal laws, Option-A stateful promotion) runs in + parallel where it doesn't block on Aaron's CI-design + review gates. +- **Reviewer budget:** Kira + Rune floor per GOVERNANCE.md + §20 on every code landing. Aaron personally reviews + every CI-decision artefact. + +## Round 28 close — what landed + +Anchor goal achieved: FsCheck law runner live as a test- +time library. + +- `LawRunner.checkLinear` — generic additivity check; works + over `ZSet` and plain numerics via user-supplied `add`/ + `equal` callbacks. +- `LawRunner.checkRetractionCompleteness` — Option B + (trace-based), state-restoration via continuation after + reviewer P0 rewrite. Catches retraction-lossy stateful + ops (tested against a floored-counter fixture). +- Per-sample `System.Random(seed + i)` for true bit-exact + reproducibility of `(seed, sampleIndex)`. +- Deterministic-simulation framing locked in + `docs/research/stateful-harness-design.md`; Option A + (enrich `IStatefulStrictOperator` with `Init`/`Step`/ + `Retract` triple matching the DBSP paper's `(σ, λ, ρ)` + shape) is the planned additive promotion in round-30+. +- Scaffolding cleanup: `tools/lean4/` `lake new` leftovers + removed; `Lean4.lean` rewired to import the real + `DbspChainRule` proof file. + +## Round 29 anchor — CI pipeline setup + +**Discipline rules (committed up front; `docs/BACKLOG.md` +has the full text):** + +1. Read `../scratch` (build-machine setup) and + `../SQLSharp` (GitHub workflows) for shape + intent. + **Never copy files.** Hand-craft every artefact. +2. Aaron reviews every CI design decision before it lands. + This is not a "ship and iterate" surface. +3. Cost discipline: every CI minute earns its slot; default + to narrow matrix and widen with a stated reason. +4. Cross-platform eventual: macOS + Linux first; Windows + when there's a Windows-breaking test to justify it. + Aaron can run rounds on Windows on request. +5. Product work and CI work run in parallel on the same + machine; dispatch research subagents for CI design + concurrently with product work as long as neither + writes the same file. + +**Sub-task sequence (each its own Aaron review gate):** + +1. Audit `../scratch` → `docs/research/build-machine-setup.md`. +2. Audit `../SQLSharp/.github/workflows/` → + `docs/research/ci-workflow-design.md`. +3. Gate inventory → `docs/research/ci-gate-inventory.md`. +4. First workflow: `build-and-test.yml` covering + `dotnet build -c Release` + `dotnet test Zeta.sln -c + Release` on `ubuntu-latest` + `macos-latest`. Nothing + else until Aaron signs off on the gate list. +5. Subsequent workflows land one at a time, each with an + explicit design doc and sign-off. + +## Carried from round 28 + +**Law runner follow-ups (DEBT-tracked):** +- `check*` take 8-11 positional args → promote to config + record before `checkBilinear` lands. +- `LawViolation.Message` → structured DU. +- Test covering ops that omit the marker tag. + +**Law coverage (product work; can run in parallel with CI):** +- `checkBilinear` — join-shaped ops; standard DBSP + incrementalisation form. +- `checkSinkTerminal` — Sink-tagged ops must not compose + into a relational path. +- Option-A promotion of `IStatefulStrictOperator` to + explicit `(σ, λ, ρ)` triple — round-30+ unless CI work + stalls on a review gate, in which case advance here. + +**Open from round 27 (deferred to round-30+ pool):** - `IsDbspLinear` Lean predicate + B1/B2/B3/chain_rule - closures (Tariq option-c; half-day B2, two days full). -- FsCheck law runner at `Circuit.Build()` per capability - tag — unblocks PLUGIN-AUTHOR.md's soft-claim in "Known - limits of round-27." - -**Persona-notes migration tail:** -- 6 remaining persona notebooks still in single-file - layout (`public-api-designer.md`, `skill-tune-up-ranker.md`, - `best-practices-scratch.md`, `algebra-owner.md`, - `formal-verification-expert.md`, `agent-experience-researcher.md`). - Lazy migration per §21; convert when a persona next - writes a typed memory entry. - -**Other deferred:** -- Rune on `docs/STYLE.md` decision (small). -- UX + DX persona proposals. -- Empathy-coach persona spawn (naming pending). - -## Workflow cadence (established round-26, codified round-27) - -- Each round runs on its own branch (`round-N`). -- Coherent changes within a round become separate commits - where it helps readability. -- Round-close = PR from `round-N` to `main` + merge. -- Reviewer pass per §20 before round-close. -- Maintainer may request a review pass on the branch diff - before merge; ask before pushing the merge. + closures (Tariq option-c). +- Reviewer P1 list: `OutputBuffer` tick-stamp, + `ReadDependencies` defensive copy, BayesianRate + `Checked.(+)`, `IOperator` → `IZetaOperator` rename + window, `PluginApi.fs` split when >300 lines. ## Open asks to the maintainer -- **NuGet prefix reservation** on `nuget.org` for - `Zeta.*` — maintainer owns. -- **`global.json` `rollForward`** — status quo vs relaxed - (silent-pick status quo unless objection). -- **Eval-harness MVP scope** — still pending since round - 23. -- **Repo visibility** — currently private on AceHack; - flip to public when ready. - -## Next architect actions - -1. Open `round-28` branch off `main` once round-27 PR - merges. -2. Anchor choice: **FsCheck law runner** (unblocks - plugin-author trust + validates Tariq's design) or - **OutputBuffer tick-stamp** (closes Kira's P0 that - DEBT'd this round). Recommend law runner — larger - impact, removes the soft-claim from PLUGIN-AUTHOR.md. -3. Dispatch code-phase reviewer floor (Kira + Rune) per - §20 on any code that lands. +- **Aaron decisions blocking round-29 progress:** + - Sign-off on the build-machine-setup design doc (first + sub-task). + - Sign-off on the CI-workflow design doc (second sub- + task). + - Sign-off on the gate inventory (third sub-task). + - Sign-off on the first workflow's OS matrix and dotnet + pin (fourth sub-task). +- **NuGet prefix reservation** on `nuget.org` for `Zeta.*` + — still maintainer-owned. +- **`global.json` `rollForward`** — status quo vs relaxed. +- **Eval-harness MVP scope** — pending since round 23. +- **Repo visibility** — private on AceHack; flip to public + when ready. ## Notes for the next Kenji waking -- `memory/` is canonical shared memory (not the - sandbox). See GOVERNANCE.md §18 + §22. -- `memory/persona//` is per-persona memory. - Kenji's seat already folder-migrated; others lazy. -- Reviewer pass per §20 is mandatory every code-landing - round. Kira + Rune is the floor. -- Public API changes go through Ilyana per §19. +- `memory/` is canonical shared memory; `memory/persona/ + /` is per-persona (name-keyed). +- Reviewer pass per GOVERNANCE.md §20 is mandatory every + code-landing round. Kira + Rune is the floor. +- Public API changes go through Ilyana per GOVERNANCE.md + §19. - `~/.claude/projects/` is Claude Code sandbox, not git. - Do not cite as canonical (§22). + Do not cite as canonical (GOVERNANCE.md §22). +- **CI decisions need Aaron sign-off before landing** — + round-29 discipline rule. +- `../scratch` and `../SQLSharp` are **read-only references** + — never copy files. diff --git a/docs/ROUND-HISTORY.md b/docs/ROUND-HISTORY.md index 93a4af73..7b24482e 100644 --- a/docs/ROUND-HISTORY.md +++ b/docs/ROUND-HISTORY.md @@ -9,6 +9,123 @@ New rounds are appended at the top. --- +## Round 28 — FsCheck law runner (Option B), stateful-harness design doc, lean4 cleanup + +### Anchor — FsCheck law runner at plugin-law surface + +Round 28's committed anchor from `CURRENT-ROUND.md` round-27 +close was a law runner that could catch a falsely-tagged +plugin operator. Design call early in the round: move it +from `Circuit.Build()` gate (the round-27 soft-claim in +`docs/PLUGIN-AUTHOR.md`) to a **test-time library** — +`Zeta.Core.LawRunner`. Rationale committed: keeps `Core` +free of FsCheck, defers probabilistic-testing cost to the +plugin author's test project, idiomatic F#. + +**Live laws:** +- `LawRunner.checkLinear` — generates trace pairs `(A, B)` + via a `System.Random -> 'TIn` generator, asserts + `op(A + B) = op(A) + op(B)` tick-by-tick using user- + supplied `addIn` / `addOut` / `equalOut`. Works for + `ZSet<'T>` (via `ZSet.add`) and plain numerics. +- `LawRunner.checkRetractionCompleteness` — Option B + (trace-based, no interface change). State-restoration + via continuation: feed `forward ++ retract ++ + continuation`, compare continuation outputs to a fresh- + op run of the continuation alone. Any divergence means + state survived the cancel. + +Deterministic-simulation framing throughout: one seeded +`System.Random(seed + sampleIndex)` per sample, failing +run reports `(seed, sampleIndex)`, re-run reproduces +bit-exact. Each `runSingleInput` call creates a fresh op +instance so state cannot leak across samples. + +**Design doc.** `docs/research/stateful-harness-design.md` +captures the build-vs-test decision, the Option A vs +Option B analysis, and the sequenced follow-up plan. +Aaron's question mid-round — "what does Option A get us +for the future?" — produced an explicit long-term +recommendation baked into the doc: **Option A is the right +long-term direction** because it matches the DBSP paper's +`(σ, λ, ρ)` triple (the same shape +`tools/lean4/Lean4/DbspChainRule.lean` proves against) and +unlocks generic WDC checkpointing + planner fusion of +adjacent stateful ops. **Option B is the right first step** +because Option A needs real design work on the async path +and retraction contract, and prototyping A in a vacuum +would ship a contract we would need to revise. Option B +also stays as a fallback for plugins that cannot expose +`Init`/`Step`/`Retract` (ML wrappers, third-party system +integrations). + +### Reviewer floor — GOVERNANCE.md §20 caught real bugs + +Kira + Rune dispatched after the first LawRunner landing. +Kira P0: original retraction law ("cumulative output over +forward+retract = 0") is too weak — passes trivially for +empty-emitting ops and a floored-counter can leak state +while keeping cumulative zero. Law rewritten to state- +restoration via continuation (the law the tag actually +promises), test fixture replaced: `FlooredCounterOp` +(genuinely stateful and lossy) supersedes the mistagged +`PositiveOnlyOp` filter (which was non-linear, not a +retraction fixture). Second Kira P0: `invalidArg` in Core +public surface violates CLAUDE.md's result-over-exception +rule — all entries now return `Result`. +Third Kira P0: whole-loop RNG meant `(seed, sampleIndex)` +didn't actually reproduce; fixed with per-sample +`System.Random(seed + i)`. + +Rune P1 findings logged to DEBT: `check*` 8-11 positional +args → promote to config record before `checkBilinear` +lands; `LawViolation.Message: string` → structured DU; add +a test covering ops that omit the marker tag. + +Lesson carried forward: **reviewer floor is not a +formality.** Kira's P0 on retraction semantics was a real +bug in the law definition, not a style nit. The round 27 +codification of Kira + Rune as mandatory per §20 paid off +on its first applicable round. + +### Lean4 scaffolding cleanup + +`lake new` boilerplate removed from `tools/lean4/`: +`README.md` (GitHub-Pages setup template), `Basic.lean` +(hello-world sample), `.github/workflows/` (upstream Lean +CI templates), redundant `.gitignore` (root `.gitignore` +already covers `.lake/`). `Lean4.lean` rewired to `import +Lean4.DbspChainRule` so `lake build` walks the real proof +file. Load-bearing pieces kept: `lakefile.toml`, +`lake-manifest.json`, `lean-toolchain`. Sample deletion +flagged by Aaron: "that was a sample hello world [lake +scaffold] project and we don't need the [Lake] project +itself we are calling it from dotnet." + +### PLUGIN-AUTHOR.md soft-claim retracted + +Round 27's "law verification at `Circuit.Build()` once the +FsCheck generators are implemented" was honest-but- +aspirational. Round 28's doc now points at +`LawRunner.checkLinear` / `checkRetractionCompleteness` as +live, `checkBilinear` / `checkSinkTerminal` flagged as +round-29+ work. + +### Round 29 anchor — CI pipeline + +Aaron (round 28 close): "Our CI setup is as first class +for this software factory as is the agents themselves, it +does not ultimately work without both." Round 29 opens +with CI as the anchor; `../scratch` + `../SQLSharp` are +**read-only reference** repos (never copy files, hand- +craft every artefact from scratch). Discipline rules +committed in `docs/BACKLOG.md` §"P0 — CI / build-machine +setup": Aaron reviews every design decision before it +lands, cost discipline on CI minutes, cross-platform +eventual (macOS + Linux first, Windows when justified). + +--- + ## Round 27 — big round: governance §20-§22, plugin API redesign landed, memory moved in-repo ### Shipped — governance rules (§20 / §21 / §22) diff --git a/docs/WINS.md b/docs/WINS.md index 0ce80a1c..bba3f48d 100644 --- a/docs/WINS.md +++ b/docs/WINS.md @@ -11,6 +11,67 @@ shipped." **Ordered newest-first** — recent rounds lead, older rounds trail below. Entries stay even after the moment passes, because the pattern is the value. +## Wins — round 28 + +### Reviewer floor caught a P0 in the law definition itself + +Kira's harsh-critic pass on the first `LawRunner` landing +didn't flag style — it flagged the retraction law's +semantics. The original "cumulative output over +forward+retract = 0" formulation passes trivially for +empty-emitting ops and a floored-counter can leak state +while satisfying it. The rewritten law (state-restoration +via continuation) is what the `IStatefulStrictOperator` +tag actually promises. Without the round-27 codification +of Kira + Rune as mandatory per GOVERNANCE.md §20, the +wrong law would have landed as canonical and the test +fixture (`PositiveOnlyOp` — non-linear, not even a +retraction fixture) would have silently misled the next +plugin author. + +**What it teaches.** Reviewer-floor-as-rule pays the first +time a round produces novel semantics. The cost of +dispatching two reviewers on a 130-line file is hours of +future confusion averted. §20 is load-bearing, not +ceremonial. + +### Option B with a planned Option A path, not "ship and pray" + +Mid-round Aaron asked "what does Option A get us for the +future, is it the right long-term decision?" The answer +shipped as part of the design doc, not as verbal +agreement: Option A matches the DBSP paper's `(σ, λ, ρ)` +triple and unlocks generic WDC checkpointing + planner +fusion, so it's the right long-term direction — but Option +B is the right first step because prototyping A in a +vacuum would ship a contract we'd revise. The document +names the sequenced round-29+ follow-ups explicitly. + +**What it teaches.** When a design question has a future, +write the future down. "Decided Option B" is a weaker +commit than "Option B with these specific round-29+ +follow-ups and this specific rationale for why not A +today." The latter survives the next Kenji waking. + +### Deterministic simulation as the harness framing, not an afterthought + +Aaron's mid-round "I'm assuming this is for deterministic +simulation?" re-shaped the design doc. Rather than a +loose "tests should be reproducible" intent, the doc +encodes the constraint: one seeded RNG per sample (not +per whole loop), no wall-clock, total tick order, failing +run prints seed + sample index, fresh op instance per +sample so state can't leak across samples. The +implementation followed the framing; per-sample RNG was a +Kira P0 because the initial implementation had drifted +from the doc and the doc was the source of truth. + +**What it teaches.** Framing labels ("deterministic +simulation", "result-over-exception") are contracts, not +decoration. When the framing and the code disagree, the +framing wins and the code gets rewritten — not the other +way round. + ## Wins — round 27 ### 1. Reviewer cadence caught two P0s in the same round the code landed