fix(stryker-iteration-2): retarget to Core.CSharp.csproj (real C# tests) — PR #4570 merged with 0% kill-rate target#4571
Conversation
…ts) — supersedes #4570's 0%-kill-rate target on Core.CSharp.ZetaId.csproj which had zero direct .NET test references; cross-verify harness is TS not xUnit so doesn't count as Stryker test-project; make-due now with what has tests + big plan author C# ZetaId xUnit later
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 60da640386
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
60da640 to
d369aa6
Compare
There was a problem hiding this comment.
Pull request overview
Retargets Stryker.NET mutation testing from the previously selected C# project that had no direct xUnit oracle to the actively-tested C# Core surface, so mutation runs can produce a meaningful kill-rate in CI.
Changes:
- Update
stryker-config.jsonto targetsrc/Core.CSharp/Core.CSharp.csprojand usetests/Core.CSharp.Tests/Core.CSharp.Tests.csprojas the test oracle. - Update the Stryker GitHub Actions workflow docstring, job name, and path filters to reflect the new target.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| stryker-config.json | Retarget Stryker project + test-projects to the C# Core surface with active xUnit tests. |
| .github/workflows/stryker-mutation.yml | Update workflow documentation and triggers for the new Stryker target/oracle. |
Comments suppressed due to low confidence (2)
.github/workflows/stryker-mutation.yml:30
- The workflow comment mentions "Soraya loop"; persona names shouldn’t appear in current-state repo surfaces like workflows. Please replace the persona name with the appropriate role-reference (e.g., formal-verification loop) while keeping the B-0691 reference.
#
# F# coverage remains rich via other formal-verification tools: FsCheck
# (property invariants), TLA+/TLC (concurrency + safety via tools/tla/specs/
# + Soraya loop B-0691), Lean 4 (theorem-grade), Z3 (constraint satisfaction),
.github/workflows/stryker-mutation.yml:88
- push.paths looks out of sync with the new Stryker target/test oracle: it includes src/Core.CSharp.ZetaId/** and tests/Tests.CSharp/, but the Stryker config now targets src/Core.CSharp/ and tests/Core.CSharp.Tests/**. As written, pushes to main that only change Core.CSharp or Core.CSharp.Tests may not trigger this workflow.
- ".github/workflows/stryker-mutation.yml"
push:
branches: [main]
paths:
- "src/Core.CSharp.ZetaId/**"
- "src/Core/**"
- "tests/Tests.CSharp/**"
- "tests/cross-verification/zeta-id/**"
…(Codex P1 — was still on iteration-1 paths); (b) clarify 'directly' not 'transitively' for Core.CSharp.Tests→Core.fsproj (Copilot nit); (c) drop break-threshold to 0 (observational mode) — empirical kill-rate 0% on Core.CSharp.csproj (4 mutants; 8 tests don't exercise mutation surface); workflow still runs + records signal + uploads HTML; threshold auto-reactivates when tests land + break is raised; NOT continue-on-error suppression
…docstring math corrections Addresses 3 reviewer findings on PR #4563: 1. **Codex P1** (PluginHarness.fs): `runTwoInputs` (and `runSingleInput`, pre-existing same gap) advanced plugin operators with `StepAsync` but never called `AfterStepAsync` — so any plugin implementing `IStrictOperator` exercised semantics differing from `Circuit.Step/StepAsync`, with strict state never committing between ticks. `LawRunner.checkBilinear` on strict ops would silently validate against incorrect state. **Fix**: mirror the Circuit's post-step hook — call `AfterStepAsync` after each tick's `StepAsync` completes. Same fix applied to both `runSingleInput` and `runTwoInputs` (pattern parity). 2. **Copilot** (LawRunner.fs:158): docstring described the `(addOut, negOut)` pair as a "monoid" with non-inverse `negOut` — monoids by definition lack inverses, so the example was malformed. **Fix**: rephrase to "caller-supplied pair that doesn't actually form an abelian group" (negOut might not be a true inverse; operations might not be associative/commutative; hidden state might creep in). Same point, correct algebra. 3. **Copilot** (LawRunner.Tests.fs:218 + 248): the comment block claimed *"The interesting failure case is L3: a plugin can satisfy L1+L2 while smuggling an additive offset (op(a,b) = a*b + c)"* — but over the abelian-group output type (`int`), any constant offset breaks L1 first (constant lands once on LHS, twice on RHS). The framing was mathematically incorrect. **Fix**: reword both the bilinearity- fixtures intro comment AND the `AffineBilinearLiar` fixture docstring to reflect that L1 trips first over abelian groups; L3 is the load-bearing law only over non-abelian-group output types. Aligns with the math note already in `LawRunner.checkBilinear`'s docstring. ## Tests 18/18 LawRunner + Harness tests pass after fixes. Build clean. No behavioral changes to test outputs — the AfterStepAsync addition is load-bearing for strict ops but no existing test exercises a strict bilinear op (BilinearMultOp / LinearOffsetLiar / AffineBilinearLiar are all non-strict). ## Review-thread resolution These 3 threads on #4563 close with this commit. Remaining unresolved state on #4563 (Stryker.NET F# unsupported error) is workflow-side and addressed by Otto-CLI's PR #4570 + #4571 retarget work.
…oInputs (#4563) * feat(core): LawRunner.checkBilinear + PluginHarness.runTwoInputs — PR 3 of 8 Closes the algebra-tag verification gap on `IBilinearOperator`. Until this PR, `LawRunner` had `checkLinear` and `checkRetractionCompleteness` but no `checkBilinear` — meaning the 3-term incremental-join rewrite in `Incremental.IncrementalJoin` trusted the bilinear tag without any test-time way to verify it. ## New surface ### `PluginHarness.runTwoInputs` Drives a two-input plugin operator through paired input sequences in lock-step (`Seq.zip` semantics — shorter sequence wins). Two `HarnessSourceOp` sources at synthetic ids 0+1, adapter at id 2; mirrors the `runSingleInput` shape so the per-tick publish-counter discipline is identical. ### `LawRunner.checkBilinear` Tests three sub-properties per sample: L1 — `op(a₁+a₂, b) ≡ op(a₁, b) + op(a₂, b)` (left-linearity) L2 — `op(a, b₁+b₂) ≡ op(a, b₁) + op(a, b₂)` (right-linearity) L3 — `op(-a, b) ≡ -op(a, b)` (sign-distribution) The check generates four independent traces (A₁, A₂, B₁, B₂), runs the operator through six combinations (A₁B₁, A₂B₁, ASumB₁, A₁B₂, A₁BSum, ANegB₁), and checks the three laws per tick. Same `(seed, sampleIndex)`-reproducibility shape as `checkLinear`; same `Result<unit, LawViolation>` return type. ## Math note on L3 (corrected from initial sketch) Over an abelian group with standard addition (the case for `int` and `ZSet<'K>`), L1 + L2 *imply* L3: setting `a₁ = a, a₂ = -a` in L1 gives `op(0, b) = op(a, b) + op(-a, b)`, so the classical bilinear condition `op(0, b) = 0` collapses to L3. In that regime L3 is the cleanup law — an affine offset like `op(a, b) = a*b + c` breaks L1 first (the constant lands once on LHS, twice on RHS), so the L3-only failure mode doesn't exist over `int`. L3 becomes load-bearing — not redundant — when the user supplies a non-abelian-group `(addOut, negOut)` pair (e.g. a monoid where `negOut` isn't truly inverse). Checking all three keeps `checkBilinear` correct across the full range of `'TOut` algebras a plugin author might supply, not just `int` / `ZSet<_>`. ## Tests (6 new in LawRunner.Tests.fs, 15/15 total pass) - `BilinearMultOp` (genuine integer multiplication): passes - `LinearOffsetLiar` (`op(a,b) = (a+b)*2`): catches L1 violation - `AffineBilinearLiar` (`op(a,b) = a*b + 7`): catches L1 (the L3 failure that I'd originally claimed was unique to this fixture doesn't exist over `int`; the docstring + test docstring now explain the math) - Reproducibility on same seed - Bad-args validation (samples and scheduleLength) - `runTwoInputs` lock-step behavior + truncation to shorter input 15/15 LawRunner tests pass, no regressions in broader plugin tests. Build clean. ## Independence PR 3 does NOT depend on PR 1 (#4558) or PR 2 (#4560) — `checkBilinear` verifies bilinearity by trace-comparison, not by reading `Op.IsBilinear`. This means PR 3 can ship orthogonally; targets `main` directly. ## Foundation for later PRs PR 4's `IncrementalAuto` dispatcher will read `Op.IsBilinear` (from PR 1) to pick the three-term-bilinear rewrite. A debug-mode build hook (planned for PR 4 or later) can run `checkBilinear` against any operator claiming the tag, catching tag-vs-implementation drift early. * fix(PR #4563 review threads): AfterStepAsync hook in PluginHarness + docstring math corrections Addresses 3 reviewer findings on PR #4563: 1. **Codex P1** (PluginHarness.fs): `runTwoInputs` (and `runSingleInput`, pre-existing same gap) advanced plugin operators with `StepAsync` but never called `AfterStepAsync` — so any plugin implementing `IStrictOperator` exercised semantics differing from `Circuit.Step/StepAsync`, with strict state never committing between ticks. `LawRunner.checkBilinear` on strict ops would silently validate against incorrect state. **Fix**: mirror the Circuit's post-step hook — call `AfterStepAsync` after each tick's `StepAsync` completes. Same fix applied to both `runSingleInput` and `runTwoInputs` (pattern parity). 2. **Copilot** (LawRunner.fs:158): docstring described the `(addOut, negOut)` pair as a "monoid" with non-inverse `negOut` — monoids by definition lack inverses, so the example was malformed. **Fix**: rephrase to "caller-supplied pair that doesn't actually form an abelian group" (negOut might not be a true inverse; operations might not be associative/commutative; hidden state might creep in). Same point, correct algebra. 3. **Copilot** (LawRunner.Tests.fs:218 + 248): the comment block claimed *"The interesting failure case is L3: a plugin can satisfy L1+L2 while smuggling an additive offset (op(a,b) = a*b + c)"* — but over the abelian-group output type (`int`), any constant offset breaks L1 first (constant lands once on LHS, twice on RHS). The framing was mathematically incorrect. **Fix**: reword both the bilinearity- fixtures intro comment AND the `AffineBilinearLiar` fixture docstring to reflect that L1 trips first over abelian groups; L3 is the load-bearing law only over non-abelian-group output types. Aligns with the math note already in `LawRunner.checkBilinear`'s docstring. ## Tests 18/18 LawRunner + Harness tests pass after fixes. Build clean. No behavioral changes to test outputs — the AfterStepAsync addition is load-bearing for strict ops but no existing test exercises a strict bilinear op (BilinearMultOp / LinearOffsetLiar / AffineBilinearLiar are all non-strict). ## Review-thread resolution These 3 threads on #4563 close with this commit. Remaining unresolved state on #4563 (Stryker.NET F# unsupported error) is workflow-side and addressed by Otto-CLI's PR #4570 + #4571 retarget work.
Follow-up to merged PR #4570. The iteration-1 target
Core.CSharp.ZetaId.csprojhad no .NET test projects directly referencing it → Stryker reported 0% kill-rate (78 mutants survived, 0 killed). The cross-verify harness attests/cross-verification/zeta-id/validates byte-identical hex agreement across oracles but is not a Stryker-compatible xUnit project — TS-side test, not .NET.What this PR does
src/Core.CSharp/Core.CSharp.csproj(Variance.cs + ZetaCircuitBuilder.cs)tests/Core.CSharp.Tests/Core.CSharp.Tests.csprojas the test-projects oracle (8 xUnit tests with direct ProjectReference)Substrate-honest framing
Aaron's bidirectional informing pattern operating in real time: make-due (use what has tests today) informs big plan (which C# surfaces need test authoring next). The retarget shape is correct; the specific target needs the test surface to match.