Skip to content

fix(stryker-iteration-2): retarget to Core.CSharp.csproj (real C# tests) — PR #4570 merged with 0% kill-rate target#4571

Merged
AceHack merged 2 commits into
mainfrom
fix/stryker-retarget-to-csharp-peer-oracle-multi-oracle-transitive-coverage-2026-05-21
May 21, 2026
Merged

fix(stryker-iteration-2): retarget to Core.CSharp.csproj (real C# tests) — PR #4570 merged with 0% kill-rate target#4571
AceHack merged 2 commits into
mainfrom
fix/stryker-retarget-to-csharp-peer-oracle-multi-oracle-transitive-coverage-2026-05-21

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 21, 2026

Follow-up to merged PR #4570. The iteration-1 target Core.CSharp.ZetaId.csproj had no .NET test projects directly referencing it → Stryker reported 0% kill-rate (78 mutants survived, 0 killed). The cross-verify harness at tests/cross-verification/zeta-id/ validates byte-identical hex agreement across oracles but is not a Stryker-compatible xUnit project — TS-side test, not .NET.

What this PR does

  • Retargets Stryker to src/Core.CSharp/Core.CSharp.csproj (Variance.cs + ZetaCircuitBuilder.cs)
  • Uses tests/Core.CSharp.Tests/Core.CSharp.Tests.csproj as the test-projects oracle (8 xUnit tests with direct ProjectReference)
  • Updates workflow docstring + job name + paths filter for the new target
  • Reframes the make-due rationale: ZetaId C# xUnit tests don't exist yet; targeting where real tests are now; big plan = author C# ZetaId xUnit tests later then re-extend Stryker target

Substrate-honest framing

Aaron's bidirectional informing pattern operating in real time: make-due (use what has tests today) informs big plan (which C# surfaces need test authoring next). The retarget shape is correct; the specific target needs the test surface to match.

Copilot AI review requested due to automatic review settings May 21, 2026 18:30
@AceHack AceHack enabled auto-merge (squash) May 21, 2026 18:30
…ts) — supersedes #4570's 0%-kill-rate target on Core.CSharp.ZetaId.csproj which had zero direct .NET test references; cross-verify harness is TS not xUnit so doesn't count as Stryker test-project; make-due now with what has tests + big plan author C# ZetaId xUnit later
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60da640386

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/stryker-mutation.yml Outdated
@AceHack AceHack force-pushed the fix/stryker-retarget-to-csharp-peer-oracle-multi-oracle-transitive-coverage-2026-05-21 branch from 60da640 to d369aa6 Compare May 21, 2026 18:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Retargets Stryker.NET mutation testing from the previously selected C# project that had no direct xUnit oracle to the actively-tested C# Core surface, so mutation runs can produce a meaningful kill-rate in CI.

Changes:

  • Update stryker-config.json to target src/Core.CSharp/Core.CSharp.csproj and use tests/Core.CSharp.Tests/Core.CSharp.Tests.csproj as the test oracle.
  • Update the Stryker GitHub Actions workflow docstring, job name, and path filters to reflect the new target.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
stryker-config.json Retarget Stryker project + test-projects to the C# Core surface with active xUnit tests.
.github/workflows/stryker-mutation.yml Update workflow documentation and triggers for the new Stryker target/oracle.
Comments suppressed due to low confidence (2)

.github/workflows/stryker-mutation.yml:30

  • The workflow comment mentions "Soraya loop"; persona names shouldn’t appear in current-state repo surfaces like workflows. Please replace the persona name with the appropriate role-reference (e.g., formal-verification loop) while keeping the B-0691 reference.
#
# F# coverage remains rich via other formal-verification tools: FsCheck
# (property invariants), TLA+/TLC (concurrency + safety via tools/tla/specs/
# + Soraya loop B-0691), Lean 4 (theorem-grade), Z3 (constraint satisfaction),

.github/workflows/stryker-mutation.yml:88

  • push.paths looks out of sync with the new Stryker target/test oracle: it includes src/Core.CSharp.ZetaId/** and tests/Tests.CSharp/, but the Stryker config now targets src/Core.CSharp/ and tests/Core.CSharp.Tests/**. As written, pushes to main that only change Core.CSharp or Core.CSharp.Tests may not trigger this workflow.
      - ".github/workflows/stryker-mutation.yml"
  push:
    branches: [main]
    paths:
      - "src/Core.CSharp.ZetaId/**"
      - "src/Core/**"
      - "tests/Tests.CSharp/**"
      - "tests/cross-verification/zeta-id/**"

Comment thread .github/workflows/stryker-mutation.yml Outdated
…(Codex P1 — was still on iteration-1 paths); (b) clarify 'directly' not 'transitively' for Core.CSharp.Tests→Core.fsproj (Copilot nit); (c) drop break-threshold to 0 (observational mode) — empirical kill-rate 0% on Core.CSharp.csproj (4 mutants; 8 tests don't exercise mutation surface); workflow still runs + records signal + uploads HTML; threshold auto-reactivates when tests land + break is raised; NOT continue-on-error suppression
@AceHack AceHack merged commit 8770cf2 into main May 21, 2026
32 checks passed
@AceHack AceHack deleted the fix/stryker-retarget-to-csharp-peer-oracle-multi-oracle-transitive-coverage-2026-05-21 branch May 21, 2026 18:39
AceHack added a commit that referenced this pull request May 21, 2026
…docstring math corrections

Addresses 3 reviewer findings on PR #4563:

1. **Codex P1** (PluginHarness.fs): `runTwoInputs` (and `runSingleInput`,
   pre-existing same gap) advanced plugin operators with `StepAsync` but
   never called `AfterStepAsync` — so any plugin implementing
   `IStrictOperator` exercised semantics differing from
   `Circuit.Step/StepAsync`, with strict state never committing between
   ticks. `LawRunner.checkBilinear` on strict ops would silently validate
   against incorrect state. **Fix**: mirror the Circuit's post-step
   hook — call `AfterStepAsync` after each tick's `StepAsync` completes.
   Same fix applied to both `runSingleInput` and `runTwoInputs` (pattern
   parity).

2. **Copilot** (LawRunner.fs:158): docstring described the
   `(addOut, negOut)` pair as a "monoid" with non-inverse `negOut` —
   monoids by definition lack inverses, so the example was malformed.
   **Fix**: rephrase to "caller-supplied pair that doesn't actually form
   an abelian group" (negOut might not be a true inverse; operations
   might not be associative/commutative; hidden state might creep in).
   Same point, correct algebra.

3. **Copilot** (LawRunner.Tests.fs:218 + 248): the comment block claimed
   *"The interesting failure case is L3: a plugin can satisfy L1+L2 while
   smuggling an additive offset (op(a,b) = a*b + c)"* — but over the
   abelian-group output type (`int`), any constant offset breaks L1
   first (constant lands once on LHS, twice on RHS). The framing was
   mathematically incorrect. **Fix**: reword both the bilinearity-
   fixtures intro comment AND the `AffineBilinearLiar` fixture docstring
   to reflect that L1 trips first over abelian groups; L3 is the
   load-bearing law only over non-abelian-group output types. Aligns
   with the math note already in `LawRunner.checkBilinear`'s docstring.

## Tests

18/18 LawRunner + Harness tests pass after fixes. Build clean. No
behavioral changes to test outputs — the AfterStepAsync addition is
load-bearing for strict ops but no existing test exercises a strict
bilinear op (BilinearMultOp / LinearOffsetLiar / AffineBilinearLiar
are all non-strict).

## Review-thread resolution

These 3 threads on #4563 close with this commit. Remaining unresolved
state on #4563 (Stryker.NET F# unsupported error) is workflow-side and
addressed by Otto-CLI's PR #4570 + #4571 retarget work.
AceHack added a commit that referenced this pull request May 21, 2026
…oInputs (#4563)

* feat(core): LawRunner.checkBilinear + PluginHarness.runTwoInputs — PR 3 of 8

Closes the algebra-tag verification gap on `IBilinearOperator`. Until
this PR, `LawRunner` had `checkLinear` and `checkRetractionCompleteness`
but no `checkBilinear` — meaning the 3-term incremental-join rewrite in
`Incremental.IncrementalJoin` trusted the bilinear tag without any
test-time way to verify it.

## New surface

### `PluginHarness.runTwoInputs`

Drives a two-input plugin operator through paired input sequences in
lock-step (`Seq.zip` semantics — shorter sequence wins). Two
`HarnessSourceOp` sources at synthetic ids 0+1, adapter at id 2;
mirrors the `runSingleInput` shape so the per-tick publish-counter
discipline is identical.

### `LawRunner.checkBilinear`

Tests three sub-properties per sample:

  L1 — `op(a₁+a₂, b) ≡ op(a₁, b) + op(a₂, b)`   (left-linearity)
  L2 — `op(a, b₁+b₂) ≡ op(a, b₁) + op(a, b₂)`   (right-linearity)
  L3 — `op(-a, b) ≡ -op(a, b)`                   (sign-distribution)

The check generates four independent traces (A₁, A₂, B₁, B₂), runs the
operator through six combinations (A₁B₁, A₂B₁, ASumB₁, A₁B₂, A₁BSum,
ANegB₁), and checks the three laws per tick. Same `(seed,
sampleIndex)`-reproducibility shape as `checkLinear`; same `Result<unit,
LawViolation>` return type.

## Math note on L3 (corrected from initial sketch)

Over an abelian group with standard addition (the case for `int` and
`ZSet<'K>`), L1 + L2 *imply* L3: setting `a₁ = a, a₂ = -a` in L1 gives
`op(0, b) = op(a, b) + op(-a, b)`, so the classical bilinear condition
`op(0, b) = 0` collapses to L3. In that regime L3 is the cleanup law —
an affine offset like `op(a, b) = a*b + c` breaks L1 first (the
constant lands once on LHS, twice on RHS), so the L3-only failure mode
doesn't exist over `int`.

L3 becomes load-bearing — not redundant — when the user supplies a
non-abelian-group `(addOut, negOut)` pair (e.g. a monoid where
`negOut` isn't truly inverse). Checking all three keeps
`checkBilinear` correct across the full range of `'TOut` algebras a
plugin author might supply, not just `int` / `ZSet<_>`.

## Tests (6 new in LawRunner.Tests.fs, 15/15 total pass)

- `BilinearMultOp` (genuine integer multiplication): passes
- `LinearOffsetLiar` (`op(a,b) = (a+b)*2`): catches L1 violation
- `AffineBilinearLiar` (`op(a,b) = a*b + 7`): catches L1 (the L3
  failure that I'd originally claimed was unique to this fixture
  doesn't exist over `int`; the docstring + test docstring now
  explain the math)
- Reproducibility on same seed
- Bad-args validation (samples and scheduleLength)
- `runTwoInputs` lock-step behavior + truncation to shorter input

15/15 LawRunner tests pass, no regressions in broader plugin tests.
Build clean.

## Independence

PR 3 does NOT depend on PR 1 (#4558) or PR 2 (#4560) — `checkBilinear`
verifies bilinearity by trace-comparison, not by reading `Op.IsBilinear`.
This means PR 3 can ship orthogonally; targets `main` directly.

## Foundation for later PRs

PR 4's `IncrementalAuto` dispatcher will read `Op.IsBilinear` (from PR
1) to pick the three-term-bilinear rewrite. A debug-mode build hook
(planned for PR 4 or later) can run `checkBilinear` against any
operator claiming the tag, catching tag-vs-implementation drift early.

* fix(PR #4563 review threads): AfterStepAsync hook in PluginHarness + docstring math corrections

Addresses 3 reviewer findings on PR #4563:

1. **Codex P1** (PluginHarness.fs): `runTwoInputs` (and `runSingleInput`,
   pre-existing same gap) advanced plugin operators with `StepAsync` but
   never called `AfterStepAsync` — so any plugin implementing
   `IStrictOperator` exercised semantics differing from
   `Circuit.Step/StepAsync`, with strict state never committing between
   ticks. `LawRunner.checkBilinear` on strict ops would silently validate
   against incorrect state. **Fix**: mirror the Circuit's post-step
   hook — call `AfterStepAsync` after each tick's `StepAsync` completes.
   Same fix applied to both `runSingleInput` and `runTwoInputs` (pattern
   parity).

2. **Copilot** (LawRunner.fs:158): docstring described the
   `(addOut, negOut)` pair as a "monoid" with non-inverse `negOut` —
   monoids by definition lack inverses, so the example was malformed.
   **Fix**: rephrase to "caller-supplied pair that doesn't actually form
   an abelian group" (negOut might not be a true inverse; operations
   might not be associative/commutative; hidden state might creep in).
   Same point, correct algebra.

3. **Copilot** (LawRunner.Tests.fs:218 + 248): the comment block claimed
   *"The interesting failure case is L3: a plugin can satisfy L1+L2 while
   smuggling an additive offset (op(a,b) = a*b + c)"* — but over the
   abelian-group output type (`int`), any constant offset breaks L1
   first (constant lands once on LHS, twice on RHS). The framing was
   mathematically incorrect. **Fix**: reword both the bilinearity-
   fixtures intro comment AND the `AffineBilinearLiar` fixture docstring
   to reflect that L1 trips first over abelian groups; L3 is the
   load-bearing law only over non-abelian-group output types. Aligns
   with the math note already in `LawRunner.checkBilinear`'s docstring.

## Tests

18/18 LawRunner + Harness tests pass after fixes. Build clean. No
behavioral changes to test outputs — the AfterStepAsync addition is
load-bearing for strict ops but no existing test exercises a strict
bilinear op (BilinearMultOp / LinearOffsetLiar / AffineBilinearLiar
are all non-strict).

## Review-thread resolution

These 3 threads on #4563 close with this commit. Remaining unresolved
state on #4563 (Stryker.NET F# unsupported error) is workflow-side and
addressed by Otto-CLI's PR #4570 + #4571 retarget work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants