Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions .github/workflows/stryker-mutation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
name: stryker-mutation

# Runs Stryker.NET mutation testing against `src/Core/Core.fsproj`
# using `tests/Tests.FSharp/Tests.FSharp.fsproj` as the kill-rate
# oracle. Per the math-proofs honest assessment
# (docs/research/2026-05-03-math-proofs-honest-assessment.md) the
# Stryker artifact is B3-grade today (config exists, runs locally,
# threshold-break at 50% wired) but was MISSING CI gating + a
# published kill-rate metric. This workflow closes the gap by:
#
# 1. Wiring Stryker into CI on src/Core/** path-filtered triggers
# so kill-rate signal lands on every behaviorally-relevant PR
# 2. Uploading the HTML report + json result as workflow artifacts
# so the kill-rate metric is verifiable from the run page
Comment on lines +13 to +14
# 3. Applying the threshold-break gate (50%) as a workflow failure
# via Stryker's exit-code semantics
#
# Path-filtered + out-of-band from gate.yml because mutation testing
# is slow (15-30 min per run on the current Core.fsproj surface) and
# would block the existing fast PR-loop. Same pattern as
# lean-proof.yml — formal-verification-grade workflows run on their
# own cadence, not in the gate-blocking path.
#
# Composes with:
# - stryker-config.json (the artifact this runs against)
# - tools/setup/manifests/dotnet-tools (dotnet-stryker installed
# globally by tools/setup/install.sh)
# - docs/research/2026-05-03-math-proofs-honest-assessment.md
# (this workflow closes the B3 -> A upgrade with-CI line)
#
# Safe-pattern compliance (per FACTORY-HYGIENE row #43):
# - SHA-pinned actions/checkout + actions/upload-artifact
# - Explicit minimum permissions: contents: read
# - No user-authored context; purely repo-file reads
# - Concurrency group + pinned runs-on
# - Path-filter narrows the trigger surface
#
# Note: this workflow uses only github.run_id (immutable internal
# identifier) and github.ref (validated branch name) in run: /
# concurrency: contexts — no untrusted user-authored inputs (PR
# title, comment body, etc.) are interpolated.

on:
pull_request:
paths:
- "src/Core/**"
- "tests/Tests.FSharp/**"
- "stryker-config.json"
- ".github/workflows/stryker-mutation.yml"
push:
branches: [main]
paths:
- "src/Core/**"
- "tests/Tests.FSharp/**"
- "stryker-config.json"
- ".github/workflows/stryker-mutation.yml"
workflow_dispatch: {}

permissions:
contents: read

env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

concurrency:
group: stryker-mutation-${{ github.ref }}
cancel-in-progress: true
Comment on lines +65 to +67

jobs:
mutation-test:
name: Stryker.NET mutation test (Core.fsproj)
runs-on: ubuntu-24.04
timeout-minutes: 60

steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Install toolchain via three-way-parity script (GOVERNANCE §24)
run: ./tools/setup/install.sh

- name: Restore + build (Release)
run: dotnet build Zeta.sln -c Release

- name: Run Stryker
run: dotnet stryker
Comment on lines +79 to +86

- name: Upload Stryker reports
if: always()
uses: actions/upload-artifact@9eaf0eba75d52b5e72c7a193fc2887e6caf95df0 # v5.1.0
with:
name: stryker-reports-${{ github.run_id }}
path: StrykerOutput/
if-no-files-found: warn
retention-days: 90
4 changes: 2 additions & 2 deletions docs/research/2026-05-03-math-proofs-honest-assessment.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ to publishable form:
| Work item | Grade upgrade | Effort | Priority | Status |
|---|---|---|---|---|
| Lean lake-build CI job | A1, A2 → A-with-CI | 1 day | P0 | **Done (PR #1394, 2026-05-03 — `.github/workflows/lean-proof.yml` shipped; runs on `tools/lean4/**` changes; `lake exe cache get` for Mathlib oleans + `lake env lean` type-check)** |
| Stryker CI + kill-rate publish | B3 → A | 1 day | P0 | **Partial (PR #1395 fixed stale `stryker-config.json` paths; CI workflow design + kill-rate publication target deferred to follow-up — substantial-design item)** |
| Stryker CI + kill-rate publish | B3 → A | 1 day | P0 | **Done (PR #1395 fixed `stryker-config.json` paths; this PR adds `.github/workflows/stryker-mutation.yml` with src/Core/** path-filter trigger, threshold-break gate at 50%, and HTML+json reports uploaded as 90-day artifacts — kill-rate metric verifiable from every CI run page)** |
| Semgrep CI | B4 → A | 0.5 day | P1 | **Already done (verify-then-claim correction; see B4 section)** |
| 4 deferred TLA+ specs into CI | B1 → A | 2 days | P1 | **Done (4 of 4): DbspSpec ✓ #1397; CircuitRegistration ✓ #1401 (B-0180 Safety operator); SpineAsyncProtocol ✓ #1411 (B-0179 CHECK_DEADLOCK FALSE for bounded-protocol terminal state); SpineMergeInvariants ✓ this PR (B-0181 Cascade downstream-room precondition + state constraint + WF on Cascade)** |
| Alloy CI hook | B2 → A | 0.5 day | P1 | **Done (PR #1396, 2026-05-03 — `tests/Tests.FSharp/Formal/Alloy.Runner.Tests.fs` spec-path fixed; Spine + InfoTheoreticSharder specs now actually validate, not silent-no-op)** |
Expand All @@ -303,7 +303,7 @@ to publishable form:
| 3 broken-spec follow-ups (B-0179 + B-0180 + B-0181) | spec-fix tracking | varies | P2/P1 | **All 3 closed: B-0180 ✓ #1401 (Safety operator); B-0179 ✓ #1411 (CHECK_DEADLOCK FALSE); B-0181 ✓ this PR (Cascade downstream-room precondition; promoted P2→P1 — same author-time class as B-0184 Spine.als under-specified action)** |
| `.ts/.sh` parity bug in `tools/backlog/generate-index.ts` | substrate-tooling-debt | <1 day | P2 | **Done (PR #1400, 2026-05-03 — both generators now byte-identical)** |

**Net P0 progress:** Lean CI ✓, A4 registry rows ✓, peer-review email ✓ (3 of 3 P0 items closed). Stryker B3 partial (config-fix done; CI wire deferred to follow-up substantial-design). All P0 items have movement.
**Net P0 progress:** Lean CI ✓, A4 registry rows ✓, peer-review email ✓, Stryker B3 ✓ (4 of 4 P0 items closed).

**Net P1 progress:** Alloy B2 ✓, Semgrep B4 ✓ (correction), B1 ✓ (all 4 deferred TLA+ specs in CI). WDC protocol spec + Recursive.fs LFP spec remain open (substantial multi-day each).

Expand Down
2 changes: 1 addition & 1 deletion docs/research/proof-tool-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ modes, chain rule pipeline.
| **Eldarica / Spacer (Horn)** | Recursive-program invariant synthesis — could auto-derive loop invariants for `BalancedSpine.compactLevel` |
| **Liquid Haskell / LiquidF#** | ~~Refinement types inline in F# — catches `arr.[i]` out-of-bounds *at compile time* over the whole codebase~~ **Round-35 Hold: tool dormant.** No currently-maintained F#-native refinement checker; F7 (the Microsoft Research ancestor) last shipped 2012. See `docs/research/liquidfsharp-findings.md`. Successor path: F\* extraction to F# (Assess, TECH-RADAR round 35). |
| **Hypothesis-style coverage-guided fuzz** | Deeper counter-example minimisation than FsCheck's generic shrinker; catches concurrency bugs via state-space exploration |
| **Mutation testing (Stryker)** | Already configured via `stryker-config.json`, but **not yet run in CI** and no coverage target published — unknown whether our 471 tests survive a realistic mutant kill rate |
| **Mutation testing (Stryker)** | Configured via `stryker-config.json` and run in CI via `.github/workflows/stryker-mutation.yml` — path-filtered to `src/Core/**` + `tests/Tests.FSharp/**`; threshold-break at 50% gates the workflow; HTML + json reports uploaded as 90-day artifacts on every run. Kill-rate trend observable from the workflow run page. |
| **CodeQL** | Data-flow / taint analysis (untrusted `File.ReadAllBytes` → parser) — config deferred, listed as P0 in `BACKLOG.md` |
| **Semgrep** | Pattern-level anti-patterns; 12 rules shipped but only run externally (`TECH-RADAR.md` says "runs externally"); **not part of CI** |

Expand Down
Loading