Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ are closed (status: closed in frontmatter)._
- [ ] **[B-0179](backlog/P2/B-0179-fix-spineasyncprotocol-tla-counterexample-blocks-b1-ci-registration-2026-05-03.md)** Fix SpineAsyncProtocol.tla counterexample (blocks B1 → A CI registration; depth-9 trace dump on TLC run)
- [ ] **[B-0180](backlog/P2/B-0180-fix-circuitregistration-tla-config-bug-safety-invariant-undefined-2026-05-03.md)** Fix CircuitRegistration.tla config bug — `.cfg` references invariant `Safety` not defined in `.tla` (blocks B1 → A CI registration)
- [ ] **[B-0181](backlog/P2/B-0181-fix-spinemergeinvariants-tla-counterexample-blocks-b1-ci-registration-2026-05-03.md)** Fix SpineMergeInvariants.tla counterexample (blocks B1 → A CI registration; depth-17 trace dump on TLC run)
- [ ] **[B-0182](backlog/P2/B-0182-linux-only-formal-verification-tests-pure-math-no-os-difference-aaron-2026-05-03.md)** Filter formal-verification tests to standard Linux only — TLC + Alloy + Lean are pure-math computation; no OS-specific behavior

## P3 — convenience / deferred

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
---
id: B-0182
priority: P2
status: open
title: Filter formal-verification tests to standard Linux only — TLC + Alloy + Lean are pure-math computation; no OS-specific behavior
tier: ci-architecture
effort: M
ask: Aaron 2026-05-03 — *"we don't have to run formal verifical on all the oses, we can run that just on linux it does not have differents between OS like scirpt and code that touch the enviroment. just the standard linux too don't need it to run on slim"*
created: 2026-05-03
last_updated: 2026-05-03
depends_on: []
composes_with: [docs/research/2026-05-03-math-proofs-honest-assessment.md, B-0017]
tags: [ci, formal-verification, tlc, alloy, lean, cost-optimization, linux-only, tests]
---

# Filter formal-verification tests to standard Linux only

## Problem

Currently the F# test suite runs the same formal-verification tests on:

- **gate.yml**: `macos-26`, `ubuntu-24.04`, `ubuntu-24.04-arm` (every PR)
- **low-memory.yml**: `ubuntu-slim` (every push to main + nightly)

But formal verification (TLC, Alloy, Lean) is pure-math computation:
no OS-specific behavior, no environment touching, no script differences.
Running TLC on macOS produces the same result as Linux. Running Alloy
on ARM produces the same result as x64. The duplication wastes CI
time + cost.

## Aaron 2026-05-03 verbatim

> *"we don't have to run formal verifical on all the oses, we can run
> that just on linux it does not have differents between OS like scirpt
> and code that touch the enviroment. just the standard linux too
> don't need it to run on slim"*

## Scope

Filter to standard `ubuntu-24.04` only. Skip on:

- `macos-26` (gate.yml)
- `ubuntu-24.04-arm` (gate.yml)
- `windows-2025` / `windows-11-arm` (gate.yml post-merge)
- `ubuntu-slim` (low-memory.yml)

Affected test classes:

- `tests/Tests.FSharp/Formal/Tlc.Runner.Tests.fs` (per-spec `[<Fact>]` TLC entries)
- `tests/Tests.FSharp/Formal/Alloy.Runner.Tests.fs` (per-spec `[<Fact>]` Alloy entries)
- (Future) Lean integration tests if any are added at the F# layer

## Implementation options

### Option A — Runtime check in `toolchainReady()`

Cheapest. Add a check to the existing `toolchainReady()` helpers
that gates on TWO axes:

1. **OS axis**: skip on non-Linux (catches `macos-26`,
`windows-2025`, `windows-11-arm`).
2. **Runner-class axis**: skip on the slim runner. Since
`ubuntu-slim` is still Linux, an OS-only check won't catch it.
The discriminator is `GITHUB_WORKFLOW=low-memory` (GHA-set env
var) — survives runner-name churn unlike `RUNNER_NAME`.

Combined logic: run formal-verification tests when
`Linux AND GITHUB_WORKFLOW != "low-memory"`; otherwise
silent-no-op.

Pros: minimal code change; consistent with existing
skip-on-missing-toolchain pattern.
Cons: macOS dev local runs would silent-skip even though TLC
works locally (regression vs current behavior; devs lose local
validation).

Mitigation: gate the skip on `CI=true` env var so dev laptops
still run them; on CI the dual-axis check applies.

### Option B — Custom `[<LinuxOnly>]` xunit attribute

More structured. Custom xunit `Theory`/`Fact` decorator that skips
the test on non-Linux runners.

Pros: explicit intent at each test; easy to grep for skipped tests.
Cons: more code; needs xunit collection-attribute machinery.

### Option C — Separate test project `Tests.FSharp.Formal`

Most invasive. Move TLC + Alloy tests to a dedicated project that
the build matrix only includes on Linux.

Pros: cleanest separation; doesn't run formal-verification build
on macOS/ARM at all (saves more time than just skipping).
Cons: substantial refactor; affects build matrix.

**Recommendation**: Option A with `CI=true` env-gate. Smallest change;
preserves dev-local validation.

## Composes with

- `docs/research/2026-05-03-math-proofs-honest-assessment.md`
(the assessment doc; CI cost is implicit context for "runs in CI"
A-grade gating)
- B-0017 Operational Resonance Dashboard (when CI cost becomes a
visible dashboard metric)

## Effort estimate

M (1-2 days):

- Half day: pick Option, implement
- Half day: verify CI matrix shape unchanged (dotnet test still runs;
just specific tests skip); confirm dev local still validates
- Doc updates: math-proofs assessment B1/A4 sections + low-memory.yml
+ gate.yml comments
Loading