Skip to content

fix(tests/alloy): correct stale spec path so Alloy tests actually validate (closes B2 → A)#1396

Merged
AceHack merged 1 commit intomainfrom
fix/alloy-runner-spec-path-bug
May 3, 2026
Merged

fix(tests/alloy): correct stale spec path so Alloy tests actually validate (closes B2 → A)#1396
AceHack merged 1 commit intomainfrom
fix/alloy-runner-spec-path-bug

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 3, 2026

Summary

Per the math-proofs honest assessment (#1383) outstanding-work matrix B2 entry: "Alloy LSM-spine spec — not in CI". The actual state was more subtle than that grade suggested.

Pre-fix: Alloy tests existed in tests/Tests.FSharp/Formal/Alloy.Runner.Tests.fs and were registered with xUnit. They ran via dotnet test (which gate.yml runs every PR). But the spec path constant pointed at docs/<Spec>.als, and actual specs live at tools/alloy/specs/<Spec>.als. The toolchain-ready fallback path silent-skipped on the missing file → 3 tests passed in ~1 second total with no actual validation.

Post-fix: alloySpecsPath = Path.Combine(repoRoot, "tools", "alloy", "specs"). Tests now actually invoke java -cp alloy.jar AlloyRunner against each .als spec. Local verification: 3 tests pass in ~2 seconds with Spine + InfoTheoreticSharder taking ~400ms each (real Alloy java invocations).

Why this closes B2 → A

The Alloy specs (Spine.als + InfoTheoreticSharder.als) now run on every dotnet test invocation (every PR via gate.yml). Real validation rather than silent skip. Per the assessment grading rubric, A-grade requires "Runs in CI on every commit" — now satisfied.

Discipline lesson

Silent-no-op is the failure mode the verify-then-claim discipline catches. The B2 grading found the gap; the fix was a 2-character path correction. This is exactly the kind of "test that passes on any input" failure that Stryker mutation testing would surface.

Test plan

🤖 Generated with Claude Code

…idate (closes B2 → A from #1383)

Per the math-proofs honest assessment (#1383) outstanding-work
matrix B2 entry, "Alloy LSM-spine spec — not in CI". The actual
state was more subtle: tests existed in CI via `dotnet test` but
silently no-op'd because the spec path was stale.

Pre-fix:
  - `Alloy.Runner.Tests.fs` constant `docsPath` pointed at
    `docs/<Spec>.als`
  - Actual specs live at `tools/alloy/specs/<Spec>.als`
  - `assertSpecValid` silent-skipped via the toolchain-ready
    fallback path when the spec file wasn't found at the
    expected location
  - Result: 3 tests passed in ~1 second total — silent no-op

Post-fix:
  - `alloySpecsPath = Path.Combine(repoRoot, "tools", "alloy",
    "specs")` matches the actual spec location
  - Tests now actually invoke `java -cp alloy.jar AlloyRunner`
    against each .als spec
  - Local verification: 3 tests pass in ~2 seconds with
    Spine + InfoTheoreticSharder taking ~400ms each (real
    Alloy java invocations)

This closes B2 → A in the math-proofs assessment: the Alloy
specs now run on every `dotnet test` (which gate.yml runs on
every PR), with real validation rather than silent skip.

Discipline lesson: silent-no-op is the failure mode the verify-
then-claim discipline catches. The B2 grading caught the gap;
the fix was a 2-character path correction.

Composes with:
- docs/research/2026-05-03-math-proofs-honest-assessment.md
  (the assessment that flagged B2)
- tools/alloy/specs/Spine.als + InfoTheoreticSharder.als (the
  artifacts now actually validated)
- tests/Tests.FSharp/Formal/Tlc.Runner.Tests.fs (sibling
  pattern; the parallel runner for TLA+ specs)
Copilot AI review requested due to automatic review settings May 3, 2026 12:29
@AceHack AceHack enabled auto-merge (squash) May 3, 2026 12:29
@AceHack AceHack merged commit 160eed5 into main May 3, 2026
27 checks passed
@AceHack AceHack deleted the fix/alloy-runner-spec-path-bug branch May 3, 2026 12:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the Alloy xUnit runner so it points at the repository’s actual Alloy spec directory (tools/alloy/specs) and therefore executes real Alloy validation during dotnet test runs (including CI), rather than silently skipping meaningful work due to a stale path.

Changes:

  • Update the spec root constant from docs/ to tools/alloy/specs/.
  • Adjust the runner’s documentation/comments to reflect the new spec location.

Comment on lines +53 to +56
// docs/ and the tests silently no-op'd (no <Spec>.als file found
// triggered toolchainReady-false-path skip-pattern; B2 from #1383
// math-proofs assessment was correct that Alloy "not in CI"). The
// path correction is the prerequisite for the tests to actually run.
Comment on lines +51 to +56
// Alloy specs live at tools/alloy/specs/<Spec>.als (not docs/) since
// the round-? specs reorganization. Pre-fix this constant pointed at
// docs/ and the tests silently no-op'd (no <Spec>.als file found
// triggered toolchainReady-false-path skip-pattern; B2 from #1383
// math-proofs assessment was correct that Alloy "not in CI"). The
// path correction is the prerequisite for the tests to actually run.
AceHack added a commit that referenced this pull request May 3, 2026
…6-05-03 EOD progress (#1402)

Reflects substantive progress this session across the math-proofs
honest assessment matrix. Key state changes:

**P0 items — 3 of 3 closed:**
- Lean lake-build CI job ✓ (#1394)
- A4 registry rows ✓ (#1393)
- Peer-review email draft ✓ (#1387)
- Stryker B3 → partial (config-fix #1395; CI wire deferred to
  follow-up substantial-design)

**P1 items — significant progress:**
- Alloy B2 → A ✓ (#1396 — silent-no-op was the failure mode;
  spec-path fixed)
- Semgrep B4 → A ✓ (verify-then-claim correction; was already
  in CI)
- B1 4 deferred specs → 2 of 4 done:
  - DbspSpec ✓ #1397 (1M states / 11s)
  - CircuitRegistration ✓ #1401 (B-0180 closed; 3538 states / <1s)
  - SpineAsyncProtocol B-0179 still open (counterexample inv.)
  - SpineMergeInvariants B-0181 still open (counterexample inv.)

**Sibling work tracked:**
- Phase 0 substrate-discovery PoC ✓ (#1392 — 4.0 MB AOT binary
  on osx-arm64; cross-platform CI matrix)
- 3 broken-spec backlog rows filed (#1398 → B-0179 + B-0180 +
  B-0181); B-0180 closed (#1401)
- `.ts/.sh` parity bug in `tools/backlog/generate-index.ts`
  closed ✓ (#1400 — both generators byte-identical)

This update is bounded substrate work documenting the actual
state of the matrix; doesn't add new work, just captures
completion. Future matrix re-grades happen as work-items land
(per the assessment doc's audit-trail discipline).

Composes with #1383 (the original assessment) + every PR
referenced above.

§33 archive-header lint passes.
AceHack added a commit that referenced this pull request May 3, 2026
…ster + cache-clobber discipline encoded (#1408)

Substantial multi-tick session shard. 18 PRs touched (#1383 + #1387
+ #1392-#1407 inclusive); 14 merged + 4 in-flight as of shard time.

**Math-proofs assessment progress** (#1383 outstanding-work matrix):
- A1+A2 → A-with-CI ✓ (#1394 Lean lake-build workflow)
- A4 registry rows ✓ (#1393)
- B1 → 2 of 4 deferred specs in CI ✓ (#1397 DbspSpec + #1401
  CircuitRegistration B-0180 closed)
- B2 Alloy → A ✓ (#1396 silent-no-op spec-path fix)
- B4 Semgrep → A ✓ (correction)
- Peer-review email template ✓ (#1387)
- Phase 0 substrate-discovery PoC ✓ (#1392)
- Stryker config-fix ✓ (#1395; CI wire deferred)
- 3 broken-spec backlog rows filed ✓ (#1398)

**Cache-clobber silent-bug class discovered + fully encoded:**
B-0180 fix passing locally + failing CI → verify-then-claim
identified gate.yml + low-memory.yml caching whole tools/tla and
tools/alloy directories. Fix cluster: #1403 (gate.yml) + #1404
(low-memory.yml + audit-ci-cache-paths.ts) + #1406 (CI lint gate)
+ #1407 (memory file + bug-locus disambiguation per Aaron's 'real
github bug?' question — answer: usage-bug, not tool-bug).

**Other substrate work:** #1399 BACKLOG.md regen, #1400 .ts/.sh
parity bug, #1402 assessment matrix doc update, #1405 B-0182
backlog row (Linux-only formal verification — orthogonal-axes
split per Aaron 2026-05-03).

**Discipline lessons captured:** chat-is-assertion-channel,
substrate-corrections-cluster, search-first-before-architectural-
expansion, verify-then-claim CI fidelity, documentation-is-
current-state-not-history.

Carved sentence: 'When a lucky catch surfaces a class of bug,
build the structural fix that eliminates the luck — audit + lint
gate + carved-sentence rule + memory file.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants