Skip to content

ci(tla/tlc): add DbspSpec to TLC test list (closes B1 partial — 1 of 4 deferred specs)#1397

Merged
AceHack merged 2 commits intomainfrom
ci/tla-dbsp-spec-into-ci-b1-partial
May 3, 2026
Merged

ci(tla/tlc): add DbspSpec to TLC test list (closes B1 partial — 1 of 4 deferred specs)#1397
AceHack merged 2 commits intomainfrom
ci/tla-dbsp-spec-into-ci-b1-partial

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 3, 2026

Summary

Per the math-proofs honest assessment (#1383) outstanding-work matrix B1 entry. Verify-then-claim sweep found 1 of 4 deferred specs actually passes; the other 3 have real issues blocking CI registration.

Verify-then-claim findings (local TLC runs, 2026-05-03)

Spec Status Detail
DbspSpec ✓ PASSES 1M distinct states explored in ~11s wall; full state-space coverage. This PR registers it as [Fact].
SpineAsyncProtocol ✗ FAILS TLC dumps trace file at depth 9 (counterexample / invariant violation)
CircuitRegistration ✗ FAILS Config bug — .cfg references invariant Safety not defined in .tla
SpineMergeInvariants ✗ FAILS TLC dumps trace file at depth 17 (counterexample)

The 3 failing specs remain intentionally unregistered. They need spec/config fixes in follow-up PRs before CI registration. The test-file comment documents this so future-Otto doesn't blanket-register them and ship 3 immediate red builds.

Why this is partial closure

This closes 1 of 4 entries in B1, partial progress on B1 → A. Full B1 closure requires fixing the 3 broken specs.

Test plan

🤖 Generated with Claude Code

…4 deferred specs)

Per the math-proofs honest assessment (#1383) outstanding-work
matrix B1 entry, "TLA+ specs with .cfg but not in CI (4 specs)":
verify-then-claim sweep on 2026-05-03 found 1 of 4 actually
passes; the other 3 have real issues blocking CI registration.

**Verify-then-claim findings (local TLC runs, 2026-05-03):**

- **DbspSpec**: PASSES — 1M distinct states explored in ~11s
  wall; full state-space coverage. Now registered as
  `[<Fact>] TLC validates DbspSpec`.

- **SpineAsyncProtocol**: FAILS — TLC dumps trace file
  (`SpineAsyncProtocol_TTrace_<ts>.tla`), indicating
  counterexample / invariant violation at depth 9.

- **CircuitRegistration**: FAILS — config bug. `.cfg`
  references invariant `Safety` not defined in `.tla`.
  ("The invariant Safety specified in the configuration file
  is not defined in the specification.")

- **SpineMergeInvariants**: FAILS — TLC dumps trace file
  at depth 17, indicating counterexample.

The 3 failing specs remain intentionally unregistered. They
need spec/config fixes in follow-up PRs before CI registration.
The test-file comment documents this so future-Otto doesn't
blanket-register them and ship 3 immediate red builds.

This closes 1 of 4 entries in B1, partial progress on
B1 → A. Full B1 closure requires fixing the 3 broken specs.

Composes with:
- #1383 math-proofs honest assessment (the matrix entry this
  partially closes)
- #1393 verification-registry rows for the 5 in-CI specs
  (DbspSpec was NOT yet registered there; future tick adds
  the registry row for it)
- proof-tool-coverage.md §2 (the original "deliberately
  skipped pending re-verification" framing — re-verification
  has now happened, with mixed results)

Local verification: dotnet test --filter DisplayName~DbspSpec
passes in 12.8s including TLC run + xunit overhead.
Copilot AI review requested due to automatic review settings May 3, 2026 12:33
@AceHack AceHack enabled auto-merge (squash) May 3, 2026 12:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Registers the DbspSpec TLA+ specification in the TLC xUnit test suite so it runs in CI alongside the already-registered specs, with an inline note documenting why the other deferred specs remain unregistered.

Changes:

  • Add a new [<Fact>] test that runs assertSpecValid "DbspSpec".
  • Document (in the test file) that three other deferred specs are intentionally not registered due to known failures/config issues.

Comment thread tests/Tests.FSharp/Formal/Tlc.Runner.Tests.fs Outdated
AceHack added a commit that referenced this pull request May 3, 2026
… verify-then-claim sweep (#1398)

Per the verify-then-claim sweep in #1397, the 4 deferred B1 specs
from #1383 math-proofs honest assessment broke down as:

- DbspSpec: PASSES (1M states / ~11s) → registered as [Fact] in
  #1397
- SpineAsyncProtocol: FAILS (TTrace dump at depth 9) → B-0179
- CircuitRegistration: FAILS (config bug — invariant Safety not
  defined in spec) → B-0180
- SpineMergeInvariants: FAILS (TTrace dump at depth 17) → B-0181

3 backlog rows file the broken specs as tracked work. Each row
includes:

- Symptom (verbatim TLC output for reproducibility)
- Investigation steps (read TTrace, classify failure mode)
- Fix path (spec fix vs invariant weakening vs real-bug
  escalation)
- After-fix steps (re-verify, add [Fact], add registry row)
- Composes-with links to sibling rows + #1383 + the production
  code being modeled

B-0181 (SpineMergeInvariants) calls out the real-bug-found case
— if TLA+ found an actual concurrency bug in BalancedSpine.fs
rather than a spec-modeling error, escalate to BUGS.md P0
immediately. The depth-17 termination makes that more plausible
than the depth-9 SpineAsyncProtocol case.

B-0180 (CircuitRegistration) is the smallest fix — just a
config-syntax error, not a semantic counterexample.

All 3 P2 since they're pre-existing failures, not new
regressions. Closure of B1 → A in the assessment matrix
requires all 3 to land before the [Fact] entries can be added.

Composes with #1397 (DbspSpec → CI; partial B1 closure) +
#1383 (the assessment) + proof-tool-coverage.md §2 (the
"deliberately skipped pending re-verification" framing —
verification has now happened, with results filed as these
3 rows).
…ghts only, retraction NOT exercised

Reviewer #1397 caught: original test comment claimed the DbspSpec
model-check 'proves the operator-algebra invariants hold across the
configured state space', but DbspSpec.cfg sets W = {0..9} (positives
only). Retraction (negative-weight) cases aren't exercised — that
coverage lives elsewhere (FsCheck Z-set property tests + Lean Prop
3.2 proof which is general over abelian-group weights).

Updated comment now:
- Lists the 9 specific invariants verified by name
- Documents the cfg's coverage scope verbatim (K = 2-key,
  W = positives only)
- Names the complementary coverage paths (FsCheck + Lean) for
  retraction cases
- Notes the future-work path (enlarging W vs refining the spec
  model) tied to chain_rule_poly follow-on
- References B-0179 + B-0180 + B-0181 for the 3 broken sibling
  specs (replaces the looser 'follow-up PRs' phrasing)

No behavioral change to the test itself — only the comment is
updated to match what the cfg actually verifies.
@AceHack AceHack merged commit 8841bbb into main May 3, 2026
25 checks passed
@AceHack AceHack deleted the ci/tla-dbsp-spec-into-ci-b1-partial branch May 3, 2026 12:41
AceHack added a commit that referenced this pull request May 3, 2026
…6-05-03 EOD progress (#1402)

Reflects substantive progress this session across the math-proofs
honest assessment matrix. Key state changes:

**P0 items — 3 of 3 closed:**
- Lean lake-build CI job ✓ (#1394)
- A4 registry rows ✓ (#1393)
- Peer-review email draft ✓ (#1387)
- Stryker B3 → partial (config-fix #1395; CI wire deferred to
  follow-up substantial-design)

**P1 items — significant progress:**
- Alloy B2 → A ✓ (#1396 — silent-no-op was the failure mode;
  spec-path fixed)
- Semgrep B4 → A ✓ (verify-then-claim correction; was already
  in CI)
- B1 4 deferred specs → 2 of 4 done:
  - DbspSpec ✓ #1397 (1M states / 11s)
  - CircuitRegistration ✓ #1401 (B-0180 closed; 3538 states / <1s)
  - SpineAsyncProtocol B-0179 still open (counterexample inv.)
  - SpineMergeInvariants B-0181 still open (counterexample inv.)

**Sibling work tracked:**
- Phase 0 substrate-discovery PoC ✓ (#1392 — 4.0 MB AOT binary
  on osx-arm64; cross-platform CI matrix)
- 3 broken-spec backlog rows filed (#1398 → B-0179 + B-0180 +
  B-0181); B-0180 closed (#1401)
- `.ts/.sh` parity bug in `tools/backlog/generate-index.ts`
  closed ✓ (#1400 — both generators byte-identical)

This update is bounded substrate work documenting the actual
state of the matrix; doesn't add new work, just captures
completion. Future matrix re-grades happen as work-items land
(per the assessment doc's audit-trail discipline).

Composes with #1383 (the original assessment) + every PR
referenced above.

§33 archive-header lint passes.
AceHack added a commit that referenced this pull request May 3, 2026
…egister in CI

Per B-0180 (filed in #1398) — `CircuitRegistration.cfg` referenced
`INVARIANT Safety` but the spec didn't define a `Safety` operator,
causing TLC to error out:

  Error: The invariant Safety specified in the configuration file
  is not defined in the specification.

The intended composite invariant is documented at the bottom of the
spec via the THEOREM `Spec => [](TypeOK /\ NoRegisterAfterBuild)`.
Defined `Safety` as exactly that conjunction so the cfg directive
resolves without changing TLC's check semantics.

**Local verification:**

  - `java -cp tools/tla/tla2tools.jar tlc2.TLC -workers 4 -config
     tools/tla/specs/CircuitRegistration.cfg
     tools/tla/specs/CircuitRegistration.tla` →
    "Model checking completed. No error has been found."
    3538 distinct states explored, depth 14, <1s wall.

  - `dotnet test --filter "FullyQualifiedName~CircuitRegistration"`
    passes in 1.7s.

**Composes with:**

- B-0180 (the backlog row this closes; one of 3 broken-spec
  follow-ups from #1397's verify-then-claim sweep)
- #1397 (DbspSpec → CI; the partial-B1 closure)
- #1383 math-proofs honest assessment (the matrix this
  incrementally closes — 2 of 4 deferred B1 specs now in CI;
  remaining: B-0179 SpineAsyncProtocol + B-0181 SpineMergeInvariants
  both need counterexample investigation, not config fixes)

**Discipline note:** the original spec author left "Safety:" as a
prose comment label on `NoRegisterAfterBuild` (line 81) and on
`ConnectAtMostOnce` (line 85), implying both were intended as
"safety properties" but never finished the operator definition.
The THEOREM at line 96 disambiguates: only `TypeOK /\
NoRegisterAfterBuild` belongs in INVARIANT (a state property);
`ConnectAtMostOnce` uses `[]_vars` and is a temporal PROPERTY
(deferred — not in this fix's scope).
AceHack added a commit that referenced this pull request May 3, 2026
…structions §282

Reviewer #1401 caught: my CircuitRegistration test comments + the
Safety operator's spec comment carried historical narrative (B-0180
ticket ID, #1397 PR ref, 2026-05-03 sweep date, 'Closes 1 of 3
broken sibling specs', 'was missing until B-0180 fixed', etc.) on
current-state surfaces. Per .github/copilot-instructions.md §282:

  *Documentation is current state, not history. Historical
  narrative lives in docs/ROUND-HISTORY.md and ADRs under
  docs/DECISIONS/; everywhere else in docs/ edit in place.*

Same rule applies to spec files + test comments — they describe
what the artifact currently does, not how it got there.

Stripped:
- Test comments now describe ONLY the current invariants checked
  + cfg coverage scope + complementary-coverage pointers (no
  ticket IDs, no dates, no PR refs)
- Spec's Safety operator comment now says only what Safety is
  (composite safety invariant matching the THEOREM) without the
  B-0180 / 2026-05-03 / verify-then-claim history
- Compatible-coverage pointers retained (FsCheck Z-set tests +
  Lean proof) since those are current-state composes-with refs

Build still 0/0; functional behavior unchanged.
AceHack added a commit that referenced this pull request May 3, 2026
…ister in CI (#1401)

* fix(B-0180): define `Safety` invariant in CircuitRegistration.tla + register in CI

Per B-0180 (filed in #1398) — `CircuitRegistration.cfg` referenced
`INVARIANT Safety` but the spec didn't define a `Safety` operator,
causing TLC to error out:

  Error: The invariant Safety specified in the configuration file
  is not defined in the specification.

The intended composite invariant is documented at the bottom of the
spec via the THEOREM `Spec => [](TypeOK /\ NoRegisterAfterBuild)`.
Defined `Safety` as exactly that conjunction so the cfg directive
resolves without changing TLC's check semantics.

**Local verification:**

  - `java -cp tools/tla/tla2tools.jar tlc2.TLC -workers 4 -config
     tools/tla/specs/CircuitRegistration.cfg
     tools/tla/specs/CircuitRegistration.tla` →
    "Model checking completed. No error has been found."
    3538 distinct states explored, depth 14, <1s wall.

  - `dotnet test --filter "FullyQualifiedName~CircuitRegistration"`
    passes in 1.7s.

**Composes with:**

- B-0180 (the backlog row this closes; one of 3 broken-spec
  follow-ups from #1397's verify-then-claim sweep)
- #1397 (DbspSpec → CI; the partial-B1 closure)
- #1383 math-proofs honest assessment (the matrix this
  incrementally closes — 2 of 4 deferred B1 specs now in CI;
  remaining: B-0179 SpineAsyncProtocol + B-0181 SpineMergeInvariants
  both need counterexample investigation, not config fixes)

**Discipline note:** the original spec author left "Safety:" as a
prose comment label on `NoRegisterAfterBuild` (line 81) and on
`ConnectAtMostOnce` (line 85), implying both were intended as
"safety properties" but never finished the operator definition.
The THEOREM at line 96 disambiguates: only `TypeOK /\
NoRegisterAfterBuild` belongs in INVARIANT (a state property);
`ConnectAtMostOnce` uses `[]_vars` and is a temporal PROPERTY
(deferred — not in this fix's scope).

* fix(tlc-runner): remove brittle line-number reference in CircuitRegistration test comment

Reviewer #1401 caught: the comment said 'per the spec's THEOREM at
line ~96' but the actual line drifts as the spec evolves (currently
line 106 after my Safety operator addition).

Reworded to reference the THEOREM by content rather than line:
'matching the spec's stated THEOREM Spec => [](TypeOK /NoRegisterAfterBuild)'. Doesn't drift, doesn't require maintenance
when the spec is edited.

* fix(tests/tlc + tla/specs): strip historical narrative per copilot-instructions §282

Reviewer #1401 caught: my CircuitRegistration test comments + the
Safety operator's spec comment carried historical narrative (B-0180
ticket ID, #1397 PR ref, 2026-05-03 sweep date, 'Closes 1 of 3
broken sibling specs', 'was missing until B-0180 fixed', etc.) on
current-state surfaces. Per .github/copilot-instructions.md §282:

  *Documentation is current state, not history. Historical
  narrative lives in docs/ROUND-HISTORY.md and ADRs under
  docs/DECISIONS/; everywhere else in docs/ edit in place.*

Same rule applies to spec files + test comments — they describe
what the artifact currently does, not how it got there.

Stripped:
- Test comments now describe ONLY the current invariants checked
  + cfg coverage scope + complementary-coverage pointers (no
  ticket IDs, no dates, no PR refs)
- Spec's Safety operator comment now says only what Safety is
  (composite safety invariant matching the THEOREM) without the
  B-0180 / 2026-05-03 / verify-then-claim history
- Compatible-coverage pointers retained (FsCheck Z-set tests +
  Lean proof) since those are current-state composes-with refs

Build still 0/0; functional behavior unchanged.
AceHack added a commit that referenced this pull request May 3, 2026
…ster + cache-clobber discipline encoded (#1408)

Substantial multi-tick session shard. 18 PRs touched (#1383 + #1387
+ #1392-#1407 inclusive); 14 merged + 4 in-flight as of shard time.

**Math-proofs assessment progress** (#1383 outstanding-work matrix):
- A1+A2 → A-with-CI ✓ (#1394 Lean lake-build workflow)
- A4 registry rows ✓ (#1393)
- B1 → 2 of 4 deferred specs in CI ✓ (#1397 DbspSpec + #1401
  CircuitRegistration B-0180 closed)
- B2 Alloy → A ✓ (#1396 silent-no-op spec-path fix)
- B4 Semgrep → A ✓ (correction)
- Peer-review email template ✓ (#1387)
- Phase 0 substrate-discovery PoC ✓ (#1392)
- Stryker config-fix ✓ (#1395; CI wire deferred)
- 3 broken-spec backlog rows filed ✓ (#1398)

**Cache-clobber silent-bug class discovered + fully encoded:**
B-0180 fix passing locally + failing CI → verify-then-claim
identified gate.yml + low-memory.yml caching whole tools/tla and
tools/alloy directories. Fix cluster: #1403 (gate.yml) + #1404
(low-memory.yml + audit-ci-cache-paths.ts) + #1406 (CI lint gate)
+ #1407 (memory file + bug-locus disambiguation per Aaron's 'real
github bug?' question — answer: usage-bug, not tool-bug).

**Other substrate work:** #1399 BACKLOG.md regen, #1400 .ts/.sh
parity bug, #1402 assessment matrix doc update, #1405 B-0182
backlog row (Linux-only formal verification — orthogonal-axes
split per Aaron 2026-05-03).

**Discipline lessons captured:** chat-is-assertion-channel,
substrate-corrections-cluster, search-first-before-architectural-
expansion, verify-then-claim CI fidelity, documentation-is-
current-state-not-history.

Carved sentence: 'When a lucky catch surfaces a class of bug,
build the structural fix that eliminates the luck — audit + lint
gate + carved-sentence rule + memory file.'
AceHack added a commit that referenced this pull request May 3, 2026
…l counterexample) (#1411)

* fix(B-0179): SpineAsyncProtocol — add CHECK_DEADLOCK FALSE; not a real counterexample

Per B-0179 verify-then-claim sweep. Re-running TLC on
SpineAsyncProtocol revealed: TLC reports "Deadlock reached" at the
all-done quiescent state (processed=NumBatches, channel empty,
both PCs idle). This is intended termination, not bug-deadlock.

Local TLC run before fix:
  Error: Deadlock reached.
  21 states generated, 15 distinct states found, 0 states left on queue.
  Depth 9.

The 3 invariants (InvMonotonic, InvEventuallyDrains,
InvFlushTerminates) all hold across all 15 explored states. The
ONLY violation is the deadlock-at-terminal warning.

Fix: add CHECK_DEADLOCK FALSE to SpineAsyncProtocol.cfg (same
pattern as CircuitRegistration.cfg + TwoPCSink.cfg + several
others). Tells TLC that reaching quiescence is acceptable for
bounded protocols.

Local verification:
  Model checking completed. No error has been found.
  21 states generated, 15 distinct states found, 0 states left on queue.

Added [Fact] entry registering SpineAsyncProtocol in TLC test
list. Closes B-0179 P2 backlog row.

B1 progress: 3 of 4 deferred specs now in CI (DbspSpec via #1397,
CircuitRegistration via #1401, SpineAsyncProtocol via this PR).
Remaining: B-0181 SpineMergeInvariants (depth-17 trace; substantial
counterexample investigation possibly indicating real bug).

* fix(tlc-runner): correct SpineAsyncProtocol comment — state invariants, not temporal liveness

Reviewer #1411 caught: my comment described the 3 invariants as if
they were temporal-liveness properties (eventually-drains,
flush-terminates), but the actual TLC-checked predicates are
state invariants. The 'eventually' / 'terminates' naming is
historical from the spec author's intent comments — the actual
predicates are conditional state assertions (when channel empty,
processed = sent at that instant).

Updated comment to:
- Mark them explicitly as STATE invariants
- Describe each predicate's actual TLC-checkable form
- Note that real liveness would require LTL properties + WF/SF
  fairness assumptions (not what this spec does)

This is the documentation-is-current-state-not-history rule
applied to comment-vs-implementation alignment: comments must
describe what TLC actually checks, not what the spec author
labeled the property in their intent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants