feat(B-0170.4): seed eval-set fixture for count-drift regression coverage by AceHack · Pull Request #3611 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-15T22:55:01Z

Summary

Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage).

New tools/substrate-claim-checker/fixtures/ directory with one frozen historical drift instance — count-drift-9-vs-15.md reproducing the count-drift pattern from PR review(pr-1257-postmerge): verify-then-claim count drift (9→18+) frontmatter + body + MEMORY.md #1259 (claim "9 drift instances" vs 15-row body table)
New fixtures.test.ts regression test asserting check-counts.ts still detects the empirical drift the fixture preserves
fixtures/README.md documents the index + the procedure for adding the next fixture (one sub-class per slice)
Top-level README points at the new eval-set surface

Empirical axis complement to the synthetic-case unit tests in each check-*.test.ts: fixtures regress against the actual drift patterns that prompted the discipline, not just toy inputs.

Test plan

bun test tools/substrate-claim-checker/fixtures.test.ts — 1 pass, 6 expect() calls, exit 0
bun test tools/substrate-claim-checker/ (full suite) — 113 pass, 0 fail, 250 expect() calls
bun tools/substrate-claim-checker/check-counts.ts tools/substrate-claim-checker/fixtures/count-drift-9-vs-15.md — 2 count-drift findings, exit 1 (drift surfaces as designed)
Branch verified before commit, tree size sanity-checked pre + post (52 root entries, no broken-commit canary)
Bus claim acquired: 72031688-2a2b-466d-a045-a5b76802d6df (otto-cli, B-0170.4)

Peer-work isolation

Avoided collision with in-flight branches:

otto-b0170-decompose-into-atomic-children-2026-05-15 (otto-desktop, parent B-0170 → B-0538-B-0541 children)
otto-cli/b0170-1-semantic-equiv-checker-2026-05-15 (B-0170.1)
otto-cli/b0170-3-self-recursive-checker-2026-05-15 (B-0170.3)

This slice touches only tools/substrate-claim-checker/fixtures* + README.md — purely additive.

🤖 Generated with Claude Code

…rage Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage): - New `tools/substrate-claim-checker/fixtures/` directory with one frozen historical drift instance — `count-drift-9-vs-15.md` reproducing the count-drift pattern from PR #1259 (claim "9 drift instances" vs 15-row body table) - New `fixtures.test.ts` regression test asserting `check-counts.ts` still detects the empirical drift the fixture preserves - `fixtures/README.md` documents the index + the procedure for adding the next fixture (one sub-class per slice) - Top-level README points at the new eval-set surface Empirical axis complement to the synthetic-case unit tests in each `check-*.test.ts`: fixtures regress against the actual drift patterns that prompted the discipline, not just toy inputs. Focused checks: - `bun test tools/substrate-claim-checker/fixtures.test.ts` — 1 pass, 6 expect() calls, exit 0 - `bun test tools/substrate-claim-checker/` (full suite) — 113 pass, 0 fail, 250 expect() calls (negative-path stderr lines are intentional error-handling cases) - `bun tools/substrate-claim-checker/check-counts.ts <fixture>` — 2 count-drift findings, exit 1 (drift surfaces as designed) Claim: 72031688-2a2b-466d-a045-a5b76802d6df (otto-cli, B-0170.4). Peer work in flight (avoided collision): - otto-desktop: parent B-0170 decompose branch (B-0538-B-0541 children) - otto-cli: B-0170.1 (semantic-equivalence checker), B-0170.3 (self-recursive checker) operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 212ce34a27

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

Adds the first on-disk eval-set fixture for tools/substrate-claim-checker, seeding historical count-drift regression coverage for B-0170.4.

Changes:

Documents the new fixture surface from the checker README.
Adds a fixture README with indexing and contribution procedure.
Adds one count-drift markdown fixture plus a Bun regression test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`tools/substrate-claim-checker/README.md`	Points users to the new fixture directory and fixture procedure.
`tools/substrate-claim-checker/fixtures/README.md`	Defines the eval-set fixture purpose, index, and add-fixture process.
`tools/substrate-claim-checker/fixtures/count-drift-9-vs-15.md`	Adds a frozen historical count-drift markdown fixture.
`tools/substrate-claim-checker/fixtures.test.ts`	Adds a Bun test that runs `check-counts.ts` against the fixture.

… (PR #3611 thread) Per chatgpt-codex-connector + copilot-pull-request-reviewer threads on PR #3611: the original HTML provenance comment restated "9 drift instances", producing TWO matching findings (one from the comment, one from the body). The fixtures.test assertion (length >= 1, findings[0]) could be satisfied by the comment alone, masking regressions in body-claim detection. Reword the comment to describe the scenario abstractly + add a NOTE section explaining why the exact <number> <noun> pair is omitted. Body claim "9 drift instances" + 15-row table preserved unchanged. Co-Authored-By: Claude <noreply@anthropic.com>

…3611 thread) Per chatgpt-codex-connector + copilot-pull-request-reviewer threads on PR #3611: replace `>= 1` with exact `=== 1` and pin the finding line to the body claim (line 24 after the rephrased HTML comment). A regression that stops detecting the body claim cannot now be masked by an HTML-comment match — the assertion forces exactly the intended finding. Composes with the sibling fixture-rephrase commit that removes the spurious comment match in the first place. Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

+past `fixtures.test.ts` (per PR #3611 review threads from
+chatgpt-codex-connector + copilot-pull-request-reviewer).


+    expect(result.findings.length).toBe(1);
+    const finding = result.findings[0]!;
+    expect(finding.line).toBe(24);
+    expect(finding.claimedCount).toBe(9);
+    expect(finding.actualCount).toBe(15);
+    expect(finding.claim).toContain("drift instances");
+    expect(finding.claimIsMinimum).toBe(false);


Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage). Extends PR #3611's count-drift seed to the existence-drift sub-class — the second of the 5 shipped check-types now has empirical-axis regression coverage. - New `tools/substrate-claim-checker/fixtures/existence-drift-missing-doc.md` fixture modeling the verify-then-claim memo's body table instance #8 (PR #1252 — future-domain memo referenced a docs/ markdown file that didn't actually exist). Uses a clearly synthetic path so the fixture stays stable across substrate evolution. - New `fixtures.test.ts` describe block asserting `check-existence.ts` emits exactly one drift finding at line 24 with severity "drift". - `fixtures/README.md` index gains the new fixture row. Discipline carried forward from PR #3611 review threads (chatgpt-codex-connector + copilot-pull-request-reviewer): the HTML provenance comment intentionally does NOT backtick-quote the exact fixture path. Restating the claim inside the comment would let regressions in body-claim detection slip past the test via an HTML-comment match. The test asserts exact finding count + pins the body line as the catch. Focused checks: - `bun tools/substrate-claim-checker/check-existence.ts <fixture>` — 1 drift finding at line 24, severity "drift", exit 1 - `bun test tools/substrate-claim-checker/fixtures.test.ts` — 2 pass, 12 expect() calls, exit 0 - `bun test tools/substrate-claim-checker/` (full suite) — 114 pass, 0 fail, 256 expect() calls (negative-path stderr lines are intentional error-handling cases per PR #3611 convention) Composes with: - B-0170.4 done-criteria ("fixture-tests + eval-set coverage for all shipped + new check-types") — incremental progress, one sub-class per slice per the fixtures/README.md procedure - B-0170 (parent row, decomposed) - PR #3611 (count-drift seed; same scaffolding extended here) Claim: 6c253d24-3ed0-4e89-8f3a-563b13f933cc (otto-cli, B-0170). operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…rged PR #3614 (#3628) * docs(rules): extend ID-allocation discipline with subdecimal-vs-top-level scheme distinction The ID-allocation-discipline section covered WHEN to check (on-disk + in-flight) but not WHICH scheme to use. Adds a "Subdecimal vs top-level scheme" subsection distinguishing: - B-NNNN.M (subdecimal) → child / slice of EXISTING parent row - B-NNNN (new top-level) → new umbrella / standalone row Empirically grounded by the 2026-05-15 collision: Otto on Desktop decomposed B-0170 into new top-levels B-0538/B-0539/B-0540/B-0541, missing that PR #3611 had already landed B-0170.4 via subdecimal scheme + Otto-CLI's PR #3595 had claimed B-0539 for the Otto-BFT umbrella. Both Ottos converged on the same decomposition; the scheme mismatch (top-level vs subdecimal) was the symptom of not checking existing-parent's siblings first. The new check command is tight: `find docs/backlog -name "B-NNNN.*.md"` + `gh pr list --state all --search '"B-NNNN."'`. If siblings exist, use next free subdecimal — not a new top-level. Composes with the existing ID-allocation section + refresh-before-decide invariant + audit-first-then-decide discipline (PR #3583). Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2026-05-16T00:08Z — fix-PR #3626 for monad-terminology drift from merged PR #3614 First tick of 2026-05-16 UTC; fresh-session cold-boot from autonomous-loop. Landed: PR #3626 (5 P1 review-thread fixes — monad-associativity terminology + dead xrefs in B-0543/B-0544 research substrate). Operational notes: Lior process active during commit window (lock-cleanup-race precondition); used borrow-on-existing pattern with ls-tree canary on both PRs (this shard + #3626). Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard-0008z): markdownlint MD037 — wrap full cron expression in backticks `<<autonomous-loop>>` followed by `* * * * *` parsed as emphasis markers with spaces (MD037/no-space-in-emphasis at line 72). Wrap the entire cron expression in backticks so the asterisks are inside the code span. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

* feat(B-0170.4): seed path-form-drift fixture + regression test Adds the third eval-set fixture for B-0170.4, extending regression coverage from {count, existence} to {count, existence, path-form}. Same proven shape as PR #3611 (count) and PR #3624 (existence): a small on-disk markdown file under fixtures/ plus a pinned-expectation test in fixtures.test.ts. The fixture references tools/substrate-claim-checker/check-counts.ts as both a bare basename (`check-counts.ts`) and a fully-qualified path. Both resolve to the same absolute file via check-path-forms.ts's 3-root strategy (fileDir / parentDir / repoRoot), so the drift is deterministically detected without depending on synthetic files. Per PR #3611 review-thread discipline (chatgpt-codex-connector + copilot): pin exact finding count (1) AND exact body-claim line (28) so a regression in body-claim detection cannot be silently masked by an HTML-comment-side match. The provenance comment intentionally avoids restating either path form. Re-decomposition note: original B-0170 lists B-0170.1-.4 as children. B-0170.1 (semantic-equivalence) has an in-flight branch already; B-0170.2 / .3 introduce brand-new sub-class checkers (bigger slices). Adding one more fixture under B-0170.4 is genuinely the smallest safe slice — it extends the proven pattern, has no merge risk, and closes one more line of the parent row's done-criteria (eval-set coverage). Focused check: bun test tools/substrate-claim-checker/fixtures.test.ts → 3 pass, 0 fail, 17 expect() calls. Full suite: 115 pass, 0 fail. operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2026-05-16T02:56Z — B-0170.4 path-form fixture slice (PR #3696) Per-tick shard documenting the path-form-drift fixture slice landed in PR #3696. Captures the re-decomp reasoning (B-0170.1 has in-flight branch; B-0170.2/.3 are bigger slices; B-0170.4 fixture continuation is smallest safe), the subdecimal-vs-top-level scheme discipline observed (per ac9d9a4 rule), the focused-check outcome, and the catch-43 cron sentinel re-arm at session start. operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0170.4): correct anchor citation to memo instance #15 / PR #1256 Per Copilot review threads on PR #3696: the path-form fixture's anchor was cited as "taxonomy row 4" but path-form is actually instance #15 of the verify-then-claim memo's body table (PR #1256), and sub-class #6 of the 7-class list. Corrects the README index + adds the historical anchor comment in the test. The current fixture remains a synthetic exemplar covering the sub-class; instance #15's literal substance (adjacent ADR citations from PR #1256) is queued as follow-on fixture B-0170.4.1 per the per-thread plan. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

…3749) Adds the fourth eval-set fixture for substrate-claim-checker. The fixture reproduces verify-then-claim memo instance #19 — YAML frontmatter `description:` (and MEMORY.md index in the historical case) claimed "9 drift instances" while the body table already held 15 rows. check-cross-surface's "any-table" semantics fire when zero body tables match the claim. Pinned per PR #3611 discipline: - exact finding count (1) - field == "description" - claimedCount == 9, claimIsMinimum == false - actualCounts == [15] - HTML comment intentionally avoids restating the `<number> <noun>` pair (mirrors existing fixtures for uniformity, even though the cross-surface checker only scans the frontmatter description) Focused-check outcome: - `bun test tools/substrate-claim-checker/fixtures.test.ts` → 4/4 pass - `bun test tools/substrate-claim-checker/` → 116/116 pass - CLI: `bun tools/substrate-claim-checker/check-cross-surface.ts <fixture>` exits 1 with "cross-surface count drift — frontmatter.description claims '9 drift instances' (expected == 9); body tables have [15] rows" operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Adds the 5th eval-set fixture for the substrate-claim-checker, covering the convention sub-class of the 7-class verify-then-claim taxonomy. The fixture pair (current ADR + sibling predecessor ADR support file) makes the broken half of the bidirectional ADR supersession convention reproducible without depending on any real ADR pair in the repo. Anchor: PR #2512 (the PR that shipped check-convention.ts) — synthetic exemplar, same shape as the path-form-drift fixture's synthetic case. Focused check outcomes: - bun test tools/substrate-claim-checker/fixtures.test.ts → 5 pass / 0 fail - bun test tools/substrate-claim-checker/ → 117 pass / 0 fail - Direct CLI run reports 1 convention-drift finding on line 36 with the expected reciprocal-marker reason string Composes with B-0170 (parent), B-0170.4 eval-set thread (PRs #3611, #3624, #3696, #3749). operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 22:55

AceHack enabled auto-merge (squash) May 15, 2026 22:55

Copilot started reviewing on behalf of AceHack May 15, 2026 22:55 View session

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tools/substrate-claim-checker/fixtures.test.ts Outdated

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread tools/substrate-claim-checker/fixtures.test.ts Outdated

AceHack and others added 2 commits May 15, 2026 19:03

Copilot AI review requested due to automatic review settings May 15, 2026 23:03

Copilot started reviewing on behalf of AceHack May 15, 2026 23:04 View session

AceHack merged commit f92bbd2 into main May 15, 2026
27 of 30 checks passed

AceHack deleted the otto-cli/b0170-4-eval-set-fixture-2026-05-15 branch May 15, 2026 23:07

Copilot AI reviewed May 15, 2026

View reviewed changes

AceHack mentioned this pull request May 16, 2026

feat(B-0170.4): seed existence-drift fixture + regression test #3624

Merged

4 tasks

Copilot AI mentioned this pull request May 16, 2026

shard(tick): 0008Z — fix-PR #3626 for monad-terminology drift from merged PR #3614 #3628

Merged

4 tasks

AceHack mentioned this pull request May 16, 2026

feat(B-0170.4): seed path-form-drift fixture + regression test #3696

Merged

5 tasks

AceHack mentioned this pull request May 16, 2026

feat(B-0170.4): seed cross-surface-drift fixture + regression test #3749

Merged

4 tasks

AceHack mentioned this pull request May 17, 2026

feat(B-0170.4): seed convention-drift fixture + regression test #4085

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(B-0170.4): seed eval-set fixture for count-drift regression coverage#3611

feat(B-0170.4): seed eval-set fixture for count-drift regression coverage#3611
AceHack merged 3 commits into
mainfrom
otto-cli/b0170-4-eval-set-fixture-2026-05-15

AceHack commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		past `fixtures.test.ts` (per PR #3611 review threads from
		chatgpt-codex-connector + copilot-pull-request-reviewer).

Conversation

AceHack commented May 15, 2026

Summary

Test plan

Peer-work isolation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants