feat(B-0170.4): seed eval-set fixture for count-drift regression coverage#3611
Merged
Conversation
…rage Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage): - New `tools/substrate-claim-checker/fixtures/` directory with one frozen historical drift instance — `count-drift-9-vs-15.md` reproducing the count-drift pattern from PR #1259 (claim "9 drift instances" vs 15-row body table) - New `fixtures.test.ts` regression test asserting `check-counts.ts` still detects the empirical drift the fixture preserves - `fixtures/README.md` documents the index + the procedure for adding the next fixture (one sub-class per slice) - Top-level README points at the new eval-set surface Empirical axis complement to the synthetic-case unit tests in each `check-*.test.ts`: fixtures regress against the actual drift patterns that prompted the discipline, not just toy inputs. Focused checks: - `bun test tools/substrate-claim-checker/fixtures.test.ts` — 1 pass, 6 expect() calls, exit 0 - `bun test tools/substrate-claim-checker/` (full suite) — 113 pass, 0 fail, 250 expect() calls (negative-path stderr lines are intentional error-handling cases) - `bun tools/substrate-claim-checker/check-counts.ts <fixture>` — 2 count-drift findings, exit 1 (drift surfaces as designed) Claim: 72031688-2a2b-466d-a045-a5b76802d6df (otto-cli, B-0170.4). Peer work in flight (avoided collision): - otto-desktop: parent B-0170 decompose branch (B-0538-B-0541 children) - otto-cli: B-0170.1 (semantic-equivalence checker), B-0170.3 (self-recursive checker) operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 212ce34a27
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds the first on-disk eval-set fixture for tools/substrate-claim-checker, seeding historical count-drift regression coverage for B-0170.4.
Changes:
- Documents the new fixture surface from the checker README.
- Adds a fixture README with indexing and contribution procedure.
- Adds one count-drift markdown fixture plus a Bun regression test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
tools/substrate-claim-checker/README.md |
Points users to the new fixture directory and fixture procedure. |
tools/substrate-claim-checker/fixtures/README.md |
Defines the eval-set fixture purpose, index, and add-fixture process. |
tools/substrate-claim-checker/fixtures/count-drift-9-vs-15.md |
Adds a frozen historical count-drift markdown fixture. |
tools/substrate-claim-checker/fixtures.test.ts |
Adds a Bun test that runs check-counts.ts against the fixture. |
… (PR #3611 thread) Per chatgpt-codex-connector + copilot-pull-request-reviewer threads on PR #3611: the original HTML provenance comment restated "9 drift instances", producing TWO matching findings (one from the comment, one from the body). The fixtures.test assertion (length >= 1, findings[0]) could be satisfied by the comment alone, masking regressions in body-claim detection. Reword the comment to describe the scenario abstractly + add a NOTE section explaining why the exact <number> <noun> pair is omitted. Body claim "9 drift instances" + 15-row table preserved unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
…3611 thread) Per chatgpt-codex-connector + copilot-pull-request-reviewer threads on PR #3611: replace `>= 1` with exact `=== 1` and pin the finding line to the body claim (line 24 after the rephrased HTML comment). A regression that stops detecting the body claim cannot now be masked by an HTML-comment match — the assertion forces exactly the intended finding. Composes with the sibling fixture-rephrase commit that removes the spurious comment match in the first place. Co-Authored-By: Claude <noreply@anthropic.com>
Comment on lines
+18
to
+19
| past `fixtures.test.ts` (per PR #3611 review threads from | ||
| chatgpt-codex-connector + copilot-pull-request-reviewer). |
Comment on lines
+29
to
+35
| expect(result.findings.length).toBe(1); | ||
| const finding = result.findings[0]!; | ||
| expect(finding.line).toBe(24); | ||
| expect(finding.claimedCount).toBe(9); | ||
| expect(finding.actualCount).toBe(15); | ||
| expect(finding.claim).toContain("drift instances"); | ||
| expect(finding.claimIsMinimum).toBe(false); |
4 tasks
AceHack
added a commit
that referenced
this pull request
May 16, 2026
Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage). Extends PR #3611's count-drift seed to the existence-drift sub-class — the second of the 5 shipped check-types now has empirical-axis regression coverage. - New `tools/substrate-claim-checker/fixtures/existence-drift-missing-doc.md` fixture modeling the verify-then-claim memo's body table instance #8 (PR #1252 — future-domain memo referenced a docs/ markdown file that didn't actually exist). Uses a clearly synthetic path so the fixture stays stable across substrate evolution. - New `fixtures.test.ts` describe block asserting `check-existence.ts` emits exactly one drift finding at line 24 with severity "drift". - `fixtures/README.md` index gains the new fixture row. Discipline carried forward from PR #3611 review threads (chatgpt-codex-connector + copilot-pull-request-reviewer): the HTML provenance comment intentionally does NOT backtick-quote the exact fixture path. Restating the claim inside the comment would let regressions in body-claim detection slip past the test via an HTML-comment match. The test asserts exact finding count + pins the body line as the catch. Focused checks: - `bun tools/substrate-claim-checker/check-existence.ts <fixture>` — 1 drift finding at line 24, severity "drift", exit 1 - `bun test tools/substrate-claim-checker/fixtures.test.ts` — 2 pass, 12 expect() calls, exit 0 - `bun test tools/substrate-claim-checker/` (full suite) — 114 pass, 0 fail, 256 expect() calls (negative-path stderr lines are intentional error-handling cases per PR #3611 convention) Composes with: - B-0170.4 done-criteria ("fixture-tests + eval-set coverage for all shipped + new check-types") — incremental progress, one sub-class per slice per the fixtures/README.md procedure - B-0170 (parent row, decomposed) - PR #3611 (count-drift seed; same scaffolding extended here) Claim: 6c253d24-3ed0-4e89-8f3a-563b13f933cc (otto-cli, B-0170). operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 16, 2026
…rged PR #3614 (#3628) * docs(rules): extend ID-allocation discipline with subdecimal-vs-top-level scheme distinction The ID-allocation-discipline section covered WHEN to check (on-disk + in-flight) but not WHICH scheme to use. Adds a "Subdecimal vs top-level scheme" subsection distinguishing: - B-NNNN.M (subdecimal) → child / slice of EXISTING parent row - B-NNNN (new top-level) → new umbrella / standalone row Empirically grounded by the 2026-05-15 collision: Otto on Desktop decomposed B-0170 into new top-levels B-0538/B-0539/B-0540/B-0541, missing that PR #3611 had already landed B-0170.4 via subdecimal scheme + Otto-CLI's PR #3595 had claimed B-0539 for the Otto-BFT umbrella. Both Ottos converged on the same decomposition; the scheme mismatch (top-level vs subdecimal) was the symptom of not checking existing-parent's siblings first. The new check command is tight: `find docs/backlog -name "B-NNNN.*.md"` + `gh pr list --state all --search '"B-NNNN."'`. If siblings exist, use next free subdecimal — not a new top-level. Composes with the existing ID-allocation section + refresh-before-decide invariant + audit-first-then-decide discipline (PR #3583). Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2026-05-16T00:08Z — fix-PR #3626 for monad-terminology drift from merged PR #3614 First tick of 2026-05-16 UTC; fresh-session cold-boot from autonomous-loop. Landed: PR #3626 (5 P1 review-thread fixes — monad-associativity terminology + dead xrefs in B-0543/B-0544 research substrate). Operational notes: Lior process active during commit window (lock-cleanup-race precondition); used borrow-on-existing pattern with ls-tree canary on both PRs (this shard + #3626). Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard-0008z): markdownlint MD037 — wrap full cron expression in backticks `<<autonomous-loop>>` followed by `* * * * *` parsed as emphasis markers with spaces (MD037/no-space-in-emphasis at line 72). Wrap the entire cron expression in backticks so the asterisks are inside the code span. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
4 tasks
5 tasks
AceHack
added a commit
that referenced
this pull request
May 16, 2026
* feat(B-0170.4): seed path-form-drift fixture + regression test
Adds the third eval-set fixture for B-0170.4, extending regression
coverage from {count, existence} to {count, existence, path-form}.
Same proven shape as PR #3611 (count) and PR #3624 (existence): a
small on-disk markdown file under fixtures/ plus a pinned-expectation
test in fixtures.test.ts.
The fixture references tools/substrate-claim-checker/check-counts.ts
as both a bare basename (`check-counts.ts`) and a fully-qualified
path. Both resolve to the same absolute file via check-path-forms.ts's
3-root strategy (fileDir / parentDir / repoRoot), so the drift is
deterministically detected without depending on synthetic files.
Per PR #3611 review-thread discipline (chatgpt-codex-connector +
copilot): pin exact finding count (1) AND exact body-claim line (28)
so a regression in body-claim detection cannot be silently masked by
an HTML-comment-side match. The provenance comment intentionally
avoids restating either path form.
Re-decomposition note: original B-0170 lists B-0170.1-.4 as children.
B-0170.1 (semantic-equivalence) has an in-flight branch already;
B-0170.2 / .3 introduce brand-new sub-class checkers (bigger slices).
Adding one more fixture under B-0170.4 is genuinely the smallest safe
slice — it extends the proven pattern, has no merge risk, and closes
one more line of the parent row's done-criteria (eval-set coverage).
Focused check: bun test tools/substrate-claim-checker/fixtures.test.ts
→ 3 pass, 0 fail, 17 expect() calls. Full suite: 115 pass, 0 fail.
operative-authorization: aaron 2026-05-14: "- **Devil-pole**
(edge-runner drive): keep pushing, discover, go hard, never-be-idle"
Co-Authored-By: Claude <noreply@anthropic.com>
* shard(tick): 2026-05-16T02:56Z — B-0170.4 path-form fixture slice (PR #3696)
Per-tick shard documenting the path-form-drift fixture slice landed
in PR #3696. Captures the re-decomp reasoning (B-0170.1 has in-flight
branch; B-0170.2/.3 are bigger slices; B-0170.4 fixture continuation
is smallest safe), the subdecimal-vs-top-level scheme discipline
observed (per ac9d9a4 rule), the focused-check outcome, and the
catch-43 cron sentinel re-arm at session start.
operative-authorization: aaron 2026-05-14: "- **Devil-pole**
(edge-runner drive): keep pushing, discover, go hard, never-be-idle"
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(B-0170.4): correct anchor citation to memo instance #15 / PR #1256
Per Copilot review threads on PR #3696: the path-form fixture's anchor
was cited as "taxonomy row 4" but path-form is actually instance #15
of the verify-then-claim memo's body table (PR #1256), and sub-class
#6 of the 7-class list. Corrects the README index + adds the historical
anchor comment in the test.
The current fixture remains a synthetic exemplar covering the sub-class;
instance #15's literal substance (adjacent ADR citations from PR #1256)
is queued as follow-on fixture B-0170.4.1 per the per-thread plan.
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
4 tasks
AceHack
added a commit
that referenced
this pull request
May 16, 2026
…3749) Adds the fourth eval-set fixture for substrate-claim-checker. The fixture reproduces verify-then-claim memo instance #19 — YAML frontmatter `description:` (and MEMORY.md index in the historical case) claimed "9 drift instances" while the body table already held 15 rows. check-cross-surface's "any-table" semantics fire when zero body tables match the claim. Pinned per PR #3611 discipline: - exact finding count (1) - field == "description" - claimedCount == 9, claimIsMinimum == false - actualCounts == [15] - HTML comment intentionally avoids restating the `<number> <noun>` pair (mirrors existing fixtures for uniformity, even though the cross-surface checker only scans the frontmatter description) Focused-check outcome: - `bun test tools/substrate-claim-checker/fixtures.test.ts` → 4/4 pass - `bun test tools/substrate-claim-checker/` → 116/116 pass - CLI: `bun tools/substrate-claim-checker/check-cross-surface.ts <fixture>` exits 1 with "cross-surface count drift — frontmatter.description claims '9 drift instances' (expected == 9); body tables have [15] rows" operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
AceHack
added a commit
that referenced
this pull request
May 17, 2026
Adds the 5th eval-set fixture for the substrate-claim-checker, covering the convention sub-class of the 7-class verify-then-claim taxonomy. The fixture pair (current ADR + sibling predecessor ADR support file) makes the broken half of the bidirectional ADR supersession convention reproducible without depending on any real ADR pair in the repo. Anchor: PR #2512 (the PR that shipped check-convention.ts) — synthetic exemplar, same shape as the path-form-drift fixture's synthetic case. Focused check outcomes: - bun test tools/substrate-claim-checker/fixtures.test.ts → 5 pass / 0 fail - bun test tools/substrate-claim-checker/ → 117 pass / 0 fail - Direct CLI run reports 1 convention-drift finding on line 36 with the expected reciprocal-marker reason string Composes with B-0170 (parent), B-0170.4 eval-set thread (PRs #3611, #3624, #3696, #3749). operative-authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle" Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Smallest safe slice of B-0170.4 (fixture-tests + eval-set coverage).
tools/substrate-claim-checker/fixtures/directory with one frozen historical drift instance —count-drift-9-vs-15.mdreproducing the count-drift pattern from PR review(pr-1257-postmerge): verify-then-claim count drift (9→18+) frontmatter + body + MEMORY.md #1259 (claim "9 drift instances" vs 15-row body table)fixtures.test.tsregression test assertingcheck-counts.tsstill detects the empirical drift the fixture preservesfixtures/README.mddocuments the index + the procedure for adding the next fixture (one sub-class per slice)Empirical axis complement to the synthetic-case unit tests in each
check-*.test.ts: fixtures regress against the actual drift patterns that prompted the discipline, not just toy inputs.Test plan
bun test tools/substrate-claim-checker/fixtures.test.ts— 1 pass, 6 expect() calls, exit 0bun test tools/substrate-claim-checker/(full suite) — 113 pass, 0 fail, 250 expect() callsbun tools/substrate-claim-checker/check-counts.ts tools/substrate-claim-checker/fixtures/count-drift-9-vs-15.md— 2 count-drift findings, exit 1 (drift surfaces as designed)72031688-2a2b-466d-a045-a5b76802d6df(otto-cli, B-0170.4)Peer-work isolation
Avoided collision with in-flight branches:
otto-b0170-decompose-into-atomic-children-2026-05-15(otto-desktop, parent B-0170 → B-0538-B-0541 children)otto-cli/b0170-1-semantic-equiv-checker-2026-05-15(B-0170.1)otto-cli/b0170-3-self-recursive-checker-2026-05-15(B-0170.3)This slice touches only
tools/substrate-claim-checker/fixtures*+README.md— purely additive.🤖 Generated with Claude Code