feat(B-0553): audit-backlog-status-drift tool + 16 tests (incl. 2nd FP class)#3758
Conversation
Implements the B-0553 spec for substrate-drift detection. Section- aware parsing: extracts primary-artifact paths from Acceptance, Proposed mechanization, and Scope sections; skips Composes with, Origin, Source, Resolution, Non-goals, and other context sections. Follows the empirical false-positive catalog from B-0553: naive regex over the whole body produces a 4-of-4 false-positive rate; section-aware parsing eliminates that class. Live results: 30+ candidate rows surfaced on the current backlog, including many rows already validated as drift this session (B-0506, B-0530, B-0528, B-0535 — now closed). Remaining candidates need the partial-vs-drift discriminator from .claude/rules/backlog-item- start-gate.md step 0 to filter in-progress rows. Includes 13 unit tests covering: - parseFrontmatter (status field, no-frontmatter, colon-in-value) - extractPrimaryArtifacts (acceptance, composes-with skip, origin/ source/non-goals/resolution skip, backlog cross-ref skip, proposed-mechanization, scope, empirical B-0116 case) - findDriftCandidates (all-exist, empty list, mixed missing/exists) bun test → 13 pass / 0 fail / 24 expect calls / 831ms. Out-of-scope per B-0553 (next slices): --prune-claims and --open- close-pr flags. The tool flags candidates; manual verification of each acceptance bullet shipping is still required before close-row. Co-Authored-By: Claude <noreply@anthropic.com>
…refs Adds INLINE_CROSSREF_PATTERNS regex set + line-level skip logic to handle the false-positive class discovered by manual verification of B-0518 (Sharpening 4): a Composes with X bullet inside an Acceptance sub-section is a cross-reference, not a deliverable. Patterns skipped (case-insensitive): - composes with / composes-with - sister mechanism|rule|tool|module|process - cross-reference / cross reference - see also `X` / see `X` - per `X` / per [X] - references X / cites X Adds 3 regression tests: - INLINE_CROSSREF: composes-with bullet inside Acceptance - INLINE_CROSSREF: sister mechanism reference - INLINE_CROSSREF: see-also/per/references patterns Full reasoning in memory/feedback_audit_backlog_status_drift_second_ false_positive_class_inline_composes_with_otto_cli_2026_05_16.md (pushed to remote tick 12). bun test → 16 pass / 0 fail / 30 expect calls (was 13 pass before). Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Bun/TypeScript hygiene tool to detect backlog “status drift” (rows still marked status: open even though all primary artifacts appear to exist), implementing the B-0553 spec’s section-aware parsing to avoid known false positives, and backs it with a focused unit test suite.
Changes:
- Introduces
tools/hygiene/audit-backlog-status-drift.tsto enumerate open backlog rows, extract primary-artifact paths from specific sections only, and report drift candidates (markdown or--json). - Adds
tools/hygiene/audit-backlog-status-drift.test.tswith 16 tests covering frontmatter parsing, section discrimination, inline cross-ref suppression, and candidate detection logic.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/hygiene/audit-backlog-status-drift.ts | New CLI tool + core parsing/filtering logic for drift detection and reporting. |
| tools/hygiene/audit-backlog-status-drift.test.ts | New Bun tests covering the parser and drift-candidate selection. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a5166219b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ding on PR #3758) The github-code-quality reviewer flagged SKIP_SECTIONS as unused. Valid finding — the constant existed as documentation but the parser did not consult it. Behaviour was equivalent (anything not matching PRIMARY_SECTIONS ended primary mode), but the policy data was dead code. Fix: tri-state section classification (primary / skip / other) that explicitly tests SKIP_SECTIONS. PRIMARY_SECTIONS gate extraction; SKIP_SECTIONS are recognised cross-reference sections; everything else is other. Behaviour unchanged for all 16 existing tests. bun test → 16 pass / 0 fail (no regression). Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…3758 Reorders the path-regex extension alternation: tsx before ts, fsi before fs, yaml before yml. Prevents widget.tsx being truncated to widget.ts during extraction (and then existence-checking the wrong file). Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1a6a2a6e2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds H3 section classification: `### Acceptance criteria` as a top-level heading now enters primary mode; `### Sharpening N` nested inside `## Acceptance criteria` inherits parent mode (does not reset to other). Prevents systematic false negatives on rows using H3 top-level structure. Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…-desync caught + recovered (#3776) Eighteenth tick. Triaged 10+ Copilot/Codex threads from yesterday's post-reset PR sweep. Highlights: - 4 threads resolved + 3 fix commits landed - last_updated frontmatter bug fixed on B-0532 + B-0533 - Path-naming-variance clarification added to B-0532 - HEAD-desync caught (path-variance commit initially landed on peer's PR #3758 branch); recovered cleanly via cherry-pick 5 my-lane PRs merged this tick window. 4 remain with mechanical thread debt deferred to next tick. Co-authored-by: Claude <noreply@anthropic.com>
…r findings as follow-up Captures 4 valid P1 findings from PR #3758 review-cycle 2 as a follow-up slice. Avoids iteration treadmill on the original PR. Findings: mixed-bullet extraction (Codex P1) / cwd-independent path resolution (Copilot P1) / read-failure error handling (Copilot P1) / --check mode for CI exit codes (Copilot P1). All 4 are P3 friction-reducers, additive to the first slice. Co-Authored-By: Claude <noreply@anthropic.com>
* shard(tick): 2026-05-16T05:54Z — brief-ack #3; rate-limit reset in 1 min; PR sweep pinned for next tick Sixteenth tick. Reset 1 min away — deferring 8-PR sweep one more tick to land cleanly inside the reset window. Firing the sweep right before reset would either hit rate-limit or leave the queue in a half-executed state. Brief-ack #3 of new counter cycle (within 1-2 tier). Next tick plan pinned in shard: verify rate, open 8 PRs in substrate-priority order, arm auto-merge each, close PR #3746 as superseded by PR #3757, normal shard. Co-Authored-By: Claude <noreply@anthropic.com> * backlog(B-0557): audit-tool quality improvements — 4 PR #3758 reviewer findings as follow-up Captures 4 valid P1 findings from PR #3758 review-cycle 2 as a follow-up slice. Avoids iteration treadmill on the original PR. Findings: mixed-bullet extraction (Codex P1) / cwd-independent path resolution (Copilot P1) / read-failure error handling (Copilot P1) / --check mode for CI exit codes (Copilot P1). All 4 are P3 friction-reducers, additive to the first slice. Co-Authored-By: Claude <noreply@anthropic.com> * fix(pr-3768): reconcile deferred-PR count + shard list (Copilot + Codex P2) The execution plan said '8 deferred PRs' but the shard list (a-h) totaled 9 items, and the per-tier label said '5 shard PRs' but enumerated 6 (0528z + 0535z + 0540z + 0545z + 0548z + 0554z). Resolution: peer Otto bundled 0545z into PR #3759 during the rate-limit-zero window, so the actual sweep would be 8 PRs (3 chore + 5 shards excluding 0545z). The plan-text now names this explicitly with the 'verify which branches are still pending' caveat. Both Copilot PRRT_kwDOSF9kNM6CiLg- and Copilot PRRT_kwDOSF9kNM6CiL9d threads fixed by this commit. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…ps) + peer Otto PR #3768 reconcile (#3777) * shard(tick): 2026-05-16T05:54Z — brief-ack #3; rate-limit reset in 1 min; PR sweep pinned for next tick Sixteenth tick. Reset 1 min away — deferring 8-PR sweep one more tick to land cleanly inside the reset window. Firing the sweep right before reset would either hit rate-limit or leave the queue in a half-executed state. Brief-ack #3 of new counter cycle (within 1-2 tier). Next tick plan pinned in shard: verify rate, open 8 PRs in substrate-priority order, arm auto-merge each, close PR #3746 as superseded by PR #3757, normal shard. Co-Authored-By: Claude <noreply@anthropic.com> * backlog(B-0557): audit-tool quality improvements — 4 PR #3758 reviewer findings as follow-up Captures 4 valid P1 findings from PR #3758 review-cycle 2 as a follow-up slice. Avoids iteration treadmill on the original PR. Findings: mixed-bullet extraction (Codex P1) / cwd-independent path resolution (Copilot P1) / read-failure error handling (Copilot P1) / --check mode for CI exit codes (Copilot P1). All 4 are P3 friction-reducers, additive to the first slice. Co-Authored-By: Claude <noreply@anthropic.com> * fix(pr-3768): reconcile deferred-PR count + shard list (Copilot + Codex P2) The execution plan said '8 deferred PRs' but the shard list (a-h) totaled 9 items, and the per-tier label said '5 shard PRs' but enumerated 6 (0528z + 0535z + 0540z + 0545z + 0548z + 0554z). Resolution: peer Otto bundled 0545z into PR #3759 during the rate-limit-zero window, so the actual sweep would be 8 PRs (3 chore + 5 shards excluding 0545z). The plan-text now names this explicitly with the 'verify which branches are still pending' caveat. Both Copilot PRRT_kwDOSF9kNM6CiLg- and Copilot PRRT_kwDOSF9kNM6CiL9d threads fixed by this commit. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…caught by new audit tool) First real-world use of the new audit-backlog-status-drift.ts tool (peer Otto-Desktop shipped via PR #3758 + PR #3777 quality improvements). Audit flagged B-0494 as a drift candidate; manual per-acceptance-bullet verification confirmed pure-drift: - tools/bus/export-cb-snapshot.ts exists (203 lines) - demo/circuit-breaker-snapshot.json committed - demo/index.html:1836 has snapshot-first fetch with fallback - dotnet build + tsc + panel rendering all implicit (PR #3134 CI green) All 6 acceptance criteria verifiably shipped. Row left open from 2026-05-14 to 2026-05-16 as substrate drift. Closing per the row-close gate step-0 discriminator (PR #3757). Mechanization → audit-tool → manual-verification → close-row workflow is now end-to-end operational. Co-Authored-By: Claude <noreply@anthropic.com>
…ift caught by B-0553 audit tool (#3780) * chore(b-0494): close row — mechanization shipped via PR #3134 (drift caught by new audit tool) First real-world use of the new audit-backlog-status-drift.ts tool (peer Otto-Desktop shipped via PR #3758 + PR #3777 quality improvements). Audit flagged B-0494 as a drift candidate; manual per-acceptance-bullet verification confirmed pure-drift: - tools/bus/export-cb-snapshot.ts exists (203 lines) - demo/circuit-breaker-snapshot.json committed - demo/index.html:1836 has snapshot-first fetch with fallback - dotnet build + tsc + panel rendering all implicit (PR #3134 CI green) All 6 acceptance criteria verifiably shipped. Row left open from 2026-05-14 to 2026-05-16 as substrate drift. Closing per the row-close gate step-0 discriminator (PR #3757). Mechanization → audit-tool → manual-verification → close-row workflow is now end-to-end operational. Co-Authored-By: Claude <noreply@anthropic.com> * chore(B-0045.1/B-0046.1/B-0049.1): close 3 substrate-shelf rows — Stage 1 scaffolds shipped Three sibling substrate-shelf rows surfaced by the audit-backlog-status-drift tool now on main: - B-0045.1: docs/substrate-shelves/biology.md (committed) - B-0046.1: docs/substrate-shelves/economics-history.md (committed; 411 lines) - B-0049.1: docs/substrate-shelves/mystery-schools-eleusinian.md (committed) All three rows explicitly state Stage 1 deliverable is the scaffold doc 'committed in this PR' with status 'open → done on PR merge.' The PRs merged; status stayed open. Drift. Closing. First operational use of the now-on-main audit-backlog-status-drift tool from PR #3758. The tool flagged these correctly; manual verification confirmed full Acceptance shipped (not partial-vs-drift case). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…caught by new audit tool) (#3781) First real-world use of the new audit-backlog-status-drift.ts tool (peer Otto-Desktop shipped via PR #3758 + PR #3777 quality improvements). Audit flagged B-0494 as a drift candidate; manual per-acceptance-bullet verification confirmed pure-drift: - tools/bus/export-cb-snapshot.ts exists (203 lines) - demo/circuit-breaker-snapshot.json committed - demo/index.html:1836 has snapshot-first fetch with fallback - dotnet build + tsc + panel rendering all implicit (PR #3134 CI green) All 6 acceptance criteria verifiably shipped. Row left open from 2026-05-14 to 2026-05-16 as substrate drift. Closing per the row-close gate step-0 discriminator (PR #3757). Mechanization → audit-tool → manual-verification → close-row workflow is now end-to-end operational. Co-authored-by: Claude <noreply@anthropic.com>
…3783) * shard(tick): 2026-05-16T06:38Z — end-to-end drift workflow validated; B-0494 closed via new audit tool Twentieth tick. My lane empty; ran peer's newly-shipped audit-backlog-status-drift.ts. Surfaced 38 candidates; verified B-0494 as pure-drift via per-acceptance audit; closed via PR #3781. First real-world use of the new tool. End-to-end mechanization → audit → verification → close pipeline now operational. Peer Otto in parallel drift-catching shelf rows (non-overlapping lane). 2× drift-catch throughput. HEAD-desync recovered via cherry-pick (4th+ instance this session). Co-Authored-By: Claude <noreply@anthropic.com> * feat(B-0557 slice 1): add --check flag for CI integration Adds --check flag per B-0557 finding 4 (Copilot P1 on PR #3758): when --check is set and drift candidates exist, exit with code 65. CI/cron jobs can wire this in to fail the build on detected drift. Default behaviour unchanged (always exit 0 in detect-only mode). Existing --json and --help flags preserved. Help text + KNOWN_FLAGS set updated. Smallest of the 4 B-0557 follow-up slices; the other 3 (mixed-bullet extraction / cwd-independent paths / read-failure error handling) are larger and tracked in B-0557 for separate slices. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…ort on one bad file Per Copilot P1 on PR #3758: enumerateOpenRows() previously could throw and abort the whole audit on a single unreadable backlog file (permission denied, transient FS error, etc.). Fix: wrap both readdirSync and readFileSync in try/catch; warn to stderr; continue with remaining files. Audit completes; operator sees partial-result warnings naming the failed files. Smoke test: existing live run still produces 33+ candidates (no regression). Existing 16 tests pass unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
… sub-class #4) (#3800) Closing as multi-slice-children-all-closed per the row-close gate triage. Verification at 2026-05-16T07:09Z confirmed all 3 children (B-0262, B-0263, B-0264) are status: closed; the umbrella's acceptance bullets aren't individually satisfied (no poll-pr-gate-batch.ts call, no test file) but the children's combined work IS the umbrella's deliverable. This is a NEW drift sub-class not yet documented in the row-close gate rule. The taxonomy now spans 4 classes: 1. Pure drift (5 examples this session) 2. Partial completion (B-0517 Phase 1, B-0537 Slice A) 3. Multi-slice, some children open (no current example) 4. Multi-slice, ALL children closed (B-0159 — this row) Closing per class 4 rule. Tool was surfaced by peer Otto's audit-backlog-status-drift.ts (PR #3758); per-acceptance check revealed the multi-slice pattern. Co-authored-by: Claude <noreply@anthropic.com>
…ort on one bad file Per Copilot P1 on PR #3758: enumerateOpenRows() previously could throw and abort the whole audit on a single unreadable backlog file (permission denied, transient FS error, etc.). Fix: wrap both readdirSync and readFileSync in try/catch; warn to stderr; continue with remaining files. Audit completes; operator sees partial-result warnings naming the failed files. Smoke test: existing live run still produces 33+ candidates (no regression). Existing 16 tests pass unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
…ool (#3788) * feat(B-0557 slice 2): try/catch readFileSync + readdirSync — don't abort on one bad file Per Copilot P1 on PR #3758: enumerateOpenRows() previously could throw and abort the whole audit on a single unreadable backlog file (permission denied, transient FS error, etc.). Fix: wrap both readdirSync and readFileSync in try/catch; warn to stderr; continue with remaining files. Audit completes; operator sees partial-result warnings naming the failed files. Smoke test: existing live run still produces 33+ candidates (no regression). Existing 16 tests pass unchanged. Co-Authored-By: Claude <noreply@anthropic.com> * fix(PR-3788): Copilot findings — safer err typing + 0644Z.md schema pipe-row Two valid Copilot findings on PR #3788: 1. (err as Error).message is unsafe TS — catch variable is unknown. Replaced with: (err instanceof Error ? err.message : String(err)) at both locations (readdirSync + readFileSync error handlers). 2. docs/hygiene-history/ticks/2026/05/16/0644Z.md missed the documented schema (first non-empty line must be a pipe-row per ticks/README.md). Added the pipe-row prefix; preserved heading + body below. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…eviewer attribution Two Copilot findings on PR #3790: 1. P1: detectRepoRoot() was internal-only; no test verified the cwd- independent behavior. Fix: export detectRepoRoot + add 2 tests (verifies repo root contains the tool itself + canonical top-level files like CLAUDE.md). 2. P2: Source comment embedded reviewer attribution ("Copilot P1 on PR #3758"). Repo guidance keeps historical attribution in backlog/ PR-history surfaces and asks reusable code comments to describe current invariants. Cleaned the docblock; preserved B-0557 ref. bun test → 18 pass / 0 fail (was 16; +2 for detectRepoRoot). Co-Authored-By: Claude <noreply@anthropic.com>
…endent) (#3790) * feat(B-0557 slice 3): chdir to repo root via git rev-parse — cwd-independent Per Copilot P1 on PR #3758: tool previously assumed cwd = repo root. Running from a subdirectory caused all existsSync(p) checks to fail and produced false negatives. Fix: at main() start, detect repo root via git rev-parse --show-toplevel and chdir. All subsequent relative-path reads + existence-checks now work regardless of invocation cwd. Fallback: if git unavailable or not in a repo, retain cwd. Verified by smoke-testing from /tmp — tool produces the same 33+ candidate output as when run from repo root. 16/16 existing tests pass (no regression). Co-Authored-By: Claude <noreply@anthropic.com> * fix(PR-3790): export detectRepoRoot + add 2 regression tests; strip reviewer attribution Two Copilot findings on PR #3790: 1. P1: detectRepoRoot() was internal-only; no test verified the cwd- independent behavior. Fix: export detectRepoRoot + add 2 tests (verifies repo root contains the tool itself + canonical top-level files like CLAUDE.md). 2. P2: Source comment embedded reviewer attribution ("Copilot P1 on PR #3758"). Repo guidance keeps historical attribution in backlog/ PR-history surfaces and asks reusable code comments to describe current invariants. Cleaned the docblock; preserved B-0557 ref. bun test → 18 pass / 0 fail (was 16; +2 for detectRepoRoot). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…f tokens are deliverables (#3809) * feat(B-0557 slice 4): mixed-bullet extraction — paths before cross-ref tokens are deliverables Per Codex P1 on PR #3758: bullets like 'Add `tools/foo.ts` per [B-0123] convention' contain BOTH a deliverable AND a citation. The previous behaviour skipped the WHOLE line because the cross-ref pattern matched, dropping the deliverable. Codex was correct: only the citation portion should be ignored. Implementation: find the first cross-ref-keyword position in the line; extract paths from the segment BEFORE that cutoff. Pure cross-ref bullets ("Composes with X") naturally produce an empty pre-cutoff segment + no extraction (regression-test verified). Mixed bullets extract the deliverable and ignore the citation. Adds 2 regression tests covering: - Mixed bullet with path + 'per [X]' or '(see also)' citation - Pure cross-ref bullets still skip (no regression) bun test → 20 pass / 0 fail (was 18; +2 for mixed-bullet handling). Closes the last of the 4 B-0557 follow-up slices. Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2026-05-16T07:27Z — B-0509 = 2nd FP class (2nd in a row); FP-rate pattern noted Twenty-seventh tick. Cost-aware tier. Audited B-0509: - tool path tools/routines/install.ts exists (for B-0448 slice 1) - B-0509-specific 'cloud-schedule' references absent via grep - Same shape as B-0418 last tick: shared-tool path FP Two consecutive 2nd-FP-class verifications. Hypothesis filed for peer's audit-tool improvement lane: add a 'feature-grep' sub-check scanning artifacts for acceptance-named identifiers. Audit progress: 17/38 triaged. ~21 remaining. Co-Authored-By: Claude <noreply@anthropic.com> * fix(PR-3809): strip reviewer attribution from test comment Per repo guidance: code surfaces use role-style references; keep historical attribution in PR/backlog surfaces. Removed reviewer/ product naming from the mixed-bullet test comment; preserved the B-0557 slice 4 reference + improved the invariant description. Same pattern as PR #3790 cleanup commit. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…th pure-drift this session) (#3820) 10th pure-drift close of the resume-session sequence. All 3 acceptance items verifiably shipped via PRs #2166 + #2168 + #2172: - Script at tools/roms/canonicalize.ts (8953 bytes) - TOSEC + No-Intro matching via Logiqx XML - Rename + unmatched-hash reporting Surfaced by peer Otto's audit-backlog-status-drift.ts (PR #3758); per-acceptance verification confirmed pure-drift. Co-authored-by: Claude <noreply@anthropic.com>
Summary
Composes with Xbullets INSIDE Acceptance sub-sections are cross-refs, not deliverables. Fixed by addingINLINE_CROSSREF_PATTERNSregex set (commits-with / sister / see-also / per / references / cites). Full reasoning inmemory/feedback_audit_backlog_status_drift_second_false_positive_class_inline_composes_with_otto_cli_2026_05_16.md(separate PR).Test plan
bun test tools/hygiene/audit-backlog-status-drift.test.ts→ 16 pass / 0 fail / 30 expect calls / 301ms--jsonflag emits structured output for downstream tooling--helpdocuments the flagsPartial implementation note
Out-of-scope per B-0553 (next slice):
--prune-claims(release matching bus claim entries) and--open-close-pr(auto-open close-row PRs). Per the partial-vs-drift discriminator from.claude/rules/backlog-item-start-gate.mdstep 0, B-0553 staysstatus: openuntil those slices land.🤖 Generated with Claude Code