feat(B-0533): Slice B.1 — §33 migration dead-xref scanner#3548
Conversation
Mechanizes the dead-xref class Codex P2 caught on PR #3513 (Riven §33 archive migration). Scans live-nav surfaces for references to docs/research/<basename> where <basename> has been migrated to memory/persona/<persona>/conversations/<basename>. Scope (Slice B.1): - Detect-only scanner (exit 0 always; humans triage before fixing) - Walks .claude/{rules,agents,commands,skills}/, memory/*.md (top-level only, persona/ excluded), docs/backlog/, repo-root *.md - Skips frozen historical archives (docs/history, docs/hygiene-history, docs/pr-discussions, docs/research itself, memory/persona/**/conversations/) Empirical baseline (first run): 10 dead xrefs (9 DeepSeek + 1 Riven that PR #3529's manual fix missed at line 135). My earlier rough-scan estimate of 20+ was a false positive — the scanner gives substrate-honest truth. Follow-up slices (separate PRs): - Slice B.2: test file (DST-friendly fixtures) - Slice B.3: wire into .github/workflows/gate.yml as warn-only - Slice B.4: promote to error after baseline cleanup Composes with B-0532 (sibling lint pattern), audit-rule-cross-refs.ts (template), B-0533 (parent row). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8233c74448
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| const PERSONA_BASE = "memory/persona"; | ||
| const LIVE_NAV_SURFACES = [".claude/rules", ".claude/agents", ".claude/commands", ".claude/skills", "memory", "docs/backlog"]; | ||
| const ROOT_MD = ["CLAUDE.md", "AGENTS.md", "README.md", "GOVERNANCE.md"]; |
There was a problem hiding this comment.
Scan all repo-root Markdown files
The detector claims to walk repo-root *.md, but the implementation hard-codes only four filenames, so root docs like CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, and SUPPORT.md are never scanned. This creates a permanent blind spot where stale docs/research/... links in those files will not be reported, undermining the audit’s stated coverage.
Useful? React with 👍 / 👎.
| const persona = migratedIndex.get(basename); | ||
| if (persona !== undefined) { | ||
| found.push({ |
There was a problem hiding this comment.
Confirm old path is absent before flagging dead xref
A reference is marked dead solely because its basename appears in memory/persona/*/conversations, without verifying that docs/research/<basename> is actually gone. This can generate false positives when a file exists in both places (for example, 2026-05-15-lior-shadow-lesson-log-codex-dirty-worktree.md exists in both trees), so any live link to the docs copy would be incorrectly reported as stale.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a Bun/TypeScript hygiene scanner to detect stale docs/research/... cross-references that should now point at migrated §33 conversation archives under memory/persona/<persona>/conversations/..., producing a structured Markdown report for human triage.
Changes:
- Introduces
tools/hygiene/audit-section-33-migration-xrefs.tsto index migrated conversation-archive files and scan selected “live-nav” Markdown surfaces. - Emits a summary + by-persona breakdown + per-finding detail, with optional
--report PATHoutput.
| for (const f of readdirSync(conversationsDir)) { | ||
| if (!f.endsWith(".md")) continue; | ||
| index.set(f, persona); |
| // Match docs/research/<basename> where basename ends in .md | ||
| const pattern = /docs\/research\/([^\s`)"'<>\[\]]+\.md)/g; | ||
| for (let i = 0; i < lines.length; i++) { | ||
| const line = lines[i]!; | ||
| pattern.lastIndex = 0; | ||
| let m: RegExpExecArray | null; | ||
| while ((m = pattern.exec(line)) !== null) { | ||
| const basename = m[1]!; | ||
| const persona = migratedIndex.get(basename); | ||
| if (persona !== undefined) { | ||
| found.push({ | ||
| fromFile: filePath, | ||
| lineNumber: i + 1, | ||
| basename, | ||
| persona, | ||
| newPath: `memory/persona/${persona}/conversations/${basename}`, | ||
| line: line.trim().slice(0, 200), | ||
| }); |
…st correction (10 dead xrefs, not 20+) (#3550) - PR #3546 (1820Z) merged - PR #3548 — Slice B.1 scanner (audit-section-33-migration-xrefs.ts, 284 LOC) - Empirical baseline: 10 dead xrefs (9 DeepSeek + 1 Riven) — 1807Z's "20+" was false positive - Scanner caught B-0159:135 dead xref that PR #3529's manual sweep missed - 7-tick parallel-substantive pattern continues; mechanization landed Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes B-0533 Slice A baseline cleanup. Following the scanner (PR #3548) empirical baseline of 10 dead xrefs, updates all live-nav references to migrated §33 archive files. Mapping: docs/research/<basename> → memory/persona/<persona>/conversations/<basename> Files updated (6 files, 10 line-edits): Riven (1): - docs/backlog/P1/B-0159-refresh-github-worldview-cross-cutting-claudeai-2026-05-01.md:135 (PR #3529 fixed line 17; this completes the second reference at line 135) DeepSeek (9): - docs/backlog/P1/B-0463-wallet-immune-system-vaccine-spread-poucc-spec.md:95, :97 (hkt-clifford-e8 + immune-system files) - docs/backlog/P3/B-0202-...md:62, :444 (claudeai-tinygrad-uop file; ×2 occurrences) - docs/backlog/P3/B-0203-...md:36, :430 (claudeai-tinygrad-uop file; ×2 markdown-link occurrences; relative path also updated to ../../../memory/persona/deepseek/...) - memory/feedback_carved_sentence_*.md:580, :1225 (deepseek-csap-architecture-review-verbatim file; ×2 occurrences) - memory/feedback_dbsp_zsets_*.md:55 (claudeai-tinygrad-uop file; 1 occurrence) Verification: `bun tools/hygiene/audit-section-33-migration-xrefs.ts` returns "Dead xrefs found: 0" after these edits. Composes with: - B-0533 (parent row) - B-0533 Slice A POC (PR #3544 — established the pattern) - B-0533 Slice B.1 (PR #3548 — the scanner that surfaced 10/10) - PR #3529 (narrow Codex P2 fix that missed B-0159:135) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(b-0533): Slice B.3 + B.4 — --enforce flag + gate.yml wiring Completes B-0533 mechanization. Scanner now supports --enforce flag (exit 1 if dead xrefs found, exit 0 otherwise). New gate.yml job lint-section-33-migration-xrefs runs the scanner in --enforce mode on every PR. With baseline = 0 (PR #3552 cleanup landed) the new gate fires only when a future migration leaves dead xrefs in live-nav surfaces — the catch-once-then-lint pattern completing for the §33 migration class. Sibling of lint-archive-header-section33 (B-0036): same shape, different failure-class. Both catch §33-discipline violations at PR time before merge. Changes: - tools/hygiene/audit-section-33-migration-xrefs.ts: - Add --enforce CLI flag - Add exit code 1 when dead xrefs found and --enforce set - Update header comment with new exit-code semantics - .github/workflows/gate.yml: - Add lint-section-33-migration-xrefs job after lint-archive-header-section33 - Same install.sh + bun pattern as sibling job - Header comment cites empirical baseline (10) + full lineage Discipline arc complete: | Tick | Slice | PR | |------|-------|----| | 1749Z | Catch | #3529 | | 1807Z | Row | #3540 | | 1820Z | Slice A POC | #3544 | | 1833Z | Slice B.1 scanner | #3548 | | 1844Z | Slice A baseline | #3552 | | 1848Z | Slice B.3 + B.4 (this) | (new) | Remaining: Slice B.2 (test file with DST fixtures) — optional, scanner logic is simple enough that the end-to-end gate.yml job acts as integration test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(B-0533): dynamically detect root .md files in audit-section-33 scanner ROOT_MD was hard-coded to 4 files; readdirSync now discovers all repo-root *.md files so CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md, SUPPORT.md are protected by the enforced gate. Resolves Copilot P1 thread on PR #3555. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed (PR #3692) (#3693) Highest-value-per-effort substrate of session — mechanizes the bug class that shipped twice this session (5-`..` paths resolving to docs/ instead of repo root). 255-line audit walks 833 shards, found 17 pre-existing findings as detect-only baseline. Followup: cleanup PR + enforce gate following same 4-step pattern as §33 migration xrefs (PR #3513 → #3529 → #3548 → #3552 → enforce). GraphQL still 0/5000 (resets 02:55:28Z); REST sufficient for PR creation. Auto-merge arming on #3690 + #3692 deferred to post-reset tick. Co-authored-by: Claude <noreply@anthropic.com>
…cleanup pending) (#3692) * feat(hygiene): tick-shard relative-path audit (detect-only; baseline cleanup pending) Bug class: tick shards live 5 directories below docs/, so the count-the-.. pattern is error-prone. Empirical evidence this session: PR #3676 + PR #3679 both shipped with 5-`..` paths that resolved to docs/ instead of repo root; Copilot caught both via review threads, but the broken links landed on main briefly (PR #3680 fixed post-merge). This audit walks docs/hygiene-history/ticks/**/*.md, extracts every relative markdown link target (skipping URLs/anchors/code-blocks/images), resolves from the shard's directory, and reports missing-or-escaping targets. Empirical baseline (run on origin/main at 2026-05-16T02:48Z): - 833 tick shards scanned - 17 broken relative-path links across multiple historical shards - Real bug classes detected: wrong-depth `..` (B-0442 link in 1436Z), malformed link syntax (`docs/api(v2`), missing-file refs Detect-only initially. CI enforce wires in after baseline cleanup (same pattern as §33 migration xrefs: PR #3513 → #3529 → #3548 → #3552 → enforce). `bun --bun tsc --noEmit -p tsconfig.json` exit 0. Co-Authored-By: Claude <noreply@anthropic.com> * fix(audit): skip placeholder targets (..., parens, identifier-only) First baseline showed 17 findings; ~7 were false positives where shard prose contained inline `[label](path-shape)` constructs as pattern illustrations: - `path` / `otto-kenji-...` / `.claude/...` / `docs/...` — placeholder names - `docs/api(v2` — fragmentary malformed syntax - `docs/research/...amara-...md` — ellipsis-marked example Add `isPlaceholderTarget` filter: - contains `...` → placeholder - contains `(` or `)` → malformed/fragment - no `/` AND no `.` → pure identifier (not a path) Re-run: 17 → 10 findings. The 10 remaining are real broken links (wrong-depth `..` in `1436Z.md`, `0329Z.md`, `0852Z.md`; one borderline `docs/foo.md` example). Worth a separate baseline-cleanup PR. `bun --bun tsc --noEmit -p tsconfig.json` exit 0. Co-Authored-By: Claude <noreply@anthropic.com> * fix(audit): 4 Copilot P1/P2 — sonarjs disable, main export, URI scheme, --files validation PR #3692 review threads: P1 (lint failure risk): 1. spawnSync("git", ...) at repoRoot() needs the standard repo-convention `// eslint-disable-next-line sonarjs/no-os-command-from-path` comment. Every sibling tool (check-tick-history-shard-schema.ts:23, etc.) uses it. 2. Top-level `process.exit(main(...))` blocks safe module-import for tests or composition. Switch to `export function main` + guarded `if (import.meta.main) { process.exit(main(...)); }` per the sibling audit-section-33-migration-xrefs.ts convention. P2 (precision / brittleness): 3. isRelativeTarget only exempts http(s) + mailto. Replace with a generic `<scheme>:` regex (`/^[A-Za-z][A-Za-z0-9+.-]*:/`) so ftp:, file:, tel:, data:, etc. are properly classified as absolute. 4. --files inputs aren't validated; readFileSync throws on missing path. Add an explicit existence check at the args boundary; emit `input not found: <path>` and return exit 64. Local verify: - Baseline still 10 findings (no regression) - `--files /tmp/does-not-exist` → exit 64 with structured message - `bun --bun tsc --noEmit -p tsconfig.json` exit 0 Co-Authored-By: Claude <noreply@anthropic.com> * fix(audit): 2 Copilot fixups — directory inputs + Windows path separator PR #3692 second-pass review threads: P1 (line 244): --files validation only checked existsSync; a directory or unreadable file passed the preflight, then `readFileSync` threw EISDIR/EACCES inside extractLinks, bypassing the structured exit-64 contract. Tighten to also require `statSync(abs).isFile()` and wrap stat in try/catch for permission failures. Empirical verify: - --files docs/hygiene-history/ → "input not a regular file" + exit 64 - --files /tmp/does-not-exist → "input not found" + exit 64 P2 (line 210): Repo-boundary check hardcoded "/" in `ROOT + "/"`. On Windows `resolve()` returns paths with `\\` separators, so valid in-repo targets like `C:\\repo\\docs\\...` would fail the `C:\\repo/` prefix test and be flagged as `escapes-repo` — false positive that would break --enforce mode on Windows CI. Replace with platform-correct `PATH_SEP` imported as `sep as PATH_SEP` from node:path. Local verify: - Baseline still 10 findings (no regression) - `bun --bun tsc --noEmit -p tsconfig.json` exit 0 Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…e B shipped, Slice A pending) (#3763) * chore(b-0533): add Status section confirming partial-completion (Slice B shipped, Slice A pending) Empirical pure-git audit at 2026-05-16T05:48Z (rate-limit 0/5000) confirms B-0533 is partial completion per row-close gate triage. Shipped (Slice B): tools/hygiene/audit-section-33-migration-xrefs.ts via PR #3548 + PR #3555; gate.yml lint-section-33-migration-xrefs job wired. Pending (Slice A): the actual sweep of dead xrefs. Empirical evidence: multiple recent PRs (#3670, #3659, #3643, #3633, #3599) show the lint check FAILING — meaning dead xrefs persist. Row stays status: open until Slice A's persona-batched sweep PRs land. Co-Authored-By: Claude <noreply@anthropic.com> * chore(b-0533): bump last_updated to 2026-05-16 per tools/backlog/README.md (review fix) --------- Co-authored-by: Claude <noreply@anthropic.com>
Summary
B-0533 Slice B.1. Mechanizes the dead-xref class Codex P2 caught on PR #3513 (Riven §33 archive migration).
Empirical baseline (first run on `origin/main`)
```
```
Substrate-honest correction to B-0533: the row's rough estimate of "20+" was a false positive from sloppy grep parsing. Real count is 10 (9 + 1).
Scope (Slice B.1)
Follow-up slices (separate PRs)
Test plan
Composes with
🤖 Generated with Claude Code