feat(audit): add --baseline flag + initial baseline of 10 grandfathered findings#3699
Conversation
…ed findings
Resolves the baseline-cleanup question deferred since tick 8 by choosing
option D (--baseline grandfather mechanism). Avoids the tick-shard-
immutability tension entirely: don't edit historical shards; track what's
grandfathered so new violations still fail CI.
API:
--baseline <path> Load JSON file of known-acceptable findings.
Each entry: { file, line, target }.
Match key: (file, line, target) triple.
Output:
- Detect-only: same as before (lists all findings)
- With baseline: shows "(N grandfathered by baseline, M new)" + only
lists NEW findings in the human-readable detail
- --json: adds newFindings + baselineMatched + baselineLoaded fields
Exit codes:
--enforce + no baseline → exit 1 on any finding (legacy behavior)
--enforce + baseline → exit 1 only on NEW findings (gate behavior)
--baseline <missing> → exit 64 (structured arg error)
Initial baseline ships with the 10 known findings from the empirical run:
- 1 in 0852Z.md
- 5 in 1436Z.md
- 3 in 0329Z.md
- 1 in 2158Z.md
Same shape as Stryker --reset or ESLint suppressions. Unblocks the CI-gate
wire-up (next-tick): the audit can ship --enforce --baseline as a
non-required check without breaking on pre-existing historical residue.
Local verify:
- detect-only: 10 findings (no regression)
- --baseline (valid): 10 grandfathered, 0 new, exit 0
- --enforce --baseline: exit 0 (all grandfathered)
- --enforce (no baseline): exit 1 (legacy behavior preserved)
- --baseline /nonexistent: exit 64 "baseline file not found"
- tsc --noEmit: exit 0
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8fe1b7101b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a baseline/grandfathering mechanism to the tick-shard relative-link audit so CI can enforce only new violations while keeping historical residue visible under tick-shard immutability.
Changes:
- Add
--baseline <path>argument that partitions findings into baseline-matched vs new, and changes--enforceto fail only on new findings when a baseline is provided. - Extend JSON and human-readable output to report baseline/new finding counts and (for humans) print only the new finding details.
- Introduce an initial baseline JSON file containing 10 grandfathered findings.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/hygiene/audit-tick-shard-relative-paths.ts | Adds --baseline support, baseline loading, finding partitioning, updated output, and enforce semantics. |
| tools/hygiene/audit-tick-shard-relative-paths.baseline.json | Adds the initial baseline set of known findings to grandfather. |
…lity question (PR #3699) (#3701) Tick 11 substantive landing: --baseline flag added to audit (option D from tick 8's deferred decision). Avoids the tick-shard-immutability tension entirely — don't edit historical shards; track grandfathered findings; new violations still fail --enforce. Same shape as Stryker/ESLint suppressions. Initial baseline ships with 10 findings from the empirical 02:48Z run. PR #3692 (audit script) MERGED 03:08:39Z by auto-merge — raced my baseline- feature push by ~6s; recovered by cherry-pick onto fresh branch. PR #3699 is the recovered fresh-branch PR. PR #3697 also merged this tick (03:04:32Z). Audit-script PR lifecycle now at 7 steps (matching §33 audit's 4-step backbone + 2 quality rounds + baseline). CI-gate wire-up is the next-tick candidate, unblocked by this baseline landing. Co-authored-by: Claude <noreply@anthropic.com>
…array PR #3699 review threads (2 P1 + 1 P2 from Copilot): P1/P2 (line 144): loadBaseline blindly cast parsed JSON array to BaselineEntry[]. Malformed entries (null, wrong types, missing fields) either crashed later in isInBaseline (null.file) or silently failed to match grandfathered findings — converting them to NEW under --enforce. Documented behavior says malformed = exit 64. Fix: add `isBaselineEntry` type guard that validates each element: - file is string - line is integer >= 1 - target is string Bad entries collected with index + reason; emit "baseline entry [N] invalid: ..." and exit 64. P1 (line 368): JSON output emitted `baselineMatched: <number>` while the docstring described partitioning into baselineMatched vs newFindings as parallel arrays. API mismatch. Fix: emit `baselineMatched` as the actual array of findings (parallel to `newFindings`); consumers compute the count from `.length`. Local verify: - Valid baseline: exit 1 (10 grandfathered + 1 transient new) — unchanged - `[null]` baseline: exit 64 "baseline entry [0] invalid: ..." - `[{file, line as string}]` baseline: exit 64 - `[{file, line, missing target}]` baseline: exit 64 - --json: baselineMatched is array of 10 (was number); newFindings array len 1 - tsc --noEmit: exit 0 Co-Authored-By: Claude <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…dings → fixup be3998f) (#3703) 3 new Copilot threads on PR #3699 — all real: - P1/P2 line 144: loadBaseline blindly cast parsed array; malformed entries bypass documented exit-64. Add isBaselineEntry type guard validating each element (file:string, line:integer>=1, target:string). - P1 line 368: JSON output emitted baselineMatched as number while docstring described partition arrays. Fix: emit as the actual array. Tick 12 substantive landing. 9 total real Copilot findings on the audit script across 3 review rounds (discovery → scanner → filter → quality-r1 → quality-r2 → quality-r3), all caught pre-merge. CI gate wire-up unblocked, pending #3699 merge. PR #3698 also merged 03:09:02Z (carry from tick 10). Co-authored-by: Claude <noreply@anthropic.com>
…seline (#3708) Adds the final step of the tick-shard-relative-path audit lifecycle: discovery (#3676/#3679) → narrow fix (#3680) → scanner (#3692) → filter + quality × 3 (#3692 fixups) → baseline mechanism (#3699) → THIS JOB. The job runs `audit-tick-shard-relative-paths.ts --enforce --baseline tools/hygiene/audit-tick-shard-relative-paths.baseline.json`, exiting 1 only on NEW findings (not in baseline). The 10 pre-existing findings recorded in the baseline file stay grandfathered — same shape as Stryker `--reset` or ESLint suppressions. This is a NON-required check by default per gate.yml convention (only the checks explicitly listed in branch-protection rules are required). The job will surface as a status check on every PR; specific path-failure detection prevents the wrong-depth-`..` bug class from recurring on new shards. Local verify on origin/main + new files: - 842 shards scanned (was 833 in tick 7; +9 from this session's merges) - 10 grandfathered (matches baseline) - 0 NEW findings - exit 0 Composes with: audit-section-33-migration-xrefs.ts (sibling gate, same lifecycle pattern), blocked-green-ci-investigate-threads.md (the rule this catch surface mechanizes for tick-shard navigation specifically). Co-authored-by: Claude <noreply@anthropic.com>
… gate (PR #3708) (#3709) * shard(tick): 2026-05-16T03:28Z — audit-script lifecycle CLOSED via CI gate (PR #3708) 3 PRs landed during tick 13 cycle (#3699 baseline mechanism, #3703 0316Z shard, #3690 finally after MD038 fix). The audit-script lifecycle is now structurally complete: discovery → narrow-fix → scanner → quality × 3 → baseline → CI enforce gate. PR #3708 ships the gate. Same §33-audit lifecycle pattern (PR #3513 → #3552 → enforce), compressed across 14 ticks of one session. Local gate-invocation verify on main + new files: 842 shards, 10 grandfathered, 0 NEW, exit 0. The earlier transient 0249Z.md:4 → 0240Z.md finding self-resolved when PR #3690 merged. TodoWrite adopted this tick for the 4-step gate-wire (wire → verify → PR → shard). Aligned naturally with per-tick discipline. Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 0328Z — fix parent-tick link + status-term drift (PR #3709 review) - Merged origin/main: adds 0322Z.md to tree so parent-tick link resolves at review time (was P0 copilot + P2 codex finding; link target existed on main but not on the PR branch) - "landed" → "opened (armed for auto-merge)" for #3708, since the lifecycle table marks it as armed not merged (copilot) - Table-syntax finding (||) is a false positive — table uses single | (line 18: `| ~~#3690~~ ...`) --------- Co-authored-by: Claude <noreply@anthropic.com>
…stale/FP) (#3715) PR #3707 + #3708 merged. 6 new Copilot threads investigated: - PR #3710 (AUDIT-LIFECYCLE.md): 2 real — name attribution (Codex/Riven → role-refs) + §33 PR-attribution factual error (PR #3552 baseline cleanup + PR #3555 CI enforce, not both #3552). Fixup cd7ba81. - PR #3709 (0328Z shard): 4 threads — 2 stale (0322Z merged via #3707), 1 minor prose-drift, 1 false-positive (4th time on table-pipes). All resolved no-op. The Copilot table-pipe || hallucination is now a 4-time pattern (#3685, #3690, #3699-era, #3709) — verify-first-resolve-no-op discipline. Co-authored-by: Claude <noreply@anthropic.com>
…ify-before-fix discipline (#3721) * rule(verify-reviewer-findings): extend blocked-green-ci rule with verify-before-fix discipline Extends `.claude/rules/blocked-green-ci-investigate-threads.md` with a composes-with section on verifying reviewer findings before applying fixes. Captures empirical evidence from the 2026-05-16 autonomous session: 1. Verification anchors: direct line-level awk inspection; gh api + git log for cross-reference claims; local lint/build re-run. 2. Suspect-by-default Copilot finding classes: table double-pipe (||) hallucination — 4 confirmed FPs in one session (PR #3685, #3690, #3699-era, #3709), all verified by direct awk as single-| rows. 3. Stale-but-fresh-looking findings: parent-tick links to shard files in sibling PRs (true at filing-time, self-healed by review-time); "X-status vs Y-status inconsistency" prose observations (accurate at write-time but underlying state moved). Resolve no-op. Threshold for adding a Copilot finding to the suspect-by-default list: two-or-more across distinct PRs. Markdownlint clean on the rule file. (The new check-shard-before-push.ts helper flagged 3 false-positive MD032s on bullet-continuation lines — filing as next-tick fix for the helper itself.) Co-Authored-By: Claude <noreply@anthropic.com> * fix(pr-3721): 2 Copilot findings — runnable awk + git log commands P0 (line 35): awk one-liner used `<N>` as a literal placeholder; if copied verbatim, awk treats `N` as uninitialized (defaults to 0) and prints nothing. Show `-v N=22` (literal value substitution) + explain the gotcha. P1 (line 38): `git log <PR-cited-PR>` doesn't work — git log expects refs/commits/paths, not PR numbers. Replace with three concrete runnable forms: - gh api repos/<owner>/<repo>/pulls/<N> → metadata - gh pr view <N> --json commits,mergeCommit → commits via API - git log --grep '#<N>' → local-repo merge-commit by PR-number Both fixes preserve the intent (verification anchors) while making the commands directly runnable. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
What
Adds
--baseline <path>flag toaudit-tick-shard-relative-paths.ts. With a baseline file loaded:baselineMatchedandnewFindings--enforcemode exits 1 only on NEW findings (not in baseline)newFindings,baselineMatched,baselineLoadedfieldsWhy
Resolves the baseline-cleanup question deferred since tick 8. The 10 pre-existing findings live in merged tick shards under tick-shard-immutability discipline (canonical:
docs/hygiene-history/ticks/README.md— "Each shard is an immutable per-tick event").Rather than choose between strict (don't edit; audit detect-only forever) and pragmatic (edit shards in-place), introduce a grandfather mechanism: ship the audit with a baseline of known findings. New violations still fail
--enforce; historical residue stays visible but doesn't block.Same shape as Stryker
--resetor ESLint suppressions.Initial baseline
tools/hygiene/audit-tick-shard-relative-paths.baseline.jsonships with the 10 findings from the empirical baseline run on 2026-05-16T02:48Z (origin/main at that time):docs/foo.mdexample)Local verify
--baseline <valid>--enforce --baseline(valid)--enforce(no baseline)--baseline /nonexistentbun --bun tsc --noEmitTransient new finding
This PR's branch shows 11 findings (10 grandfathered + 1 new). The 1 new is
0249Z.md:4 → 0240Z.md— my 0249Z shard cites 0240Z as parent-tick, but 0240Z.md hasn't merged to main yet (PR #3690 is armed but awaiting CI). The finding will self-resolve once #3690 merges.For this PR (which only INTRODUCES the mechanism, doesn't wire
--enforceto CI), the transient finding is fine — audit ships detect-only by default. The follow-up CI-gate PR will pick up whatever baseline state main has at that time.Followup (carried from §33 audit lifecycle pattern)
.github/workflows/gate.ymlnon-required job:bun tools/hygiene/audit-tick-shard-relative-paths.ts --enforce --baseline tools/hygiene/audit-tick-shard-relative-paths.baseline.jsonCo-Authored-By: Claude noreply@anthropic.com