Skip to content

feat(audit): add --baseline flag + initial baseline of 10 grandfathered findings#3699

Merged
AceHack merged 2 commits into
mainfrom
feat/audit-baseline-mechanism-otto-cli-2026-05-16
May 16, 2026
Merged

feat(audit): add --baseline flag + initial baseline of 10 grandfathered findings#3699
AceHack merged 2 commits into
mainfrom
feat/audit-baseline-mechanism-otto-cli-2026-05-16

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 16, 2026

What

Adds --baseline <path> flag to audit-tick-shard-relative-paths.ts. With a baseline file loaded:

  • Findings get partitioned into baselineMatched and newFindings
  • --enforce mode exits 1 only on NEW findings (not in baseline)
  • JSON output adds newFindings, baselineMatched, baselineLoaded fields
  • Human output shows "(N grandfathered by baseline, M new)" + only details the new ones

Why

Resolves the baseline-cleanup question deferred since tick 8. The 10 pre-existing findings live in merged tick shards under tick-shard-immutability discipline (canonical: docs/hygiene-history/ticks/README.md — "Each shard is an immutable per-tick event").

Rather than choose between strict (don't edit; audit detect-only forever) and pragmatic (edit shards in-place), introduce a grandfather mechanism: ship the audit with a baseline of known findings. New violations still fail --enforce; historical residue stays visible but doesn't block.

Same shape as Stryker --reset or ESLint suppressions.

Initial baseline

tools/hygiene/audit-tick-shard-relative-paths.baseline.json ships with the 10 findings from the empirical baseline run on 2026-05-16T02:48Z (origin/main at that time):

  • 1 in 0852Z.md (line 1)
  • 5 in 1436Z.md (lines 6 × 2, 30, 36 × 2)
  • 3 in 0329Z.md (lines 6, 7, 20)
  • 1 in 2158Z.md (line 29 — borderline docs/foo.md example)

Local verify

Test Result
detect-only 10 findings (no regression)
--baseline <valid> 10 grandfathered, 0 new (well, +1 transient — see below)
--enforce --baseline (valid) exit 0 (all grandfathered)
--enforce (no baseline) exit 1 (legacy behavior preserved)
--baseline /nonexistent exit 64 "baseline file not found"
bun --bun tsc --noEmit exit 0

Transient new finding

This PR's branch shows 11 findings (10 grandfathered + 1 new). The 1 new is 0249Z.md:4 → 0240Z.md — my 0249Z shard cites 0240Z as parent-tick, but 0240Z.md hasn't merged to main yet (PR #3690 is armed but awaiting CI). The finding will self-resolve once #3690 merges.

For this PR (which only INTRODUCES the mechanism, doesn't wire --enforce to CI), the transient finding is fine — audit ships detect-only by default. The follow-up CI-gate PR will pick up whatever baseline state main has at that time.

Followup (carried from §33 audit lifecycle pattern)

  • Wire .github/workflows/gate.yml non-required job: bun tools/hygiene/audit-tick-shard-relative-paths.ts --enforce --baseline tools/hygiene/audit-tick-shard-relative-paths.baseline.json

Co-Authored-By: Claude noreply@anthropic.com

…ed findings

Resolves the baseline-cleanup question deferred since tick 8 by choosing
option D (--baseline grandfather mechanism). Avoids the tick-shard-
immutability tension entirely: don't edit historical shards; track what's
grandfathered so new violations still fail CI.

API:
  --baseline <path>  Load JSON file of known-acceptable findings.
                     Each entry: { file, line, target }.
                     Match key: (file, line, target) triple.

Output:
  - Detect-only: same as before (lists all findings)
  - With baseline: shows "(N grandfathered by baseline, M new)" + only
    lists NEW findings in the human-readable detail
  - --json: adds newFindings + baselineMatched + baselineLoaded fields

Exit codes:
  --enforce + no baseline → exit 1 on any finding (legacy behavior)
  --enforce + baseline    → exit 1 only on NEW findings (gate behavior)
  --baseline <missing>    → exit 64 (structured arg error)

Initial baseline ships with the 10 known findings from the empirical run:
  - 1 in 0852Z.md
  - 5 in 1436Z.md
  - 3 in 0329Z.md
  - 1 in 2158Z.md

Same shape as Stryker --reset or ESLint suppressions. Unblocks the CI-gate
wire-up (next-tick): the audit can ship --enforce --baseline as a
non-required check without breaking on pre-existing historical residue.

Local verify:
- detect-only: 10 findings (no regression)
- --baseline (valid): 10 grandfathered, 0 new, exit 0
- --enforce --baseline: exit 0 (all grandfathered)
- --enforce (no baseline): exit 1 (legacy behavior preserved)
- --baseline /nonexistent: exit 64 "baseline file not found"
- tsc --noEmit: exit 0

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fe1b7101b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/audit-tick-shard-relative-paths.ts Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a baseline/grandfathering mechanism to the tick-shard relative-link audit so CI can enforce only new violations while keeping historical residue visible under tick-shard immutability.

Changes:

  • Add --baseline <path> argument that partitions findings into baseline-matched vs new, and changes --enforce to fail only on new findings when a baseline is provided.
  • Extend JSON and human-readable output to report baseline/new finding counts and (for humans) print only the new finding details.
  • Introduce an initial baseline JSON file containing 10 grandfathered findings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
tools/hygiene/audit-tick-shard-relative-paths.ts Adds --baseline support, baseline loading, finding partitioning, updated output, and enforce semantics.
tools/hygiene/audit-tick-shard-relative-paths.baseline.json Adds the initial baseline set of known findings to grandfather.

Comment thread tools/hygiene/audit-tick-shard-relative-paths.ts Outdated
Comment thread tools/hygiene/audit-tick-shard-relative-paths.ts Outdated
AceHack added a commit that referenced this pull request May 16, 2026
…lity question (PR #3699) (#3701)

Tick 11 substantive landing: --baseline flag added to audit (option D from
tick 8's deferred decision). Avoids the tick-shard-immutability tension
entirely — don't edit historical shards; track grandfathered findings;
new violations still fail --enforce. Same shape as Stryker/ESLint
suppressions.

Initial baseline ships with 10 findings from the empirical 02:48Z run.
PR #3692 (audit script) MERGED 03:08:39Z by auto-merge — raced my baseline-
feature push by ~6s; recovered by cherry-pick onto fresh branch. PR #3699
is the recovered fresh-branch PR. PR #3697 also merged this tick (03:04:32Z).

Audit-script PR lifecycle now at 7 steps (matching §33 audit's 4-step
backbone + 2 quality rounds + baseline). CI-gate wire-up is the next-tick
candidate, unblocked by this baseline landing.

Co-authored-by: Claude <noreply@anthropic.com>
…array

PR #3699 review threads (2 P1 + 1 P2 from Copilot):

P1/P2 (line 144): loadBaseline blindly cast parsed JSON array to
BaselineEntry[]. Malformed entries (null, wrong types, missing fields)
either crashed later in isInBaseline (null.file) or silently failed to
match grandfathered findings — converting them to NEW under --enforce.
Documented behavior says malformed = exit 64.

Fix: add `isBaselineEntry` type guard that validates each element:
  - file is string
  - line is integer >= 1
  - target is string

Bad entries collected with index + reason; emit "baseline entry [N]
invalid: ..." and exit 64.

P1 (line 368): JSON output emitted `baselineMatched: <number>` while the
docstring described partitioning into baselineMatched vs newFindings as
parallel arrays. API mismatch. Fix: emit `baselineMatched` as the actual
array of findings (parallel to `newFindings`); consumers compute the
count from `.length`.

Local verify:
- Valid baseline: exit 1 (10 grandfathered + 1 transient new) — unchanged
- `[null]` baseline: exit 64 "baseline entry [0] invalid: ..."
- `[{file, line as string}]` baseline: exit 64
- `[{file, line, missing target}]` baseline: exit 64
- --json: baselineMatched is array of 10 (was number); newFindings array len 1
- tsc --noEmit: exit 0

Co-Authored-By: Claude <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 1496f72 into main May 16, 2026
28 checks passed
@AceHack AceHack deleted the feat/audit-baseline-mechanism-otto-cli-2026-05-16 branch May 16, 2026 03:20
AceHack added a commit that referenced this pull request May 16, 2026
…dings → fixup be3998f) (#3703)

3 new Copilot threads on PR #3699 — all real:
- P1/P2 line 144: loadBaseline blindly cast parsed array; malformed entries
  bypass documented exit-64. Add isBaselineEntry type guard validating each
  element (file:string, line:integer>=1, target:string).
- P1 line 368: JSON output emitted baselineMatched as number while docstring
  described partition arrays. Fix: emit as the actual array.

Tick 12 substantive landing. 9 total real Copilot findings on the audit
script across 3 review rounds (discovery → scanner → filter → quality-r1
→ quality-r2 → quality-r3), all caught pre-merge. CI gate wire-up
unblocked, pending #3699 merge.

PR #3698 also merged 03:09:02Z (carry from tick 10).

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
…seline (#3708)

Adds the final step of the tick-shard-relative-path audit lifecycle:
discovery (#3676/#3679) → narrow fix (#3680) → scanner (#3692) → filter +
quality × 3 (#3692 fixups) → baseline mechanism (#3699) → THIS JOB.

The job runs `audit-tick-shard-relative-paths.ts --enforce --baseline
tools/hygiene/audit-tick-shard-relative-paths.baseline.json`, exiting 1
only on NEW findings (not in baseline). The 10 pre-existing findings
recorded in the baseline file stay grandfathered — same shape as Stryker
`--reset` or ESLint suppressions.

This is a NON-required check by default per gate.yml convention (only the
checks explicitly listed in branch-protection rules are required). The job
will surface as a status check on every PR; specific path-failure
detection prevents the wrong-depth-`..` bug class from recurring on new
shards.

Local verify on origin/main + new files:
- 842 shards scanned (was 833 in tick 7; +9 from this session's merges)
- 10 grandfathered (matches baseline)
- 0 NEW findings
- exit 0

Composes with: audit-section-33-migration-xrefs.ts (sibling gate, same
lifecycle pattern), blocked-green-ci-investigate-threads.md (the rule
this catch surface mechanizes for tick-shard navigation specifically).

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
… gate (PR #3708) (#3709)

* shard(tick): 2026-05-16T03:28Z — audit-script lifecycle CLOSED via CI gate (PR #3708)

3 PRs landed during tick 13 cycle (#3699 baseline mechanism, #3703 0316Z
shard, #3690 finally after MD038 fix). The audit-script lifecycle is now
structurally complete: discovery → narrow-fix → scanner → quality × 3 →
baseline → CI enforce gate. PR #3708 ships the gate.

Same §33-audit lifecycle pattern (PR #3513#3552 → enforce), compressed
across 14 ticks of one session.

Local gate-invocation verify on main + new files: 842 shards, 10
grandfathered, 0 NEW, exit 0. The earlier transient 0249Z.md:4 → 0240Z.md
finding self-resolved when PR #3690 merged.

TodoWrite adopted this tick for the 4-step gate-wire (wire → verify → PR →
shard). Aligned naturally with per-tick discipline.

Co-Authored-By: Claude <noreply@anthropic.com>

* shard(tick): 0328Z — fix parent-tick link + status-term drift (PR #3709 review)

- Merged origin/main: adds 0322Z.md to tree so parent-tick link
  resolves at review time (was P0 copilot + P2 codex finding;
  link target existed on main but not on the PR branch)
- "landed" → "opened (armed for auto-merge)" for #3708, since
  the lifecycle table marks it as armed not merged (copilot)
- Table-syntax finding (||) is a false positive — table uses
  single | (line 18: `| ~~#3690~~ ...`)

---------

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
…stale/FP) (#3715)

PR #3707 + #3708 merged. 6 new Copilot threads investigated:
- PR #3710 (AUDIT-LIFECYCLE.md): 2 real — name attribution (Codex/Riven →
  role-refs) + §33 PR-attribution factual error (PR #3552 baseline cleanup
  + PR #3555 CI enforce, not both #3552). Fixup cd7ba81.
- PR #3709 (0328Z shard): 4 threads — 2 stale (0322Z merged via #3707),
  1 minor prose-drift, 1 false-positive (4th time on table-pipes). All
  resolved no-op.

The Copilot table-pipe || hallucination is now a 4-time pattern (#3685,
#3690, #3699-era, #3709) — verify-first-resolve-no-op discipline.

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 16, 2026
…ify-before-fix discipline (#3721)

* rule(verify-reviewer-findings): extend blocked-green-ci rule with verify-before-fix discipline

Extends `.claude/rules/blocked-green-ci-investigate-threads.md` with a
composes-with section on verifying reviewer findings before applying
fixes. Captures empirical evidence from the 2026-05-16 autonomous
session:

1. Verification anchors: direct line-level awk inspection; gh api +
   git log for cross-reference claims; local lint/build re-run.

2. Suspect-by-default Copilot finding classes: table double-pipe (||)
   hallucination — 4 confirmed FPs in one session (PR #3685, #3690,
   #3699-era, #3709), all verified by direct awk as single-| rows.

3. Stale-but-fresh-looking findings: parent-tick links to shard files
   in sibling PRs (true at filing-time, self-healed by review-time);
   "X-status vs Y-status inconsistency" prose observations (accurate
   at write-time but underlying state moved). Resolve no-op.

Threshold for adding a Copilot finding to the suspect-by-default list:
two-or-more across distinct PRs.

Markdownlint clean on the rule file. (The new check-shard-before-push.ts
helper flagged 3 false-positive MD032s on bullet-continuation lines —
filing as next-tick fix for the helper itself.)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(pr-3721): 2 Copilot findings — runnable awk + git log commands

P0 (line 35): awk one-liner used `<N>` as a literal placeholder; if copied
verbatim, awk treats `N` as uninitialized (defaults to 0) and prints
nothing. Show `-v N=22` (literal value substitution) + explain the gotcha.

P1 (line 38): `git log <PR-cited-PR>` doesn't work — git log expects
refs/commits/paths, not PR numbers. Replace with three concrete runnable
forms:
  - gh api repos/<owner>/<repo>/pulls/<N> → metadata
  - gh pr view <N> --json commits,mergeCommit → commits via API
  - git log --grep '#<N>' → local-repo merge-commit by PR-number

Both fixes preserve the intent (verification anchors) while making the
commands directly runnable.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants