rule(verify-reviewer-findings): extend blocked-green-ci rule with verify-before-fix discipline#3721
Merged
Conversation
…ify-before-fix discipline Extends `.claude/rules/blocked-green-ci-investigate-threads.md` with a composes-with section on verifying reviewer findings before applying fixes. Captures empirical evidence from the 2026-05-16 autonomous session: 1. Verification anchors: direct line-level awk inspection; gh api + git log for cross-reference claims; local lint/build re-run. 2. Suspect-by-default Copilot finding classes: table double-pipe (||) hallucination — 4 confirmed FPs in one session (PR #3685, #3690, #3699-era, #3709), all verified by direct awk as single-| rows. 3. Stale-but-fresh-looking findings: parent-tick links to shard files in sibling PRs (true at filing-time, self-healed by review-time); "X-status vs Y-status inconsistency" prose observations (accurate at write-time but underlying state moved). Resolve no-op. Threshold for adding a Copilot finding to the suspect-by-default list: two-or-more across distinct PRs. Markdownlint clean on the rule file. (The new check-shard-before-push.ts helper flagged 3 false-positive MD032s on bullet-continuation lines — filing as next-tick fix for the helper itself.) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the factory rule for diagnosing “BLOCKED with green required CI” to add a verify-before-fix discipline for reviewer-thread findings, including an empirical “suspect-by-default” list for known false-positive classes and guidance for handling stale-but-previously-true findings.
Changes:
- Add a “verify-before-fix” step after surfacing unresolved review threads to reduce wasted fixes/regressions from hallucinated findings.
- Introduce an empirical suspect-by-default list (currently the table-pipe false-positive class) with a threshold for inclusion.
- Document how to resolve stale-but-fresh-looking findings as no-ops when the underlying state self-heals.
P0 (line 35): awk one-liner used `<N>` as a literal placeholder; if copied verbatim, awk treats `N` as uninitialized (defaults to 0) and prints nothing. Show `-v N=22` (literal value substitution) + explain the gotcha. P1 (line 38): `git log <PR-cited-PR>` doesn't work — git log expects refs/commits/paths, not PR numbers. Replace with three concrete runnable forms: - gh api repos/<owner>/<repo>/pulls/<N> → metadata - gh pr view <N> --json commits,mergeCommit → commits via API - git log --grep '#<N>' → local-repo merge-commit by PR-number Both fixes preserve the intent (verification anchors) while making the commands directly runnable. Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 16, 2026
…er-bug (#3724) (#3725) GraphQL reset at 03:55:31Z. 3 real Copilot findings this tick: - PR #3720 (0350Z shard): B-0545 stale ref — peer-Otto landed sweep - PR #3721 (rule extension): 2 unrunnable command examples (awk N, git log on PR num) - Helper bullet-continuation MD032 FP (discovered tick 19): isContinuationLine fix in PR #3724 3/3 findings real this tick. Copilot's overall accuracy is high; the table-pipe || class remains the only confirmed 2+-occurrence FP. Dogfood loop: helper merged tick 17 → used in tick 19 on rule file → caught helper's own bug → fix landed tick 20 (this). Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 16, 2026
…ion (PR #3721) (#3722) * shard(tick): 2026-05-16T03:54Z — verify-reviewer-findings rule extension (PR #3721) PR #3716 (helper) + PR #3717 (0344Z) merged. Extended blocked-green-ci-investigate-threads.md with 3 composes-with sections: verification anchors + suspect-by-default Copilot finding list (table-pipe || hallucination, 4-FP entry) + stale-but-fresh-looking finding class. Self-discovery: running the just-merged check-shard-before-push.ts on the rule file surfaced a helper bug — checkMd032 flags bullet-continuation lines as false-positive paragraphs. Filing as next-tick fix. Also self-bite caught during shard authoring: 2 MD038 violations in the prose describing the helper bug (literal trailing-space-in-backticks examples). Rewrote the prose without the literal-trigger pattern. The self-check IS doing its job. GraphQL exhausted mid-tick after PR-3721 REST creation. PR #3721 NOT armed this tick; will arm post-reset at 03:55:31Z next tick. Co-Authored-By: Claude <noreply@anthropic.com> * fix(pr-3722): Copilot P1 — double-backtick code span for regex with literal backtick The inline code span containing the structural-marker regex contained a literal backtick inside the character class. Single-backtick delimiters ended early at the inner backtick, leaving the closing delimiter unmatched and tripping MD038 / breaking parse. Switch to a double-backtick code span (``/^[#>*\-|`]/``) which can carry single inner backticks since the closing run is two consecutive backticks. Verified: bun tools/hygiene/check-shard-before-push.ts ok; markdownlint-cli2 ok (exit 0). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extends
.claude/rules/blocked-green-ci-investigate-threads.mdwith verify-before-fix discipline + a suspect-by-default Copilot finding list.Empirical evidence from the 2026-05-16 autonomous session: 4 confirmed false positives on Copilot's table double-pipe (
||) hallucination class. Threshold for entry: 2+ FPs across distinct PRs.Also captures the stale-but-fresh-looking finding class (sibling-PR self-healing, write-time-accurate prose) — resolve no-op.
Discovered while authoring: my new
check-shard-before-push.tshelper flags bullet-continuation lines as MD032 false positives. Filing as next-tick fix for the helper itself.Co-Authored-By: Claude noreply@anthropic.com