chore: add 5 FP-prevention rules to /codebase-audit agent prompts#1734
Conversation
Adds Rules R-A through R-E to the agent-prompt template, derived from the 12 FP categories observed in the 2026-05-03 audit run: - R-A: verify proposed fix isn't already in place (catches the ghost-wired settings class where 11/13 of agent 09 findings were consumed by living machinery whose owning service was never started at boot; also catches NotImplementedError in mixins overridden by subclasses, and cleanup hooks already registered in test-setup). - R-B: read existing helper docstrings for design carve-outs (catches agent 138 flagging 5 retry loops that GeneralRetryHandler explicitly carved out as semantic / contention / sync-thread non-goals). - R-C: scope estimate alongside cited cases (catches agent 34 citing 12 plain-Exception classes when the actual project-wide population is 80-120 across 19 modules; user can't decide surgical vs convention rollout without the gap). - R-D: semantic equivalence before flagging duplicates (catches agent 110 over-flagging pairwise-compare clusters with divergent inner predicates, and agent 146 conflating intentional bootstrap multi- surface settings with accidental duplication). - R-E: caller-context distribution for API duplication findings (catches the AuthService mixed sync/async case where 100% of prod callers are async; only test fixtures use sync). These rules are additive to the round-1 fixes from PR #1732 (agent 09 six-pattern consumption recognition, scratch-script ban, Phase 7 cleanup, .gitignore broadening, etc). Each rule is grounded with a concrete 2026-05-03 example so future agents can recognise the pattern they're being told to avoid.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
🔇 Additional comments (1)
WalkthroughThe codebase-audit skill's agent prompt template was extended with a new "Five FP-prevention rules" section comprising five verification requirements (R-A through R-E). These rules instruct agents to perform specific false-positive avoidance checks prior to emitting findings, including verifying proposed fixes are not already implemented, consulting docstring carve-outs before suggesting centralization, including scope estimates for convention-wide issues, performing semantic equivalence checks to distinguish intentional multi-instance patterns from accidental duplication, and enumerating all callers across src/ and tests/ directories for API-duplication findings to calibrate severity based on production-versus-test usage distribution. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new section to the codebase audit skill documentation detailing five mandatory rules for preventing false positives, including verification of existing fixes, reading helper docstrings, providing scope estimates, checking semantic equivalence, and analyzing caller distribution. The review feedback suggests using full workspace-relative paths for service wiring files to ensure agents can locate them reliably and recommends a minor grammatical correction in the rule examples for consistency.
| - For "settings-not-wired" findings: trace the setting through the consuming | ||
| object's STARTUP wiring, not just import-graph references. A setting | ||
| consumed by code that never runs at startup (because the parent service is | ||
| never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival |
There was a problem hiding this comment.
For Rule R-A, using full workspace-relative paths for auto_wire.py, lifecycle_helpers.py, and app.py will help the audit agents locate these files more reliably. These are the primary locations where service instantiation and wiring occur.
| never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival | |
| never instantiated in src/synthorg/api/auto_wire.py, src/synthorg/api/lifecycle_helpers.py, or src/synthorg/api/app.py) is dead-on-arrival |
| test file and downgrade severity if the missing methods aren't called by | ||
| any test. | ||
|
|
||
| The 2026-05-03 run had agent flag AuthService's mixed sync/async API as |
There was a problem hiding this comment.
The example for Rule R-E is missing an article or an agent ID, which makes it inconsistent with the examples in Rules R-A through R-D. Adding "an agent" or the specific agent ID (likely Agent 149) improves clarity.
| The 2026-05-03 run had agent flag AuthService's mixed sync/async API as | |
| The 2026-05-03 run had an agent flag AuthService's mixed sync/async API as |
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Release notes now include AI-generated Highlights and commit subject/body in dev releases. ### What's new - Trace added to detect ghost-wired settings for better linting feedback. ### Under the hood - Docker publish step now retries transient GHCR errors to improve release stability. - Codebase audit incorporated 21 mechanical fixes from recent reviews to improve quality. - Added false-positive prevention rules to codebase audit prompts to reduce errors. - Enhanced codebase audit skill based on insights from latest audits. - Web component tightened async-leak detection and improved jsdom Storage timer handling. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.7.9](v0.7.8...v0.7.9) (2026-05-04) ### Features * **ci:** promote AI Highlights to release body, add commit subject/body to dev releases ([#1743](#1743)) ([70bffe9](70bffe9)) * **lint:** bootstrap-wiring trace for ghost-wired settings ([#1742](#1742)) ([6ddbb86](6ddbb86)), closes [#1737](#1737) ### Bug Fixes * **ci:** retry transient GHCR errors in docker publish + retag ([#1741](#1741)) ([18e5349](18e5349)) ### Maintenance * 2026-05-03 Bucket A audit ([#1733](#1733)): 21 mechanical fixes ([#1744](#1744)) ([334262b](334262b)) * add 5 FP-prevention rules to /codebase-audit agent prompts ([#1734](#1734)) ([8f33ad6](8f33ad6)) * improve codebase-audit skill from 2026-05-03 lessons ([#1732](#1732)) ([7281cd7](7281cd7)) * **web:** tighten async-leak ceiling, bypass jsdom Storage timer dispatch ([#1728](#1728)) ([de0a1e1](de0a1e1)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Adds Rules R-A through R-E to the
/codebase-auditagent-prompt template. Each is grounded in a specific FP category observed in the 2026-05-03 audit run (12 categories tracked during the post-run walk-through). Skill-only change; no source code touched.Why
After PR #1732 (round-1 fixes), four agents still produced FPs at significant rates because their prompts shared the same underlying gaps:
These five rules are mandatory for every audit agent.
The 5 rules
R-A: Verify the proposed fix isn't already in place
Catches:
cancelPendingPersistis intest-setup.tsx:272).lifecycle_helpers.py/app.pystartup wiring.R-B: Read existing helper docstrings for design carve-outs
Catches: agent 138 flagged 5 inline retry loops as needing centralisation, but
GeneralRetryHandler's module docstring explicitly carved out 4 of the 5 (semantic self-correction, contention, sync-thread). Reading the docstring catches design intent.R-C: Scope estimate alongside cited cases
Catches: agent 34 cited 12 plain-Exception classes; actual project-wide population is ~80-120 across 19 modules. User's surgical-vs-convention-rollout decision depends on knowing the gap. New rule: agents emit "12 cited; sample suggests ~80-120 across N modules".
R-D: Semantic equivalence before flagging duplicates
Catches: agent 110 flagged a pairwise-compare cluster as duplicate when one used field-equality and the other used text-similarity (identical iteration, divergent inner predicate). Agent 146 conflated intentional bootstrap multi-surface settings with accidental duplication (4 of 8 flagged settings were intentional).
R-E: Caller-context distribution for API duplication findings
Catches: AuthService mixed sync/async API was high-severity in raw findings; actual distribution was 100% production-async + test-fixture-sync, which made the choice obvious. New rule: agents emit caller distribution before recommending the fix shape.
Test plan
Skill is docs-only. Validated by:
Out of scope
A separate master tracking issue (filed alongside this PR) captures the remaining session learnings: 5 follow-up issues for larger workstreams (config layering, error-handling convention rollout, no-hardcoded-values rule, bootstrap-wiring lint, scoring research), Group 1+2+3 decisions with research-backed rationale, and the meta-process plan (scaffolding templates, pre-PR mini-audit, worktree do-not-introduce, convention-must-ship-its-gate).
Closes
No linked issue (skill self-improvement).