chore: add 5 FP-prevention rules to /codebase-audit agent prompts by Aureliolo · Pull Request #1734 · Aureliolo/synthorg

Aureliolo · 2026-05-03T17:38:29Z

Summary

Adds Rules R-A through R-E to the /codebase-audit agent-prompt template. Each is grounded in a specific FP category observed in the 2026-05-03 audit run (12 categories tracked during the post-run walk-through). Skill-only change; no source code touched.

Why

After PR #1732 (round-1 fixes), four agents still produced FPs at significant rates because their prompts shared the same underlying gaps:

Agents flag patterns without verifying the surrounding state (existing fixes, docstring carve-outs, caller distribution, semantic equivalence).
Convention-scope findings cite N cases without estimating total population, so the user can't choose between surgical fix and convention rollout.
Mock-drift / API-duplication findings calibrate severity without sampling actual usage.

These five rules are mandatory for every audit agent.

The 5 rules

R-A: Verify the proposed fix isn't already in place

Catches:

Memory-leak findings whose cleanup hook IS already registered (agent 97 Site 2 was a FP because cancelPendingPersist is in test-setup.tsx:272).
NotImplementedError-in-mixin findings overridden in concrete subclass (agent 18 had 2/3 sample FP rate).
Ghost-wired settings (most severe class): agent 09 flagged 13 settings dead; 11 of 13 were consumed by living machinery whose owning service was simply never instantiated at boot. Import-graph trace missed this entirely. The new rule requires tracing through lifecycle_helpers.py / app.py startup wiring.

R-B: Read existing helper docstrings for design carve-outs

Catches: agent 138 flagged 5 inline retry loops as needing centralisation, but GeneralRetryHandler's module docstring explicitly carved out 4 of the 5 (semantic self-correction, contention, sync-thread). Reading the docstring catches design intent.

R-C: Scope estimate alongside cited cases

Catches: agent 34 cited 12 plain-Exception classes; actual project-wide population is ~80-120 across 19 modules. User's surgical-vs-convention-rollout decision depends on knowing the gap. New rule: agents emit "12 cited; sample suggests ~80-120 across N modules".

R-D: Semantic equivalence before flagging duplicates

Catches: agent 110 flagged a pairwise-compare cluster as duplicate when one used field-equality and the other used text-similarity (identical iteration, divergent inner predicate). Agent 146 conflated intentional bootstrap multi-surface settings with accidental duplication (4 of 8 flagged settings were intentional).

R-E: Caller-context distribution for API duplication findings

Catches: AuthService mixed sync/async API was high-severity in raw findings; actual distribution was 100% production-async + test-fixture-sync, which made the choice obvious. New rule: agents emit caller distribution before recommending the fix shape.

Test plan

Skill is docs-only. Validated by:

Reading the 5 new rules end-to-end for clarity + grounding.
Each rule cites the concrete 2026-05-03 example so future agents can recognise the pattern.

Out of scope

A separate master tracking issue (filed alongside this PR) captures the remaining session learnings: 5 follow-up issues for larger workstreams (config layering, error-handling convention rollout, no-hardcoded-values rule, bootstrap-wiring lint, scoring research), Group 1+2+3 decisions with research-backed rationale, and the meta-process plan (scaffolding templates, pre-PR mini-audit, worktree do-not-introduce, convention-must-ship-its-gate).

Closes

No linked issue (skill self-improvement).

Adds Rules R-A through R-E to the agent-prompt template, derived from the 12 FP categories observed in the 2026-05-03 audit run: - R-A: verify proposed fix isn't already in place (catches the ghost-wired settings class where 11/13 of agent 09 findings were consumed by living machinery whose owning service was never started at boot; also catches NotImplementedError in mixins overridden by subclasses, and cleanup hooks already registered in test-setup). - R-B: read existing helper docstrings for design carve-outs (catches agent 138 flagging 5 retry loops that GeneralRetryHandler explicitly carved out as semantic / contention / sync-thread non-goals). - R-C: scope estimate alongside cited cases (catches agent 34 citing 12 plain-Exception classes when the actual project-wide population is 80-120 across 19 modules; user can't decide surgical vs convention rollout without the gap). - R-D: semantic equivalence before flagging duplicates (catches agent 110 over-flagging pairwise-compare clusters with divergent inner predicates, and agent 146 conflating intentional bootstrap multi- surface settings with accidental duplication). - R-E: caller-context distribution for API duplication findings (catches the AuthService mixed sync/async case where 100% of prod callers are async; only test fixtures use sync). These rules are additive to the round-1 fixes from PR #1732 (agent 09 six-pattern consumption recognition, scratch-script ban, Phase 7 cleanup, .gitignore broadening, etc). Each rule is grounded with a concrete 2026-05-03 example so future agents can recognise the pattern they're being told to avoid.

github-actions · 2026-05-03T17:38:40Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-05-03T17:38:42Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 596475c3-24a0-4da6-b980-e6f32dc471f0

📥 Commits

Reviewing files that changed from the base of the PR and between 7281cd7 and d09e2d8.

📒 Files selected for processing (1)

.claude/skills/codebase-audit/SKILL.md

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Analyze (actions)
GitHub Check: Analyze (python)
GitHub Check: Analyze (go)
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (1)

.claude/skills/codebase-audit/SKILL.md (1)

182-273: Excellent FP-prevention framework with strong grounding in real audit data.

The five rules are well-structured and clearly target specific false-positive patterns identified in the 2026-05-03 audit run. Each rule includes:

Concrete examples with agent numbers and finding counts

Clear verification steps

Actionable guidance

The grounding is particularly strong:

R-A: "11/13 settings flagged as dead" with specific pattern explanation

R-B: "agent 138, 5 findings, 4 were FPs" with root cause

R-C: "agent 34, 12 cited but ~80-120 actual" demonstrating scope gap

R-D: agents 110 and 146 with contrasting examples (semantic vs structural similarity)

R-E: AuthService example showing value of caller distribution analysis

Observations:

R-A's startup-wiring trace (lines 197-204) is the most complex verification requirement. The distinction between "dead-on-arrival" (service never instantiated) vs "generic dead" may require significant agent sophistication.

R-D's semantic equivalence check (lines 236-252) requires judgment about "intentional" vs "accidental" patterns, which could be challenging for automated agents to distinguish reliably.

These complexity concerns are offset by the specific examples and clear guidance. The rules should significantly reduce the FP rate observed in prior runs.

Walkthrough

The codebase-audit skill's agent prompt template was extended with a new "Five FP-prevention rules" section comprising five verification requirements (R-A through R-E). These rules instruct agents to perform specific false-positive avoidance checks prior to emitting findings, including verifying proposed fixes are not already implemented, consulting docstring carve-outs before suggesting centralization, including scope estimates for convention-wide issues, performing semantic equivalence checks to distinguish intentional multi-instance patterns from accidental duplication, and enumerating all callers across src/ and tests/ directories for API-duplication findings to calibrate severity based on production-versus-test usage distribution.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately summarizes the main change: adding five FP-prevention rules to the codebase-audit agent prompts.
Description check	✅ Passed	The description comprehensively explains the purpose, rationale, and details of all five FP-prevention rules with concrete examples from the audit run.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new section to the codebase audit skill documentation detailing five mandatory rules for preventing false positives, including verification of existing fixes, reading helper docstrings, providing scope estimates, checking semantic equivalence, and analyzing caller distribution. The review feedback suggests using full workspace-relative paths for service wiring files to ensure agents can locate them reliably and recommends a minor grammatical correction in the rule examples for consistency.

gemini-code-assist · 2026-05-03T17:40:15Z

+- For "settings-not-wired" findings: trace the setting through the consuming
+  object's STARTUP wiring, not just import-graph references. A setting
+  consumed by code that never runs at startup (because the parent service is
+  never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival


For Rule R-A, using full workspace-relative paths for auto_wire.py, lifecycle_helpers.py, and app.py will help the audit agents locate these files more reliably. These are the primary locations where service instantiation and wiring occur.

Suggested change

never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival

never instantiated in src/synthorg/api/auto_wire.py, src/synthorg/api/lifecycle_helpers.py, or src/synthorg/api/app.py) is dead-on-arrival

gemini-code-assist · 2026-05-03T17:40:15Z

+  test file and downgrade severity if the missing methods aren't called by
+  any test.
+
+The 2026-05-03 run had agent flag AuthService's mixed sync/async API as


The example for Rule R-E is missing an article or an agent ID, which makes it inconsistent with the examples in Rules R-A through R-D. Adding "an agent" or the specific agent ID (likely Agent 149) improves clarity.

Suggested change

The 2026-05-03 run had agent flag AuthService's mixed sync/async API as

The 2026-05-03 run had an agent flag AuthService's mixed sync/async API as

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Release notes now include AI-generated Highlights and commit subject/body in dev releases. ### What's new - Trace added to detect ghost-wired settings for better linting feedback. ### Under the hood - Docker publish step now retries transient GHCR errors to improve release stability. - Codebase audit incorporated 21 mechanical fixes from recent reviews to improve quality. - Added false-positive prevention rules to codebase audit prompts to reduce errors. - Enhanced codebase audit skill based on insights from latest audits. - Web component tightened async-leak detection and improved jsdom Storage timer handling.  :robot: I have created a release *beep* *boop* --- ## [0.7.9](v0.7.8...v0.7.9) (2026-05-04) ### Features * **ci:** promote AI Highlights to release body, add commit subject/body to dev releases ([#1743](#1743)) ([70bffe9](70bffe9)) * **lint:** bootstrap-wiring trace for ghost-wired settings ([#1742](#1742)) ([6ddbb86](6ddbb86)), closes [#1737](#1737) ### Bug Fixes * **ci:** retry transient GHCR errors in docker publish + retag ([#1741](#1741)) ([18e5349](18e5349)) ### Maintenance * 2026-05-03 Bucket A audit ([#1733](#1733)): 21 mechanical fixes ([#1744](#1744)) ([334262b](334262b)) * add 5 FP-prevention rules to /codebase-audit agent prompts ([#1734](#1734)) ([8f33ad6](8f33ad6)) * improve codebase-audit skill from 2026-05-03 lessons ([#1732](#1732)) ([7281cd7](7281cd7)) * **web:** tighten async-leak ceiling, bypass jsdom Storage timer dispatch ([#1728](#1728)) ([de0a1e1](de0a1e1)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed May 3, 2026

View reviewed changes

Aureliolo mentioned this pull request May 3, 2026

Audit 2026-05-03 follow-ups: decisions, FP-prevention, meta-process plan (handoff) #1735

Closed

coderabbitai Bot approved these changes May 3, 2026

View reviewed changes

Aureliolo merged commit 8f33ad6 into main May 3, 2026
41 checks passed

Aureliolo deleted the chore/codebase-audit-skill-fp-prevention branch May 3, 2026 17:59

synthorg-repo-bot Bot mentioned this pull request May 3, 2026

chore(main): release 0.7.9 #1731

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add 5 FP-prevention rules to /codebase-audit agent prompts#1734

chore: add 5 FP-prevention rules to /codebase-audit agent prompts#1734
Aureliolo merged 1 commit into
mainfrom
chore/codebase-audit-skill-fp-prevention

Aureliolo commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 3, 2026

Uh oh!

gemini-code-assist Bot May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival
	never instantiated in src/synthorg/api/auto_wire.py, src/synthorg/api/lifecycle_helpers.py, or src/synthorg/api/app.py) is dead-on-arrival

	The 2026-05-03 run had agent flag AuthService's mixed sync/async API as
	The 2026-05-03 run had an agent flag AuthService's mixed sync/async API as

Conversation

Aureliolo commented May 3, 2026

Summary

Why

The 5 rules

R-A: Verify the proposed fix isn't already in place

R-B: Read existing helper docstrings for design carve-outs

R-C: Scope estimate alongside cited cases

R-D: Semantic equivalence before flagging duplicates

R-E: Caller-context distribution for API duplication findings

Test plan

Out of scope

Closes

Uh oh!

github-actions Bot commented May 3, 2026

Dependency Review

Scanned Files

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 3, 2026 •

edited

Loading