Skip to content

chore: add 5 FP-prevention rules to /codebase-audit agent prompts#1734

Merged
Aureliolo merged 1 commit into
mainfrom
chore/codebase-audit-skill-fp-prevention
May 3, 2026
Merged

chore: add 5 FP-prevention rules to /codebase-audit agent prompts#1734
Aureliolo merged 1 commit into
mainfrom
chore/codebase-audit-skill-fp-prevention

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds Rules R-A through R-E to the /codebase-audit agent-prompt template. Each is grounded in a specific FP category observed in the 2026-05-03 audit run (12 categories tracked during the post-run walk-through). Skill-only change; no source code touched.

Why

After PR #1732 (round-1 fixes), four agents still produced FPs at significant rates because their prompts shared the same underlying gaps:

  • Agents flag patterns without verifying the surrounding state (existing fixes, docstring carve-outs, caller distribution, semantic equivalence).
  • Convention-scope findings cite N cases without estimating total population, so the user can't choose between surgical fix and convention rollout.
  • Mock-drift / API-duplication findings calibrate severity without sampling actual usage.

These five rules are mandatory for every audit agent.

The 5 rules

R-A: Verify the proposed fix isn't already in place

Catches:

  • Memory-leak findings whose cleanup hook IS already registered (agent 97 Site 2 was a FP because cancelPendingPersist is in test-setup.tsx:272).
  • NotImplementedError-in-mixin findings overridden in concrete subclass (agent 18 had 2/3 sample FP rate).
  • Ghost-wired settings (most severe class): agent 09 flagged 13 settings dead; 11 of 13 were consumed by living machinery whose owning service was simply never instantiated at boot. Import-graph trace missed this entirely. The new rule requires tracing through lifecycle_helpers.py / app.py startup wiring.

R-B: Read existing helper docstrings for design carve-outs

Catches: agent 138 flagged 5 inline retry loops as needing centralisation, but GeneralRetryHandler's module docstring explicitly carved out 4 of the 5 (semantic self-correction, contention, sync-thread). Reading the docstring catches design intent.

R-C: Scope estimate alongside cited cases

Catches: agent 34 cited 12 plain-Exception classes; actual project-wide population is ~80-120 across 19 modules. User's surgical-vs-convention-rollout decision depends on knowing the gap. New rule: agents emit "12 cited; sample suggests ~80-120 across N modules".

R-D: Semantic equivalence before flagging duplicates

Catches: agent 110 flagged a pairwise-compare cluster as duplicate when one used field-equality and the other used text-similarity (identical iteration, divergent inner predicate). Agent 146 conflated intentional bootstrap multi-surface settings with accidental duplication (4 of 8 flagged settings were intentional).

R-E: Caller-context distribution for API duplication findings

Catches: AuthService mixed sync/async API was high-severity in raw findings; actual distribution was 100% production-async + test-fixture-sync, which made the choice obvious. New rule: agents emit caller distribution before recommending the fix shape.

Test plan

Skill is docs-only. Validated by:

  • Reading the 5 new rules end-to-end for clarity + grounding.
  • Each rule cites the concrete 2026-05-03 example so future agents can recognise the pattern.

Out of scope

A separate master tracking issue (filed alongside this PR) captures the remaining session learnings: 5 follow-up issues for larger workstreams (config layering, error-handling convention rollout, no-hardcoded-values rule, bootstrap-wiring lint, scoring research), Group 1+2+3 decisions with research-backed rationale, and the meta-process plan (scaffolding templates, pre-PR mini-audit, worktree do-not-introduce, convention-must-ship-its-gate).

Closes

No linked issue (skill self-improvement).

Adds Rules R-A through R-E to the agent-prompt template, derived from
the 12 FP categories observed in the 2026-05-03 audit run:

- R-A: verify proposed fix isn't already in place (catches the
  ghost-wired settings class where 11/13 of agent 09 findings were
  consumed by living machinery whose owning service was never started
  at boot; also catches NotImplementedError in mixins overridden by
  subclasses, and cleanup hooks already registered in test-setup).
- R-B: read existing helper docstrings for design carve-outs (catches
  agent 138 flagging 5 retry loops that GeneralRetryHandler explicitly
  carved out as semantic / contention / sync-thread non-goals).
- R-C: scope estimate alongside cited cases (catches agent 34 citing
  12 plain-Exception classes when the actual project-wide population
  is 80-120 across 19 modules; user can't decide surgical vs convention
  rollout without the gap).
- R-D: semantic equivalence before flagging duplicates (catches agent
  110 over-flagging pairwise-compare clusters with divergent inner
  predicates, and agent 146 conflating intentional bootstrap multi-
  surface settings with accidental duplication).
- R-E: caller-context distribution for API duplication findings
  (catches the AuthService mixed sync/async case where 100% of prod
  callers are async; only test fixtures use sync).

These rules are additive to the round-1 fixes from PR #1732 (agent 09
six-pattern consumption recognition, scratch-script ban, Phase 7
cleanup, .gitignore broadening, etc).

Each rule is grounded with a concrete 2026-05-03 example so future
agents can recognise the pattern they're being told to avoid.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 596475c3-24a0-4da6-b980-e6f32dc471f0

📥 Commits

Reviewing files that changed from the base of the PR and between 7281cd7 and d09e2d8.

📒 Files selected for processing (1)
  • .claude/skills/codebase-audit/SKILL.md
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (actions)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
.claude/skills/codebase-audit/SKILL.md (1)

182-273: Excellent FP-prevention framework with strong grounding in real audit data.

The five rules are well-structured and clearly target specific false-positive patterns identified in the 2026-05-03 audit run. Each rule includes:

  • Concrete examples with agent numbers and finding counts
  • Clear verification steps
  • Actionable guidance

The grounding is particularly strong:

  • R-A: "11/13 settings flagged as dead" with specific pattern explanation
  • R-B: "agent 138, 5 findings, 4 were FPs" with root cause
  • R-C: "agent 34, 12 cited but ~80-120 actual" demonstrating scope gap
  • R-D: agents 110 and 146 with contrasting examples (semantic vs structural similarity)
  • R-E: AuthService example showing value of caller distribution analysis

Observations:

  1. R-A's startup-wiring trace (lines 197-204) is the most complex verification requirement. The distinction between "dead-on-arrival" (service never instantiated) vs "generic dead" may require significant agent sophistication.
  2. R-D's semantic equivalence check (lines 236-252) requires judgment about "intentional" vs "accidental" patterns, which could be challenging for automated agents to distinguish reliably.

These complexity concerns are offset by the specific examples and clear guidance. The rules should significantly reduce the FP rate observed in prior runs.


Walkthrough

The codebase-audit skill's agent prompt template was extended with a new "Five FP-prevention rules" section comprising five verification requirements (R-A through R-E). These rules instruct agents to perform specific false-positive avoidance checks prior to emitting findings, including verifying proposed fixes are not already implemented, consulting docstring carve-outs before suggesting centralization, including scope estimates for convention-wide issues, performing semantic equivalence checks to distinguish intentional multi-instance patterns from accidental duplication, and enumerating all callers across src/ and tests/ directories for API-duplication findings to calibrate severity based on production-versus-test usage distribution.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately summarizes the main change: adding five FP-prevention rules to the codebase-audit agent prompts.
Description check ✅ Passed The description comprehensively explains the purpose, rationale, and details of all five FP-prevention rules with concrete examples from the audit run.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new section to the codebase audit skill documentation detailing five mandatory rules for preventing false positives, including verification of existing fixes, reading helper docstrings, providing scope estimates, checking semantic equivalence, and analyzing caller distribution. The review feedback suggests using full workspace-relative paths for service wiring files to ensure agents can locate them reliably and recommends a minor grammatical correction in the rule examples for consistency.

- For "settings-not-wired" findings: trace the setting through the consuming
object's STARTUP wiring, not just import-graph references. A setting
consumed by code that never runs at startup (because the parent service is
never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For Rule R-A, using full workspace-relative paths for auto_wire.py, lifecycle_helpers.py, and app.py will help the audit agents locate these files more reliably. These are the primary locations where service instantiation and wiring occur.

Suggested change
never instantiated in `lifecycle_helpers.py` / `app.py`) is dead-on-arrival
never instantiated in src/synthorg/api/auto_wire.py, src/synthorg/api/lifecycle_helpers.py, or src/synthorg/api/app.py) is dead-on-arrival

test file and downgrade severity if the missing methods aren't called by
any test.

The 2026-05-03 run had agent flag AuthService's mixed sync/async API as
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example for Rule R-E is missing an article or an agent ID, which makes it inconsistent with the examples in Rules R-A through R-D. Adding "an agent" or the specific agent ID (likely Agent 149) improves clarity.

Suggested change
The 2026-05-03 run had agent flag AuthService's mixed sync/async API as
The 2026-05-03 run had an agent flag AuthService's mixed sync/async API as

@Aureliolo Aureliolo merged commit 8f33ad6 into main May 3, 2026
41 checks passed
@Aureliolo Aureliolo deleted the chore/codebase-audit-skill-fp-prevention branch May 3, 2026 17:59
Aureliolo pushed a commit that referenced this pull request May 4, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Release notes now include AI-generated Highlights and commit
subject/body in dev releases.

### What's new
- Trace added to detect ghost-wired settings for better linting
feedback.

### Under the hood
- Docker publish step now retries transient GHCR errors to improve
release stability.
- Codebase audit incorporated 21 mechanical fixes from recent reviews to
improve quality.
- Added false-positive prevention rules to codebase audit prompts to
reduce errors.
- Enhanced codebase audit skill based on insights from latest audits.
- Web component tightened async-leak detection and improved jsdom
Storage timer handling.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.9](v0.7.8...v0.7.9)
(2026-05-04)


### Features

* **ci:** promote AI Highlights to release body, add commit subject/body
to dev releases
([#1743](#1743))
([70bffe9](70bffe9))
* **lint:** bootstrap-wiring trace for ghost-wired settings
([#1742](#1742))
([6ddbb86](6ddbb86)),
closes [#1737](#1737)


### Bug Fixes

* **ci:** retry transient GHCR errors in docker publish + retag
([#1741](#1741))
([18e5349](18e5349))


### Maintenance

* 2026-05-03 Bucket A audit
([#1733](#1733)): 21
mechanical fixes
([#1744](#1744))
([334262b](334262b))
* add 5 FP-prevention rules to /codebase-audit agent prompts
([#1734](#1734))
([8f33ad6](8f33ad6))
* improve codebase-audit skill from 2026-05-03 lessons
([#1732](#1732))
([7281cd7](7281cd7))
* **web:** tighten async-leak ceiling, bypass jsdom Storage timer
dispatch ([#1728](#1728))
([de0a1e1](de0a1e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant