Skip to content

chore: improve codebase-audit skill from 2026-05-03 lessons#1732

Merged
Aureliolo merged 5 commits into
mainfrom
chore/codebase-audit-skill-improvements
May 3, 2026
Merged

chore: improve codebase-audit skill from 2026-05-03 lessons#1732
Aureliolo merged 5 commits into
mainfrom
chore/codebase-audit-skill-improvements

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Tightens the /codebase-audit skill based on lessons from the 2026-05-03 full run. Skill-only change (.claude/skills/codebase-audit/SKILL.md); no source code touched.

Changes

  1. Agent 09 (unwired-settings) prompt rewrite -- was the dominant FP source (38% rate, 39 of 39 false positives in last run). New prompt enumerates all six legitimate consumption patterns: bridge config methods, composed-config methods (get_coordination_config()), direct ConfigResolver.get_int/float/str/bool calls, Pydantic config-model embedding, subscriber pattern, bootstrap-only / read-only-post-init flags. Adds a verification requirement that forces three independent searches before flagging.
  2. Hard ban on agent scratch scripts -- in agent-prompt template + skill-level Rule Implement agent-to-agent messaging with channels and topics #10. Past run leaked 14+ helper .py files to project root, scripts/, and c:\tmp\ that triggered Pyright diagnostic floods in the main thread.
  3. New Phase 7: Cleanup (mandatory) -- sweeps known leak filenames + greps for any new .py at root. Documents the 14 files seen in 2026-05-03 run as the seed sweep list.
  4. Phase 0 robustness -- handles existing _audit/latest directory (Junction collision on Windows); explicitly bans Bash redirects (project's check_bash_no_write.sh hook blocks >, >>, 2>, &>); explicit no-cd reminder.
  5. Phase 1 Renovate filter -- drops bot-managed Dependency Dashboard from injected open-issues list per feedback_open_issues_exclude_renovate.md memory rule.
  6. Phase 4 mandatory file enumeration -- previous run hallucinated agent filenames (14-web-store-architecture-drift.md, 19-benchmark-regression.md) that never existed. INDEX-builder must enumerate actual files via Glob and grep counts.
  7. Phase 5 DEFAULT: deduped ISSUES.md -- one row per issue class (e.g. all 107 09-unwired-settings.md findings collapse into one "Unwired settings" row). User shouldn't have to wade through 300+ raw findings to make a decision.
  8. Validation batching strategy -- explicit heavy / mid / small file batching with target ~20-30 findings per validator.
  9. Lessons-from-prior-runs section -- dated subsections (starting with 2026-05-03) so each rule's rationale is traceable. Future runs add new dated subsections; older entries stay.
  10. Rules expansion -- Implement agent-to-agent messaging with channels and topics #10 no scratch scripts, Implement agent engine core with ExecutionLoop protocol integration (DESIGN_SPEC §3.1, §6.1, §6.5) #11 no Bash redirects, Implement hierarchical delegation (task flows down, results flow up) #12 surface FP-rate signals end-of-run.

Test plan

Skill is a docs-only file (no source code). Validated by re-reading the skill end-to-end and grep-checking that all four key edits landed (DO NOT write helper, Phase 7: Cleanup, Lessons from prior runs, agent 09 FP guard).

Review coverage

Quick mode: docs-only change auto-skipped agent review. Automated checks N/A (no Python, web, or Go files changed). Pre-commit hooks passed at commit time (em-dash, large-file, secrets, branch protection, etc.).

Closes

No linked issue (skill self-improvement from observed run).

- Fix agent 09 (unwired-settings) FP rate (was 38%, drove 39 of 39 FPs).
  Recognise direct ConfigResolver.get_int/float/str calls, composed-config
  methods (get_coordination_config()), Pydantic config-model embedding,
  subscriber pattern, and bootstrap-only flags as legitimate consumption.
- Hard-prohibit agent scratch scripts: ban writing helper *.py to project
  root, scripts/, c:/tmp/, /tmp/. Past run leaked 14+ scratch files that
  triggered Pyright diagnostic floods.
- Add Phase 7: Cleanup (mandatory) to sweep agent-leaked scripts.
- Phase 0: handle existing _audit/latest dir (Junction collision); ban
  Bash redirects (project hook blocks them); explicit no-cd reminder.
- Phase 1: filter Renovate Dependency Dashboard from injected issue list.
- Phase 4: MANDATORY enumerate actual finding files via Glob; previous
  run hallucinated agent filenames in INDEX.
- Phase 5 DEFAULT: produce deduped ISSUES.md (one row per issue class)
  before triage prompt. User shouldn't have to wade through 300+ raw
  findings to make a decision.
- Validation batching strategy made concrete (heavy/mid/small files).
- Lessons-from-prior-runs section so each rule's rationale is traceable.
- Rules expanded: no scratch scripts (#10), no Bash redirects (#11),
  surface FP-rate signals end-of-run (#12).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e4124ea0-d7d0-4780-a3cc-6daf766a7ecb

📥 Commits

Reviewing files that changed from the base of the PR and between 4a1d5e1 and 76d97ac.

📒 Files selected for processing (1)
  • .claude/skills/codebase-audit/SKILL.md
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
🪛 LanguageTool
.claude/skills/codebase-audit/SKILL.md

[style] ~3238-~3238: ‘make a decision’ might be wordy. Consider a shorter alternative.
Context: ...staring at 300+ raw findings and cannot make a decision. ### Phase 5 DEFAULT: Deduped Issue Li...

(EN_WORDINESS_PREMIUM_MAKE_A_DECISION)

🔇 Additional comments (1)
.claude/skills/codebase-audit/SKILL.md (1)

42-50: LGTM! Comprehensive and well-documented improvements.

The skill updates systematically address all issues from the 2026-05-03 audit run:

  • Phase 0 robustness: The setup command now handles Windows junctions, real directories, and symlinks correctly with the conditional removal pattern.
  • Agent 09 FP prevention: Expanding from one consumption pattern to six enumerated patterns with mandatory three-search verification directly targets the 38% false-positive rate observed in the prior run.
  • Batching strategy: The concrete Bash example and split guidance (heavy/mid/small files targeting ~20-30 findings per validator) are clear and mathematically correct.
  • INDEX.md enumeration: The mandatory file-enumeration requirement prevents the hallucinated-filename issue that occurred previously.
  • Phase 7 cleanup: The two-stage sweep (known leaks + time-based find) with robust prev_run logic will catch agent script leakage that contaminated the last run.
  • Lessons section: Documenting the specific 2026-05-03 issues with cross-references to the fixes ensures future maintainers understand the rationale.

All past review comments have been correctly addressed in the code, and the new rules (#10, #11, #12) formalize the constraints that prevent recurring problems.

Also applies to: 53-53, 68-68, 401-442, 3125-3143, 3180-3283, 3501-3530, 3548-3568


Walkthrough

Updated the codebase-audit skill spec across phases: Phase 0 exports RUN_DIR, creates "$RUN_DIR/findings", and relinks _audit/latest with conflict-aware handling and reinforced PreToolUse Bash constraints. Phase 1 now injects GitHub open-issue data (including author) and filters Renovate/bot noise. Agent prompts forbid writing helper/analysis Python scripts outside _audit/latest/findings/. Phase 3 adds grep-driven batching targeting ~20–30 findings per validator. Phase 4 requires INDEX.md from real filenames and grep counts. Phase 5 always emits _audit/latest/ISSUES.md and computes confirmed counts. Phase 7 mandates cleanup of leaked and newer .py scripts. Global rules disallow scratch scripts and Bash redirects and require surfacing high FP rates.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: tightening the codebase-audit skill based on lessons from the 2026-05-03 incident.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing all major improvements made to the SKILL.md file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 2/5 reviews remaining, refill in 24 minutes and 17 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Around line 3503-3504: The cleanup command in the snippet contains the
forbidden stderr redirect symbol "2>/dev/null"; remove that redirect and rely on
the existing "|| true" to swallow errors (or explicitly handle logging to an
allowed destination), so update the rm -f ... command to omit "2>/dev/null"
wherever it appears; specifically change the line containing the rm invocation
that includes "2>/dev/null" so it no longer uses stderr redirection.
- Line 42: The one-liner that creates RUN_DIR then tries to remove and relink
_audit/latest fails when _audit/latest is a directory/junction; update the
cleanup step to robustly remove whatever _audit/latest is before creating the
new symlink by replacing the plain "rm -f _audit/latest" usage with a safe
existence check and force recursive removal (e.g., test for existence of
_audit/latest and run a recursive remove) so the relink step always succeeds;
reference the RUN_DIR assignment and the `_audit/latest` target in the same
one-liner.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 36ad0deb-e5aa-4283-a3dc-32a977efbe74

📥 Commits

Reviewing files that changed from the base of the PR and between de0a1e1 and f6b72b6.

📒 Files selected for processing (1)
  • .claude/skills/codebase-audit/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (actions)
🧰 Additional context used
🪛 LanguageTool
.claude/skills/codebase-audit/SKILL.md

[style] ~3231-~3231: ‘make a decision’ might be wordy. Consider a shorter alternative.
Context: ...staring at 300+ raw findings and cannot make a decision. ### Phase 5 DEFAULT: Deduped Issue Li...

(EN_WORDINESS_PREMIUM_MAKE_A_DECISION)


[uncategorized] ~3496-~3496: Do not mix variants of the same word (‘finalize’ and ‘finalise’) within a single text.
Context: ...ns after triage / report-only output is finalized. Sweep agent-leaked scratch files from ...

(EN_WORD_COHERENCY)

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the codebase audit skill by implementing a mandatory cleanup phase, prohibiting scratch scripts and Bash redirects, and refining the 'unwired-settings' agent to minimize false positives. It also introduces a structured batching strategy for validation and a mandatory deduped issue list for triage. Feedback suggests improving shell command robustness by quoting variables, removing a prohibited stderr redirect in the cleanup script, and correcting the search logic to include the scripts directory as intended.

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
RUN_DIR="_audit/runs/$(date +%Y-%m-%d-%H%M%S)"
mkdir -p "$RUN_DIR/findings"
ln -sfn "runs/$(basename "$RUN_DIR")" _audit/latest
RUN_DIR="_audit/runs/$(date +%Y-%m-%d-%H%M%S)" && mkdir -p "$RUN_DIR/findings" && rm -f _audit/latest && ln -sfn "runs/$(basename $RUN_DIR)" _audit/latest && echo "RUN_DIR=$RUN_DIR"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The setup command has a regression where $RUN_DIR is no longer quoted inside the basename call. While the generated timestamp is unlikely to contain spaces, it is better practice to maintain quoting for robustness and consistency with the previous version of this skill.

Suggested change
RUN_DIR="_audit/runs/$(date +%Y-%m-%d-%H%M%S)" && mkdir -p "$RUN_DIR/findings" && rm -f _audit/latest && ln -sfn "runs/$(basename $RUN_DIR)" _audit/latest && echo "RUN_DIR=$RUN_DIR"
RUN_DIR="_audit/runs/$(date +%Y-%m-%d-%H%M%S)" && mkdir -p "$RUN_DIR/findings" && rm -f _audit/latest && ln -sfn "runs/$(basename "$RUN_DIR")" _audit/latest && echo "RUN_DIR=$RUN_DIR"

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated

### Batching strategy (concrete)

Run `Bash for f in _audit/latest/findings/*.md; do count=$(grep -cE "^### (critical|high|medium|low|info)" "$f"); echo "$count $(basename $f)"; done | sort -rn` to produce a sorted list of (count, filename). Then build batches:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The batching command uses $(basename $f) without quotes. This is unsafe if any filenames in the findings directory contain spaces. Although the agents are instructed to use specific filenames, quoting variables in shell commands is a best practice for robustness.

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Run this Bash sweep:

```bash
rm -f find_missing_logging.py find_missing_logging_filtered.py parse_audit.py validate_config_examples.py audit_diff.py audit_parity.py check_docs.py check_rate_limits.py circular_dep_analyzer.py check_protocols.py debug_scanner.py detailed_check.py final_audit.py find_unwired.py test_regex.py validate_configs.py verify_final.py verify_protocols.py 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup command uses a stderr redirect (2>/dev/null), which violates Rule 11 (line 3536) and the setup notes in Phase 0 (line 47) that explicitly ban all Bash redirects. Per the instructions in Rule 11, you should let stderr surface or rely on rm -f being silent for non-existent files, combined with || true to ensure the command chain continues.

Suggested change
rm -f find_missing_logging.py find_missing_logging_filtered.py parse_audit.py validate_config_examples.py audit_diff.py audit_parity.py check_docs.py check_rate_limits.py circular_dep_analyzer.py check_protocols.py debug_scanner.py detailed_check.py final_audit.py find_unwired.py test_regex.py validate_configs.py verify_final.py verify_protocols.py 2>/dev/null || true
rm -f find_missing_logging.py find_missing_logging_filtered.py parse_audit.py validate_config_examples.py audit_diff.py audit_parity.py check_docs.py check_rate_limits.py circular_dep_analyzer.py check_protocols.py debug_scanner.py detailed_check.py final_audit.py find_unwired.py test_regex.py validate_configs.py verify_final.py verify_protocols.py || true

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Then list and remove anything else suspicious:

```bash
find . -maxdepth 2 -name "*.py" -newer _audit/runs/$(ls -1 _audit/runs/ | tail -2 | head -1)/findings -not -path "./src/*" -not -path "./tests/*" -not -path "./web/*" -not -path "./cli/*" -not -path "./scripts/*"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The find command excludes the ./scripts/* path, which contradicts the subsequent notes (line 3512) stating that files inside scripts/ should be listed and the user prompted before removal. Since the description (line 3498) specifically mentions that scripts leaked into scripts/ during previous runs, this exclusion will prevent the cleanup phase from identifying those leaks.

Suggested change
find . -maxdepth 2 -name "*.py" -newer _audit/runs/$(ls -1 _audit/runs/ | tail -2 | head -1)/findings -not -path "./src/*" -not -path "./tests/*" -not -path "./web/*" -not -path "./cli/*" -not -path "./scripts/*"
find . -maxdepth 2 -name "*.py" -newer _audit/runs/$(ls -1 _audit/runs/ | tail -2 | head -1)/findings -not -path "./src/*" -not -path "./tests/*" -not -path "./web/*" -not -path "./cli/*"

- Phase 0 setup: handle Junction/real-directory case for _audit/latest
  via inline conditional (rm -rf if directory but not symlink, else rm -f).
  CodeRabbit flagged plain rm -f would abort the chain on directory targets.
- Phase 0 setup: quote $RUN_DIR in basename for defensive habit. Gemini.
- Phase 7 cleanup: drop the stderr-discard redirect (violates new Rule #11
  banning redirects). rm -f is silent on missing paths, so the redirect was
  redundant. CodeRabbit + Gemini.
- Phase 7 cleanup: drop -not -path './scripts/*' from find sweep so the
  scripts/audit_pydantic_models leakage detected last run actually surfaces.
  Gemini.
- Phase 7 batching strategy: quote $f in basename for filename safety. Gemini.

Pre-reviewed by 2 external agents (CodeRabbit, Gemini), all 4 findings
addressed.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.claude/skills/codebase-audit/SKILL.md (1)

2-2: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep Claude/OpenCode command metadata in sync (152 vs 155 agents).

This file now advertises 155 agents, but .opencode/commands/codebase-audit.md still says 152. That breaks the documented parity contract and can mislead users depending on entrypoint.

Suggested parity update
--- a/.opencode/commands/codebase-audit.md
+++ b/.opencode/commands/codebase-audit.md
-description: "Full codebase audit: launches 152 specialized agents to find issues across Python/React/Go/docs/website, writes findings to _audit/latest/findings/, then triages with user"
+description: "Full codebase audit: launches 155 specialized agents to find issues across Python/React/Go/docs/website, writes findings to _audit/latest/findings/, then triages with user"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/codebase-audit/SKILL.md at line 2, The SKILL description
advertises "Full codebase audit: launches 155 specialized agents..." but the
command metadata still lists 152; update the command metadata value that
currently reads "152" to "155" so the numeric agent count matches the
description string, and then verify both the description string and the command
metadata are identical (155) across the two files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Line 3127: The example command in the SKILL.md snippet incorrectly begins with
the literal token "Bash for f in ..." which is shell-invalid; update the example
so it either starts with a standard shell loop keyword (replace the leading
"Bash " with nothing so it begins "for f in ...") or show it executed via an
explicit shell invocation (e.g., prefix with a proper "bash -lc" invocation) and
ensure the rest of the pipeline (grep -cE, echo, sort -rn) and variable usage
(f, count) remain unchanged; modify the example line containing "Bash for f in
_audit/latest/findings/*.md; do count=$(...)" accordingly.
- Line 68: Update the gh query to include author metadata so Renovate-author
filtering can work: change the command string "gh issue list --state open
--limit 200 --json number,title,labels" to include author (e.g., --json
number,title,labels,author) and then apply the filter against author.login ===
"app/renovate" as described in feedback_open_issues_exclude_renovate.md; locate
usages of the literal "gh issue list --state open --limit 200 --json
number,title,labels" (and any helper that parses its JSON) and ensure they read
author.login before dropping issues authored by "app/renovate".
- Around line 3129-3134: The guidance is inconsistent: update the "Heavy files
(>25 findings)" bullet and its example so the example matches the stated chunk
size (~25) and target (~20-30). Locate the line starting with "Heavy files (>25
findings): dedicate 1 batch per ~25-finding chunk." and replace the example "a
107-finding file splits into batches of 35 + 36 + 36." with a correctly summed
split that uses ~25-finding chunks (for example "a 107-finding file splits into
batches of 25 + 25 + 25 + 32.") so the numbers align with the "~25" chunk
guidance and the overall "~20-30" target.
- Line 3556: The lesson note incorrectly states that `rm -f _audit/latest` fixes
symlink collisions; update the note to explain and show the robust,
platform-aware approach: when relinking `_audit/latest` handle symlinks, files,
and directory junctions separately (e.g., check if `_audit/latest` is a symlink
and unlink it, if it's a directory/junction remove it with rmdir/rm -rf only
after verifying it's not a real audit dir) and mention that `ln -sfn` alone can
fail on Windows junctions—reference the existing `ln -sfn` and `rm -f
_audit/latest` phrasing and the "Phase 0 setup command" so readers replace the
old single `rm -f` advice with the conditional removal pattern appropriate for
their OS.

---

Outside diff comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Line 2: The SKILL description advertises "Full codebase audit: launches 155
specialized agents..." but the command metadata still lists 152; update the
command metadata value that currently reads "152" to "155" so the numeric agent
count matches the description string, and then verify both the description
string and the command metadata are identical (155) across the two files.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 379acbe4-f492-4f2f-a062-509e336287f8

📥 Commits

Reviewing files that changed from the base of the PR and between f6b72b6 and 8585a58.

📒 Files selected for processing (1)
  • .claude/skills/codebase-audit/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
🪛 LanguageTool
.claude/skills/codebase-audit/SKILL.md

[style] ~3232-~3232: ‘make a decision’ might be wordy. Consider a shorter alternative.
Context: ...staring at 300+ raw findings and cannot make a decision. ### Phase 5 DEFAULT: Deduped Issue Li...

(EN_WORDINESS_PREMIUM_MAKE_A_DECISION)


[uncategorized] ~3497-~3497: Do not mix variants of the same word (‘finalize’ and ‘finalise’) within a single text.
Context: ...ns after triage / report-only output is finalized. Sweep agent-leaked scratch files from ...

(EN_WORD_COHERENCY)

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
- Phase 1 issue list: include author in --json fields so the Renovate
  filter (drops author.login == 'app/renovate') has metadata to match on.
  Without --json author the filter is a no-op. CodeRabbit.
- Phase 7 batching strategy: drop the literal leading 'Bash' token from
  the example command (was 'Run `Bash for f in ...`'); shell-invalid as
  written. Reformatted as a proper fenced bash block. CodeRabbit.
- Phase 7 batching strategy: align heavy-files example with stated
  ~25-finding chunk size. Was '107 splits into 35+36+36'; now '107 splits
  into 25+25+25+32' (4 batches). CodeRabbit.
- Lessons section: rewrite the symlink-collision note to describe the
  actual conditional fix landed in round-1 (handles all 3 states: symlink,
  real-dir/Junction, missing) instead of the obsolete 'rm -f alone' claim.
  CodeRabbit.
- .opencode mirror: bump '152' to '155' in description and adapter prose
  to match SKILL.md. CodeRabbit.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Line 3503: The sentence "Runs after triage / report-only output is finalized.
Sweep agent-leaked scratch files from the working tree." uses US spelling;
update it to British English by changing "finalized" to "finalised" so it reads
"Runs after triage / report-only output is finalised..." and ensure this
spelling aligns with the project's internationalisation convention (see the
string containing "Runs after triage / report-only output is finalized").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7953b614-4e50-4b8e-8d98-c4ac9ca353c2

📥 Commits

Reviewing files that changed from the base of the PR and between 8585a58 and dd746be.

📒 Files selected for processing (2)
  • .claude/skills/codebase-audit/SKILL.md
  • .opencode/commands/codebase-audit.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
🪛 LanguageTool
.claude/skills/codebase-audit/SKILL.md

[style] ~3238-~3238: ‘make a decision’ might be wordy. Consider a shorter alternative.
Context: ...staring at 300+ raw findings and cannot make a decision. ### Phase 5 DEFAULT: Deduped Issue Li...

(EN_WORDINESS_PREMIUM_MAKE_A_DECISION)


[uncategorized] ~3503-~3503: Do not mix variants of the same word (‘finalize’ and ‘finalise’) within a single text.
Context: ...ns after triage / report-only output is finalized. Sweep agent-leaked scratch files from ...

(EN_WORD_COHERENCY)

🔇 Additional comments (2)
.claude/skills/codebase-audit/SKILL.md (1)

42-42: Past review feedback genuinely addressed.

Verified that all 6 items flagged in previous reviews are now correctly fixed in the current code:

  • Line 42: Conditional _audit/latest removal handles symlink vs directory/junction
  • Line 68: gh issue list includes --json ...author for Renovate filtering
  • Line 3130: Bash loop starts with for f in (no invalid leading token)
  • Line 3135: Batch example math correct (25+25+25+32 = 107)
  • Line 3510: Cleanup command uses || true, no stderr redirect
  • Line 3562: Lesson note documents the full conditional-removal approach

Also applies to: 68-68, 3130-3130, 3135-3135, 3510-3510, 3562-3562

.opencode/commands/codebase-audit.md (1)

2-2: LGTM: Agent count updated consistently.

The adapter correctly reflects the skill's updated agent count (155) in both the description and the spawning instructions. The change aligns with the main skill file's expansion from 152 to 155 agents.

Also applies to: 11-11

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
Comment thread .claude/skills/codebase-audit/SKILL.md
- Phase 7: 'finalized' (US spelling) becomes 'finalised' (UK). Project
  ships in British English per docs/design/internationalization.md and
  the internal Agent 37 directive. CodeRabbit + LanguageTool flagged.
- Phase 7: harden the find -newer command for first/second-run cases.
  Was 'tail -2 | head -1' which compares against itself on first run
  (no findings) and is fragile in general. Now 'ls -1t | sed -n 2p'
  picks the second-newest run by mtime, the '[ -n $prev_run ] &&'
  guard short-circuits the first-run case where no previous run
  exists, and the path is now quoted. CodeRabbit.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.claude/skills/codebase-audit/SKILL.md (1)

3518-3523: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

First-run cleanup probe can still return non-zero and fail the phase.

When there is no previous run, [ -n "$prev_run" ] returns exit code 1; in Bash-tool execution this is treated as failure even though “no previous run” should be a valid no-op path.

Suggested robust command form
-prev_run=$(ls -1t _audit/runs/ | sed -n '2p') && [ -n "$prev_run" ] && find . -maxdepth 2 -name "*.py" -newer "_audit/runs/$prev_run/findings" -not -path "./src/*" -not -path "./tests/*" -not -path "./web/*" -not -path "./cli/*"
+prev_run=$(ls -1t _audit/runs/ | sed -n '2p'); if [ -n "$prev_run" ]; then find . -maxdepth 2 -name "*.py" -newer "_audit/runs/$prev_run/findings" -not -path "./src/*" -not -path "./tests/*" -not -path "./web/*" -not -path "./cli/*"; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/codebase-audit/SKILL.md around lines 3518 - 3523, The current
one-liner using prev_run=$(...) && [ -n "$prev_run" ] && find ... can return a
non-zero exit when there is no previous run; change the logic to explicitly
handle the "no previous run" case so the step returns success: after computing
prev_run, use an if/then check (if [ -n "$prev_run" ]; then run the find command
referencing "_audit/runs/$prev_run/findings"; fi) or ensure the entire probe
always exits zero by appending a safe fallback (e.g., run the find only when
prev_run is non-empty or wrap the test in a construct that does not propagate a
non-zero exit). Reference the prev_run variable and the find invocation to
locate where to apply this defensive check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Line 53: Update the `.gitignore` check to match entries with optional
leading/trailing slashes so entries like `/_audit/` aren't missed: replace the
current grep pattern that only anchors `^_audit` with a regex that allows an
optional leading slash and either a trailing slash or end-of-line (i.e., match
`/_audit/`, `_audit/`, or `_audit`), and update the doc line that runs the grep
accordingly; refer to the `_audit` token and the `.gitignore` check in the
SKILL.md content when making the change.

---

Duplicate comments:
In @.claude/skills/codebase-audit/SKILL.md:
- Around line 3518-3523: The current one-liner using prev_run=$(...) && [ -n
"$prev_run" ] && find ... can return a non-zero exit when there is no previous
run; change the logic to explicitly handle the "no previous run" case so the
step returns success: after computing prev_run, use an if/then check (if [ -n
"$prev_run" ]; then run the find command referencing
"_audit/runs/$prev_run/findings"; fi) or ensure the entire probe always exits
zero by appending a safe fallback (e.g., run the find only when prev_run is
non-empty or wrap the test in a construct that does not propagate a non-zero
exit). Reference the prev_run variable and the find invocation to locate where
to apply this defensive check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 039dfad0-1a14-4acc-bf27-a010aaa8d237

📥 Commits

Reviewing files that changed from the base of the PR and between dd746be and 4a1d5e1.

📒 Files selected for processing (1)
  • .claude/skills/codebase-audit/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
🧰 Additional context used
🪛 LanguageTool
.claude/skills/codebase-audit/SKILL.md

[style] ~3238-~3238: ‘make a decision’ might be wordy. Consider a shorter alternative.
Context: ...staring at 300+ raw findings and cannot make a decision. ### Phase 5 DEFAULT: Deduped Issue Li...

(EN_WORDINESS_PREMIUM_MAKE_A_DECISION)

🔇 Additional comments (1)
.claude/skills/codebase-audit/SKILL.md (1)

401-442: Strong FP guardrail rewrite for Agent 09.

The six consumption-pattern checklist plus mandatory multi-search verification is a solid improvement and should materially reduce false positives in unwired-settings.

Comment thread .claude/skills/codebase-audit/SKILL.md Outdated
- Phase 0 .gitignore check: broaden grep pattern from '^_audit'
  (which misses '/_audit/' style entries) to a regex that matches
  all four common forms ('_audit', '_audit/', '/_audit', '/_audit/').
  Prevents false-negative duplicate-add actions. CodeRabbit.
@Aureliolo Aureliolo merged commit 7281cd7 into main May 3, 2026
41 checks passed
@Aureliolo Aureliolo deleted the chore/codebase-audit-skill-improvements branch May 3, 2026 16:50
Aureliolo added a commit that referenced this pull request May 3, 2026
)

## Summary

Adds Rules **R-A through R-E** to the `/codebase-audit` agent-prompt
template. Each is grounded in a specific FP category observed in the
2026-05-03 audit run (12 categories tracked during the post-run
walk-through). Skill-only change; no source code touched.

## Why

After PR #1732 (round-1 fixes), four agents still produced FPs at
significant rates because their prompts shared the same underlying gaps:

- Agents flag patterns without verifying the surrounding state (existing
fixes, docstring carve-outs, caller distribution, semantic equivalence).
- Convention-scope findings cite N cases without estimating total
population, so the user can't choose between surgical fix and convention
rollout.
- Mock-drift / API-duplication findings calibrate severity without
sampling actual usage.

These five rules are mandatory for every audit agent.

## The 5 rules

### R-A: Verify the proposed fix isn't already in place

Catches:
- Memory-leak findings whose cleanup hook IS already registered (agent
97 Site 2 was a FP because `cancelPendingPersist` is in
`test-setup.tsx:272`).
- NotImplementedError-in-mixin findings overridden in concrete subclass
(agent 18 had 2/3 sample FP rate).
- **Ghost-wired settings (most severe class)**: agent 09 flagged 13
settings dead; 11 of 13 were consumed by living machinery whose owning
service was simply never instantiated at boot. Import-graph trace missed
this entirely. The new rule requires tracing through
`lifecycle_helpers.py` / `app.py` startup wiring.

### R-B: Read existing helper docstrings for design carve-outs

Catches: agent 138 flagged 5 inline retry loops as needing
centralisation, but `GeneralRetryHandler`'s module docstring explicitly
carved out 4 of the 5 (semantic self-correction, contention,
sync-thread). Reading the docstring catches design intent.

### R-C: Scope estimate alongside cited cases

Catches: agent 34 cited 12 plain-Exception classes; actual project-wide
population is ~80-120 across 19 modules. User's
surgical-vs-convention-rollout decision depends on knowing the gap. New
rule: agents emit "12 cited; sample suggests ~80-120 across N modules".

### R-D: Semantic equivalence before flagging duplicates

Catches: agent 110 flagged a pairwise-compare cluster as duplicate when
one used field-equality and the other used text-similarity (identical
iteration, divergent inner predicate). Agent 146 conflated intentional
bootstrap multi-surface settings with accidental duplication (4 of 8
flagged settings were intentional).

### R-E: Caller-context distribution for API duplication findings

Catches: AuthService mixed sync/async API was high-severity in raw
findings; actual distribution was 100% production-async +
test-fixture-sync, which made the choice obvious. New rule: agents emit
caller distribution before recommending the fix shape.

## Test plan

Skill is docs-only. Validated by:
- Reading the 5 new rules end-to-end for clarity + grounding.
- Each rule cites the concrete 2026-05-03 example so future agents can
recognise the pattern.

## Out of scope

A separate master tracking issue (filed alongside this PR) captures the
remaining session learnings: 5 follow-up issues for larger workstreams
(config layering, error-handling convention rollout, no-hardcoded-values
rule, bootstrap-wiring lint, scoring research), Group 1+2+3 decisions
with research-backed rationale, and the meta-process plan (scaffolding
templates, pre-PR mini-audit, worktree do-not-introduce,
convention-must-ship-its-gate).

## Closes

No linked issue (skill self-improvement).
Aureliolo pushed a commit that referenced this pull request May 4, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Release notes now include AI-generated Highlights and commit
subject/body in dev releases.

### What's new
- Trace added to detect ghost-wired settings for better linting
feedback.

### Under the hood
- Docker publish step now retries transient GHCR errors to improve
release stability.
- Codebase audit incorporated 21 mechanical fixes from recent reviews to
improve quality.
- Added false-positive prevention rules to codebase audit prompts to
reduce errors.
- Enhanced codebase audit skill based on insights from latest audits.
- Web component tightened async-leak detection and improved jsdom
Storage timer handling.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.9](v0.7.8...v0.7.9)
(2026-05-04)


### Features

* **ci:** promote AI Highlights to release body, add commit subject/body
to dev releases
([#1743](#1743))
([70bffe9](70bffe9))
* **lint:** bootstrap-wiring trace for ghost-wired settings
([#1742](#1742))
([6ddbb86](6ddbb86)),
closes [#1737](#1737)


### Bug Fixes

* **ci:** retry transient GHCR errors in docker publish + retag
([#1741](#1741))
([18e5349](18e5349))


### Maintenance

* 2026-05-03 Bucket A audit
([#1733](#1733)): 21
mechanical fixes
([#1744](#1744))
([334262b](334262b))
* add 5 FP-prevention rules to /codebase-audit agent prompts
([#1734](#1734))
([8f33ad6](8f33ad6))
* improve codebase-audit skill from 2026-05-03 lessons
([#1732](#1732))
([7281cd7](7281cd7))
* **web:** tighten async-leak ceiling, bypass jsdom Storage timer
dispatch ([#1728](#1728))
([de0a1e1](de0a1e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant