-
Notifications
You must be signed in to change notification settings - Fork 1
chore: add /codebase-audit skill for deep parallel codebase auditing #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
90c61dd
e16565d
33538ec
e80e60d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,395 @@ | ||
| --- | ||
| description: "Deep codebase audit: launches specialized parallel agents to find issues, validates findings, groups into work packages, and creates GitHub issues" | ||
| argument-hint: "<scope: full | src/ | web/ | cli/ | docs/ | .github/> [--report-only] [--skip-issues]" | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| allowed-tools: ["Agent", "Bash", "Read", "Glob", "Grep", "Edit", "Write", "WebFetch", "WebSearch", "AskUserQuestion", "mcp__github__issue_write", "mcp__github__issue_read", "mcp__github__list_issues", "mcp__github__search_issues"] | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| --- | ||
|
|
||
| # /codebase-audit -- Deep Codebase Audit | ||
|
|
||
| Launch a swarm of specialized agents to find issues across the entire codebase (or a targeted scope), validate all findings against actual code, group into developer-friendly work packages, and optionally create GitHub issues. | ||
|
|
||
| ## Key Principles (from battle-tested sessions) | ||
|
|
||
| 1. **Never present unvalidated findings** -- validation is mandatory before ANY output to the user | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| 2. **Research architecture BEFORE auditing** -- agents that don't understand the system produce false positives | ||
| 3. **Skepticism is required** -- "100% clean" results are suspicious and trigger deeper investigation | ||
| 4. **Group by code proximity, NOT severity** -- work packages are what a developer would naturally fix together | ||
| 5. **No meta/tracking issues** -- every finding is a real issue or part of a real work package | ||
| 6. **Existing issue dedup happens TWICE** -- once in agent prompts, once after validation | ||
| 7. **Fix everything valid** -- no deferring, no "out of scope", no "future work" | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 0: Parse Arguments & Determine Scope | ||
|
|
||
| Parse the user's argument to determine audit scope: | ||
|
|
||
| | Argument | Scope | Agent Categories | | ||
| |----------|-------|------------------| | ||
| | `full` (default) | Entire codebase | All categories | | ||
| | `src/` or `src/synthorg/` | Python backend only | Python-focused categories | | ||
| | `web/` | Vue dashboard only | Frontend categories | | ||
| | `cli/` | Go CLI only | Go categories | | ||
| | `docs/` or `site/` | Documentation/site | Docs/content categories | | ||
| | `.github/` or `ci` | CI/CD only | CI/workflow categories | | ||
| | `--report-only` | Any scope | Skip issue creation, report only | | ||
| | `--skip-issues` | Any scope | Same as --report-only | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| If no argument given, default to `full`. | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 1: Gather Context | ||
|
|
||
| **This phase is CRITICAL. Agents without context produce false positives.** | ||
|
|
||
| ### Step 1a: Fetch existing GitHub issues | ||
|
|
||
| ```bash | ||
| gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels | ||
| ``` | ||
|
|
||
| Parse into a compact reference list: `#N: title [labels]`. This list is passed to EVERY audit agent. | ||
|
|
||
| ### Step 1b: Research project architecture | ||
|
|
||
| Read key architectural files to build context that agents need. At minimum: | ||
|
|
||
| 1. **CLAUDE.md** (already in context) -- project conventions, code standards, testing rules | ||
| 2. **Observability stack** -- read `src/synthorg/observability/__init__.py`, `_logger.py`, `sinks.py`, `setup.py`, `correlation.py` to understand logging architecture, sink routing, correlation ID system | ||
| 3. **DI/wiring** -- read `src/synthorg/api/auto_wire.py` and `src/synthorg/api/lifecycle.py` to understand service initialization | ||
| 4. **Testing setup** -- read `conftest.py` files, `pyproject.toml` test config section | ||
| 5. **Design spec pointer** -- read `docs/DESIGN_SPEC.md` to know which spec pages exist | ||
|
|
||
| Produce a **Architecture Brief** (200-400 words) summarizing: | ||
| - Logging: how it works, sink routing rules, correlation IDs | ||
| - DI: how services are wired, lifecycle phases | ||
| - Testing: markers, parallelism, async mode, coverage requirements | ||
| - Key conventions: immutability, error handling, vendor-agnostic naming | ||
|
|
||
| This brief is injected into every agent's prompt. | ||
|
|
||
| ### Step 1c: Identify scope-specific files | ||
|
|
||
| If scope is targeted (not `full`), glob the target directory to understand what's there. | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 2: Select & Launch Audit Agents | ||
|
|
||
| ### Agent Roster | ||
|
|
||
| Select agents based on scope. Each agent searches for ONE type of issue only. | ||
|
|
||
| #### Python Backend Agents (scope includes `src/`) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `missing-logging` | Business logic modules without `get_logger`, error paths that don't log before raising, state transitions without INFO logging, missing DEBUG at decision points | | ||
| | `event-constants` | Log calls using raw strings instead of event constants from `observability/events/` | | ||
| | `silent-errors` | Bare `except:`, `except Exception: pass`, catch blocks that swallow without logging | | ||
| | `test-coverage` | Public modules with no corresponding test file, empty test files | | ||
| | `flaky-tests` | Unmocked time, real asyncio.sleep in tests, timing-dependent assertions, skipped tests | | ||
| | `wiring-lifecycle` | Incorrectly wired services, missing DI, lifecycle gaps, protocol implementations incomplete | | ||
| | `security-gaps` | Hardcoded secrets, missing auth guards, injection vectors, SSRF, XSS | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Consider either:
|
||
| | `dead-code` | Unreachable functions, unused imports, orphaned modules | | ||
| | `todo-fixme` | Unresolved TODOs that should be tracked as issues | | ||
| | `spec-drift` | Implementation diverging from design spec behavior | | ||
| | `api-consistency` | REST endpoint issues: wrong status codes, missing validation, inconsistent patterns | | ||
| | `async-patterns` | Bare create_task, missing await, blocking in async, race conditions | | ||
| | `immutability` | Mutable defaults, in-place mutation of frozen models, missing deepcopy | | ||
| | `missing-validation` | System boundary inputs without validation (API params, config loading, external data) | | ||
| | `type-hints` | Missing return types, bare Any, missing NotBlankStr on identifiers | | ||
| | `vendor-names` | Real vendor names used where generic names should be (per CLAUDE.md rules) | | ||
| | `observability-gaps` | Sink routing gaps, correlation ID propagation drops, missing event constant modules | | ||
|
|
||
| #### Frontend Agents (scope includes `web/`) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `vue-dashboard` | Broken API refs, missing error handling, console.log in prod, TypeScript gaps, a11y | | ||
|
|
||
| #### Go CLI Agents (scope includes `cli/`) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `go-cli` | Ignored errors, resource leaks, missing error wrapping, cross-platform issues | | ||
|
|
||
| #### Infrastructure Agents (scope includes `.github/` or `docker/`) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `docker-infra` | Dockerfile issues, compose config, port security, healthchecks | | ||
| | `ci-workflows` | Missing timeouts, script injection, permissions gaps, silent failures | | ||
|
|
||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| #### Documentation Agents (scope includes `docs/` or `site/`) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `docs-consistency` | Broken links, outdated info, wrong commands, inconsistent terminology | | ||
| | `landing-site` | SEO gaps, broken links, a11y issues, missing error pages | | ||
|
|
||
| #### Cross-Cutting Agents (always included) | ||
|
|
||
| | Agent | What It Searches For | | ||
| |-------|---------------------| | ||
| | `dependency-issues` | Unused deps, missing deps, version conflicts across all package managers | | ||
| | `docstring-gaps` | Public classes/functions missing Google-style docstrings | | ||
|
|
||
| ### Agent Prompt Template | ||
|
|
||
| Every agent receives this structure: | ||
|
|
||
| ``` | ||
|
coderabbitai[bot] marked this conversation as resolved.
Outdated
|
||
| ## Task | ||
| You are searching the codebase for ONE specific type of issue: {ISSUE_TYPE}. | ||
|
|
||
| ## Architecture Context | ||
| {ARCHITECTURE_BRIEF from Phase 1b} | ||
|
|
||
| ## Existing Open Issues (do NOT report these) | ||
| {ISSUE_LIST from Phase 1a} | ||
|
|
||
| ## Scope | ||
| Search: {SCOPE_DIRECTORIES} | ||
|
|
||
| ## Rules | ||
| 1. For each finding, report: file path, line number, what's wrong, operational impact | ||
| 2. Cross-reference against the existing issues list -- only report what's NOT already tracked | ||
| 3. BE SKEPTICAL of your own findings -- verify each one by reading the actual code | ||
| 4. If you find ZERO issues, state that clearly but also explain what you checked | ||
| 5. Do NOT flag things that are intentional design decisions (check comments, docstrings) | ||
| 6. Rate each finding: CONFIRMED (verified in code) or LIKELY (needs validation) | ||
|
|
||
| ## What to Search For | ||
| {CATEGORY-SPECIFIC INSTRUCTIONS} | ||
| ``` | ||
|
|
||
| ### Launch | ||
|
|
||
| Launch ALL selected agents in parallel using the Agent tool with `run_in_background: true`. Give each a descriptive `name` for tracking. | ||
|
|
||
| Track agent count and report to user: "Launched N audit agents in parallel. Waiting for results..." | ||
|
|
||
|
Comment on lines
+171
to
+174
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bound parallelism instead of launching all agents at once. Line 172’s “Launch ALL selected agents in parallel” can overload local resources and hit GitHub/API rate limits, causing partial or failed audits. Add a bounded worker pool (for example, max 6-10 concurrent agents) with queueing and retry/backoff. Suggested update-Launch ALL selected agents in parallel using the Agent tool with `run_in_background: true`. Give each a descriptive `name` for tracking.
+Launch selected agents using bounded parallelism (worker pool) with `run_in_background: true`.
+Recommended: max 6-10 concurrent agents, queue the rest, and apply retry/backoff on transient failures.
+Give each agent a descriptive `name` for tracking.🤖 Prompt for AI Agents |
||
| --- | ||
|
|
||
| ## Phase 3: Collect & Deduplicate | ||
|
|
||
| As agents complete, collect their findings. Once ALL agents have reported: | ||
|
|
||
| ### Step 3a: Check for suspicious clean results | ||
|
|
||
| If any agent reported zero findings, flag it: | ||
| - "Agent `{name}` found zero issues. This may be accurate or the agent may have been too shallow." | ||
| - These categories are candidates for Phase 5 (deep dive). | ||
|
|
||
| ### Step 3b: Merge all findings into a single list | ||
|
|
||
| Combine findings from all agents into one flat list with columns: | ||
| - Source agent | ||
| - File path : line number | ||
| - Category | ||
| - Description | ||
| - Agent's self-assessed confidence (CONFIRMED / LIKELY) | ||
|
|
||
| ### Step 3c: Deduplicate | ||
|
|
||
| - Multiple agents may flag the same line/issue (e.g., security + validation both flag missing input checks) | ||
| - Merge duplicates, keep the most detailed description | ||
| - Remove findings that match existing open GitHub issues (by file path + description similarity) | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 4: Validate Findings | ||
|
|
||
| **MANDATORY. Never skip this phase.** | ||
|
|
||
| Launch validation agents in parallel. Each validation agent gets a batch of 8-12 findings and is instructed to: | ||
|
|
||
| 1. Read the actual source file at the reported line number | ||
| 2. Verify the issue exists as described | ||
| 3. Check if the "issue" is actually intentional (read comments, docstrings, related code) | ||
| 4. Check if CI/build/tests handle it in a way the audit agent missed | ||
| 5. Classify each finding: | ||
| - **CONFIRMED** -- verified in code, real issue | ||
| - **LIKELY CONFIRMED** -- code suggests the issue but edge case unclear | ||
| - **LIKELY FALSE** -- probably not a real issue (explain why) | ||
| - **FALSE POSITIVE** -- definitely not an issue (explain why) | ||
| 6. For intentional patterns (e.g., graceful shutdown error swallowing), mark as "CONFIRMED but INTENTIONAL" -- these are excluded from work packages | ||
|
|
||
| ### Validation Agent Prompt Template | ||
|
|
||
| ``` | ||
| Validate these audit findings by reading the ACTUAL SOURCE CODE. | ||
| For each finding, determine: CONFIRMED, LIKELY CONFIRMED, LIKELY FALSE, or FALSE POSITIVE. | ||
|
|
||
| For each: | ||
| 1. Read the file at the reported line number | ||
| 2. Quote the actual code | ||
| 3. Check if it's intentional (read surrounding comments, docstrings) | ||
| 4. Check if CI, tests, or build pipelines handle it | ||
| 5. Give a clear verdict with evidence | ||
|
|
||
| {BATCH OF FINDINGS} | ||
| ``` | ||
|
|
||
| ### After validation | ||
|
|
||
| - Remove all FALSE POSITIVE and LIKELY FALSE findings | ||
| - Keep CONFIRMED and LIKELY CONFIRMED | ||
| - Mark CONFIRMED-but-INTENTIONAL as excluded (note in report but don't create issues) | ||
| - Calculate false positive rate: `removed / total` | ||
| - Report: "Validated N findings. Removed M false positives (X%). N remaining confirmed findings." | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 5: Deep Dive on Suspicious Clean Results | ||
|
|
||
| For each audit category that found ZERO issues in Phase 2: | ||
|
|
||
| 1. **Research the relevant architecture first** -- read the actual implementation files to understand how the system works | ||
| 2. **Craft a targeted, informed prompt** -- include specific architectural details (e.g., "the observability stack uses structlog with 8 sinks routed by logger name prefix via _SINK_ROUTING in sinks.py") | ||
| 3. **Launch a second agent** with the enriched prompt and explicit instructions: "The first audit found nothing. Dig deeper. Check specific functions, look for subtle gaps, verify edge cases." | ||
| 4. **Validate any new findings** (same as Phase 4) | ||
| 5. Add validated findings to the main list | ||
|
|
||
| Skip this phase if the user passed `--quick` or if the zero-finding categories are genuinely well-covered (e.g., dependencies audit finding nothing is believable). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| --- | ||
|
|
||
| ## Phase 6: Present Validated Findings | ||
|
|
||
| Present the validated, deduplicated findings to the user. Format: | ||
|
|
||
| ### Summary Table | ||
|
|
||
| ``` | ||
| | # | Finding | File:Line | Category | Verdict | | ||
| |---|---------|-----------|----------|---------| | ||
| | 1 | Description | path:123 | category | CONFIRMED | | ||
| | ... | ... | ... | ... | ... | | ||
| ``` | ||
|
|
||
| ### Statistics | ||
|
|
||
| - Total findings: N | ||
| - False positives removed: M (X%) | ||
| - Confirmed: N1, Likely confirmed: N2 | ||
| - Intentional (excluded): N3 | ||
| - Categories with zero findings: list | ||
|
|
||
| ### User Gate | ||
|
|
||
| Ask the user: | ||
| 1. **"Proceed to group into work packages and create issues" (Recommended)** | ||
| 2. "Show me the full detail for each finding first" | ||
| 3. "Export as markdown report only (no issues)" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The user gate offers an option to "create issues", but this doesn't account for the user having already passed the |
||
|
|
||
| --- | ||
|
|
||
| ## Phase 7: Group into Work Packages | ||
|
|
||
| **Group by code proximity, NOT by severity.** | ||
|
|
||
| ### Grouping Rules | ||
|
|
||
| 1. **Same directory/module** -- findings touching the same `src/synthorg/<module>/` go together | ||
| 2. **Same file** -- multiple findings in one file always go in the same package | ||
| 3. **Dependency chain** -- if fixing A requires fixing B first, bundle them | ||
| 4. **Same developer context** -- what would a developer naturally fix in one sitting? | ||
| 5. **Target medium scope** -- each package should be a meaningful PR, not too small (1 finding) or too large (15+ findings) | ||
| 6. **Never group by severity** -- a HIGH and LOW in the same file go together; a HIGH and HIGH in different modules do NOT | ||
|
|
||
| ### Common Groupings | ||
|
|
||
| These patterns recur across audits: | ||
| - **API controller sweep** -- validation, response patterns, auth hardening (all in `api/controllers/`) | ||
| - **Observability fixes** -- sink routing, correlation, event constants (all in `observability/`) | ||
| - **Test quality** -- flaky fixes + missing coverage (all in `tests/`) | ||
| - **CI hardening** -- timeouts, permissions, script safety (all in `.github/`) | ||
| - **Documentation** -- content fixes across `docs/` and `site/` | ||
| - **Language-specific** -- Go fixes together, Vue fixes together | ||
|
|
||
| ### Standalone features | ||
|
|
||
| If a finding is a "feature not yet implemented" (spec drift with TODO/stub), it can be its own issue if medium+ scope. Do NOT create meta/tracking issues -- each issue must be implementable on its own. | ||
|
|
||
| ### Present to User | ||
|
|
||
| Show the proposed work packages: | ||
|
|
||
| ``` | ||
| ## Proposed Work Packages (N total) | ||
|
|
||
| ### WP1: Name | ||
| | # | Finding | | ||
| |---|---------| | ||
| | 1 | ... | | ||
| | 2 | ... | | ||
| **Rationale:** Why these go together. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ### WP2: Name | ||
| ... | ||
| ``` | ||
|
|
||
| Ask: "Create issues for all N work packages? Or adjust groupings first?" | ||
|
|
||
| --- | ||
|
|
||
| ## Phase 8: Final Issue Dedup & Creation | ||
|
|
||
| ### Step 8a: Final dedup against existing issues | ||
|
|
||
| Before creating, do one final check: | ||
|
|
||
| ```bash | ||
| gh issue list --repo OWNER/REPO --state open --limit 200 --json number,title,labels | ||
| ``` | ||
|
|
||
| For each work package, search for title/description overlap with existing issues. If a finding is already covered by an existing issue, either: | ||
| - Remove it from the work package | ||
| - Note "extends #NNN" in the new issue body | ||
|
|
||
| ### Step 8b: Create issues | ||
|
|
||
| For each work package, create a GitHub issue with: | ||
|
|
||
| - **Title**: `<type>: <concise description>` (matching commit convention: fix, feat, chore, docs, test) | ||
| - **Body**: | ||
| - `## Summary` -- 1-2 sentences | ||
| - `## Findings` -- table of findings with file:line, description | ||
| - `## Files to Modify` -- list of files that need changes | ||
| - Design spec references if applicable | ||
| - **Labels**: appropriate type/scope/spec labels | ||
|
|
||
| Use the `mcp__github__issue_write` tool or `gh issue create` via Bash. | ||
|
|
||
| **IMPORTANT**: Never use em-dashes or non-ASCII punctuation in issue bodies (project convention). | ||
|
|
||
| ### Step 8c: Report | ||
|
|
||
| Present the complete list of created issues: | ||
|
|
||
| ``` | ||
| | WP | Issue | Title | | ||
| |----|-------|-------| | ||
| | 1 | #NNN | ... | | ||
| | ... | ... | ... | | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Rules | ||
|
|
||
| 1. **NEVER present unvalidated findings to the user.** Validation (Phase 4) is mandatory. | ||
| 2. **ALWAYS research architecture before auditing.** Phase 1b is not optional. | ||
| 3. **Be skeptical of clean results.** Zero findings triggers Phase 5 deep dive. | ||
| 4. **Group by code proximity, NEVER by severity.** What files does a developer touch together? | ||
| 5. **No meta/tracking issues.** Every issue must be directly implementable. | ||
| 6. **Dedup twice.** Once in agent prompts (Phase 2), once before issue creation (Phase 8a). | ||
| 7. **All agents run in parallel.** Never launch agents sequentially when they're independent. | ||
| 8. **Agent prompts include architecture context.** Never launch a "blind" agent. | ||
| 9. **Intentional patterns are not bugs.** Graceful shutdown error swallowing, defensive cleanup, etc. are valid patterns -- exclude from issues but note in report. | ||
| 10. **Respect project conventions.** Read CLAUDE.md, use correct commands (`uv run python -m pytest`, not `uv run pytest`), no vendor names, etc. | ||
| 11. **Default to creating issues.** Unless user passes `--report-only`, the skill creates issues. | ||
| 12. **Never push code.** This skill audits and creates issues -- it does not fix code. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve clarity and avoid redundancy, it's best to have a single flag for a given behavior. Both
--report-onlyand--skip-issues(defined on line 36) have the same effect. I recommend removing--skip-issuesand using--report-onlyas the canonical flag.