diff --git a/.claude/skills/aurelio-review-pr/SKILL.md b/.claude/skills/aurelio-review-pr/SKILL.md index abe7d5e76a..2f9a1d8e8c 100644 --- a/.claude/skills/aurelio-review-pr/SKILL.md +++ b/.claude/skills/aurelio-review-pr/SKILL.md @@ -134,19 +134,42 @@ gh pr diff NUMBER --name-only git diff main --name-only ``` +**Categorize changed files:** + +- `src_py`: `.py` files in `src/` +- `test_py`: `.py` files in `tests/` +- `web_src`: `.vue`, `.ts`, `.css` files in `web/src/` (excluding `web/src/__tests__/`) +- `web_test`: `.ts` files in `web/src/__tests__/` +- `docker`: files in `docker/`, root-level `Dockerfile*`, `compose*.yml`, `compose*.yaml`, `docker-compose*.yml`, `docker-compose*.yaml` +- `ci`: files in `.github/workflows/`, `.github/actions/` +- `infra_config`: `.pre-commit-config.yaml`, `.dockerignore` +- `config`: `.toml`, `.yaml`, `.yml`, `.json`, `.cfg` files (not already categorized above) +- `docs`: `.md` files +- `site`: files in `site/` +- `other`: everything else + Based on changed files, launch applicable review agents **in parallel** using the Task tool. **Do NOT use `run_in_background`** — launch them as regular parallel Task calls so results arrive together and the user sees all agents complete before triage begins. Background agents cause confusing late-arriving `task-notification` messages that make it look like you presented triage before agents finished. | Agent | When to launch | subagent_type | |---|---|---| -| **code-reviewer** | Always | `pr-review-toolkit:code-reviewer` | -| **pr-test-analyzer** | Test files changed | `pr-review-toolkit:pr-test-analyzer` | -| **silent-failure-hunter** | Error handling or try/except changed | `pr-review-toolkit:silent-failure-hunter` | -| **comment-analyzer** | Comments or docstrings changed | `pr-review-toolkit:comment-analyzer` | -| **type-design-analyzer** | Type annotations or classes added/modified | `pr-review-toolkit:type-design-analyzer` | -| **logging-audit** | Any `.py` file in `src/` changed | `pr-review-toolkit:code-reviewer` | -| **resilience-audit** | Provider-layer `.py` files changed (`src/ai_company/providers/`) | `pr-review-toolkit:code-reviewer` | -| **docs-consistency** | **ALWAYS** — runs on every PR regardless of change type | `pr-review-toolkit:code-reviewer` | -| **issue-resolution-verifier** | Issue is linked (pre-existing or auto-linked) | `pr-review-toolkit:code-reviewer` | +| **docs-consistency** | **ALWAYS** — runs on every PR regardless of change type | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **code-reviewer** | Any `src_py` or `test_py` | `pr-review-toolkit:code-reviewer` | +| **python-reviewer** | Any `src_py` or `test_py` | `everything-claude-code:python-reviewer` | +| **pr-test-analyzer** | `test_py` changed, OR `src_py` changed with no corresponding test changes | `pr-review-toolkit:pr-test-analyzer` | +| **silent-failure-hunter** | Diff contains `try`, `except`, `raise`, error handling patterns | `pr-review-toolkit:silent-failure-hunter` | +| **comment-analyzer** | Diff contains docstring changes (`"""`) or significant comment changes | `pr-review-toolkit:comment-analyzer` | +| **type-design-analyzer** | Diff contains `class ` definitions, `BaseModel`, `TypedDict`, type aliases | `pr-review-toolkit:type-design-analyzer` | +| **logging-audit** | Any `src_py` changed | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **resilience-audit** | Any `src_py` changed | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **conventions-enforcer** | Any `src_py` or `test_py` | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **security-reviewer** | Files in `src/ai_company/api/`, `src/ai_company/security/`, `src/ai_company/tools/`, `src/ai_company/config/`, `src/ai_company/persistence/`, `src/ai_company/engine/` changed, OR any `web_src` changed, OR diff contains `subprocess`, `eval`, `exec`, `pickle`, `yaml.load`, `sql`, auth/credential patterns | `everything-claude-code:security-reviewer` | +| **frontend-reviewer** | Any `web_src` or `web_test` | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **api-contract-drift** | Any file in `src/ai_company/api/` OR `web/src/api/` OR `src/ai_company/core/enums.py` | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **infra-reviewer** | Any `docker`, `ci`, or `infra_config` file | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **persistence-reviewer** | Any file in `src/ai_company/persistence/` | `everything-claude-code:database-reviewer` | +| **test-quality-reviewer** | Any `test_py` or `web_test` | `pr-review-toolkit:pr-test-analyzer` (custom prompt below) | +| **async-concurrency-reviewer** | Diff contains `async def`, `await`, `asyncio`, `TaskGroup`, `create_task`, `aiosqlite` in `src_py` files | `pr-review-toolkit:code-reviewer` (custom prompt below) | +| **issue-resolution-verifier** | Issue is linked (pre-existing or auto-linked in Phase 2) | `pr-review-toolkit:code-reviewer` (custom prompt below) | The **issue-resolution-verifier** agent checks whether the PR fully resolves the linked issue. It only runs when an issue is linked — either from a pre-existing `closes #N` in the PR body, or auto-linked/user-selected during Phase 2's search. @@ -233,16 +256,276 @@ For every function touched by the PR, analyze its logic and suggest missing logg The **resilience-audit** agent prompt must check for these violations (see CLAUDE.md `## Resilience`): -**Hard rules:** +Resilience is a cross-cutting concern — ANY code can introduce resilience issues, not just provider files. Check all changed source files. + +**Hard rules (provider layer):** 1. Driver subclass implements its own retry/backoff logic instead of relying on base class (CRITICAL) 2. Calling code wraps provider calls in manual retry loops (CRITICAL) 3. New `BaseCompletionProvider` subclass doesn't pass `retry_handler`/`rate_limiter` to `super().__init__()` (MAJOR) 4. Retryable error type created without `is_retryable = True` (MAJOR) 5. `asyncio.sleep` used for retry delays outside of `RetryHandler` (MAJOR) +**Hard rules (any code):** +6. Error hierarchy overlap — new exception classes that accidentally inherit from or shadow `ProviderError`, which could cause incorrect error routing (MAJOR) +7. Code that catches broad `Exception` or `BaseException` and silently swallows provider errors that should propagate (MAJOR) +8. Manual retry/backoff patterns (e.g., `for attempt in range(...)`, `while retries > 0`, `time.sleep` in loops) anywhere in the codebase — retries belong in `RetryHandler` only (CRITICAL) + **Soft rules (SUGGESTION):** -6. New provider error type missing `is_retryable` classification (SUGGESTION) -7. Provider call site that catches `ProviderError` but doesn't account for `RetryExhaustedError` (SUGGESTION) +9. New error types missing `is_retryable` classification when they represent I/O or network failures (SUGGESTION) +10. Provider call site that catches `ProviderError` but doesn't account for `RetryExhaustedError` (SUGGESTION) +11. Engine or orchestration code that imports from `providers/` without considering that provider calls may raise `RetryExhaustedError` (SUGGESTION) +12. Non-retryable error types (e.g., deterministic failures like bad templates, invalid config) that should NOT be retryable — verify they don't accidentally inherit retryable classification (SUGGESTION) + +### Conventions-enforcer custom prompt + +The conventions-enforcer agent checks for project-specific code conventions from CLAUDE.md that automated linters cannot catch. + +**Immutability (CRITICAL):** +1. Direct mutation of existing objects instead of creating new ones via `model_copy(update=...)` or `copy.deepcopy()` (CRITICAL) +2. Mutable default arguments (`def foo(items=[])`) (CRITICAL) +3. In-place modification of function arguments (MAJOR) +4. Missing `MappingProxyType` wrapping for read-only registries/collections in non-Pydantic classes (MAJOR) +5. Missing `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation) (MAJOR) + +**Vendor names (CRITICAL):** +6. Real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples (CRITICAL) — allowed only in: Operations design page, `.claude/` files, third-party import paths +7. Test code using real vendor names instead of `test-provider`, `test-small-001`, etc. (CRITICAL) + +**Python 3.14 conventions (MAJOR):** +8. `from __future__ import annotations` — forbidden, Python 3.14 has PEP 649 (CRITICAL) +9. Parenthesized `except (A, B):` instead of PEP 758 `except A, B:` (MAJOR) + +**Code structure (MAJOR):** +10. Functions exceeding 50 lines (MAJOR) +11. Files exceeding 800 lines (MAJOR) +12. Deep nesting > 4 levels (MAJOR) + +**Pydantic conventions (MAJOR):** +13. Storing derived/redundant fields instead of using `@computed_field` (MAJOR) +14. Using raw `str` for identifier/name fields instead of `NotBlankStr` (from `core.types`) (MAJOR) +15. Mixing static config fields with mutable runtime fields in the same model (MAJOR) + +**Async patterns (SUGGESTION):** +16. Bare `asyncio.create_task()` instead of `asyncio.TaskGroup` for fan-out/fan-in operations in new code (SUGGESTION) +17. Unstructured concurrency patterns that could benefit from `TaskGroup` (SUGGESTION) + +### Security-reviewer supplemental prompt + +When `web_src` files are included in the security review scope, add these frontend-specific checks to the security-reviewer agent's prompt alongside its standard backend security analysis: + +**Frontend security (when `web_src` changed):** +1. XSS via `v-html` or unescaped user content rendering (CRITICAL) +2. Sensitive data (tokens, API keys) stored in `localStorage`/`sessionStorage` instead of httpOnly cookies (CRITICAL) +3. Missing CSRF token handling in API requests (MAJOR) +4. Exposing sensitive data in client-side JavaScript bundles (MAJOR) +5. Missing input sanitization on form inputs before sending to API (MAJOR) +6. Insecure WebSocket connections (ws:// instead of wss:// in production config) (MAJOR) +7. Missing Content-Security-Policy headers in the web server config (MEDIUM) +8. CORS misconfiguration allowing wildcard origins (MEDIUM) + +### Frontend-reviewer custom prompt + +The frontend-reviewer agent checks Vue 3 dashboard code quality and patterns. + +**Vue 3 / Composition API (CRITICAL):** +1. Options API usage instead of Composition API with `