Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 18 additions & 9 deletions .claude/skills/aurelio-review-pr/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,37 +250,46 @@ Collect all findings with their severity/confidence scores.

## Phase 4: Fetch external reviewer feedback

Fetch from three GitHub API sources **in parallel** using `gh api`:
**CRITICAL: Fetch ALL reviewers — do NOT filter by known bot names.** The set of external reviewers varies per repo and can include any combination of bots (CodeRabbit, Gemini, Copilot, Greptile, etc.) and human reviewers. Always fetch unfiltered results and categorize by author from the response.

**CRITICAL: Wait for all bots to finish processing.** Before triaging, check if any bot reviewer is still processing (e.g. CodeRabbit's "Currently processing" placeholder, or a review with an empty body). If a bot appears to still be processing:
1. Poll every 30 seconds for up to 3 minutes (6 checks)
2. If still not ready after 3 minutes, proceed without it and mark its coverage as "pending" in the triage table
3. After implementing fixes and pushing, re-check for the bot's feedback in Phase 9

Fetch from three GitHub API sources **in parallel** using `gh api` — **always unfiltered** (no `select(.user.login == ...)` filtering):

1. **Review submissions** (top-level review bodies):

```bash
gh api repos/OWNER/REPO/pulls/NUMBER/reviews --paginate
gh api repos/OWNER/REPO/pulls/NUMBER/reviews --paginate --jq '.[] | {author: .user.login, state: .state, body: (.body // "")}'
```
Comment thread
coderabbitai[bot] marked this conversation as resolved.

Extract: author, state, body.
Extract: author, state, body. List ALL unique authors to identify every reviewer.

**CRITICAL: Parse review bodies for outside-diff-range comments.** Some reviewers (e.g. CodeRabbit) embed actionable comments inside `<details>` blocks in the review body when the affected lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" and extract each embedded comment's file path, line range, severity, and description. These are just as important as inline comments — do NOT skip them.

2. **Inline review comments** (comments on specific lines):

```bash
gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate
gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate --jq '.[] | {author: .user.login, path: .path, line: .line, body: (.body // "")}'
```

Extract: author, file path, line number, body.
Extract: author, file path, line number, body. **Include ALL authors.**

3. **Issue-level comments** (general PR comments, e.g. CodeRabbit walkthrough):

```bash
gh api repos/OWNER/REPO/issues/NUMBER/comments --paginate
gh api repos/OWNER/REPO/issues/NUMBER/comments --paginate --jq '.[] | {author: .user.login, body: (.body // "")}'
```

Extract: author, body (look for actionable items, not just summaries).
Extract: author, body (look for actionable items, not just summaries). **Include ALL authors.**

After fetching, **enumerate all unique external reviewers** found across all three sources and report the list to the user before triaging. This ensures no reviewer is accidentally missed.

**Important:** Use `gh api` with `--jq` for filtering. Keep it simple and robust — no complex Python scripts to parse JSON.
**Important:** Use `gh api` with `--jq` for filtering fields only (not filtering authors). Keep it simple and robust — no complex Python scripts to parse JSON.

**Important:** When review bodies are large (e.g. CodeRabbit's review with embedded outside-diff comments), fetch the **full body** without truncation. Use `head -c` with a generous limit (e.g. 15000 chars) rather than `--jq '.body[0:500]'` truncation. Outside-diff comments are typically at the top of the review body.
**Important:** When review bodies are large (e.g. CodeRabbit's review with embedded outside-diff comments), fetch the **full body** without truncation. Outside-diff comments are typically at the top of the review body.

## Phase 5: Consolidate and triage

Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ src/ai_company/
communication/ # Inter-agent message bus and channels
config/ # YAML company config loading and validation
core/ # Shared domain models and base classes
engine/ # Agent execution engine and task lifecycle
engine/ # Agent orchestration, execution loops, and task lifecycle
memory/ # Persistent agent memory (memory layer TBD)
observability/ # Structured logging, correlation tracking, log sinks
providers/ # LLM provider abstraction (LiteLLM adapter)
Expand Down
39 changes: 36 additions & 3 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ Every agent has a comprehensive identity. At the design level, agent data splits
- **Config (immutable)**: identity, personality, skills, model preferences, tool permissions, authority. Defined at hire time, changed only by explicit reconfiguration. Represented as frozen Pydantic models.
- **Runtime state (mutable-via-copy)**: current status, active task, conversation history, execution metrics. Evolves during agent operation. Represented as Pydantic models using `model_copy(update=...)` for state transitions — never mutated in place.

> **Current state (M2):** Only the config layer exists as `AgentIdentity` (frozen Pydantic model in `core/agent.py`). The runtime state layer will be introduced in M3 when the agent execution engine is implemented. All identifier/name fields use `NotBlankStr` (from `core.types`) for automatic whitespace rejection; optional identifier fields use `NotBlankStr | None`; tuple fields use `tuple[NotBlankStr, ...]` for per-element validation.
> **Current state (M3):** Both layers are implemented. Config layer: `AgentIdentity` (frozen, in `core/agent.py`). Runtime state layer: `TaskExecution`, `AgentContext`, `AgentContextSnapshot` (frozen + `model_copy`, in `engine/`). `AgentEngine` orchestrates execution via `run()`. All identifier/name fields use `NotBlankStr` (from `core.types`) for automatic whitespace rejection; optional identifier fields use `NotBlankStr | None`; tuple fields use `tuple[NotBlankStr, ...]` for per-element validation.

```yaml
# --- Current (M2): Config layer — AgentIdentity (frozen) ---
Expand Down Expand Up @@ -911,6 +911,38 @@ hybrid:

> **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules` — a mapping of complexity thresholds to loop implementations (e.g., `{simple_max_tokens: 500, medium_max_tokens: 3000}` with corresponding loop assignments).

#### AgentEngine Orchestrator

`AgentEngine` (in `engine/agent_engine.py`) is the top-level entry point for running an agent on a task. It composes the execution loop with prompt construction, context management, tool invocation, and cost tracking into a single `run()` call.

**`async run(identity, task, completion_config?, max_turns?, memory_messages?) -> AgentRunResult`**

Pipeline steps:

1. **Validate inputs** — agent must be `ACTIVE`, task must be `ASSIGNED` or `IN_PROGRESS`. Raises `ExecutionStateError` on violation.
2. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
3. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
4. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config.
8. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
9. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.

Error handling: `MemoryError` and `RecursionError` propagate unconditionally. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.

Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).

Logs structured events under the `execution.engine.*` namespace (10 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, and cost recording outcomes.

**`AgentRunResult`** — frozen Pydantic model wrapping `ExecutionResult` with engine metadata:

- `execution_result` — outcome from the execution loop
- `system_prompt` — the `SystemPrompt` used for this run
- `duration_seconds` — wall-clock run time
- `agent_id`, `task_id` — identifiers
- Computed fields: `termination_reason`, `total_turns`, `total_cost_usd`, `is_success`

### 6.6 Agent Crash Recovery

When an agent execution fails unexpectedly (unhandled exception, OOM, process kill), the framework needs a recovery mechanism. Recovery strategies are implemented behind a `RecoveryStrategy` protocol, making the system pluggable — new strategies can be added without modifying existing ones.
Expand Down Expand Up @@ -2114,15 +2146,16 @@ ai-company/
│ │ ├── artifact.py # Produced work items
│ │ ├── role.py # Role model
│ │ └── role_catalog.py # Role catalog
│ ├── engine/ # Core engines (M3+)
│ ├── engine/ # Agent orchestration, execution loops, and task lifecycle
│ │ ├── errors.py # Engine error hierarchy
│ │ ├── prompt.py # System prompt builder
│ │ ├── prompt_template.py # System prompt Jinja2 templates
│ │ ├── task_execution.py # TaskExecution + StatusTransition
│ │ ├── context.py # AgentContext + AgentContextSnapshot
│ │ ├── loop_protocol.py # ExecutionLoop protocol + result models
│ │ ├── react_loop.py # ReAct loop implementation
│ │ ├── agent_engine.py # Agent execution engine (M3)
│ │ ├── run_result.py # AgentRunResult outcome model
│ │ ├── agent_engine.py # Agent execution engine
│ │ ├── task_engine.py # Task routing & scheduling (M3-M4)
│ │ ├── workflow_engine.py # Workflow orchestration (M4)
│ │ ├── meeting_engine.py # Meeting coordination (M4)
Expand Down
2 changes: 2 additions & 0 deletions src/ai_company/budget/tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,8 @@ def _resolve_department(self, agent_id: str) -> str | None:
return None
try:
return self._department_resolver(agent_id)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
raise
except Exception as exc:
logger.warning(
BUDGET_DEPARTMENT_RESOLVE_FAILED,
Expand Down
9 changes: 7 additions & 2 deletions src/ai_company/engine/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""Agent execution engine.

Re-exports the public API for system prompt construction,
runtime execution state, execution loops, and engine errors.
Re-exports the public API for the agent orchestrator, run results,
system prompt construction, runtime execution state, execution loops,
and engine errors.
"""

from ai_company.engine.agent_engine import AgentEngine
from ai_company.engine.context import (
DEFAULT_MAX_TURNS,
AgentContext,
Expand Down Expand Up @@ -31,6 +33,7 @@
build_system_prompt,
)
from ai_company.engine.react_loop import ReactLoop
from ai_company.engine.run_result import AgentRunResult
from ai_company.engine.task_execution import StatusTransition, TaskExecution
from ai_company.providers.models import ZERO_TOKEN_USAGE, add_token_usage

Expand All @@ -39,6 +42,8 @@
"ZERO_TOKEN_USAGE",
"AgentContext",
"AgentContextSnapshot",
"AgentEngine",
"AgentRunResult",
"BudgetChecker",
"BudgetExhaustedError",
"DefaultTokenEstimator",
Expand Down
Loading