Aureliolo · Aureliolo · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
@@ -250,37 +250,46 @@ Collect all findings with their severity/confidence scores.
 
 ## Phase 4: Fetch external reviewer feedback
 
-Fetch from three GitHub API sources **in parallel** using `gh api`:
+**CRITICAL: Fetch ALL reviewers — do NOT filter by known bot names.** The set of external reviewers varies per repo and can include any combination of bots (CodeRabbit, Gemini, Copilot, Greptile, etc.) and human reviewers. Always fetch unfiltered results and categorize by author from the response.
+
+**CRITICAL: Wait for all bots to finish processing.** Before triaging, check if any bot reviewer is still processing (e.g. CodeRabbit's "Currently processing" placeholder, or a review with an empty body). If a bot appears to still be processing:
+1. Poll every 30 seconds for up to 3 minutes (6 checks)
+2. If still not ready after 3 minutes, proceed without it and mark its coverage as "pending" in the triage table
+3. After implementing fixes and pushing, re-check for the bot's feedback in Phase 9
+
+Fetch from three GitHub API sources **in parallel** using `gh api` — **always unfiltered** (no `select(.user.login == ...)` filtering):
 
 1. **Review submissions** (top-level review bodies):
 
    ```bash
-   gh api repos/OWNER/REPO/pulls/NUMBER/reviews --paginate
+   gh api repos/OWNER/REPO/pulls/NUMBER/reviews --paginate --jq '.[] | {author: .user.login, state: .state, body: (.body // "")}'
    ```
 
-   Extract: author, state, body.
+   Extract: author, state, body. List ALL unique authors to identify every reviewer.
 
    **CRITICAL: Parse review bodies for outside-diff-range comments.** Some reviewers (e.g. CodeRabbit) embed actionable comments inside `<details>` blocks in the review body when the affected lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" and extract each embedded comment's file path, line range, severity, and description. These are just as important as inline comments — do NOT skip them.
 
 2. **Inline review comments** (comments on specific lines):
 
    ```bash
-   gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate
+   gh api repos/OWNER/REPO/pulls/NUMBER/comments --paginate --jq '.[] | {author: .user.login, path: .path, line: .line, body: (.body // "")}'
    ```
 
-   Extract: author, file path, line number, body.
+   Extract: author, file path, line number, body. **Include ALL authors.**
 
 3. **Issue-level comments** (general PR comments, e.g. CodeRabbit walkthrough):
 
    ```bash
-   gh api repos/OWNER/REPO/issues/NUMBER/comments --paginate
+   gh api repos/OWNER/REPO/issues/NUMBER/comments --paginate --jq '.[] | {author: .user.login, body: (.body // "")}'
    ```
 
-   Extract: author, body (look for actionable items, not just summaries).
+   Extract: author, body (look for actionable items, not just summaries). **Include ALL authors.**
+
+After fetching, **enumerate all unique external reviewers** found across all three sources and report the list to the user before triaging. This ensures no reviewer is accidentally missed.
 
-**Important:** Use `gh api` with `--jq` for filtering. Keep it simple and robust — no complex Python scripts to parse JSON.
+**Important:** Use `gh api` with `--jq` for filtering fields only (not filtering authors). Keep it simple and robust — no complex Python scripts to parse JSON.
 
-**Important:** When review bodies are large (e.g. CodeRabbit's review with embedded outside-diff comments), fetch the **full body** without truncation. Use `head -c` with a generous limit (e.g. 15000 chars) rather than `--jq '.body[0:500]'` truncation. Outside-diff comments are typically at the top of the review body.
+**Important:** When review bodies are large (e.g. CodeRabbit's review with embedded outside-diff comments), fetch the **full body** without truncation. Outside-diff comments are typically at the top of the review body.
 
 ## Phase 5: Consolidate and triage
 

@@ -48,7 +48,7 @@ src/ai_company/
   communication/  # Inter-agent message bus and channels
   config/         # YAML company config loading and validation
   core/           # Shared domain models and base classes
-  engine/         # Agent execution engine and task lifecycle
+  engine/         # Agent orchestration, execution loops, and task lifecycle
   memory/         # Persistent agent memory (memory layer TBD)
   observability/  # Structured logging, correlation tracking, log sinks
   providers/      # LLM provider abstraction (LiteLLM adapter)

@@ -151,7 +151,7 @@ Every agent has a comprehensive identity. At the design level, agent data splits
 - **Config (immutable)**: identity, personality, skills, model preferences, tool permissions, authority. Defined at hire time, changed only by explicit reconfiguration. Represented as frozen Pydantic models.
 - **Runtime state (mutable-via-copy)**: current status, active task, conversation history, execution metrics. Evolves during agent operation. Represented as Pydantic models using `model_copy(update=...)` for state transitions — never mutated in place.
 
-> **Current state (M2):** Only the config layer exists as `AgentIdentity` (frozen Pydantic model in `core/agent.py`). The runtime state layer will be introduced in M3 when the agent execution engine is implemented. All identifier/name fields use `NotBlankStr` (from `core.types`) for automatic whitespace rejection; optional identifier fields use `NotBlankStr | None`; tuple fields use `tuple[NotBlankStr, ...]` for per-element validation.
+> **Current state (M3):** Both layers are implemented. Config layer: `AgentIdentity` (frozen, in `core/agent.py`). Runtime state layer: `TaskExecution`, `AgentContext`, `AgentContextSnapshot` (frozen + `model_copy`, in `engine/`). `AgentEngine` orchestrates execution via `run()`. All identifier/name fields use `NotBlankStr` (from `core.types`) for automatic whitespace rejection; optional identifier fields use `NotBlankStr | None`; tuple fields use `tuple[NotBlankStr, ...]` for per-element validation.
 
 ```yaml
 # --- Current (M2): Config layer — AgentIdentity (frozen) ---
@@ -911,6 +911,38 @@ hybrid:
 
 > **Auto-selection (optional):** When `execution_loop: "auto"`, the framework selects the loop based on `estimated_complexity`: simple → ReAct, medium → Plan-and-Execute, complex/epic → Hybrid. Configurable via `auto_loop_rules` — a mapping of complexity thresholds to loop implementations (e.g., `{simple_max_tokens: 500, medium_max_tokens: 3000}` with corresponding loop assignments).
 
+#### AgentEngine Orchestrator
+
+`AgentEngine` (in `engine/agent_engine.py`) is the top-level entry point for running an agent on a task. It composes the execution loop with prompt construction, context management, tool invocation, and cost tracking into a single `run()` call.
+
+**`async run(identity, task, completion_config?, max_turns?, memory_messages?) -> AgentRunResult`**
+
+Pipeline steps:
+
+1. **Validate inputs** — agent must be `ACTIVE`, task must be `ASSIGNED` or `IN_PROGRESS`. Raises `ExecutionStateError` on violation.
+2. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
+3. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
+4. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
+5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
+6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
+7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config.
+8. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
+9. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.
+
+Error handling: `MemoryError` and `RecursionError` propagate unconditionally. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.
+
+Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).
+
+Logs structured events under the `execution.engine.*` namespace (10 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, and cost recording outcomes.
+
+**`AgentRunResult`** — frozen Pydantic model wrapping `ExecutionResult` with engine metadata:
+
+- `execution_result` — outcome from the execution loop
+- `system_prompt` — the `SystemPrompt` used for this run
+- `duration_seconds` — wall-clock run time
+- `agent_id`, `task_id` — identifiers
+- Computed fields: `termination_reason`, `total_turns`, `total_cost_usd`, `is_success`
+
 ### 6.6 Agent Crash Recovery
 
 When an agent execution fails unexpectedly (unhandled exception, OOM, process kill), the framework needs a recovery mechanism. Recovery strategies are implemented behind a `RecoveryStrategy` protocol, making the system pluggable — new strategies can be added without modifying existing ones.
@@ -2114,15 +2146,16 @@ ai-company/
 │       │   ├── artifact.py         # Produced work items
 │       │   ├── role.py             # Role model
 │       │   └── role_catalog.py     # Role catalog
-│       ├── engine/                  # Core engines (M3+)
+│       ├── engine/                  # Agent orchestration, execution loops, and task lifecycle
 │       │   ├── errors.py           # Engine error hierarchy
 │       │   ├── prompt.py           # System prompt builder
 │       │   ├── prompt_template.py  # System prompt Jinja2 templates
 │       │   ├── task_execution.py   # TaskExecution + StatusTransition
 │       │   ├── context.py          # AgentContext + AgentContextSnapshot
 │       │   ├── loop_protocol.py    # ExecutionLoop protocol + result models
 │       │   ├── react_loop.py       # ReAct loop implementation
-│       │   ├── agent_engine.py     # Agent execution engine (M3)
+│       │   ├── run_result.py       # AgentRunResult outcome model
+│       │   ├── agent_engine.py     # Agent execution engine
 │       │   ├── task_engine.py      # Task routing & scheduling (M3-M4)
 │       │   ├── workflow_engine.py  # Workflow orchestration (M4)
 │       │   ├── meeting_engine.py   # Meeting coordination (M4)

@@ -301,6 +301,8 @@ def _resolve_department(self, agent_id: str) -> str | None:
             return None
         try:
             return self._department_resolver(agent_id)
+        except MemoryError, RecursionError:
-        except MemoryError, RecursionError:
+        except (MemoryError, RecursionError):
-        except MemoryError, RecursionError:
+        except (MemoryError, RecursionError):
-        except MemoryError, RecursionError:
+        except (MemoryError, RecursionError):
-        except MemoryError, RecursionError:
+        except (MemoryError, RecursionError):
+            raise
         except Exception as exc:
             logger.warning(
                 BUDGET_DEPARTMENT_RESOLVE_FAILED,

@@ -1,9 +1,11 @@
 """Agent execution engine.
 
-Re-exports the public API for system prompt construction,
-runtime execution state, execution loops, and engine errors.
+Re-exports the public API for the agent orchestrator, run results,
+system prompt construction, runtime execution state, execution loops,
+and engine errors.
 """
 
+from ai_company.engine.agent_engine import AgentEngine
 from ai_company.engine.context import (
     DEFAULT_MAX_TURNS,
     AgentContext,
@@ -31,6 +33,7 @@
     build_system_prompt,
 )
 from ai_company.engine.react_loop import ReactLoop
+from ai_company.engine.run_result import AgentRunResult
 from ai_company.engine.task_execution import StatusTransition, TaskExecution
 from ai_company.providers.models import ZERO_TOKEN_USAGE, add_token_usage
 
@@ -39,6 +42,8 @@
     "ZERO_TOKEN_USAGE",
     "AgentContext",
     "AgentContextSnapshot",
+    "AgentEngine",
+    "AgentRunResult",
     "BudgetChecker",
     "BudgetExhaustedError",
     "DefaultTokenEstimator",