Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -915,7 +915,7 @@ hybrid:

`AgentEngine` (in `engine/agent_engine.py`) is the top-level entry point for running an agent on a task. It composes the execution loop with prompt construction, context management, tool invocation, and cost tracking into a single `run()` call.

**`async run(identity, task, completion_config?, max_turns?, memory_messages?) -> AgentRunResult`**
**`async run(identity, task, completion_config?, max_turns?, memory_messages?, timeout_seconds?) -> AgentRunResult`**

Pipeline steps:

Expand All @@ -925,23 +925,24 @@ Pipeline steps:
4. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESIGN_SPEC references asyncio.wait_for but implementation uses asyncio.wait

The spec description (line 928) states:

If timeout_seconds is set, wraps the call in asyncio.wait_for

However, the actual implementation in _run_loop_with_timeout (lines 278–300) deliberately uses asyncio.wait instead, with clear rationale: this prevents conflating an internal TimeoutError from the loop with the engine's wall-clock deadline (see the method's docstring at lines 280–282).

The implementation choice is correct and well-reasoned; the spec just needs to match:

Suggested change
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait` (not `asyncio.wait_for`, to avoid conflating internal `TimeoutError` with the engine's wall-clock deadline); on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 928

Comment:
**DESIGN_SPEC references `asyncio.wait_for` but implementation uses `asyncio.wait`**

The spec description (line 928) states:

> If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`

However, the actual implementation in `_run_loop_with_timeout` (lines 278–300) deliberately uses `asyncio.wait` instead, with clear rationale: this prevents conflating an internal `TimeoutError` from the loop with the engine's wall-clock deadline (see the method's docstring at lines 280–282).

The implementation choice is correct and well-reasoned; the spec just needs to match:

```suggestion
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait` (not `asyncio.wait_for`, to avoid conflating internal `TimeoutError` with the engine's wall-clock deadline); on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
```

How can I resolve this? If you propose a fix, please make it concise.

8. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
9. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.
9. **Apply post-execution transitions** — on `COMPLETED` termination: IN_PROGRESS → IN_REVIEW → COMPLETED (two-hop auto-complete in M3; reviewers deferred to M4+). All other termination reasons leave the task in its current state. Transition failures are logged but do not discard the successful execution result.
10. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.

Error handling: `MemoryError` and `RecursionError` propagate unconditionally. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.

Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).

Logs structured events under the `execution.engine.*` namespace (10 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, and cost recording outcomes.
Logs structured events under the `execution.engine.*` namespace (12 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.

**`AgentRunResult`** — frozen Pydantic model wrapping `ExecutionResult` with engine metadata:

- `execution_result` — outcome from the execution loop
- `system_prompt` — the `SystemPrompt` used for this run
- `duration_seconds` — wall-clock run time
- `agent_id`, `task_id` — identifiers
- Computed fields: `termination_reason`, `total_turns`, `total_cost_usd`, `is_success`
- Computed fields: `termination_reason`, `total_turns`, `total_cost_usd`, `is_success`, `completion_summary`

### 6.6 Agent Crash Recovery

Expand Down Expand Up @@ -1463,12 +1464,15 @@ Every LLM provider call is tracked with comprehensive metadata for financial rep

Every completion call produces a `CompletionResponse` with `TokenUsage` (token counts and cost). The engine layer creates a `CostRecord` (with agent/task context) and records it into `CostTracker` — the provider itself does not have agent/task context. In M3, the engine additionally logs **proxy overhead metrics** at task completion:

- `turns_per_task` — number of LLM turns to complete the task (from `AgentContext.turn_count`)
- `turns_per_task` — number of LLM turns to complete the task (from `AgentRunResult.total_turns`)
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost.total_tokens`)
- `cost_per_task` — total USD cost (from `TaskExecution.accumulated_cost.cost_usd`)
- `cost_per_task` — total USD cost (from `AgentContext.accumulated_cost.cost_usd` via `AgentRunResult.total_cost_usd`)
- `duration_seconds` — wall-clock execution time in seconds (from `AgentRunResult.duration_seconds`)

These are natural overhead indicators — a task consuming 15 turns and 50k tokens for a one-line fix signals a problem.

These metrics are captured in `TaskCompletionMetrics` (in `engine/metrics.py`), a frozen Pydantic model with a `from_run_result()` factory method. The engine logs these metrics at task completion via the `EXECUTION_ENGINE_TASK_METRICS` event.

#### M4: Call Categorization + Orchestration Ratio

When multi-agent coordination exists, each `CostRecord` is tagged with a **call category**:
Expand Down Expand Up @@ -2153,6 +2157,7 @@ ai-company/
│ │ ├── task_execution.py # TaskExecution + StatusTransition
│ │ ├── context.py # AgentContext + AgentContextSnapshot
│ │ ├── loop_protocol.py # ExecutionLoop protocol + result models
│ │ ├── metrics.py # TaskCompletionMetrics proxy overhead model
│ │ ├── react_loop.py # ReAct loop implementation
│ │ ├── run_result.py # AgentRunResult outcome model
│ │ ├── agent_engine.py # Agent execution engine
Expand Down
2 changes: 2 additions & 0 deletions src/ai_company/engine/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
TerminationReason,
TurnRecord,
)
from ai_company.engine.metrics import TaskCompletionMetrics
from ai_company.engine.prompt import (
DefaultTokenEstimator,
PromptTokenEstimator,
Expand Down Expand Up @@ -58,6 +59,7 @@
"ReactLoop",
"StatusTransition",
"SystemPrompt",
"TaskCompletionMetrics",
"TaskExecution",
"TerminationReason",
"TurnRecord",
Expand Down
Loading