Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
- Every implementation plan must be **presented to the user** for accept/deny before coding starts
- At **every phase** of planning and implementation, be critical — actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free)
- Surface improvements as suggestions, not silent changes — user decides
- **Prioritize issues by dependency order**, not priority labels — unblocked dependencies come first

## Quick Commands

Expand Down Expand Up @@ -112,6 +113,8 @@ src/ai_company/
- **Enforced by**: commitizen (commit-msg hook)
- **Branches**: `<type>/<slug>` from main
- **Pre-commit hooks**: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks
- **GitHub issue queries**: use `gh issue list` via Bash (not MCP tools) — MCP `list_issues` returns `null` for milestone data
- **PR issue references**: preserve existing `Closes #NNN` references — never remove unless explicitly asked

## Post-Implementation (MANDATORY)

Expand Down
20 changes: 18 additions & 2 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -799,6 +799,20 @@ The agent execution loop defines how an agent processes a task from start to fin

> **MVP: ReAct only (Loop 1).** Plan-and-Execute and Hybrid are M4+. Auto-selection is M4+.

#### ExecutionLoop Protocol

All loop implementations satisfy the `ExecutionLoop` runtime-checkable protocol (defined in `engine/loop_protocol.py`):

- **`get_loop_type() -> str`** — returns a unique identifier (e.g. `"react"`)
- **`execute(...) -> ExecutionResult`** — runs the loop to completion, accepting `AgentContext`, `CompletionProvider`, optional `ToolInvoker`, optional `BudgetChecker`, and optional `CompletionConfig`

Supporting models:

- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`
- **`TurnRecord`** — frozen per-turn stats (tokens, cost, tool calls, finish reason)
- **`ExecutionResult`** — frozen outcome with final context, termination reason, turn records, and optional error message (required when reason is `ERROR`)
- **`BudgetChecker`** — callback type `Callable[[AgentContext], bool]` invoked before each LLM call
Comment on lines +811 to +814
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep a first-class blocked termination state.

Issue #124 still calls out blocked termination, and later sections in this spec already distinguish parked/resumable work from outright failure. Folding those cases into ERROR loses whether the loop failed versus needs external input and can be resumed; if BLOCKED is intentionally deferred for M3, call that out explicitly instead of redefining the protocol here.

📝 Suggested spec correction
-- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`
+- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `BLOCKED`, `ERROR`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 811 - 814, The spec removes a distinct blocked
termination state and folds it into ERROR which loses important semantics;
update the TerminationReason enum to include BLOCKED as a first-class value
(e.g., add `BLOCKED` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`,
`ERROR`) and adjust ExecutionResult rules so that an error message is required
only when TerminationReason == `ERROR`, while `BLOCKED` is documented as a
resumable/parked state (not an outright failure) and any references to
termination handling or consumers of ExecutionResult/TurnRecord reflect this
distinction; if BLOCKED is intentionally deferred, add a clear note in the spec
stating it will be introduced in M3 rather than collapsing it into ERROR.


#### Loop 1: ReAct (Default for Simple Tasks)

A single interleaved loop: the agent reasons about the current state, selects an action (tool call or response), observes the result, and repeats until done or `max_turns` is reached.
Expand All @@ -814,7 +828,7 @@ A single interleaved loop: the agent reasons about the current state, selects an
│ └─────────────────────────┘ │
│ │
│ Terminate when: task complete, max │
│ turns, budget exhausted, or blocked
│ turns, budget exhausted, or error
└──────────────────────────────────────────┘
```

Expand Down Expand Up @@ -2106,7 +2120,9 @@ ai-company/
│ │ ├── prompt_template.py # System prompt Jinja2 templates
│ │ ├── task_execution.py # TaskExecution + StatusTransition
│ │ ├── context.py # AgentContext + AgentContextSnapshot
│ │ ├── agent_engine.py # Agent execution loop (M3)
│ │ ├── loop_protocol.py # ExecutionLoop protocol + result models
│ │ ├── react_loop.py # ReAct loop implementation
│ │ ├── agent_engine.py # Agent execution engine (M3)
│ │ ├── task_engine.py # Task routing & scheduling (M3-M4)
│ │ ├── workflow_engine.py # Workflow orchestration (M4)
│ │ ├── meeting_engine.py # Meeting coordination (M4)
Expand Down
20 changes: 19 additions & 1 deletion src/ai_company/engine/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Agent execution engine.

Re-exports the public API for system prompt construction,
runtime execution state, and engine errors.
runtime execution state, execution loops, and engine errors.
"""

from ai_company.engine.context import (
Expand All @@ -10,17 +10,27 @@
AgentContextSnapshot,
)
from ai_company.engine.errors import (
BudgetExhaustedError,
EngineError,
ExecutionStateError,
LoopExecutionError,
MaxTurnsExceededError,
PromptBuildError,
)
from ai_company.engine.loop_protocol import (
BudgetChecker,
ExecutionLoop,
ExecutionResult,
TerminationReason,
TurnRecord,
)
from ai_company.engine.prompt import (
DefaultTokenEstimator,
PromptTokenEstimator,
SystemPrompt,
build_system_prompt,
)
from ai_company.engine.react_loop import ReactLoop
from ai_company.engine.task_execution import StatusTransition, TaskExecution
from ai_company.providers.models import ZERO_TOKEN_USAGE, add_token_usage

Expand All @@ -29,15 +39,23 @@
"ZERO_TOKEN_USAGE",
"AgentContext",
"AgentContextSnapshot",
"BudgetChecker",
"BudgetExhaustedError",
"DefaultTokenEstimator",
"EngineError",
"ExecutionLoop",
"ExecutionResult",
"ExecutionStateError",
"LoopExecutionError",
"MaxTurnsExceededError",
"PromptBuildError",
"PromptTokenEstimator",
"ReactLoop",
"StatusTransition",
"SystemPrompt",
"TaskExecution",
"TerminationReason",
"TurnRecord",
"add_token_usage",
"build_system_prompt",
]
18 changes: 18 additions & 0 deletions src/ai_company/engine/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,21 @@ class MaxTurnsExceededError(EngineError):
Enforced by ``AgentContext.with_turn_completed`` when the hard turn
limit has been reached.
"""


class BudgetExhaustedError(EngineError):
"""Budget exhaustion signal for the engine layer.

The execution loop returns ``TerminationReason.BUDGET_EXHAUSTED``
internally. This exception is available for the engine layer above
the loop to convert that result into a raised error when appropriate.
"""


class LoopExecutionError(EngineError):
"""Non-recoverable execution loop error for the engine layer.

The execution loop returns ``TerminationReason.ERROR`` internally.
This exception is available for the engine layer above the loop to
convert that result into a raised error when appropriate.
"""
161 changes: 161 additions & 0 deletions src/ai_company/engine/loop_protocol.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
"""Execution loop protocol and supporting models.

Defines the ``ExecutionLoop`` protocol that the agent engine calls to
run a task, along with ``ExecutionResult``, ``TurnRecord``,
``TerminationReason``, and the ``BudgetChecker`` type alias.
"""

from collections.abc import Callable
from enum import StrEnum
from typing import TYPE_CHECKING, Any, Protocol, Self, runtime_checkable

from pydantic import BaseModel, ConfigDict, Field, computed_field, model_validator

from ai_company.core.types import NotBlankStr # noqa: TC001
from ai_company.engine.context import AgentContext
from ai_company.providers.enums import FinishReason # noqa: TC001

if TYPE_CHECKING:
from ai_company.providers.models import CompletionConfig
from ai_company.providers.protocol import CompletionProvider
from ai_company.tools.invoker import ToolInvoker


class TerminationReason(StrEnum):
"""Why the execution loop terminated."""

COMPLETED = "completed"
MAX_TURNS = "max_turns"
BUDGET_EXHAUSTED = "budget_exhausted"
ERROR = "error"


class TurnRecord(BaseModel):
"""Per-turn metadata recorded during execution.

Attributes:
turn_number: 1-indexed turn number.
input_tokens: Input tokens consumed this turn.
output_tokens: Output tokens generated this turn.
total_tokens: Sum of input and output tokens (computed).
cost_usd: Cost in USD for this turn.
tool_calls_made: Names of tools invoked this turn.
finish_reason: LLM finish reason for this turn.
"""

model_config = ConfigDict(frozen=True)

turn_number: int = Field(gt=0, description="1-indexed turn number")
input_tokens: int = Field(ge=0, description="Input tokens this turn")
output_tokens: int = Field(ge=0, description="Output tokens this turn")
cost_usd: float = Field(ge=0.0, description="Cost in USD this turn")
tool_calls_made: tuple[NotBlankStr, ...] = Field(
default=(),
description="Tool names invoked this turn",
)
finish_reason: FinishReason = Field(
description="LLM finish reason this turn",
)

@computed_field(description="Total token count") # type: ignore[prop-decorator]
@property
def total_tokens(self) -> int:
"""Sum of input and output tokens."""
return self.input_tokens + self.output_tokens


class ExecutionResult(BaseModel):
"""Result returned by an execution loop.

Attributes:
context: Final agent context after execution.
termination_reason: Why the loop stopped.
turns: Per-turn metadata records.
total_tool_calls: Total tool calls across all turns (computed).
error_message: Error description when termination_reason is ERROR.
metadata: Forward-compatible dict for future loop types.
Note: ``frozen=True`` prevents field reassignment but not
in-place mutation of the dict contents; deep-copy at
system boundaries per project conventions.
"""

model_config = ConfigDict(frozen=True)

context: AgentContext = Field(description="Final agent context")
termination_reason: TerminationReason = Field(
description="Why the loop stopped",
)
turns: tuple[TurnRecord, ...] = Field(
default=(),
description="Per-turn metadata",
)
error_message: str | None = Field(
default=None,
description="Error description (when reason is ERROR)",
)
metadata: dict[str, Any] = Field(
default_factory=dict,
description="Forward-compatible metadata for future loop types",
)

@computed_field( # type: ignore[prop-decorator]
description="Total tool calls across all turns",
)
@property
def total_tool_calls(self) -> int:
"""Sum of tool calls from all turn records."""
return sum(len(t.tool_calls_made) for t in self.turns)

@model_validator(mode="after")
def _validate_error_message(self) -> Self:
if self.termination_reason == TerminationReason.ERROR:
if self.error_message is None:
msg = "error_message is required when termination_reason is ERROR"
raise ValueError(msg)
elif self.error_message is not None:
msg = "error_message must be None when termination_reason is not ERROR"
raise ValueError(msg)
return self


BudgetChecker = Callable[[AgentContext], bool]
"""Callback that returns ``True`` when the budget is exhausted."""


@runtime_checkable
class ExecutionLoop(Protocol):
"""Protocol for agent execution loops.

The agent engine calls ``execute`` to run a task through the loop.
Implementations decide the control flow (ReAct, Plan-and-Execute, etc.)
but all return an ``ExecutionResult`` with a ``TerminationReason``.
"""

async def execute(
self,
*,
context: AgentContext,
provider: CompletionProvider,
tool_invoker: ToolInvoker | None = None,
budget_checker: BudgetChecker | None = None,
completion_config: CompletionConfig | None = None,
) -> ExecutionResult:
"""Run the execution loop.

Args:
context: Initial agent context with conversation and identity.
provider: LLM completion provider.
tool_invoker: Optional tool invoker for tool execution.
budget_checker: Optional callback; returns ``True`` when
budget is exhausted.
completion_config: Optional per-execution override for
temperature/max_tokens (defaults to identity's model config).

Returns:
Execution result with final context and termination reason.
"""
...

def get_loop_type(self) -> str:
"""Return the loop type identifier (e.g. ``"react"``)."""
...
Loading