Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ src/ai_company/
- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
- **Type hints**: all public functions, mypy strict mode
- **Docstrings**: Google style, required on public classes/functions (enforced by ruff D rules)
- **Immutability**: create new objects, never mutate existing ones. For `dict`/`list` fields in frozen Pydantic models, use `MappingProxyType` wrapping at construction (not `deepcopy` on access). Deep-copy only at system boundaries (e.g. passing data to `tool.execute()`, serializing for persistence).
- **Immutability**: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` for field reassignment prevention and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
- **Config vs runtime state**: frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model.
- **Models**: Pydantic v2 (`BaseModel`, `model_validator`, `ConfigDict`). Planned conventions for new code: use `@computed_field` for derived values instead of storing + validating redundant fields; use `NotBlankStr` (from `core.types`) for non-optional identifier/name fields instead of manual whitespace validators. Existing models are being migrated incrementally.
- **Async concurrency**: prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Expand Down
14 changes: 7 additions & 7 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -915,7 +915,7 @@ budget:

When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_all` currently executes them **sequentially**. Migration to `asyncio.TaskGroup` for parallel structured concurrency is planned (see §15.5). Recoverable errors are captured as `ToolResult(is_error=True)` without aborting remaining invocations; non-recoverable errors (`MemoryError`, `RecursionError`) propagate immediately and abort the sequence.

Tool parameter schemas (`parameters_schema`) are currently exposed via `deepcopy` on each property access (construction also deep-copies). `MappingProxyType` wrapping is used in the `ToolRegistry` for its internal collections. Migrating `BaseTool.parameters_schema` to `MappingProxyType` at construction (removing per-access `deepcopy`) is a planned convention (see §15.5).
`BaseTool.parameters_schema` deep-copies the caller-supplied schema at construction and wraps it in `MappingProxyType` for read-only enforcement; the property returns a deep copy on access to prevent mutation of internal state. `ToolInvoker` deep-copies arguments at the tool execution boundary before passing them to `tool.execute()`. `MappingProxyType` wrapping is also used in `ToolRegistry` for its internal collections.

### 11.2 Tool Access Levels

Expand Down Expand Up @@ -1302,11 +1302,11 @@ ai-company/
│ │ ├── role.py # Role model
│ │ └── role_catalog.py # Role catalog
│ ├── engine/ # Core engines (M3+)
│ │ ├── errors.py # Engine error hierarchy (M3)
│ │ ├── prompt.py # System prompt builder (M3)
│ │ ├── prompt_template.py # System prompt Jinja2 templates (M3)
│ │ ├── task_execution.py # TaskExecution + StatusTransition (M3)
│ │ ├── context.py # AgentContext + AgentContextSnapshot (M3)
│ │ ├── errors.py # Engine error hierarchy
│ │ ├── prompt.py # System prompt builder
│ │ ├── prompt_template.py # System prompt Jinja2 templates
│ │ ├── task_execution.py # TaskExecution + StatusTransition
│ │ ├── context.py # AgentContext + AgentContextSnapshot
│ │ ├── agent_engine.py # Agent execution loop (M3)
│ │ ├── task_engine.py # Task routing & scheduling (M3-M4)
│ │ ├── workflow_engine.py # Workflow orchestration (M4)
Expand Down Expand Up @@ -1436,7 +1436,7 @@ These conventions were established during the M0–M2 review cycle. **Adopted**

| Convention | Status | Decision | Rationale |
|------------|--------|----------|-----------|
| **Immutability strategy** | Adopted | `MappingProxyType` at construction for dict fields in registries and collections; `frozen=True` on all config/identity models | MappingProxyType is O(1) and prevents accidental mutation. Pydantic `frozen=True` is confirmed shallow (pydantic#7784). |
| **Immutability strategy** | Adopted | `copy.deepcopy()` at construction + `MappingProxyType` wrapping for non-Pydantic internal collections (registries, `BaseTool`). For Pydantic frozen models: `frozen=True` prevents field reassignment; `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization) prevents nested mutation. No MappingProxyType inside Pydantic models (serialization friction). | Deep-copy at construction fully isolates nested structures; `MappingProxyType` enforces read-only access. Boundary-copy for Pydantic models is simple, centralized, and Pydantic-native. A future CPython built-in immutable mapping type (e.g. `frozendict`) would provide zero-friction field-level immutability when available. |
| **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for copy-on-write state transitions without re-running validators (per Pydantic `model_copy` semantics). Config layer (`AgentIdentity`, `Task`) remains unchanged. |
| **Derived fields** | Planned | `@computed_field` instead of stored + validated | Eliminates redundant storage and impossible-to-fail validators (e.g. `total_tokens = input + output`). Currently `total_tokens` uses stored `Field` + `@model_validator`. |
| **String validation** | Planned | `NotBlankStr` type from `core.types` for all identifiers | Eliminates per-model `@model_validator` boilerplate for whitespace checks. `NotBlankStr` is defined but models still use `Field(min_length=1)` + manual validators. |
Expand Down
2 changes: 1 addition & 1 deletion src/ai_company/engine/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ class SystemPrompt(BaseModel):
description="Names of sections included in the prompt",
)
metadata: dict[str, str] = Field(
description="Agent identity metadata (treat as read-only)",
description="Agent identity metadata (string-only values; shallow-frozen)",
)


Expand Down
1 change: 1 addition & 0 deletions src/ai_company/observability/events.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@
TOOL_INVOKE_PARAMETER_ERROR: Final[str] = "tool.invoke.parameter_error"
TOOL_INVOKE_SCHEMA_ERROR: Final[str] = "tool.invoke.schema_error"
TOOL_INVOKE_EXECUTION_ERROR: Final[str] = "tool.invoke.execution_error"
TOOL_INVOKE_DEEPCOPY_ERROR: Final[str] = "tool.invoke.deepcopy_error"
TOOL_INVOKE_NON_RECOVERABLE: Final[str] = "tool.invoke.non_recoverable"
TOOL_INVOKE_VALIDATION_UNEXPECTED: Final[str] = "tool.invoke.validation_unexpected"
TOOL_BASE_INVALID_NAME: Final[str] = "tool.base.invalid_name"
Expand Down
21 changes: 13 additions & 8 deletions src/ai_company/providers/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,14 @@ class ToolDefinition(BaseModel):
provider (OpenAI, Anthropic, LiteLLM) consumes it natively.

Note:
The ``parameters_schema`` dict is shallowly immutable under the
frozen model — reassignment is prevented but contents can still
be mutated. Callers should treat it as read-only or copy before
modifying.
The ``parameters_schema`` dict is shallowly frozen by Pydantic's
``frozen=True`` — field reassignment is prevented but nested
contents can still be mutated in place. ``BaseTool.to_definition()``
provides a deep-copied schema, and ``ToolInvoker`` deep-copies
arguments at the execution boundary, so no additional caller-side
copying is needed for standard tool/provider workflows. Direct
consumers outside these paths should deep-copy if they intend to
modify the schema. See DESIGN_SPEC.md section 15.5.

Attributes:
name: Tool name (must be non-blank).
Expand All @@ -106,10 +110,11 @@ class ToolCall(BaseModel):
"""A tool invocation requested by the model.

Note:
The ``arguments`` dict is shallowly immutable under the frozen
model — reassignment is prevented but contents can still be
mutated. Callers should treat it as read-only or copy before
modifying.
The ``arguments`` dict is shallowly frozen by Pydantic's
``frozen=True`` — field reassignment is prevented but nested
contents can still be mutated in place. The ``ToolInvoker``
deep-copies arguments before passing them to tool
implementations. See DESIGN_SPEC.md section 15.5.

Attributes:
id: Provider-assigned tool call identifier.
Expand Down
26 changes: 18 additions & 8 deletions src/ai_company/tools/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import copy
from abc import ABC, abstractmethod
from types import MappingProxyType
from typing import Any

from pydantic import BaseModel, ConfigDict, Field
Expand All @@ -26,9 +27,12 @@ class ToolExecutionResult(BaseModel):
to the LLM and is available only for programmatic consumers.

Note:
The ``metadata`` dict is shallowly immutable under the frozen
model — reassignment is prevented but contents can still be
mutated. Callers should treat it as read-only.
The ``metadata`` dict is shallowly frozen by Pydantic's
``frozen=True``. Tool implementations construct and return
this model, but the invoker converts it into a provider-facing
``ToolResult`` — ``metadata`` is not forwarded to LLM providers
or other external boundaries, so no additional boundary copy
is needed at this layer.

Attributes:
content: Tool output as a string.
Expand Down Expand Up @@ -57,7 +61,8 @@ class BaseTool(ABC):
name: Non-blank tool name.
description: Human-readable description of the tool.
parameters_schema: JSON Schema dict describing expected arguments,
or ``None`` if the tool accepts any arguments.
or ``None`` if no parameter schema is defined (the invoker
skips validation).
"""

def __init__(
Expand All @@ -83,8 +88,10 @@ def __init__(
raise ValueError(msg)
self._name = name
self._description = description
self._parameters_schema: dict[str, Any] | None = (
copy.deepcopy(parameters_schema) if parameters_schema is not None else None
self._parameters_schema: MappingProxyType[str, Any] | None = (
MappingProxyType(copy.deepcopy(parameters_schema))
if parameters_schema is not None
else None
)

@property
Expand All @@ -101,9 +108,12 @@ def description(self) -> str:
def parameters_schema(self) -> dict[str, Any] | None:
"""JSON Schema for tool parameters, or None if unspecified.

Returns a deep copy to prevent mutation of the internal schema.
Returns a deep copy to prevent mutation of internal state.
"""
return copy.deepcopy(self._parameters_schema)
if self._parameters_schema is None:
return None
# dict() needed: MappingProxyType cannot be deep-copied directly
return copy.deepcopy(dict(self._parameters_schema))
Comment on lines +113 to +116

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation can be simplified. copy.deepcopy() correctly handles both MappingProxyType and None values, so the explicit check for None and the cast to dict() are unnecessary. A single line would be more concise and achieve the same result.

        return copy.deepcopy(self._parameters_schema)

Comment on lines +115 to +116

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading comment — MappingProxyType can be deep-copied

The inline comment # dict() needed: MappingProxyType cannot be deep-copied directly is inaccurate. In Python 3, copy.deepcopy(MappingProxyType(…)) is well-defined and returns a new MappingProxyType wrapping a deep-copied dict. The real reason dict() is needed here is to satisfy the dict[str, Any] return-type annotation — a deepcopy of a MappingProxyType would return another MappingProxyType, which mismatches the declared return type.

Suggested change
# dict() needed: MappingProxyType cannot be deep-copied directly
return copy.deepcopy(dict(self._parameters_schema))
# Convert to dict first: deepcopy of MappingProxyType returns a
# MappingProxyType, but the public API declares dict[str, Any].
return copy.deepcopy(dict(self._parameters_schema))
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/base.py
Line: 115-116

Comment:
**Misleading comment — `MappingProxyType` *can* be deep-copied**

The inline comment `# dict() needed: MappingProxyType cannot be deep-copied directly` is inaccurate. In Python 3, `copy.deepcopy(MappingProxyType(…))` is well-defined and returns a new `MappingProxyType` wrapping a deep-copied dict. The real reason `dict()` is needed here is to satisfy the `dict[str, Any]` return-type annotation — a `deepcopy` of a `MappingProxyType` would return another `MappingProxyType`, which mismatches the declared return type.

```suggestion
        # Convert to dict first: deepcopy of MappingProxyType returns a
        # MappingProxyType, but the public API declares dict[str, Any].
        return copy.deepcopy(dict(self._parameters_schema))
```

How can I resolve this? If you propose a fix, please make it concise.


def to_definition(self) -> ToolDefinition:
"""Convert this tool to a ``ToolDefinition`` for LLM providers.
Expand Down
48 changes: 39 additions & 9 deletions src/ai_company/tools/invoker.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
"""Tool invoker — validates and executes tool calls.

Bridges LLM ``ToolCall`` objects with concrete ``BaseTool.execute``
methods. Never propagates exceptions — always returns a ``ToolResult``.

Note:
``BaseException`` subclasses (``KeyboardInterrupt``, ``SystemExit``,
``asyncio.CancelledError``) are NOT caught and will propagate
normally. Non-recoverable errors (``MemoryError``,
``RecursionError``) are re-raised after logging.
methods. Recoverable errors are returned as ``ToolResult(is_error=True)``;
non-recoverable errors (``MemoryError``, ``RecursionError``) and
``BaseException`` subclasses (``KeyboardInterrupt``, ``SystemExit``,
``asyncio.CancelledError``) propagate after logging.
"""

import copy
from typing import TYPE_CHECKING

import jsonschema
Expand All @@ -19,6 +17,7 @@

from ai_company.observability import get_logger
from ai_company.observability.events import (
TOOL_INVOKE_DEEPCOPY_ERROR,
TOOL_INVOKE_EXECUTION_ERROR,
TOOL_INVOKE_NON_RECOVERABLE,
TOOL_INVOKE_NOT_FOUND,
Expand Down Expand Up @@ -234,9 +233,40 @@ async def _execute_tool(
tool: BaseTool,
tool_call: ToolCall,
) -> ToolExecutionResult | ToolResult:
"""Execute the tool, catching errors as ``ToolResult``."""
"""Deep-copy arguments for isolation, then execute the tool.

Copy failures and execution errors are caught and returned as
``ToolResult(is_error=True)``. Non-recoverable errors
(``MemoryError``, ``RecursionError``) propagate after logging.
"""
try:
safe_args = copy.deepcopy(tool_call.arguments)
except (MemoryError, RecursionError) as exc:
logger.exception(
TOOL_INVOKE_NON_RECOVERABLE,
tool_call_id=tool_call.id,
tool_name=tool_call.name,
error=f"{type(exc).__name__}: {exc}",
)
raise
except Exception as exc:
error_msg = str(exc) or f"{type(exc).__name__} (no message)"
logger.exception(
TOOL_INVOKE_DEEPCOPY_ERROR,
tool_call_id=tool_call.id,
tool_name=tool_call.name,
error=f"Failed to deep-copy arguments: {error_msg}",
)
Comment thread
greptile-apps[bot] marked this conversation as resolved.
Comment on lines +254 to +259

Copilot AI Mar 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOOL_INVOKE_PARAMETER_ERROR is now used for two distinct failure modes: JSON Schema validation failures (_param_error_result, logged as warning) and argument deepcopy failures here (logged as exception). This will make observability/alerting ambiguous. Consider introducing a dedicated event constant for argument copy failures (or reusing a more appropriate existing event) so dashboards can distinguish validation errors from internal boundary-copy failures.

Copilot uses AI. Check for mistakes.
return ToolResult(
tool_call_id=tool_call.id,
content=(
f"Tool {tool_call.name!r} arguments could not be "
f"safely copied: {error_msg}"
),
is_error=True,
)
try:
return await tool.execute(arguments=dict(tool_call.arguments))
return await tool.execute(arguments=safe_args)
except (MemoryError, RecursionError) as exc:
logger.exception(
TOOL_INVOKE_NON_RECOVERABLE,
Expand Down
27 changes: 27 additions & 0 deletions tests/unit/tools/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,32 @@ async def execute(
raise ValueError(msg)


class _MutatingTool(BaseTool):
"""Tool that mutates its arguments to test boundary isolation."""

def __init__(self) -> None:
super().__init__(
name="mutating",
description="Mutates args",
parameters_schema={
"type": "object",
"properties": {
"nested": {"type": "object"},
},
},
)

async def execute(
self,
*,
arguments: dict[str, Any],
) -> ToolExecutionResult:
arguments["injected"] = True
if "nested" in arguments:
arguments["nested"]["mutated"] = True
return ToolExecutionResult(content="mutated")


class _RemoteRefTool(BaseTool):
"""Tool with a remote ``$ref`` in its schema (for SSRF testing)."""

Expand Down Expand Up @@ -278,5 +304,6 @@ def extended_invoker() -> ToolInvoker:
_InvalidSchemaTool(),
_EmptyErrorTool(),
_RemoteRefTool(),
_MutatingTool(),
]
return ToolInvoker(ToolRegistry(tools))
35 changes: 32 additions & 3 deletions tests/unit/tools/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,29 @@ def test_default_schema_none(self) -> None:
tool = _ConcreteTool(name="t")
assert tool.parameters_schema is None

def test_schema_deep_copied_on_construction(self) -> None:
def test_schema_isolated_on_construction(self) -> None:
props: dict[str, Any] = {"x": {"type": "string"}}
schema: dict[str, Any] = {"type": "object", "properties": props}
tool = _ConcreteTool(name="t", parameters_schema=schema)
props["y"] = {"type": "integer"}
schema["injected"] = True
assert tool.parameters_schema is not None
assert "y" not in tool.parameters_schema["properties"]
assert "injected" not in tool.parameters_schema

def test_schema_nested_isolated_on_construction(self) -> None:
schema: dict[str, Any] = {
"type": "object",
"properties": {"x": {"type": "string"}},
}
tool = _ConcreteTool(name="t", parameters_schema=schema)
schema["properties"]["x"]["type"] = "integer"
assert tool.parameters_schema is not None
assert tool.parameters_schema["properties"]["x"]["type"] == "string"

def test_schema_internal_is_read_only(self) -> None:
schema = {"type": "object", "properties": {"x": {"type": "string"}}}
tool = _ConcreteTool(name="t", parameters_schema=schema)
with pytest.raises(TypeError):
tool._parameters_schema["injected"] = True # type: ignore[index]
Comment on lines +119 to +123

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_schema_internal_is_read_only only covers top-level mutation

The test correctly asserts that tool._parameters_schema["injected"] = True raises TypeError (top-level MappingProxyType enforcement). However, it does not cover the nested mutability gap: since _parameters_schema["properties"] is a plain dict, the following silently succeeds and corrupts internal state:

tool._parameters_schema["properties"]["x"]["type"] = "integer"  # No TypeError raised!

Given the PR's stated goal of strengthening immutability and the explicit test_schema_nested_isolated_on_construction / test_schema_property_nested_mutation_isolated additions, adding a clarifying comment would make the documented guarantees clear and prevent future confusion:

def test_schema_internal_is_read_only(self) -> None:
    schema = {"type": "object", "properties": {"x": {"type": "string"}}}
    tool = _ConcreteTool(name="t", parameters_schema=schema)
    with pytest.raises(TypeError):
        tool._parameters_schema["injected"] = True  # type: ignore[index]
    # Note: MappingProxyType provides only top-level protection. Nested dicts
    # in _parameters_schema remain mutable, but direct access is discouraged
    # since it's a private attribute. Public isolation is guaranteed by the
    # parameters_schema property, which returns a deep copy.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/tools/test_base.py
Line: 119-123

Comment:
**`test_schema_internal_is_read_only` only covers top-level mutation**

The test correctly asserts that `tool._parameters_schema["injected"] = True` raises `TypeError` (top-level `MappingProxyType` enforcement). However, it does not cover the nested mutability gap: since `_parameters_schema["properties"]` is a plain `dict`, the following silently succeeds and corrupts internal state:

```python
tool._parameters_schema["properties"]["x"]["type"] = "integer"  # No TypeError raised!
```

Given the PR's stated goal of strengthening immutability and the explicit `test_schema_nested_isolated_on_construction` / `test_schema_property_nested_mutation_isolated` additions, adding a clarifying comment would make the documented guarantees clear and prevent future confusion:

```python
def test_schema_internal_is_read_only(self) -> None:
    schema = {"type": "object", "properties": {"x": {"type": "string"}}}
    tool = _ConcreteTool(name="t", parameters_schema=schema)
    with pytest.raises(TypeError):
        tool._parameters_schema["injected"] = True  # type: ignore[index]
    # Note: MappingProxyType provides only top-level protection. Nested dicts
    # in _parameters_schema remain mutable, but direct access is discouraged
    # since it's a private attribute. Public isolation is guaranteed by the
    # parameters_schema property, which returns a deep copy.
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


def test_schema_property_returns_copy(self) -> None:
schema: dict[str, Any] = {
Expand All @@ -118,6 +134,19 @@ def test_schema_property_returns_copy(self) -> None:
assert tool.parameters_schema is not None
assert "injected" not in tool.parameters_schema

def test_schema_property_nested_mutation_isolated(self) -> None:
schema: dict[str, Any] = {
"type": "object",
"properties": {"x": {"type": "string"}},
}
tool = _ConcreteTool(name="t", parameters_schema=schema)
returned = tool.parameters_schema
assert returned is not None
returned["properties"]["x"]["type"] = "integer"
fresh = tool.parameters_schema
assert fresh is not None
assert fresh["properties"]["x"]["type"] == "string"

def test_to_definition(self) -> None:
schema = {"type": "object", "properties": {"x": {"type": "string"}}}
tool = _ConcreteTool(
Expand Down
Loading