diff --git a/CLAUDE.md b/CLAUDE.md index bc346a3c8c..77aa783ee5 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -66,7 +66,7 @@ src/ai_company/ - **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14 - **Type hints**: all public functions, mypy strict mode - **Docstrings**: Google style, required on public classes/functions (enforced by ruff D rules) -- **Immutability**: create new objects, never mutate existing ones. For `dict`/`list` fields in frozen Pydantic models, use `MappingProxyType` wrapping at construction (not `deepcopy` on access). Deep-copy only at system boundaries (e.g. passing data to `tool.execute()`, serializing for persistence). +- **Immutability**: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` for field reassignment prevention and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence). - **Config vs runtime state**: frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model. - **Models**: Pydantic v2 (`BaseModel`, `model_validator`, `ConfigDict`). Planned conventions for new code: use `@computed_field` for derived values instead of storing + validating redundant fields; use `NotBlankStr` (from `core.types`) for non-optional identifier/name fields instead of manual whitespace validators. Existing models are being migrated incrementally. - **Async concurrency**: prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally. diff --git a/DESIGN_SPEC.md b/DESIGN_SPEC.md index 7b0dc7f0da..2e050fcf1e 100644 --- a/DESIGN_SPEC.md +++ b/DESIGN_SPEC.md @@ -915,7 +915,7 @@ budget: When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_all` currently executes them **sequentially**. Migration to `asyncio.TaskGroup` for parallel structured concurrency is planned (see §15.5). Recoverable errors are captured as `ToolResult(is_error=True)` without aborting remaining invocations; non-recoverable errors (`MemoryError`, `RecursionError`) propagate immediately and abort the sequence. -Tool parameter schemas (`parameters_schema`) are currently exposed via `deepcopy` on each property access (construction also deep-copies). `MappingProxyType` wrapping is used in the `ToolRegistry` for its internal collections. Migrating `BaseTool.parameters_schema` to `MappingProxyType` at construction (removing per-access `deepcopy`) is a planned convention (see §15.5). +`BaseTool.parameters_schema` deep-copies the caller-supplied schema at construction and wraps it in `MappingProxyType` for read-only enforcement; the property returns a deep copy on access to prevent mutation of internal state. `ToolInvoker` deep-copies arguments at the tool execution boundary before passing them to `tool.execute()`. `MappingProxyType` wrapping is also used in `ToolRegistry` for its internal collections. ### 11.2 Tool Access Levels @@ -1302,11 +1302,11 @@ ai-company/ │ │ ├── role.py # Role model │ │ └── role_catalog.py # Role catalog │ ├── engine/ # Core engines (M3+) -│ │ ├── errors.py # Engine error hierarchy (M3) -│ │ ├── prompt.py # System prompt builder (M3) -│ │ ├── prompt_template.py # System prompt Jinja2 templates (M3) -│ │ ├── task_execution.py # TaskExecution + StatusTransition (M3) -│ │ ├── context.py # AgentContext + AgentContextSnapshot (M3) +│ │ ├── errors.py # Engine error hierarchy +│ │ ├── prompt.py # System prompt builder +│ │ ├── prompt_template.py # System prompt Jinja2 templates +│ │ ├── task_execution.py # TaskExecution + StatusTransition +│ │ ├── context.py # AgentContext + AgentContextSnapshot │ │ ├── agent_engine.py # Agent execution loop (M3) │ │ ├── task_engine.py # Task routing & scheduling (M3-M4) │ │ ├── workflow_engine.py # Workflow orchestration (M4) @@ -1436,7 +1436,7 @@ These conventions were established during the M0–M2 review cycle. **Adopted** | Convention | Status | Decision | Rationale | |------------|--------|----------|-----------| -| **Immutability strategy** | Adopted | `MappingProxyType` at construction for dict fields in registries and collections; `frozen=True` on all config/identity models | MappingProxyType is O(1) and prevents accidental mutation. Pydantic `frozen=True` is confirmed shallow (pydantic#7784). | +| **Immutability strategy** | Adopted | `copy.deepcopy()` at construction + `MappingProxyType` wrapping for non-Pydantic internal collections (registries, `BaseTool`). For Pydantic frozen models: `frozen=True` prevents field reassignment; `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization) prevents nested mutation. No MappingProxyType inside Pydantic models (serialization friction). | Deep-copy at construction fully isolates nested structures; `MappingProxyType` enforces read-only access. Boundary-copy for Pydantic models is simple, centralized, and Pydantic-native. A future CPython built-in immutable mapping type (e.g. `frozendict`) would provide zero-friction field-level immutability when available. | | **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for copy-on-write state transitions without re-running validators (per Pydantic `model_copy` semantics). Config layer (`AgentIdentity`, `Task`) remains unchanged. | | **Derived fields** | Planned | `@computed_field` instead of stored + validated | Eliminates redundant storage and impossible-to-fail validators (e.g. `total_tokens = input + output`). Currently `total_tokens` uses stored `Field` + `@model_validator`. | | **String validation** | Planned | `NotBlankStr` type from `core.types` for all identifiers | Eliminates per-model `@model_validator` boilerplate for whitespace checks. `NotBlankStr` is defined but models still use `Field(min_length=1)` + manual validators. | diff --git a/src/ai_company/engine/prompt.py b/src/ai_company/engine/prompt.py index 664c4c682e..94886615ef 100644 --- a/src/ai_company/engine/prompt.py +++ b/src/ai_company/engine/prompt.py @@ -77,7 +77,7 @@ class SystemPrompt(BaseModel): description="Names of sections included in the prompt", ) metadata: dict[str, str] = Field( - description="Agent identity metadata (treat as read-only)", + description="Agent identity metadata (string-only values; shallow-frozen)", ) diff --git a/src/ai_company/observability/events.py b/src/ai_company/observability/events.py index 6489b6d9ba..b71ed87fba 100644 --- a/src/ai_company/observability/events.py +++ b/src/ai_company/observability/events.py @@ -120,6 +120,7 @@ TOOL_INVOKE_PARAMETER_ERROR: Final[str] = "tool.invoke.parameter_error" TOOL_INVOKE_SCHEMA_ERROR: Final[str] = "tool.invoke.schema_error" TOOL_INVOKE_EXECUTION_ERROR: Final[str] = "tool.invoke.execution_error" +TOOL_INVOKE_DEEPCOPY_ERROR: Final[str] = "tool.invoke.deepcopy_error" TOOL_INVOKE_NON_RECOVERABLE: Final[str] = "tool.invoke.non_recoverable" TOOL_INVOKE_VALIDATION_UNEXPECTED: Final[str] = "tool.invoke.validation_unexpected" TOOL_BASE_INVALID_NAME: Final[str] = "tool.base.invalid_name" diff --git a/src/ai_company/providers/models.py b/src/ai_company/providers/models.py index 1d98c7a616..a267769974 100644 --- a/src/ai_company/providers/models.py +++ b/src/ai_company/providers/models.py @@ -81,10 +81,14 @@ class ToolDefinition(BaseModel): provider (OpenAI, Anthropic, LiteLLM) consumes it natively. Note: - The ``parameters_schema`` dict is shallowly immutable under the - frozen model — reassignment is prevented but contents can still - be mutated. Callers should treat it as read-only or copy before - modifying. + The ``parameters_schema`` dict is shallowly frozen by Pydantic's + ``frozen=True`` — field reassignment is prevented but nested + contents can still be mutated in place. ``BaseTool.to_definition()`` + provides a deep-copied schema, and ``ToolInvoker`` deep-copies + arguments at the execution boundary, so no additional caller-side + copying is needed for standard tool/provider workflows. Direct + consumers outside these paths should deep-copy if they intend to + modify the schema. See DESIGN_SPEC.md section 15.5. Attributes: name: Tool name (must be non-blank). @@ -106,10 +110,11 @@ class ToolCall(BaseModel): """A tool invocation requested by the model. Note: - The ``arguments`` dict is shallowly immutable under the frozen - model — reassignment is prevented but contents can still be - mutated. Callers should treat it as read-only or copy before - modifying. + The ``arguments`` dict is shallowly frozen by Pydantic's + ``frozen=True`` — field reassignment is prevented but nested + contents can still be mutated in place. The ``ToolInvoker`` + deep-copies arguments before passing them to tool + implementations. See DESIGN_SPEC.md section 15.5. Attributes: id: Provider-assigned tool call identifier. diff --git a/src/ai_company/tools/base.py b/src/ai_company/tools/base.py index 79d93ab75b..db8551736c 100644 --- a/src/ai_company/tools/base.py +++ b/src/ai_company/tools/base.py @@ -6,6 +6,7 @@ import copy from abc import ABC, abstractmethod +from types import MappingProxyType from typing import Any from pydantic import BaseModel, ConfigDict, Field @@ -26,9 +27,12 @@ class ToolExecutionResult(BaseModel): to the LLM and is available only for programmatic consumers. Note: - The ``metadata`` dict is shallowly immutable under the frozen - model — reassignment is prevented but contents can still be - mutated. Callers should treat it as read-only. + The ``metadata`` dict is shallowly frozen by Pydantic's + ``frozen=True``. Tool implementations construct and return + this model, but the invoker converts it into a provider-facing + ``ToolResult`` — ``metadata`` is not forwarded to LLM providers + or other external boundaries, so no additional boundary copy + is needed at this layer. Attributes: content: Tool output as a string. @@ -57,7 +61,8 @@ class BaseTool(ABC): name: Non-blank tool name. description: Human-readable description of the tool. parameters_schema: JSON Schema dict describing expected arguments, - or ``None`` if the tool accepts any arguments. + or ``None`` if no parameter schema is defined (the invoker + skips validation). """ def __init__( @@ -83,8 +88,10 @@ def __init__( raise ValueError(msg) self._name = name self._description = description - self._parameters_schema: dict[str, Any] | None = ( - copy.deepcopy(parameters_schema) if parameters_schema is not None else None + self._parameters_schema: MappingProxyType[str, Any] | None = ( + MappingProxyType(copy.deepcopy(parameters_schema)) + if parameters_schema is not None + else None ) @property @@ -101,9 +108,12 @@ def description(self) -> str: def parameters_schema(self) -> dict[str, Any] | None: """JSON Schema for tool parameters, or None if unspecified. - Returns a deep copy to prevent mutation of the internal schema. + Returns a deep copy to prevent mutation of internal state. """ - return copy.deepcopy(self._parameters_schema) + if self._parameters_schema is None: + return None + # dict() needed: MappingProxyType cannot be deep-copied directly + return copy.deepcopy(dict(self._parameters_schema)) def to_definition(self) -> ToolDefinition: """Convert this tool to a ``ToolDefinition`` for LLM providers. diff --git a/src/ai_company/tools/invoker.py b/src/ai_company/tools/invoker.py index cfa1324047..3519a2a6e2 100644 --- a/src/ai_company/tools/invoker.py +++ b/src/ai_company/tools/invoker.py @@ -1,15 +1,13 @@ """Tool invoker — validates and executes tool calls. Bridges LLM ``ToolCall`` objects with concrete ``BaseTool.execute`` -methods. Never propagates exceptions — always returns a ``ToolResult``. - -Note: - ``BaseException`` subclasses (``KeyboardInterrupt``, ``SystemExit``, - ``asyncio.CancelledError``) are NOT caught and will propagate - normally. Non-recoverable errors (``MemoryError``, - ``RecursionError``) are re-raised after logging. +methods. Recoverable errors are returned as ``ToolResult(is_error=True)``; +non-recoverable errors (``MemoryError``, ``RecursionError``) and +``BaseException`` subclasses (``KeyboardInterrupt``, ``SystemExit``, +``asyncio.CancelledError``) propagate after logging. """ +import copy from typing import TYPE_CHECKING import jsonschema @@ -19,6 +17,7 @@ from ai_company.observability import get_logger from ai_company.observability.events import ( + TOOL_INVOKE_DEEPCOPY_ERROR, TOOL_INVOKE_EXECUTION_ERROR, TOOL_INVOKE_NON_RECOVERABLE, TOOL_INVOKE_NOT_FOUND, @@ -234,9 +233,40 @@ async def _execute_tool( tool: BaseTool, tool_call: ToolCall, ) -> ToolExecutionResult | ToolResult: - """Execute the tool, catching errors as ``ToolResult``.""" + """Deep-copy arguments for isolation, then execute the tool. + + Copy failures and execution errors are caught and returned as + ``ToolResult(is_error=True)``. Non-recoverable errors + (``MemoryError``, ``RecursionError``) propagate after logging. + """ + try: + safe_args = copy.deepcopy(tool_call.arguments) + except (MemoryError, RecursionError) as exc: + logger.exception( + TOOL_INVOKE_NON_RECOVERABLE, + tool_call_id=tool_call.id, + tool_name=tool_call.name, + error=f"{type(exc).__name__}: {exc}", + ) + raise + except Exception as exc: + error_msg = str(exc) or f"{type(exc).__name__} (no message)" + logger.exception( + TOOL_INVOKE_DEEPCOPY_ERROR, + tool_call_id=tool_call.id, + tool_name=tool_call.name, + error=f"Failed to deep-copy arguments: {error_msg}", + ) + return ToolResult( + tool_call_id=tool_call.id, + content=( + f"Tool {tool_call.name!r} arguments could not be " + f"safely copied: {error_msg}" + ), + is_error=True, + ) try: - return await tool.execute(arguments=dict(tool_call.arguments)) + return await tool.execute(arguments=safe_args) except (MemoryError, RecursionError) as exc: logger.exception( TOOL_INVOKE_NON_RECOVERABLE, diff --git a/tests/unit/tools/conftest.py b/tests/unit/tools/conftest.py index 33a5bfd448..a8b32014f1 100644 --- a/tests/unit/tools/conftest.py +++ b/tests/unit/tools/conftest.py @@ -185,6 +185,32 @@ async def execute( raise ValueError(msg) +class _MutatingTool(BaseTool): + """Tool that mutates its arguments to test boundary isolation.""" + + def __init__(self) -> None: + super().__init__( + name="mutating", + description="Mutates args", + parameters_schema={ + "type": "object", + "properties": { + "nested": {"type": "object"}, + }, + }, + ) + + async def execute( + self, + *, + arguments: dict[str, Any], + ) -> ToolExecutionResult: + arguments["injected"] = True + if "nested" in arguments: + arguments["nested"]["mutated"] = True + return ToolExecutionResult(content="mutated") + + class _RemoteRefTool(BaseTool): """Tool with a remote ``$ref`` in its schema (for SSRF testing).""" @@ -278,5 +304,6 @@ def extended_invoker() -> ToolInvoker: _InvalidSchemaTool(), _EmptyErrorTool(), _RemoteRefTool(), + _MutatingTool(), ] return ToolInvoker(ToolRegistry(tools)) diff --git a/tests/unit/tools/test_base.py b/tests/unit/tools/test_base.py index 0aec649c3c..97a08b77eb 100644 --- a/tests/unit/tools/test_base.py +++ b/tests/unit/tools/test_base.py @@ -98,13 +98,29 @@ def test_default_schema_none(self) -> None: tool = _ConcreteTool(name="t") assert tool.parameters_schema is None - def test_schema_deep_copied_on_construction(self) -> None: + def test_schema_isolated_on_construction(self) -> None: props: dict[str, Any] = {"x": {"type": "string"}} schema: dict[str, Any] = {"type": "object", "properties": props} tool = _ConcreteTool(name="t", parameters_schema=schema) - props["y"] = {"type": "integer"} + schema["injected"] = True assert tool.parameters_schema is not None - assert "y" not in tool.parameters_schema["properties"] + assert "injected" not in tool.parameters_schema + + def test_schema_nested_isolated_on_construction(self) -> None: + schema: dict[str, Any] = { + "type": "object", + "properties": {"x": {"type": "string"}}, + } + tool = _ConcreteTool(name="t", parameters_schema=schema) + schema["properties"]["x"]["type"] = "integer" + assert tool.parameters_schema is not None + assert tool.parameters_schema["properties"]["x"]["type"] == "string" + + def test_schema_internal_is_read_only(self) -> None: + schema = {"type": "object", "properties": {"x": {"type": "string"}}} + tool = _ConcreteTool(name="t", parameters_schema=schema) + with pytest.raises(TypeError): + tool._parameters_schema["injected"] = True # type: ignore[index] def test_schema_property_returns_copy(self) -> None: schema: dict[str, Any] = { @@ -118,6 +134,19 @@ def test_schema_property_returns_copy(self) -> None: assert tool.parameters_schema is not None assert "injected" not in tool.parameters_schema + def test_schema_property_nested_mutation_isolated(self) -> None: + schema: dict[str, Any] = { + "type": "object", + "properties": {"x": {"type": "string"}}, + } + tool = _ConcreteTool(name="t", parameters_schema=schema) + returned = tool.parameters_schema + assert returned is not None + returned["properties"]["x"]["type"] = "integer" + fresh = tool.parameters_schema + assert fresh is not None + assert fresh["properties"]["x"]["type"] == "string" + def test_to_definition(self) -> None: schema = {"type": "object", "properties": {"x": {"type": "string"}}} tool = _ConcreteTool( diff --git a/tests/unit/tools/test_invoker.py b/tests/unit/tools/test_invoker.py index cc6df2ffd2..b1a0bb1446 100644 --- a/tests/unit/tools/test_invoker.py +++ b/tests/unit/tools/test_invoker.py @@ -318,6 +318,103 @@ async def test_remote_ref_does_not_raise( assert isinstance(result, ToolResult) +@pytest.mark.unit +class TestInvokeBoundaryIsolation: + """Tests that tool execution receives isolated argument copies.""" + + async def test_tool_receives_deep_copy_of_arguments( + self, + extended_invoker: ToolInvoker, + ) -> None: + """Nested argument structures are isolated from the frozen model.""" + call = ToolCall( + id="c1", + name="mutating", + arguments={"nested": {"key": "original"}}, + ) + await extended_invoker.invoke(call) + assert call.arguments["nested"]["key"] == "original" + assert "mutated" not in call.arguments.get("nested", {}) + assert "injected" not in call.arguments + + async def test_nested_mutation_does_not_leak( + self, + extended_invoker: ToolInvoker, + ) -> None: + """Tool mutating nested dicts does not affect the original ToolCall.""" + call = ToolCall( + id="c2", + name="mutating", + arguments={"nested": {"value": 42}}, + ) + await extended_invoker.invoke(call) + assert "mutated" not in call.arguments.get("nested", {}) + + +@pytest.mark.unit +class TestInvokeDeepcopyFailure: + """Tests for argument deep-copy failure handling.""" + + async def test_deepcopy_failure_returns_error_result( + self, + extended_invoker: ToolInvoker, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """When deepcopy of arguments fails, a ToolResult error is returned.""" + import copy as _copy_mod + + real_deepcopy = _copy_mod.deepcopy + call_count = 0 + + def _fail_on_execute(obj: object, memo: object = None) -> object: + nonlocal call_count + call_count += 1 + # First deepcopy call is in _validate_params via + # parameters_schema; let it pass. Fail on the second + # call in _execute_tool. + if call_count > 1: + msg = "cannot copy" + raise TypeError(msg) + return real_deepcopy(obj, memo) # type: ignore[arg-type] + + call = ToolCall(id="c_dc", name="mutating", arguments={"key": "val"}) + monkeypatch.setattr( + "ai_company.tools.invoker.copy.deepcopy", + _fail_on_execute, + ) + result = await extended_invoker.invoke(call) + assert result.is_error is True + assert "safely copied" in result.content + assert result.tool_call_id == "c_dc" + + async def test_recursion_error_during_deepcopy_propagates( + self, + extended_invoker: ToolInvoker, + monkeypatch: pytest.MonkeyPatch, + ) -> None: + """RecursionError during deepcopy is re-raised, not swallowed.""" + import copy as _copy_mod + + real_deepcopy = _copy_mod.deepcopy + call_count = 0 + + def _fail_on_execute(obj: object, memo: object = None) -> object: + nonlocal call_count + call_count += 1 + if call_count > 1: + msg = "maximum recursion depth exceeded" + raise RecursionError(msg) + return real_deepcopy(obj, memo) # type: ignore[arg-type] + + call = ToolCall(id="c_rec", name="mutating", arguments={"key": "val"}) + monkeypatch.setattr( + "ai_company.tools.invoker.copy.deepcopy", + _fail_on_execute, + ) + with pytest.raises(RecursionError, match="maximum recursion depth"): + await extended_invoker.invoke(call) + + @pytest.mark.unit class TestInvokeEmptyErrorMessage: """Tests for empty exception message fallback."""