Aureliolo · Aureliolo · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026 · Mar 6, 2026
@@ -1473,7 +1473,7 @@ call_analytics:
 
 ### 11.1.1 Tool Execution Model
 
-When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_all` currently executes them **sequentially**. Migration to `asyncio.TaskGroup` for parallel structured concurrency is planned (see §15.5). Recoverable errors are captured as `ToolResult(is_error=True)` without aborting remaining invocations; non-recoverable errors (`MemoryError`, `RecursionError`) propagate immediately and abort the sequence.
+When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_all` executes them **concurrently** using `asyncio.TaskGroup`. An optional `max_concurrency` parameter (default unbounded) limits parallelism via `asyncio.Semaphore`. Recoverable errors are captured as `ToolResult(is_error=True)` without aborting sibling invocations; non-recoverable errors (`MemoryError`, `RecursionError`) are collected and re-raised after all tasks complete (bare exception for one, `ExceptionGroup` for multiple).
 
 `BaseTool.parameters_schema` deep-copies the caller-supplied schema at construction and wraps it in `MappingProxyType` for read-only enforcement; the property returns a deep copy on access to prevent mutation of internal state. `ToolInvoker` deep-copies arguments at the tool execution boundary before passing them to `tool.execute()`. `MappingProxyType` wrapping is also used in `ToolRegistry` for its internal collections.
 
@@ -2156,7 +2156,7 @@ ai-company/
 │       ├── tools/                   # Tool/capability system
 │       │   ├── base.py             # BaseTool ABC, ToolExecutionResult
 │       │   ├── registry.py         # Immutable tool registry (MappingProxyType)
-│       │   ├── invoker.py          # Tool invocation (sequential execution)
+│       │   ├── invoker.py          # Tool invocation (concurrent via TaskGroup)
 │       │   ├── errors.py           # Tool error hierarchy
 │       │   ├── examples/           # Example tool implementations
 │       │   │   └── echo.py        # Echo tool (for testing)
@@ -2245,7 +2245,7 @@ These conventions were established during the M0–M2+ review cycle. **Adopted**
 | **String validation** | Adopted | `NotBlankStr` type from `core.types` for all identifiers | Eliminates per-model `@model_validator` boilerplate for whitespace checks. All identifier/name fields use `NotBlankStr`; optional identifiers use `NotBlankStr \| None`; tuple fields use `tuple[NotBlankStr, ...]` for per-element validation. |
 | **Shared field groups** | Planned | Extract common field sets into base models (e.g. `_SpendingTotals`) | Prevents field duplication across spending summary models. Not yet implemented — each model independently defines fields. |
 | **Event constants** | Adopted (per-domain) | Per-domain submodules under `events/` package (e.g. `events.provider`, `events.budget`). Import directly: `from ai_company.observability.events.<domain> import CONSTANT` | Split by domain for discoverability, co-location with domain logic, and reduced merge conflicts as constants grow. `__init__.py` serves as package marker with usage documentation; no re-exports. |
-| **Parallel tool execution** | Planned | `asyncio.TaskGroup` in `ToolInvoker.invoke_all` | Structured concurrency with proper cancellation semantics. Currently sequential; migration planned for M3 when the agent engine needs concurrent tool calls. |
+| **Parallel tool execution** | Adopted (M2.5) | `asyncio.TaskGroup` in `ToolInvoker.invoke_all` with optional `max_concurrency` semaphore | Structured concurrency with proper cancellation semantics. Fatal errors collected via guarded wrapper and re-raised after all tasks complete. |
 | **Tool sandboxing** | Planned (M3) | Layered `SandboxBackend` protocol: `SubprocessSandbox` for low-risk tools (file, git), `DockerSandbox` for high-risk tools (code_runner, terminal, web, database). `K8sSandbox` planned for future container deployments. | Risk-proportionate isolation. Docker optional — only needed for code execution and network-sensitive tools. Pluggable protocol enables seamless migration to K8s per-agent pods in Phase 3-4. See §11.1.2. |
 | **Crash recovery** | Planned (M3) | Pluggable `RecoveryStrategy` protocol. M3: `FailAndReassignStrategy` (catch at engine boundary, log snapshot, mark FAILED, reassign). M4/M5: `CheckpointStrategy` (persist `AgentContext` per turn, resume from last checkpoint). | Immutable `model_copy` pattern makes checkpoint serialization trivial to add later. Fail-and-reassign is sufficient for short MVP tasks. See §6.6. |
 | **Agent behavior testing** | Planned (M3) | Scripted `FakeProvider` for unit tests (deterministic turn sequences); behavioral outcome assertions for integration tests (task completed, tools called, cost within budget). | Leverages existing `FakeProvider` and `CompletionResponseFactory` fixtures. Precise engine testing without brittle response-matching at integration level. |

@@ -17,3 +17,5 @@
 TOOL_INVOKE_VALIDATION_UNEXPECTED: Final[str] = "tool.invoke.validation_unexpected"
 TOOL_BASE_INVALID_NAME: Final[str] = "tool.base.invalid_name"
 TOOL_REGISTRY_CONTAINS_TYPE_ERROR: Final[str] = "tool.registry.contains_type_error"
+TOOL_INVOKE_ALL_START: Final[str] = "tool.invoke_all.start"
+TOOL_INVOKE_ALL_COMPLETE: Final[str] = "tool.invoke_all.complete"
@@ -2,21 +2,24 @@
 
 Bridges LLM ``ToolCall`` objects with concrete ``BaseTool.execute``
 methods.  Recoverable errors are returned as ``ToolResult(is_error=True)``;
-non-recoverable errors (``MemoryError``, ``RecursionError``) and
-``BaseException`` subclasses (``KeyboardInterrupt``, ``SystemExit``,
-``asyncio.CancelledError``) propagate after logging.
+non-recoverable errors (``MemoryError``, ``RecursionError``) are logged and
+re-raised.  ``BaseException`` subclasses (``KeyboardInterrupt``,
+``SystemExit``, ``asyncio.CancelledError``) propagate uncaught.
 """
 
+import asyncio
 import copy
-from typing import TYPE_CHECKING
+from contextlib import nullcontext
+from typing import TYPE_CHECKING, Never
 
 import jsonschema
 from referencing import Registry as JsonSchemaRegistry
-from referencing import Resource
 from referencing.exceptions import NoSuchResource
 
 from ai_company.observability import get_logger
 from ai_company.observability.events.tool import (
+    TOOL_INVOKE_ALL_COMPLETE,
+    TOOL_INVOKE_ALL_START,
     TOOL_INVOKE_DEEPCOPY_ERROR,
     TOOL_INVOKE_EXECUTION_ERROR,
     TOOL_INVOKE_NON_RECOVERABLE,
@@ -41,7 +44,7 @@
 logger = get_logger(__name__)
 
 
-def _no_remote_retrieve(uri: str) -> Resource:
+def _no_remote_retrieve(uri: str) -> Never:
     """Block remote ``$ref`` resolution to prevent SSRF."""
     raise NoSuchResource(uri)
 
@@ -64,9 +67,13 @@ class ToolInvoker:
             invoker = ToolInvoker(registry)
             result = await invoker.invoke(tool_call)
 
-        Invoke multiple tool calls sequentially::
+        Invoke multiple tool calls concurrently::
 
             results = await invoker.invoke_all(tool_calls)
+
+        Limit concurrency::
+
+            results = await invoker.invoke_all(tool_calls, max_concurrency=3)
     """
 
     def __init__(self, registry: ToolRegistry) -> None:
@@ -172,7 +179,7 @@ def _schema_error_result(
         error_msg: str,
     ) -> ToolResult:
         """Build an error result for an invalid tool schema."""
-        logger.exception(
+        logger.error(
             TOOL_INVOKE_SCHEMA_ERROR,
             tool_call_id=tool_call.id,
             tool_name=tool_call.name,
@@ -318,19 +325,94 @@ def _build_result(
             is_error=result.is_error,
         )
 
+    async def _run_guarded(
+        self,
+        index: int,
+        tool_call: ToolCall,
+        results: dict[int, ToolResult],
+        fatal_errors: list[Exception],
+        semaphore: asyncio.Semaphore | None,
+    ) -> None:
+        """Execute a single tool call, storing fatal errors instead of raising.
+
+        This wrapper ensures that ``MemoryError`` / ``RecursionError`` do not
+        cancel sibling tasks inside a ``TaskGroup``.  ``BaseException``
+        subclasses (``KeyboardInterrupt``, ``CancelledError``) are not
+        intercepted and will cancel the group normally.
+        """
+        try:
+            ctx = semaphore if semaphore is not None else nullcontext()
+            async with ctx:
+                results[index] = await self.invoke(tool_call)
+        except (MemoryError, RecursionError) as exc:
+            fatal_errors.append(exc)
+
     async def invoke_all(
         self,
         tool_calls: Iterable[ToolCall],
+        *,
+        max_concurrency: int | None = None,
     ) -> tuple[ToolResult, ...]:
-        """Execute multiple tool calls sequentially.
+        """Execute multiple tool calls concurrently.
 
         Calls continue through recoverable failures; non-recoverable
-        errors propagate immediately.
+        errors (``MemoryError``, ``RecursionError``) are collected and
+        re-raised after all tasks complete.
 
         Args:
-            tool_calls: Tool calls to execute in order.
+            tool_calls: Tool calls to execute.
+            max_concurrency: Maximum number of concurrent invocations.
+                ``None`` (default) means unbounded.  Must be ``>= 1``
+                if provided.
 
         Returns:
             Tuple of results in the same order as the input.
+
+        Raises:
+            ValueError: If ``max_concurrency`` is less than 1.
+            MemoryError: Re-raised if it was the sole fatal error.
+            RecursionError: Re-raised if it was the sole fatal error.
+            ExceptionGroup: If multiple fatal errors occurred.
         """
-        return tuple([await self.invoke(call) for call in tool_calls])
+        if max_concurrency is not None and max_concurrency < 1:
+            msg = f"max_concurrency must be >= 1, got {max_concurrency}"
+            raise ValueError(msg)
+
+        calls = list(tool_calls)
+        if not calls:
+            return ()
+
+        logger.info(
+            TOOL_INVOKE_ALL_START,
+            count=len(calls),
+            max_concurrency=max_concurrency,
+        )
+
+        # SAFETY: Both ``results`` and ``fatal_errors`` are mutated by
+        # concurrent tasks.  This is safe because asyncio runs tasks on
+        # a single thread — dict assignment and list.append() never race.
+        results: dict[int, ToolResult] = {}
+        fatal_errors: list[Exception] = []
+        semaphore = (
+            asyncio.Semaphore(max_concurrency) if max_concurrency is not None else None
+        )
+
+        async with asyncio.TaskGroup() as tg:
+            for idx, call in enumerate(calls):
+                tg.create_task(
+                    self._run_guarded(idx, call, results, fatal_errors, semaphore),
+                )
+
+        logger.info(
+            TOOL_INVOKE_ALL_COMPLETE,
+            count=len(calls),
+            fatal_count=len(fatal_errors),
+        )
-        logger.info(
-            TOOL_INVOKE_ALL_COMPLETE,
-            count=len(calls),
-            fatal_count=len(fatal_errors),
-        )
+        if fatal_errors:
+            logger.warning(
+                TOOL_INVOKE_ALL_COMPLETE,
+                count=len(calls),
+                fatal_count=len(fatal_errors),
+            )
+        else:
+            logger.info(
+                TOOL_INVOKE_ALL_COMPLETE,
+                count=len(calls),
+                fatal_count=0,
+            )
-        logger.info(
-            TOOL_INVOKE_ALL_COMPLETE,
-            count=len(calls),
-            fatal_count=len(fatal_errors),
-        )
+        if fatal_errors:
+            logger.warning(
+                TOOL_INVOKE_ALL_COMPLETE,
+                count=len(calls),
+                fatal_count=len(fatal_errors),
+            )
+        else:
+            logger.info(
+                TOOL_INVOKE_ALL_COMPLETE,
+                count=len(calls),
+                fatal_count=0,
+            )
+
+        if fatal_errors:
+            if len(fatal_errors) == 1:
+                raise fatal_errors[0]
+            msg = "multiple non-recoverable tool errors"
+            raise ExceptionGroup(msg, fatal_errors)
+
+        return tuple(results[i] for i in range(len(calls)))
@@ -1,5 +1,6 @@
 """Unit test fixtures for the tool system."""
 
+import asyncio
 from typing import Any
 
 import pytest
@@ -297,7 +298,9 @@ def sample_tool_call() -> ToolCall:
 
 @pytest.fixture
 def extended_invoker() -> ToolInvoker:
-    """Invoker with additional edge-case tools for advanced tests."""
+    """Invoker with echo, recursion, invalid-schema, empty-error,
+    remote-ref, and mutating tools for edge-case tests.
+    """
     tools = [
         _EchoTestTool(),
         _RecursionTool(),
@@ -307,3 +310,93 @@ def extended_invoker() -> ToolInvoker:
         _MutatingTool(),
     ]
     return ToolInvoker(ToolRegistry(tools))
+
+
+# ── Concurrency test tools ───────────────────────────────────────
+
+
+class _DelayTool(BaseTool):
+    """Sleeps for ``delay`` seconds, then returns ``value``."""
+
+    def __init__(self) -> None:
+        super().__init__(
+            name="delay",
+            description="Sleeps then returns value",
+            parameters_schema={
+                "type": "object",
+                "properties": {
+                    "delay": {"type": "number"},
+                    "value": {"type": "string"},
+                },
+                "required": ["delay", "value"],
+                "additionalProperties": False,
+            },
+        )
+
+    async def execute(
+        self,
+        *,
+        arguments: dict[str, Any],
+    ) -> ToolExecutionResult:
+        await asyncio.sleep(arguments["delay"])
+        return ToolExecutionResult(content=arguments["value"])
+
+
+class _ConcurrencyTrackingTool(BaseTool):
+    """Tracks peak concurrent executions via a lock-guarded counter."""
+
+    def __init__(self) -> None:
+        super().__init__(
+            name="tracking",
+            description="Tracks concurrency",
+            parameters_schema={
+                "type": "object",
+                "properties": {
+                    "duration": {"type": "number"},
+                },
+                "required": ["duration"],
+                "additionalProperties": False,
+            },
+        )
+        self._lock = asyncio.Lock()
+        self._current = 0
+        self._peak = 0
+
+    @property
+    def peak(self) -> int:
+        """Return the peak concurrent execution count."""
+        return self._peak
+
+    async def execute(
+        self,
+        *,
+        arguments: dict[str, Any],
+    ) -> ToolExecutionResult:
+        async with self._lock:
+            self._current += 1
+            self._peak = max(self._peak, self._current)
+        await asyncio.sleep(arguments["duration"])
+        async with self._lock:
+            self._current -= 1
+        return ToolExecutionResult(content=str(self._peak))
+
+
+@pytest.fixture
+def concurrency_tracking_tool() -> _ConcurrencyTrackingTool:
+    """Standalone tracking tool for direct peak inspection."""
+    return _ConcurrencyTrackingTool()
+
+
+@pytest.fixture
+def concurrency_invoker(
+    concurrency_tracking_tool: _ConcurrencyTrackingTool,
+) -> ToolInvoker:
+    """Invoker with echo, failing, delay, tracking, and recursion tools."""
+    tools: list[BaseTool] = [
+        _EchoTestTool(),
+        _FailingTool(),
+        _DelayTool(),
+        concurrency_tracking_tool,
+        _RecursionTool(),
+    ]
+    return ToolInvoker(ToolRegistry(tools))