Aureliolo · Aureliolo · Mar 7, 2026 · Mar 7, 2026 · Mar 7, 2026 · Mar 7, 2026
@@ -193,6 +193,7 @@ agent:
     type: "persistent"           # persistent, project, session, none
     retention_days: null         # null = forever
   tools:
+    access_level: "standard"  # sandboxed | restricted | standard | elevated | custom
     allowed:
       - file_system
       - git
@@ -1641,6 +1642,13 @@ When the LLM requests multiple tool calls in a single turn, `ToolInvoker.invoke_
 
 `BaseTool.parameters_schema` deep-copies the caller-supplied schema at construction and wraps it in `MappingProxyType` for read-only enforcement; the property returns a deep copy on access to prevent mutation of internal state. `ToolInvoker` deep-copies arguments at the tool execution boundary before passing them to `tool.execute()`. `MappingProxyType` wrapping is also used in `ToolRegistry` for its internal collections.
 
+**Permission checking (M3):** Each `BaseTool` carries a `category: ToolCategory` attribute used for access-level gating. `ToolInvoker` accepts an optional `ToolPermissionChecker` which enforces the agent's `ToolPermissions.access_level` (see §11.2). Permission checking occurs after tool lookup but before parameter validation:
+
+1. `get_permitted_definitions()` filters tool definitions sent to the LLM — the agent only sees tools it is permitted to use.
+2. At invocation time, denied tools return `ToolResult(is_error=True)` with a descriptive denial reason (defense-in-depth against LLM hallucinating unpresented tools).
+
+The `ToolPermissionChecker` resolves permissions using a priority-based system: denied list (highest) → allowed list → access-level categories → deny (default). `AgentEngine._make_tool_invoker()` creates a permission-aware invoker from the agent's `ToolPermissions` at the start of each `run()` call. Note: M3 implements category-level gating only; the granular sub-constraints described in §11.2 (workspace scope, network mode) are planned for when sandboxing is implemented.
+
 ### 11.1.2 Tool Sandboxing
 
 Tool execution requires safety boundaries proportional to the risk of each tool category. The framework uses a **layered sandboxing strategy** with a pluggable `SandboxBackend` protocol — new backends can be added without modifying existing ones. The default configuration uses lighter isolation for low-risk tools and stronger isolation for high-risk tools.
@@ -1739,6 +1747,8 @@ tool_access:
       description: "Per-agent custom configuration."
 ```
 
+> **M3 implementation note:** The current `ToolPermissionChecker` implements **category-level gating only** — each access level maps to a set of permitted `ToolCategory` values (e.g., `STANDARD` permits `file_system`, `code_execution`, `version_control`, `web`, `terminal`, `analytics`). The granular sub-constraints shown above (workspace scope, network mode, containerization) are planned for when sandboxing backends (§11.1.2) are implemented.
+
 ### 11.3 Progressive Trust
 
 Agents can earn higher tool access over time through configurable trust strategies. The trust system implements a `TrustStrategy` protocol, making it extensible. Multiple strategies are available, selectable via config.
@@ -2310,7 +2320,7 @@ ai-company/
 │       │   ├── drivers/            # Provider driver implementations
 │       │   │   ├── litellm_driver.py  # LiteLLM adapter
 │       │   │   └── mappers.py     # Request/response mappers
-│       │   ├── routing/            # Model routing (6 strategies)
+│       │   ├── routing/            # Model routing (5 strategies)
 │       │   │   ├── _strategy_helpers.py  # Shared routing helper functions
 │       │   │   ├── errors.py      # Routing errors
 │       │   │   ├── models.py      # Routing models (candidates, results)
@@ -2326,7 +2336,8 @@ ai-company/
 │       │   ├── base.py             # BaseTool ABC, ToolExecutionResult
 │       │   ├── registry.py         # Immutable tool registry (MappingProxyType)
 │       │   ├── invoker.py          # Tool invocation (concurrent via TaskGroup)
-│       │   ├── errors.py           # Tool error hierarchy
+│       │   ├── permissions.py      # ToolPermissionChecker (access-level gating)
+│       │   ├── errors.py           # Tool error hierarchy (incl. ToolPermissionDeniedError)
 │       │   ├── examples/           # Example tool implementations
 │       │   │   └── echo.py        # Echo tool (for testing)
 │       │   ├── sandbox/            # Sandboxing backends (M3)
@@ -2415,6 +2426,7 @@ These conventions were established during the M0–M2+ review cycle. **Adopted**
 | **Shared field groups** | Adopted (M2.5) | Extracted common field sets into base models (e.g. `_SpendingTotals`) | Prevents field duplication across spending summary models. `_SpendingTotals` provides shared aggregation fields; `AgentSpending`, `DepartmentSpending`, `PeriodSpending` extend it. |
 | **Event constants** | Adopted (per-domain) | Per-domain submodules under `events/` package (e.g. `events.provider`, `events.budget`). Import directly: `from ai_company.observability.events.<domain> import CONSTANT` | Split by domain for discoverability, co-location with domain logic, and reduced merge conflicts as constants grow. `__init__.py` serves as package marker with usage documentation; no re-exports. |
 | **Parallel tool execution** | Adopted (M2.5) | `asyncio.TaskGroup` in `ToolInvoker.invoke_all` with optional `max_concurrency` semaphore | Structured concurrency with proper cancellation semantics. Fatal errors collected via guarded wrapper and re-raised after all tasks complete. |
+| **Tool permission checking** | Adopted (M3) | `ToolPermissionChecker` enforces category-level gating based on `ToolAccessLevel` (sandboxed → restricted → standard → elevated, plus custom). Priority-based resolution: denied list → allowed list → level categories → deny. Case-insensitive name matching. `ToolInvoker` filters definitions for prompt and checks at invocation time. | Defense-in-depth: agents only see permitted tools in the LLM prompt, and invocations are re-checked at execution time. Explicit allow/deny lists provide per-agent overrides. See §11.1.1. |
 | **Tool sandboxing** | Planned (M3) | Layered `SandboxBackend` protocol: `SubprocessSandbox` for low-risk tools (file, git), `DockerSandbox` for high-risk tools (code_runner, terminal, web, database). `K8sSandbox` planned for future container deployments. | Risk-proportionate isolation. Docker optional — only needed for code execution and network-sensitive tools. Pluggable protocol enables seamless migration to K8s per-agent pods in Phase 3-4. See §11.1.2. |
 | **Crash recovery** | Planned (M3) | Pluggable `RecoveryStrategy` protocol. M3: `FailAndReassignStrategy` (catch at engine boundary, log snapshot, mark FAILED, reassign). M4/M5: `CheckpointStrategy` (persist `AgentContext` per turn, resume from last checkpoint). | Immutable `model_copy` pattern makes checkpoint serialization trivial to add later. Fail-and-reassign is sufficient for short MVP tasks. See §6.6. |
 | **Agent behavior testing** | Planned (M3) | Scripted `FakeProvider` for unit tests (deterministic turn sequences); behavioral outcome assertions for integration tests (task completed, tools called, cost within budget). | Leverages existing `FakeProvider` and `CompletionResponseFactory` fixtures. Precise engine testing without brittle response-matching at integration level. |

@@ -33,6 +33,8 @@
     SkillCategory,
     TaskStatus,
     TaskType,
+    ToolAccessLevel,
+    ToolCategory,
 )
 from ai_company.core.project import Project
 from ai_company.core.role import (
@@ -93,6 +95,8 @@
     "TaskStatus",
     "TaskType",
     "Team",
+    "ToolAccessLevel",
+    "ToolCategory",
     "ToolPermissions",
     "get_builtin_role",
     "get_seniority_info",

@@ -12,6 +12,7 @@
     MemoryType,
     RiskTolerance,
     SeniorityLevel,
+    ToolAccessLevel,
 )
 from ai_company.core.role import Authority
 from ai_company.core.types import NotBlankStr  # noqa: TC001
@@ -137,12 +138,18 @@ class ToolPermissions(BaseModel):
     """Tool access permissions for an agent.
 
     Attributes:
+        access_level: Tool access level controlling which categories
+            are available.
         allowed: Explicitly allowed tool names.
         denied: Explicitly denied tool names.
     """
 
     model_config = ConfigDict(frozen=True)
 
+    access_level: ToolAccessLevel = Field(
+        default=ToolAccessLevel.STANDARD,
+        description="Tool access level",
+    )
     allowed: tuple[NotBlankStr, ...] = Field(
         default=(),
         description="Explicitly allowed tools",

@@ -188,3 +188,40 @@ class ProjectStatus(StrEnum):
     ON_HOLD = "on_hold"
     COMPLETED = "completed"
     CANCELLED = "cancelled"
+
+
+class ToolAccessLevel(StrEnum):
+    """Access level for tool permissions.
+
+    Determines which tool categories an agent can use.
+    Levels ``SANDBOXED`` through ``ELEVATED`` form a hierarchy
+    where each includes all categories from lower levels.
+    ``CUSTOM`` uses only explicit allow/deny lists, ignoring
+    the hierarchy.
+
+    The concrete category sets for each level are defined in
+    ``ToolPermissionChecker._LEVEL_CATEGORIES``.
+    """
+
+    SANDBOXED = "sandboxed"
+    RESTRICTED = "restricted"
+    STANDARD = "standard"
+    ELEVATED = "elevated"
+    CUSTOM = "custom"
+
+
+class ToolCategory(StrEnum):
+    """Category of a tool for access-level gating."""
+
+    FILE_SYSTEM = "file_system"
+    CODE_EXECUTION = "code_execution"
+    VERSION_CONTROL = "version_control"
+    WEB = "web"
+    DATABASE = "database"
+    TERMINAL = "terminal"
+    DESIGN = "design"
+    COMMUNICATION = "communication"
+    ANALYTICS = "analytics"
+    DEPLOYMENT = "deployment"
+    MCP = "mcp"
+    OTHER = "other"
@@ -20,7 +20,11 @@
     TurnRecord,
 )
 from ai_company.engine.metrics import TaskCompletionMetrics
-from ai_company.engine.prompt import SystemPrompt, build_system_prompt
+from ai_company.engine.prompt import (
+    SystemPrompt,
+    build_system_prompt,
+    format_task_instruction,
+)
 from ai_company.engine.react_loop import ReactLoop
 from ai_company.engine.run_result import AgentRunResult
 from ai_company.observability import get_logger
@@ -41,6 +45,7 @@
 from ai_company.providers.enums import MessageRole
 from ai_company.providers.models import ChatMessage
 from ai_company.tools.invoker import ToolInvoker
+from ai_company.tools.permissions import ToolPermissionChecker
 
 if TYPE_CHECKING:
     from ai_company.budget.tracker import CostTracker
@@ -157,13 +162,15 @@ async def run(  # noqa: PLR0913
         ctx: AgentContext | None = None
         system_prompt: SystemPrompt | None = None
         try:
+            tool_invoker = self._make_tool_invoker(identity)
             ctx, system_prompt = self._prepare_context(
                 identity=identity,
                 task=task,
                 agent_id=agent_id,
                 task_id=task_id,
                 max_turns=max_turns,
                 memory_messages=memory_messages,
+                tool_invoker=tool_invoker,
             )
             return await self._execute(
                 identity=identity,
@@ -175,6 +182,7 @@ async def run(  # noqa: PLR0913
                 system_prompt=system_prompt,
                 start=start,
                 timeout_seconds=timeout_seconds,
+                tool_invoker=tool_invoker,
             )
         except MemoryError, RecursionError:
             logger.error(
@@ -209,6 +217,7 @@ async def _execute(  # noqa: PLR0913
         system_prompt: SystemPrompt,
         start: float,
         timeout_seconds: float | None = None,
+        tool_invoker: ToolInvoker | None = None,
     ) -> AgentRunResult:
         """Run execution loop, record costs, apply transitions, and build result.
 
@@ -217,7 +226,6 @@ async def _execute(  # noqa: PLR0913
         recording, post-execution task transitions, and metrics logging.
         """
         budget_checker = _make_budget_checker(task)
-        tool_invoker = self._make_tool_invoker()
 
         logger.debug(
             EXECUTION_ENGINE_PROMPT_BUILT,
@@ -254,6 +262,8 @@ async def _execute(  # noqa: PLR0913
         )
         try:
             self._log_completion(result, agent_id, task_id, duration)
+        except MemoryError, RecursionError:
+            raise
         except Exception:
             logger.exception(
                 EXECUTION_ENGINE_ERROR,
@@ -330,9 +340,10 @@ def _prepare_context(  # noqa: PLR0913
         task_id: str,
         max_turns: int,
         memory_messages: tuple[ChatMessage, ...],
+        tool_invoker: ToolInvoker | None = None,
     ) -> tuple[AgentContext, SystemPrompt]:
         """Build system prompt and prepare execution context."""
-        tool_defs = self._get_tool_definitions()
+        tool_defs = self._get_tool_definitions(tool_invoker)
         system_prompt = build_system_prompt(
             agent=identity,
             task=task,
@@ -352,7 +363,7 @@ def _prepare_context(  # noqa: PLR0913
         ctx = ctx.with_message(
             ChatMessage(
                 role=MessageRole.USER,
-                content=_format_task_instruction(task),
+                content=format_task_instruction(task),
             ),
         )
 
@@ -437,11 +448,14 @@ def _validate_task(
 
     # ── Helpers ──────────────────────────────────────────────────
 
-    def _get_tool_definitions(self) -> tuple[ToolDefinition, ...]:
-        """Extract tool definitions from the registry for prompt building."""
-        if self._tool_registry is None:
+    def _get_tool_definitions(
+        self,
+        tool_invoker: ToolInvoker | None,
+    ) -> tuple[ToolDefinition, ...]:
+        """Extract permitted tool definitions for prompt building."""
+        if tool_invoker is None:
             return ()
-        return self._tool_registry.to_definitions()
+        return tool_invoker.get_permitted_definitions()
 
     def _transition_task_if_needed(
         self,
@@ -541,11 +555,15 @@ def _apply_post_execution_transitions(
 
         return execution_result.model_copy(update={"context": ctx})
 
-    def _make_tool_invoker(self) -> ToolInvoker | None:
-        """Create a ToolInvoker from the registry, or None."""
+    def _make_tool_invoker(
+        self,
+        identity: AgentIdentity,
+    ) -> ToolInvoker | None:
+        """Create a ToolInvoker with permission checking, or None."""
         if self._tool_registry is None:
             return None
-        return ToolInvoker(self._tool_registry)
+        checker = ToolPermissionChecker.from_permissions(identity.tools)
+        return ToolInvoker(self._tool_registry, permission_checker=checker)
 
     def _log_completion(
         self,
@@ -760,26 +778,6 @@ def _handle_fatal_error(  # noqa: PLR0913
             raise exc from None
 
 
-def _format_task_instruction(task: Task) -> str:
-    """Format a task into a user message for the initial conversation."""
-    parts = [f"# Task: {task.title}", "", task.description]
-
-    if task.acceptance_criteria:
-        parts.append("")
-        parts.append("## Acceptance Criteria")
-        parts.extend(f"- {c.description}" for c in task.acceptance_criteria)
-
-    if task.budget_limit > 0:
-        parts.append("")
-        parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")
-
-    if task.deadline:
-        parts.append("")
-        parts.append(f"**Deadline:** {task.deadline}")
-
-    return "\n".join(parts)
-
-
 def _make_budget_checker(task: Task) -> BudgetChecker | None:
     """Create a budget checker if the task has a positive budget limit.
 

@@ -636,3 +636,30 @@ def _render_and_estimate(  # noqa: PLR0913
     )
     content = _render_template(template_str, context)
     return content, estimator.estimate_tokens(content)
+
+
+def format_task_instruction(task: Task) -> str:
+    """Format a task into a user message for the initial conversation.
+
+    Args:
+        task: Task to format.
+
+    Returns:
+        Markdown-formatted task instruction string.
+    """
+    parts = [f"# Task: {task.title}", "", task.description]
+
+    if task.acceptance_criteria:
+        parts.append("")
+        parts.append("## Acceptance Criteria")
+        parts.extend(f"- {c.description}" for c in task.acceptance_criteria)
+
+    if task.budget_limit > 0:
+        parts.append("")
+        parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")
+
+    if task.deadline:
+        parts.append("")
+        parts.append(f"**Deadline:** {task.deadline}")
+
+    return "\n".join(parts)
@@ -409,10 +409,10 @@ async def _execute_tool_calls(
 def _get_tool_definitions(
     tool_invoker: ToolInvoker | None,
 ) -> list[ToolDefinition] | None:
-    """Extract tool definitions from the invoker, or return None."""
+    """Extract permitted tool definitions from the invoker, or return None."""
     if tool_invoker is None:
         return None
-    defs = tool_invoker.registry.to_definitions()
+    defs = tool_invoker.get_permitted_definitions()
     return list(defs) if defs else None
 
 

@@ -19,3 +19,6 @@
 TOOL_REGISTRY_CONTAINS_TYPE_ERROR: Final[str] = "tool.registry.contains_type_error"
 TOOL_INVOKE_ALL_START: Final[str] = "tool.invoke_all.start"
 TOOL_INVOKE_ALL_COMPLETE: Final[str] = "tool.invoke_all.complete"
+TOOL_PERMISSION_DENIED: Final[str] = "tool.permission.denied"
+TOOL_PERMISSION_CHECKER_CREATED: Final[str] = "tool.permission.checker_created"
+TOOL_PERMISSION_FILTERED: Final[str] = "tool.permission.filtered"
@@ -1,9 +1,16 @@
-"""Tool system — base abstraction, registry, invoker, and errors."""
+"""Tool system — base abstraction, registry, invoker, permissions, and errors."""
 
 from .base import BaseTool, ToolExecutionResult
-from .errors import ToolError, ToolExecutionError, ToolNotFoundError, ToolParameterError
+from .errors import (
+    ToolError,
+    ToolExecutionError,
+    ToolNotFoundError,
+    ToolParameterError,
+    ToolPermissionDeniedError,
+)
 from .examples.echo import EchoTool
 from .invoker import ToolInvoker
+from .permissions import ToolPermissionChecker
 from .registry import ToolRegistry
 
 __all__ = [
@@ -15,5 +22,7 @@
     "ToolInvoker",
     "ToolNotFoundError",
     "ToolParameterError",
+    "ToolPermissionChecker",
+    "ToolPermissionDeniedError",
     "ToolRegistry",
 ]