Aureliolo · Aureliolo · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026
@@ -24,8 +24,11 @@ jobs:
           fail-on-severity: high
           # LicenseRef-scancode-free-unknown: aiosqlite 0.21.0 — MIT per classifiers, scancode misdetects
           # Python-2.0.1: editorconfig 0.17.1 (via jsbeautifier via litestar[standard])
+          # MIT-0: cffi 2.0.0 — permissive (MIT variant, no attribution required)
+          # LicenseRef-scancode-free-unknown: aiosqlite 0.21.0, aiodocker 0.26.0,
+          #   pycparser 3.0, sse-starlette 3.3.2 — MIT per classifiers, scancode misdetects
           allow-licenses: >-
-            MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause,
+            MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause,
             ISC, MPL-2.0, PSF-2.0, Unlicense, 0BSD,
             CC0-1.0, Python-2.0, Python-2.0.1,
             LicenseRef-scancode-free-unknown

@@ -56,7 +56,7 @@ src/ai_company/
   providers/      # LLM provider abstraction (LiteLLM adapter)
   security/       # SecOps agent, approval gates, audit
   templates/      # Pre-built company templates, personality presets, and builder
-  tools/          # Tool registry, built-in tools (file_system/, git, sandbox/), MCP integration, role-based access
+  tools/          # Tool registry, built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge (mcp/), role-based access
 ```
 
 ## Shell Usage
@@ -83,7 +83,7 @@ src/ai_company/
 - **Every module** with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
 - **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
 - **Variable name**: always `logger` (not `_logger`, not `log`)
-- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
+- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
 - **Structured kwargs**: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
 - **All error paths** must log at WARNING or ERROR with context before raising
 - **All state transitions** must log at INFO

@@ -81,7 +81,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task
 
 > **Implementation snapshot (2026-03-09):**
 > - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
-> - **Not started (mostly placeholders):** M7 security + approval system.
+> - **In progress:** M7 — Docker sandbox (#50), MCP bridge (#53), code runner implemented. Security + approval system not started.
 
 ### 1.5 Configuration Philosophy
 
@@ -2101,7 +2101,7 @@ Tool execution requires safety boundaries proportional to the risk of each tool
 | Backend | Isolation | Latency | Dependencies | Status |
 |---------|-----------|---------|--------------|--------|
 | `SubprocessSandbox` | Process-level: env filtering (allowlist + denylist), restricted PATH (configurable via `extra_safe_path_prefixes`), workspace-scoped cwd, timeout + process-group kill, library injection var blocking, explicit transport cleanup on Windows | ~ms | None | **Implemented** |
-| `DockerSandbox` | Container-level: ephemeral container, mounted workspace, no network, resource limits (CPU/memory/time) | ~1-2s cold start | Docker | Planned |
+| `DockerSandbox` | Container-level: ephemeral container, mounted workspace, no network, resource limits (CPU/memory/time) | ~1-2s cold start | Docker | **Implemented** |
 | `K8sSandbox` | Pod-level: per-agent containers, namespace isolation, resource quotas, network policies | ~2-5s | Kubernetes | Future |
 
 #### Default Layered Configuration
@@ -2130,7 +2130,7 @@ sandboxing:
     memory_limit: "512m"
     cpu_limit: "1.0"
     timeout_seconds: 120
-    mount_mode: "rw"                   # rw for workspace dir, nothing else mounted
+    mount_mode: "ro"                   # read-only by default; workspace mounted separately
     auto_remove: true                  # ephemeral — container removed after execution
   k8s:                                 # future — per-agent pod isolation
     namespace: "ai-company-agents"
@@ -2151,7 +2151,7 @@ sandboxing:
 
 > **Decisions ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D17, D18):**
 >
-> - **D17 — MCP SDK:** Official `mcp` Python SDK, pinned `>=1.25,<2`. Thin `MCPBridgeTool` adapter layer isolates the rest of the codebase from SDK API changes. Support **stdio** (local/dev) and **Streamable HTTP** (remote/production) transports. Skip deprecated SSE. v2 migration planned — pin range prevents accidental breaking upgrade.
+> - **D17 — MCP SDK:** Official `mcp` Python SDK, pinned `==1.26.0`. Thin `MCPBridgeTool` adapter layer isolates the rest of the codebase from SDK API changes. Support **stdio** (local/dev) and **Streamable HTTP** (remote/production) transports. Skip deprecated SSE. v2 migration planned — pin range prevents accidental breaking upgrade.
 > - **D18 — MCP Result Mapping:** Adapter in `MCPBridgeTool` keeps `ToolResult` as-is. Mapping: text blocks → concatenate to `content: str`; image/audio → `[image: {mimeType}]` placeholder + base64 in `metadata["attachments"]`; `structuredContent` → `metadata["structured_content"]`; `isError` → `is_error` (1:1). Future: extend `ToolResult` with optional `attachments` when multi-modal LLM tool results are needed.
 
 ### 11.1.4 Action Type System
@@ -2728,7 +2728,8 @@ Circular inheritance is detected via chain tracking and raises `TemplateInherita
 | **Web UI** | Vue 3 + Vite | Modern, fast, good ecosystem. Simpler than React for dashboards |
 | **Real-time** | WebSocket (Litestar channels plugin) | Built-in pub/sub broadcasting, per-channel history, backpressure management. Real-time agent activity, task updates, chat feed |
 | **Containerization** | Docker + Docker Compose | Isolated code execution, reproducible environments |
-| **Tool Integration** | MCP (Model Context Protocol) | Industry standard for LLM-to-tool integration |
+| **Docker API** | aiodocker | Async-native Docker API client for `DockerSandbox` backend |
+| **Tool Integration** | MCP SDK (`mcp`) | Industry standard for LLM-to-tool integration |
 | **Agent Comms** | A2A Protocol compatible | Future-proof inter-agent communication |
 | **Config Format** | YAML + Pydantic validation | Human-readable config with strict validation |
 | **CLI** | TBD (future, if needed) | Thin wrapper around the REST API for terminal use. May not be needed — interactive Scalar docs at `/docs/api` and `curl`/`httpie` may suffice |
@@ -2960,7 +2961,10 @@ ai-company/
 │       │   │   ├── task_routing.py # TASK_ROUTING_* constants
 │       │   │   ├── template.py    # TEMPLATE_* constants
 │       │   │   ├── tool.py        # TOOL_* constants
-│       │   │   └── workspace.py   # WORKSPACE_* constants
+│       │   │   ├── workspace.py   # WORKSPACE_* constants
+│       │   │   ├── code_runner.py # CODE_RUNNER_* constants
+│       │   │   ├── docker.py      # DOCKER_* constants
+│       │   │   └── mcp.py         # MCP_* constants
 │       │   ├── processors.py       # Log processors
 │       │   ├── setup.py            # Logging setup
 │       │   └── sinks.py            # Log output backends
@@ -2996,13 +3000,6 @@ ai-company/
 │       │   ├── examples/           # Example tool implementations
 │       │   │   ├── __init__.py    # Package exports
 │       │   │   └── echo.py        # Echo tool (for testing)
-│       │   ├── sandbox/            # Sandboxing backends
-│       │   │   ├── __init__.py    # Package exports
-│       │   │   ├── config.py      # SubprocessSandboxConfig model
-│       │   │   ├── errors.py      # SandboxError hierarchy
-│       │   │   ├── protocol.py    # SandboxBackend protocol
-│       │   │   ├── result.py      # SandboxResult model
-│       │   │   └── subprocess_sandbox.py  # SubprocessSandbox (default)
 │       │   ├── file_system/        # Built-in file system tools
 │       │   │   ├── __init__.py    # Package exports
 │       │   │   ├── _base_fs_tool.py  # BaseFileSystemTool ABC
@@ -3015,9 +3012,28 @@ ai-company/
 │       │   ├── _git_base.py        # Base class for git tools (workspace, subprocess, sandbox integration)
 │       │   ├── _process_cleanup.py  # Subprocess transport cleanup utility (Windows ResourceWarning prevention)
 │       │   ├── git_tools.py        # Git operations — 6 built-in tools (sandbox-aware)
-│       │   ├── code_runner.py      # Code execution (M7)
+│       │   ├── code_runner.py      # Code execution tool
 │       │   ├── web_tools.py        # HTTP, search (M7)
-│       │   └── mcp_bridge.py       # MCP server integration (M7)
+│       │   ├── sandbox/             # Sandbox backends subpackage
+│       │   │   ├── __init__.py    # Package exports
+│       │   │   ├── config.py      # Subprocess sandbox configuration
+│       │   │   ├── docker_config.py # Docker sandbox configuration
+│       │   │   ├── docker_sandbox.py # DockerSandbox backend (aiodocker)
+│       │   │   ├── errors.py      # Sandbox error hierarchy
+│       │   │   ├── protocol.py    # SandboxBackend protocol
+│       │   │   ├── result.py      # SandboxResult model
+│       │   │   ├── sandboxing_config.py # Top-level sandboxing config
+│       │   │   └── subprocess_sandbox.py # SubprocessSandbox backend
+│       │   └── mcp/                # MCP bridge subpackage
+│       │       ├── __init__.py    # Package exports
+│       │       ├── bridge_tool.py # MCPBridgeTool (BaseTool integration)
+│       │       ├── cache.py       # MCP result cache (TTL + LRU)
+│       │       ├── client.py      # MCP client wrapper
+│       │       ├── config.py      # MCP server/bridge config models
+│       │       ├── errors.py      # MCP error hierarchy
+│       │       ├── factory.py     # MCPToolFactory (parallel connect)
+│       │       ├── models.py      # MCP domain models
+│       │       └── result_mapper.py # MCP result → ToolExecutionResult mapping
 │       ├── security/                # Security & approval (M7, stubs only)
 │       │   ├── approval.py         # Approval workflow gates (M7) — domain model is in core/approval.py
 │       │   ├── secops_agent.py     # Security operations agent (M7)

@@ -15,7 +15,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
 - **Company Config + Core Models** - Strong Pydantic validation, immutable config models, runtime state models
 - **Provider Layer** - LiteLLM-based provider abstraction with routing, retry, and rate limiting
 - **Budget Tracking** - Cost records, summaries, and coordination analytics models
-- **Tool System** - File system tools, git tools, sandbox abstraction, permission gating
+- **Tool System** - File system tools, git tools, sandbox abstraction (subprocess + Docker), code runner, MCP bridge, permission gating
 - **Single-Agent Engine (M3)** - ReAct/Plan-Execute loops, fail-and-reassign recovery, graceful shutdown
 - **Multi-Agent Core (M4)** - Message bus, delegation with loop prevention, conflict resolution, meeting protocols
 - **Task Intelligence (M4)** - Task decomposition, routing, assignment strategies, workspace isolation via git worktrees
@@ -38,7 +38,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
 
 ## Status
 
-**M7: Security & HR** next (M0–M6 all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
+**M7: Security & HR** in progress (M0–M6 all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
 
 ## Tech Stack
 
@@ -47,7 +47,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
 - **LiteLLM** for multi-provider LLM abstraction
 - **structlog** for structured logging and observability
 - **Mem0** for agent memory (initial backend; custom stack future — see [ADR-001](docs/decisions/ADR-001-memory-layer.md))
-- **MCP** for tool integration (planned)
+- **MCP** for tool integration
 - **Vue 3** for web dashboard (planned)
 - **SQLite** (aiosqlite) → PostgreSQL for operational data persistence
 
@@ -56,6 +56,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
 - **Python 3.14+**
 - **uv** — package manager ([install](https://docs.astral.sh/uv/getting-started/installation/))
 - **Git 2.x+** — required at runtime for built-in git tools (subprocess-based, not a Python binding)
+- **Docker** (optional) — required for code execution sandbox and Docker-backed tool isolation. Install [Docker Desktop](https://docs.docker.com/get-docker/) or Docker Engine. File system and git tools work without Docker via subprocess isolation.
 
 ## Getting Started
 

@@ -0,0 +1,19 @@
+FROM node:22-slim AS node-base
+
+FROM python:3.14-slim
+
+COPY --from=node-base /usr/local/bin/node /usr/local/bin/node
+COPY --from=node-base /usr/local/lib/node_modules /usr/local/lib/node_modules
+RUN ln -s /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm
+
+RUN apt-get update && apt-get install -y --no-install-recommends git \
+    && apt-get clean && rm -rf /var/lib/apt/lists/*
+
+RUN mkdir -p /workspace \
+    && useradd --uid 10001 --no-create-home --shell /usr/sbin/nologin sandbox
+
+WORKDIR /workspace
+
+USER sandbox
+
+CMD ["bash"]
@@ -13,11 +13,13 @@ classifiers = [
     "Typing :: Typed",
 ]
 dependencies = [
+    "aiodocker==0.26.0",
     "aiosqlite==0.21.0",
     "jinja2==3.1.6",
     "jsonschema==4.26.0",
     "litellm==1.82.0",
     "litestar[standard,structlog,pydantic,brotli,prometheus]==2.21.1",
+    "mcp==1.26.0",
     "pydantic==2.12.5",
     "pyyaml==6.0.3",
     "structlog==25.5.0",
@@ -157,10 +159,18 @@ ignore_missing_imports = true
 module = "jsonschema.*"
 ignore_missing_imports = true
 
+[[tool.mypy.overrides]]
+module = "aiodocker.*"
+ignore_missing_imports = true
+
 [[tool.mypy.overrides]]
 module = "aiosqlite.*"
 ignore_missing_imports = true
 
+[[tool.mypy.overrides]]
+module = "mcp.*"
+ignore_missing_imports = true
+
 [[tool.mypy.overrides]]
 module = "litestar.*"
 ignore_missing_imports = true

@@ -35,4 +35,6 @@ def default_config_dict() -> dict[str, Any]:
         "cost_tiers": {},
         "org_memory": {},
         "api": {},
+        "sandboxing": {},
+        "mcp": {},
     }
@@ -26,6 +26,8 @@
 from ai_company.observability.config import LogConfig  # noqa: TC001
 from ai_company.observability.events.config import CONFIG_VALIDATION_FAILED
 from ai_company.persistence.config import PersistenceConfig
+from ai_company.tools.mcp.config import MCPConfig
+from ai_company.tools.sandbox.sandboxing_config import SandboxingConfig
 
 logger = get_logger(__name__)
 
@@ -487,6 +489,8 @@ class RootConfig(BaseModel):
         cost_tiers: Cost tier definitions.
         org_memory: Organizational memory configuration.
         api: API server configuration.
+        sandboxing: Sandboxing backend configuration.
+        mcp: MCP bridge configuration.
     """
 
     model_config = ConfigDict(frozen=True)
@@ -574,6 +578,14 @@ class RootConfig(BaseModel):
         default_factory=ApiConfig,
         description="API server configuration",
     )
+    sandboxing: SandboxingConfig = Field(
+        default_factory=SandboxingConfig,
+        description="Sandboxing backend configuration",
+    )
+    mcp: MCPConfig = Field(
+        default_factory=MCPConfig,
+        description="MCP bridge configuration",
+    )
 
     @model_validator(mode="after")
     def _validate_unique_agent_names(self) -> Self:

@@ -0,0 +1,8 @@
+"""Code runner tool event constants."""
+
+from typing import Final
+
+CODE_RUNNER_EXECUTE_START: Final[str] = "code_runner.execute.start"
+CODE_RUNNER_EXECUTE_SUCCESS: Final[str] = "code_runner.execute.success"
+CODE_RUNNER_EXECUTE_FAILED: Final[str] = "code_runner.execute.failed"
+CODE_RUNNER_INVALID_LANGUAGE: Final[str] = "code_runner.invalid_language"
@@ -0,0 +1,16 @@
+"""Docker sandbox event constants."""
+
+from typing import Final
+
+DOCKER_EXECUTE_START: Final[str] = "docker.execute.start"
+DOCKER_EXECUTE_SUCCESS: Final[str] = "docker.execute.success"
+DOCKER_EXECUTE_FAILED: Final[str] = "docker.execute.failed"
+DOCKER_EXECUTE_TIMEOUT: Final[str] = "docker.execute.timeout"
+DOCKER_CONTAINER_CREATED: Final[str] = "docker.container.created"
+DOCKER_CONTAINER_STOPPED: Final[str] = "docker.container.stopped"
+DOCKER_CONTAINER_REMOVED: Final[str] = "docker.container.removed"
+DOCKER_CONTAINER_STOP_FAILED: Final[str] = "docker.container.stop_failed"
+DOCKER_CONTAINER_REMOVE_FAILED: Final[str] = "docker.container.remove_failed"
+DOCKER_CLEANUP: Final[str] = "docker.cleanup"
+DOCKER_HEALTH_CHECK: Final[str] = "docker.health_check"
+DOCKER_DAEMON_UNAVAILABLE: Final[str] = "docker.daemon.unavailable"