Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/dependency-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,11 @@ jobs:
fail-on-severity: high
# LicenseRef-scancode-free-unknown: aiosqlite 0.21.0 — MIT per classifiers, scancode misdetects
# Python-2.0.1: editorconfig 0.17.1 (via jsbeautifier via litestar[standard])
# MIT-0: cffi 2.0.0 — permissive (MIT variant, no attribution required)
# LicenseRef-scancode-free-unknown: aiosqlite 0.21.0, aiodocker 0.26.0,
# pycparser 3.0, sse-starlette 3.3.2 — MIT per classifiers, scancode misdetects
allow-licenses: >-
MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause,
MIT, MIT-0, Apache-2.0, BSD-2-Clause, BSD-3-Clause,
ISC, MPL-2.0, PSF-2.0, Unlicense, 0BSD,
CC0-1.0, Python-2.0, Python-2.0.1,
LicenseRef-scancode-free-unknown
Expand Down
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ src/ai_company/
providers/ # LLM provider abstraction (LiteLLM adapter)
security/ # SecOps agent, approval gates, audit
templates/ # Pre-built company templates, personality presets, and builder
tools/ # Tool registry, built-in tools (file_system/, git, sandbox/), MCP integration, role-based access
tools/ # Tool registry, built-in tools (file_system/, git, sandbox/, code_runner), MCP bridge (mcp/), role-based access
```

## Shell Usage
Expand All @@ -83,7 +83,7 @@ src/ai_company/
- **Every module** with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Structured kwargs**: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
- **All error paths** must log at WARNING or ERROR with context before raising
- **All state transitions** must log at INFO
Expand Down
46 changes: 31 additions & 15 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task

> **Implementation snapshot (2026-03-09):**
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
> - **Not started (mostly placeholders):** M7 security + approval system.
> - **In progress:** M7 — Docker sandbox (#50), MCP bridge (#53), code runner implemented. Security + approval system not started.

### 1.5 Configuration Philosophy

Expand Down Expand Up @@ -2101,7 +2101,7 @@ Tool execution requires safety boundaries proportional to the risk of each tool
| Backend | Isolation | Latency | Dependencies | Status |
|---------|-----------|---------|--------------|--------|
| `SubprocessSandbox` | Process-level: env filtering (allowlist + denylist), restricted PATH (configurable via `extra_safe_path_prefixes`), workspace-scoped cwd, timeout + process-group kill, library injection var blocking, explicit transport cleanup on Windows | ~ms | None | **Implemented** |
| `DockerSandbox` | Container-level: ephemeral container, mounted workspace, no network, resource limits (CPU/memory/time) | ~1-2s cold start | Docker | Planned |
| `DockerSandbox` | Container-level: ephemeral container, mounted workspace, no network, resource limits (CPU/memory/time) | ~1-2s cold start | Docker | **Implemented** |
| `K8sSandbox` | Pod-level: per-agent containers, namespace isolation, resource quotas, network policies | ~2-5s | Kubernetes | Future |

#### Default Layered Configuration
Expand Down Expand Up @@ -2130,7 +2130,7 @@ sandboxing:
memory_limit: "512m"
cpu_limit: "1.0"
timeout_seconds: 120
mount_mode: "rw" # rw for workspace dir, nothing else mounted
mount_mode: "ro" # read-only by default; workspace mounted separately
auto_remove: true # ephemeral — container removed after execution
k8s: # future — per-agent pod isolation
namespace: "ai-company-agents"
Expand All @@ -2151,7 +2151,7 @@ sandboxing:

> **Decisions ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D17, D18):**
>
> - **D17 — MCP SDK:** Official `mcp` Python SDK, pinned `>=1.25,<2`. Thin `MCPBridgeTool` adapter layer isolates the rest of the codebase from SDK API changes. Support **stdio** (local/dev) and **Streamable HTTP** (remote/production) transports. Skip deprecated SSE. v2 migration planned — pin range prevents accidental breaking upgrade.
> - **D17 — MCP SDK:** Official `mcp` Python SDK, pinned `==1.26.0`. Thin `MCPBridgeTool` adapter layer isolates the rest of the codebase from SDK API changes. Support **stdio** (local/dev) and **Streamable HTTP** (remote/production) transports. Skip deprecated SSE. v2 migration planned — pin range prevents accidental breaking upgrade.
> - **D18 — MCP Result Mapping:** Adapter in `MCPBridgeTool` keeps `ToolResult` as-is. Mapping: text blocks → concatenate to `content: str`; image/audio → `[image: {mimeType}]` placeholder + base64 in `metadata["attachments"]`; `structuredContent` → `metadata["structured_content"]`; `isError` → `is_error` (1:1). Future: extend `ToolResult` with optional `attachments` when multi-modal LLM tool results are needed.

### 11.1.4 Action Type System
Expand Down Expand Up @@ -2728,7 +2728,8 @@ Circular inheritance is detected via chain tracking and raises `TemplateInherita
| **Web UI** | Vue 3 + Vite | Modern, fast, good ecosystem. Simpler than React for dashboards |
| **Real-time** | WebSocket (Litestar channels plugin) | Built-in pub/sub broadcasting, per-channel history, backpressure management. Real-time agent activity, task updates, chat feed |
| **Containerization** | Docker + Docker Compose | Isolated code execution, reproducible environments |
| **Tool Integration** | MCP (Model Context Protocol) | Industry standard for LLM-to-tool integration |
| **Docker API** | aiodocker | Async-native Docker API client for `DockerSandbox` backend |
| **Tool Integration** | MCP SDK (`mcp`) | Industry standard for LLM-to-tool integration |
| **Agent Comms** | A2A Protocol compatible | Future-proof inter-agent communication |
| **Config Format** | YAML + Pydantic validation | Human-readable config with strict validation |
| **CLI** | TBD (future, if needed) | Thin wrapper around the REST API for terminal use. May not be needed — interactive Scalar docs at `/docs/api` and `curl`/`httpie` may suffice |
Expand Down Expand Up @@ -2960,7 +2961,10 @@ ai-company/
│ │ │ ├── task_routing.py # TASK_ROUTING_* constants
│ │ │ ├── template.py # TEMPLATE_* constants
│ │ │ ├── tool.py # TOOL_* constants
│ │ │ └── workspace.py # WORKSPACE_* constants
│ │ │ ├── workspace.py # WORKSPACE_* constants
│ │ │ ├── code_runner.py # CODE_RUNNER_* constants
│ │ │ ├── docker.py # DOCKER_* constants
│ │ │ └── mcp.py # MCP_* constants
│ │ ├── processors.py # Log processors
│ │ ├── setup.py # Logging setup
│ │ └── sinks.py # Log output backends
Expand Down Expand Up @@ -2996,13 +3000,6 @@ ai-company/
│ │ ├── examples/ # Example tool implementations
│ │ │ ├── __init__.py # Package exports
│ │ │ └── echo.py # Echo tool (for testing)
│ │ ├── sandbox/ # Sandboxing backends
│ │ │ ├── __init__.py # Package exports
│ │ │ ├── config.py # SubprocessSandboxConfig model
│ │ │ ├── errors.py # SandboxError hierarchy
│ │ │ ├── protocol.py # SandboxBackend protocol
│ │ │ ├── result.py # SandboxResult model
│ │ │ └── subprocess_sandbox.py # SubprocessSandbox (default)
│ │ ├── file_system/ # Built-in file system tools
│ │ │ ├── __init__.py # Package exports
│ │ │ ├── _base_fs_tool.py # BaseFileSystemTool ABC
Expand All @@ -3015,9 +3012,28 @@ ai-company/
│ │ ├── _git_base.py # Base class for git tools (workspace, subprocess, sandbox integration)
│ │ ├── _process_cleanup.py # Subprocess transport cleanup utility (Windows ResourceWarning prevention)
│ │ ├── git_tools.py # Git operations — 6 built-in tools (sandbox-aware)
│ │ ├── code_runner.py # Code execution (M7)
│ │ ├── code_runner.py # Code execution tool
│ │ ├── web_tools.py # HTTP, search (M7)
│ │ └── mcp_bridge.py # MCP server integration (M7)
│ │ ├── sandbox/ # Sandbox backends subpackage
│ │ │ ├── __init__.py # Package exports
│ │ │ ├── config.py # Subprocess sandbox configuration
│ │ │ ├── docker_config.py # Docker sandbox configuration
│ │ │ ├── docker_sandbox.py # DockerSandbox backend (aiodocker)
│ │ │ ├── errors.py # Sandbox error hierarchy
│ │ │ ├── protocol.py # SandboxBackend protocol
│ │ │ ├── result.py # SandboxResult model
│ │ │ ├── sandboxing_config.py # Top-level sandboxing config
│ │ │ └── subprocess_sandbox.py # SubprocessSandbox backend
Comment thread
coderabbitai[bot] marked this conversation as resolved.
│ │ └── mcp/ # MCP bridge subpackage
│ │ ├── __init__.py # Package exports
│ │ ├── bridge_tool.py # MCPBridgeTool (BaseTool integration)
│ │ ├── cache.py # MCP result cache (TTL + LRU)
│ │ ├── client.py # MCP client wrapper
│ │ ├── config.py # MCP server/bridge config models
│ │ ├── errors.py # MCP error hierarchy
│ │ ├── factory.py # MCPToolFactory (parallel connect)
│ │ ├── models.py # MCP domain models
│ │ └── result_mapper.py # MCP result → ToolExecutionResult mapping
│ ├── security/ # Security & approval (M7, stubs only)
│ │ ├── approval.py # Approval workflow gates (M7) — domain model is in core/approval.py
│ │ ├── secops_agent.py # Security operations agent (M7)
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Company Config + Core Models** - Strong Pydantic validation, immutable config models, runtime state models
- **Provider Layer** - LiteLLM-based provider abstraction with routing, retry, and rate limiting
- **Budget Tracking** - Cost records, summaries, and coordination analytics models
- **Tool System** - File system tools, git tools, sandbox abstraction, permission gating
- **Tool System** - File system tools, git tools, sandbox abstraction (subprocess + Docker), code runner, MCP bridge, permission gating
Comment thread
coderabbitai[bot] marked this conversation as resolved.
- **Single-Agent Engine (M3)** - ReAct/Plan-Execute loops, fail-and-reassign recovery, graceful shutdown
- **Multi-Agent Core (M4)** - Message bus, delegation with loop prevention, conflict resolution, meeting protocols
- **Task Intelligence (M4)** - Task decomposition, routing, assignment strategies, workspace isolation via git worktrees
Expand All @@ -38,7 +38,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents

## Status

**M7: Security & HR** next (M0–M6 all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
**M7: Security & HR** in progress (M0–M6 all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.

## Tech Stack

Expand All @@ -47,7 +47,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **LiteLLM** for multi-provider LLM abstraction
- **structlog** for structured logging and observability
- **Mem0** for agent memory (initial backend; custom stack future — see [ADR-001](docs/decisions/ADR-001-memory-layer.md))
- **MCP** for tool integration (planned)
- **MCP** for tool integration
- **Vue 3** for web dashboard (planned)
- **SQLite** (aiosqlite) → PostgreSQL for operational data persistence

Expand All @@ -56,6 +56,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Python 3.14+**
- **uv** — package manager ([install](https://docs.astral.sh/uv/getting-started/installation/))
- **Git 2.x+** — required at runtime for built-in git tools (subprocess-based, not a Python binding)
- **Docker** (optional) — required for code execution sandbox and Docker-backed tool isolation. Install [Docker Desktop](https://docs.docker.com/get-docker/) or Docker Engine. File system and git tools work without Docker via subprocess isolation.

## Getting Started

Expand Down
19 changes: 19 additions & 0 deletions docker/sandbox/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM node:22-slim AS node-base

FROM python:3.14-slim

COPY --from=node-base /usr/local/bin/node /usr/local/bin/node
COPY --from=node-base /usr/local/lib/node_modules /usr/local/lib/node_modules
RUN ln -s /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm

RUN apt-get update && apt-get install -y --no-install-recommends git \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

Comment on lines +9 to +11
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dockerfile does not set ReadOnly mode or drop capabilities at the image layer — those are applied at runtime by DockerSandbox._build_container_config. This is correct design. However, the Dockerfile installs git (line 9) and copies the full Node.js installation including npm. Because ReadonlyRootfs: True is applied at runtime, the sandbox container cannot write to the filesystem — but git and npm may try to write to /home, /tmp, or other paths that are blocked by the read-only rootfs. If agents need to run git or npm inside the sandbox, they will fail silently or with confusing errors. Consider documenting this constraint, or mounting a writable /tmp via tmpfs in the container config to allow tools that need temporary write access.

Copilot uses AI. Check for mistakes.
RUN mkdir -p /workspace \
&& useradd --uid 10001 --no-create-home --shell /usr/sbin/nologin sandbox

WORKDIR /workspace

USER sandbox

CMD ["bash"]
10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,13 @@ classifiers = [
"Typing :: Typed",
]
dependencies = [
"aiodocker==0.26.0",
"aiosqlite==0.21.0",
"jinja2==3.1.6",
"jsonschema==4.26.0",
"litellm==1.82.0",
"litestar[standard,structlog,pydantic,brotli,prometheus]==2.21.1",
"mcp==1.26.0",
"pydantic==2.12.5",
"pyyaml==6.0.3",
"structlog==25.5.0",
Expand Down Expand Up @@ -157,10 +159,18 @@ ignore_missing_imports = true
module = "jsonschema.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "aiodocker.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "aiosqlite.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "mcp.*"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "litestar.*"
ignore_missing_imports = true
Expand Down
2 changes: 2 additions & 0 deletions src/ai_company/config/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,6 @@ def default_config_dict() -> dict[str, Any]:
"cost_tiers": {},
"org_memory": {},
"api": {},
"sandboxing": {},
"mcp": {},
}
12 changes: 12 additions & 0 deletions src/ai_company/config/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
from ai_company.observability.config import LogConfig # noqa: TC001
from ai_company.observability.events.config import CONFIG_VALIDATION_FAILED
from ai_company.persistence.config import PersistenceConfig
from ai_company.tools.mcp.config import MCPConfig
from ai_company.tools.sandbox.sandboxing_config import SandboxingConfig

logger = get_logger(__name__)

Expand Down Expand Up @@ -487,6 +489,8 @@ class RootConfig(BaseModel):
cost_tiers: Cost tier definitions.
org_memory: Organizational memory configuration.
api: API server configuration.
sandboxing: Sandboxing backend configuration.
mcp: MCP bridge configuration.
"""

model_config = ConfigDict(frozen=True)
Expand Down Expand Up @@ -574,6 +578,14 @@ class RootConfig(BaseModel):
default_factory=ApiConfig,
description="API server configuration",
)
sandboxing: SandboxingConfig = Field(
default_factory=SandboxingConfig,
description="Sandboxing backend configuration",
)
mcp: MCPConfig = Field(
default_factory=MCPConfig,
description="MCP bridge configuration",
)

@model_validator(mode="after")
def _validate_unique_agent_names(self) -> Self:
Expand Down
8 changes: 8 additions & 0 deletions src/ai_company/observability/events/code_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
"""Code runner tool event constants."""

from typing import Final

CODE_RUNNER_EXECUTE_START: Final[str] = "code_runner.execute.start"
CODE_RUNNER_EXECUTE_SUCCESS: Final[str] = "code_runner.execute.success"
CODE_RUNNER_EXECUTE_FAILED: Final[str] = "code_runner.execute.failed"
CODE_RUNNER_INVALID_LANGUAGE: Final[str] = "code_runner.invalid_language"
16 changes: 16 additions & 0 deletions src/ai_company/observability/events/docker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
"""Docker sandbox event constants."""

from typing import Final

DOCKER_EXECUTE_START: Final[str] = "docker.execute.start"
DOCKER_EXECUTE_SUCCESS: Final[str] = "docker.execute.success"
DOCKER_EXECUTE_FAILED: Final[str] = "docker.execute.failed"
DOCKER_EXECUTE_TIMEOUT: Final[str] = "docker.execute.timeout"
DOCKER_CONTAINER_CREATED: Final[str] = "docker.container.created"
DOCKER_CONTAINER_STOPPED: Final[str] = "docker.container.stopped"
DOCKER_CONTAINER_REMOVED: Final[str] = "docker.container.removed"
DOCKER_CONTAINER_STOP_FAILED: Final[str] = "docker.container.stop_failed"
DOCKER_CONTAINER_REMOVE_FAILED: Final[str] = "docker.container.remove_failed"
DOCKER_CLEANUP: Final[str] = "docker.cleanup"
DOCKER_HEALTH_CHECK: Final[str] = "docker.health_check"
DOCKER_DAEMON_UNAVAILABLE: Final[str] = "docker.daemon.unavailable"
Loading