Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ src/ai_company/
config/ # YAML company config loading and validation
core/ # Shared domain models and base classes
engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, and workspace isolation
memory/ # Persistent agent memory (memory layer TBD)
memory/ # Persistent agent memory (Mem0 initial, custom stack future — ADR-001)
observability/ # Structured logging, correlation tracking, log sinks
providers/ # LLM provider abstraction (LiteLLM adapter)
security/ # SecOps agent, approval gates, audit
Expand Down
29 changes: 18 additions & 11 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -1229,7 +1229,8 @@ The auto-selector uses task structure, artifact count, and (when available from
├──────────┴──────────┴───────────┴───────────┤
│ Storage Backend │
│ SQLite / PostgreSQL / File-based │
│ + Memory Layer (TBD — see §15.2) │
│ + Mem0 (initial) / Custom Stack (future) │
│ See ADR-001 │
└─────────────────────────────────────────────┘
```

Expand All @@ -1248,7 +1249,11 @@ The auto-selector uses task structure, artifact count, and (when available from
```yaml
memory:
level: "full" # none, session, project, full
backend: "sqlite" # sqlite, postgresql, file (memory layer library is on top, not a backend itself — see §15.2)
backend: "mem0" # mem0, custom, cognee, graphiti (future) — see ADR-001
storage:
data_dir: "/data/memory" # mounted Docker volume path
vector_store: "qdrant" # qdrant (embedded), qdrant-external, etc.
history_store: "sqlite" # sqlite, postgresql
options:
retention_days: null # null = forever
max_memories_per_agent: 10000
Expand Down Expand Up @@ -1319,7 +1324,7 @@ org_memory:
- Handles policy evolution naturally. Agents understand when and why things changed
- Most complex. Potentially overkill for small companies or local-first use

> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The memory layer candidate (currently evaluating Mem0 and alternatives — see §15.2) may provide graph memory capabilities natively, reducing implementation effort for Backends 2-3.
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backends 2-3.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence suggests Mem0's optional graph support could reduce implementation effort for both OrgMemory backends 2 and 3, but ADR-001 later notes Backend 3 (temporal KG) is not supported by Mem0 beyond basic timestamps. Consider narrowing this to Backend 2 (GraphRAG) and clarifying that Backend 3 still requires a custom temporal model (or Graphiti) to avoid contradicting ADR-001.

Suggested change
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backends 2-3.
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backend 2 (GraphRAG). Backend 3 (`temporal_kg`) still requires a custom temporal knowledge-graph model or an external system (e.g., Graphiti) layered on top of Mem0’s basic timestamp support.

Copilot uses AI. Check for mistakes.
> **Write access control:** Core policies are human-only. ADRs and procedures can be written by senior+ agents. All writes are versioned and auditable. This prevents agents from corrupting shared organizational knowledge while allowing senior agents to document decisions.

---
Expand Down Expand Up @@ -2319,14 +2324,14 @@ Run: ai-company start acme-corp
└──────────────────────────────────────────────────────────────┘
```

### 15.2 Technology Stack (Candidates - TBD After Research)
### 15.2 Technology Stack

| Component | Technology | Rationale |
|-----------|-----------|-----------|
| **Language** | Python 3.14+ | Best AI/ML ecosystem, all major frameworks use it, LiteLLM/MCP and memory layer candidates all Python-native. PEP 649 native lazy annotations, PEP 758 except syntax. |
| **API Framework** | FastAPI | Async-native, WebSocket support, auto OpenAPI docs, high performance, type-safe with Pydantic |
| **LLM Abstraction** | LiteLLM | 100+ providers, unified API, built-in cost tracking, retries/fallbacks |
| **Agent Memory** | TBD (candidates: Mem0, Zep, Letta, Cognee, custom) + SQLite | Memory layer library TBD after evaluation. SQLite for structured data. Upgrade to Postgres later |
| **Agent Memory** | Mem0 (Qdrant + SQLite) → custom (Neo4j + Qdrant) | Mem0 in-process as initial backend behind pluggable `MemoryBackend` protocol ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Qdrant embedded + SQLite for persistence. Custom stack (Neo4j + Qdrant external) as future upgrade. Config-driven backend selection |
| **Message Bus** | Internal (async queues) → Redis | Start with Python asyncio queues, upgrade to Redis for multi-process/distributed |
| **Task Queue** | Internal → Celery/Redis | Start simple, scale with Celery when needed |
| **Database** | SQLite → PostgreSQL | Start lightweight, migrate to Postgres for production/multi-user |
Expand Down Expand Up @@ -2619,6 +2624,8 @@ ai-company/
│ ├── integration/
│ └── e2e/
├── docs/
│ ├── decisions/
│ │ └── ADR-001-memory-layer.md
│ └── getting_started.md
├── DESIGN_SPEC.md # This document
├── README.md
Expand All @@ -2633,7 +2640,7 @@ ai-company/
| Language | Python 3.14+ | TypeScript, Go, Rust | AI ecosystem, LiteLLM/MCP and memory layer candidates are Python-native, PEP 649 lazy annotations, PEP 758 except syntax |
| API | FastAPI | Flask, Django, aiohttp | Async native, Pydantic integration, auto docs, WebSocket support |
| LLM Layer | LiteLLM | Direct APIs, OpenRouter only | 100+ providers, cost tracking, fallbacks, load balancing built-in |
| Memory | TBD + SQLite | Mem0, Zep, Letta, Cognee, ChromaDB, custom | Memory layer library TBD — all candidates under evaluation. Must support episodic, semantic, procedural memory types (§7.1–7.3). Org memory served via `OrgMemoryBackend` protocol (§7.4) |
| Memory | Mem0 (initial) → custom stack (future) + SQLite | Graphiti, Letta, Cognee, custom | Mem0 in-process as initial backend behind pluggable `MemoryBackend` protocol ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Custom stack (Neo4j + Qdrant) as future upgrade. Must support episodic, semantic, procedural memory types (§7.1–7.3). Org memory served via `OrgMemoryBackend` protocol (§7.4) |
| Message Bus | asyncio queues → Redis | Kafka, RabbitMQ, NATS | Start simple, Redis well-supported, Kafka overkill for local |
| Config | YAML + Pydantic | JSON, TOML, Python dicts | Human-friendly, strict validation, good IDE support |
| CLI | Typer | Click, argparse, Fire | Built on Click, auto-completion, type hints |
Expand Down Expand Up @@ -2692,7 +2699,7 @@ These conventions were established during the M0–M2+ review cycle. **Adopted**
| Full company simulation | Partial | Partial | No | **Yes - complete** |
| HR (hiring/firing) | No | No | No | **Yes** |
| Budget management (CFO) | No | No | No | **Yes** |
| Persistent agent memory | No | No | Basic | **Yes (memory layer TBD — candidates under evaluation)** |
| Persistent agent memory | No | No | Basic | **Yes (Mem0 initial, custom stack future — ADR-001)** |
| Agent personalities | Basic | Basic | Basic | **Deep - traits, styles, evolution** |
| Dynamic team scaling | No | No | Manual | **Yes - auto + manual** |
| Multiple company types | No | No | Manual | **Yes - templates + builder** |
Expand Down Expand Up @@ -2726,12 +2733,12 @@ Rationale:
- No existing framework covers even 50% of our requirements
- Our core differentiators (HR, budget, security ops, deep personalities, progressive trust) don't exist in any framework
- Forking MetaGPT or CrewAI would mean fighting their architecture while adding our features
- **LiteLLM**, **FastAPI**, **MCP**, and a memory layer library (TBD) give us battle-tested components for the hard parts
- **LiteLLM**, **FastAPI**, **MCP**, and **Mem0** (memory layer — ADR-001) give us battle-tested components for the hard parts
- The "company simulation" layer on top is our unique value and must be purpose-built

What we **plan to leverage** (not fork) — subject to evaluation:
- **LiteLLM** (candidate) - Provider abstraction
- **Memory layer** (candidates: Mem0, Zep, Letta, Cognee, custom) - Agent memory
- **Mem0** (selected, ADR-001) - Agent memory (initial backend; custom stack future)
- **FastAPI** (candidate) - API layer
- **MCP** - Tool integration standard (strong candidate, emerging industry standard)
- **Pydantic** (candidate) - Config validation and data models
Expand Down Expand Up @@ -2759,7 +2766,7 @@ What we **plan to leverage** (not fork) — subject to evaluation:
| 11 | What is the agent execution loop architecture? | High | **Resolved** | Multiple configurable loops — see §6.5 Agent Execution Loop |
| 12 | How should shared organizational memory work? | High | **Resolved** | Modular backends behind protocol — see §7.4 Shared Organizational Memory |
| 13 | What happens when humans don't respond to approvals? | High | **Resolved** | Configurable timeout policies with task suspension — see §12.4 Approval Timeout |
| 14 | Which memory layer library to use? | Medium | Open | Mem0, Zep, Letta, Cognee, custom — all candidates, TBD after evaluation (see §15.2) |
| 14 | Which memory layer library to use? | Medium | **Resolved** | Mem0 (initial) → custom stack (future) behind pluggable `MemoryBackend` protocol — see [ADR-001](docs/decisions/ADR-001-memory-layer.md) |
| 15 | How to handle agent crashes mid-task? | High | **Resolved** | Pluggable `RecoveryStrategy` protocol — see §6.6 Agent Crash Recovery |
| 16 | How to test non-deterministic agent behavior? | High | **Resolved** | Scripted providers for unit tests + behavioral assertions for integration — see §15.5 Engineering Conventions |
| 17 | How to detect orchestration overhead? | Medium | **Resolved** | Incremental LLM call analytics with proxy metrics (M3) → full categorization (M4) — see §10.5 |
Expand All @@ -2772,7 +2779,7 @@ What we **plan to leverage** (not fork) — subject to evaluation:
| Cost explosion from agent loops | High | Budget hard stops, loop detection, max iterations per task |
| Agent quality degradation with cheap models | Medium | Quality gates, minimum model requirements per task type |
| Third-party library breaking changes | Medium | Pin versions, integration tests, abstraction layers |
| Memory retrieval quality | Medium | Evaluate candidates (Mem0, custom, etc.) against our use case |
| Memory retrieval quality | Medium | Mem0 selected as initial backend (ADR-001). Protocol layer enables backend swap if retrieval quality insufficient. Pin version, test 3.14 compat in CI |
| Agent personality inconsistency | Low | Strong system prompts, few-shot examples, personality tests |
| WebSocket scaling | Low | Start local, add Redis pub/sub when needed |

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,23 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Task Assignment** - Pluggable strategies (manual, role-based, load-balanced, cost-optimized, hierarchical, auction) for matching tasks to capable agents
- **Workspace Isolation** - Git worktree-based concurrent workspace isolation with sequential merge and conflict escalation
- **Configurable Autonomy** - From fully autonomous to human-approves-everything, with a Security Ops agent in between
- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)
- **Persistent Memory** - Agents remember past decisions, code, relationships (Mem0 initial, custom stack future)
- **HR System** - Hire, fire, promote agents. HR agent analyzes skill gaps and proposes candidates
- **Real Tool Access** - File system, git, code execution, web, databases - role-based and sandboxed
- **API-First** - REST + WebSocket API with local web dashboard
- **Templates + Builder** - Pre-built company templates and interactive builder

## Status

**M3: Single Agent** and **M4: Multi-Agent** in progress (M0 Tooling, M1 Config & Core, M2 Providers — all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
**M5: Memory & Budget** in progress (M0 Tooling, M1 Config & Core, M2 Providers, M3 Single Agent, M4 Multi-Agent — all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.

## Tech Stack

- **Python 3.14+** with FastAPI, Pydantic, Typer
- **uv** as package manager, **Hatchling** as build backend
- **LiteLLM** for multi-provider LLM abstraction
- **structlog** for structured logging and observability
- **Memory layer TBD** (candidates: Mem0, Zep, Letta, Cognee, custom) for agent memory (planned)
- **Mem0** for agent memory (initial backend; custom stack future — see [ADR-001](docs/decisions/ADR-001-memory-layer.md))
- **MCP** for tool integration (planned)
- **Vue 3** for web dashboard (planned)
- **SQLite** → PostgreSQL for data persistence (planned)
Expand Down
Loading