Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ src/ai_company/
- **Async**: `asyncio_mode = "auto"` — no manual `@pytest.mark.asyncio` needed
- **Timeout**: 30 seconds per test
- **Parallelism**: `pytest-xdist` via `-n auto`
- **Vendor-agnostic fixtures**: use fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`), never real vendor model IDs — tests must not be coupled to external providers
- **Vendor-agnostic everywhere**: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list (listing supported providers), (2) `.claude/` skill/agent files, (3) third-party import paths/module names (e.g. `litellm.types.llms.openai`). Tests must use `test-provider`, `test-small-001`, etc.

## Git

Expand Down
130 changes: 71 additions & 59 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Build a **configurable AI company framework** where AI agents operate within a v
| Principle | Description |
|-----------|-------------|
| **Configuration over Code** | Company structures, roles, and workflows defined via config, not hardcoded |
| **Provider Agnostic** | Any LLM backend: Claude API, OpenRouter, Ollama, custom endpoints |
| **Provider Agnostic** | Any LLM backend: cloud APIs, OpenRouter, Ollama, custom endpoints |
| **Composable** | Mix and match roles, teams, workflows. Build any type of company |
| **Observable** | Every agent action, communication, and decision is logged and visible |
| **Autonomy Spectrum** | From full human oversight to fully autonomous operation |
Expand Down Expand Up @@ -184,11 +184,11 @@ agent:
- redis
- testing
model:
provider: "anthropic" # example provider
model_id: "claude-sonnet-4-6" # example model — actual models TBD per agent/role
provider: "example-provider" # example provider
model_id: "example-medium-001" # example model — actual models TBD per agent/role
temperature: 0.3
max_tokens: 8192
fallback_model: "openrouter/anthropic/claude-haiku" # example fallback
fallback_model: "openrouter/example-medium-001" # example fallback
memory:
type: "persistent" # persistent, project, session, none
retention_days: null # null = forever
Expand Down Expand Up @@ -231,14 +231,14 @@ agent:

| Level | Authority | Typical Model | Cost Tier |
|-------|----------|---------------|-----------|
| Intern/Junior | Execute assigned tasks only | Haiku / small local | $ |
| Mid | Execute + suggest improvements | Sonnet / medium local | $$ |
| Senior | Execute + design + review others | Sonnet / Opus | $$$ |
| Lead | All above + approve + delegate | Opus / Sonnet | $$$ |
| Principal/Staff | All above + architectural decisions | Opus | $$$$ |
| Director | Strategic decisions + budget authority | Opus | $$$$ |
| VP | Department-wide authority | Opus | $$$$ |
| C-Suite (CEO/CTO/CFO) | Company-wide authority + final approvals | Opus | $$$$ |
| Intern/Junior | Execute assigned tasks only | small / local | $ |
| Mid | Execute + suggest improvements | medium / local | $$ |
| Senior | Execute + design + review others | medium / large | $$$ |
| Lead | All above + approve + delegate | large / medium | $$$ |
| Principal/Staff | All above + architectural decisions | large | $$$$ |
| Director | Strategic decisions + budget authority | large | $$$$ |
| VP | Department-wide authority | large | $$$$ |
| C-Suite (CEO/CTO/CFO) | Company-wide authority + final approvals | large | $$$$ |

### 3.3 Role Catalog (Extensible)

Expand Down Expand Up @@ -305,7 +305,7 @@ custom_roles:
skills: ["solidity", "web3", "smart-contracts"]
system_prompt_template: "blockchain_dev.md"
authority_level: "senior"
suggested_model: "opus"
suggested_model: "large"
```

---
Expand Down Expand Up @@ -828,7 +828,7 @@ execution_loop: "react" # react, plan_execute, hybrid, auto

#### Loop 2: Plan-and-Execute

A two-phase approach: the agent first generates a step-by-step plan, then executes each step sequentially. On failure, the agent can replan. Different models can be used for planning vs execution (e.g., Opus for planning, Haiku for execution steps).
A two-phase approach: the agent first generates a step-by-step plan, then executes each step sequentially. On failure, the agent can replan. Different models can be used for planning vs execution (e.g., large for planning, small for execution steps).

```text
┌──────────────────────────────────────────┐
Expand Down Expand Up @@ -1218,11 +1218,11 @@ Agents can move between seniority levels based on performance:
│ Unified Model Interface │
│ completion(messages, tools, config) → resp │
├───────────┬───────────┬───────────┬─────────┤
Anthropic │OpenRouter │ Ollama │ Custom │
Cloud API │OpenRouter │ Ollama │ Custom │
│ Adapter │ Adapter │ Adapter │ Adapter │
├───────────┼───────────┼───────────┼─────────┤
Claude API │ 400+ LLMs│ Local LLMs│ Any API │
Direct │ via OR │ Self-host │ │
Direct │ 400+ LLMs│ Local LLMs│ Any API │
API call │ via OR │ Self-host │ │
└───────────┴───────────┴───────────┴─────────┘
```

Expand All @@ -1232,35 +1232,38 @@ Agents can move between seniority levels based on performance:

```yaml
providers:
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
example-provider:
api_key: "${PROVIDER_API_KEY}"
models: # example entries — real list loaded from provider
- id: "claude-opus-4-6"
alias: "opus"
- id: "example-large-001"
alias: "large"
cost_per_1k_input: 0.015 # illustrative, verify at implementation time
cost_per_1k_output: 0.075
max_context: 200000
- id: "claude-sonnet-4-6"
alias: "sonnet"
estimated_latency_ms: 1500 # optional, used by fastest strategy
- id: "example-medium-001"
alias: "medium"
cost_per_1k_input: 0.003
cost_per_1k_output: 0.015
max_context: 200000
- id: "claude-haiku-4-5"
alias: "haiku"
estimated_latency_ms: 500
- id: "example-small-001"
alias: "small"
cost_per_1k_input: 0.0008
cost_per_1k_output: 0.004
max_context: 200000
Comment thread
coderabbitai[bot] marked this conversation as resolved.
estimated_latency_ms: 200

openrouter:
api_key: "${OPENROUTER_API_KEY}"
base_url: "https://openrouter.ai/api/v1"
models: # example entries
- id: "anthropic/claude-sonnet-4-6"
alias: "or-sonnet"
- id: "google/gemini-2.5-pro"
alias: "or-gemini-pro"
- id: "deepseek/deepseek-r1"
alias: "or-deepseek"
- id: "vendor-a/model-medium"
alias: "or-medium"
- id: "vendor-b/model-pro"
alias: "or-pro"
- id: "vendor-c/model-reasoning"
alias: "or-reasoning"

ollama:
base_url: "http://localhost:11434"
Expand Down Expand Up @@ -1288,25 +1291,33 @@ Use **LiteLLM** as the provider abstraction layer:

```yaml
routing:
strategy: "smart" # smart, cheapest, fastest, manual
strategy: "smart" # smart, cheapest, fastest, role_based, cost_aware, manual
# Strategy behaviors:
# manual — resolve an explicit model override; fails if not set
# role_based — match agent seniority level to routing rules, then catalog default
# cost_aware — match task-type rules, then pick cheapest model within budget
# cheapest — alias for cost_aware
# fastest — match task-type rules, then pick fastest model (by estimated_latency_ms)
# within budget; falls back to cheapest when no latency data is available
# smart — priority cascade: override > task-type > role > seniority > cheapest > fallback chain
rules:
- role_level: "C-Suite"
preferred_model: "opus"
fallback: "sonnet"
preferred_model: "large"
fallback: "medium"
- role_level: "Senior"
preferred_model: "sonnet"
fallback: "haiku"
preferred_model: "medium"
fallback: "small"
- role_level: "Junior"
preferred_model: "haiku"
preferred_model: "small"
fallback: "local-small"
- task_type: "code_review"
preferred_model: "sonnet"
preferred_model: "medium"
- task_type: "documentation"
preferred_model: "haiku"
preferred_model: "small"
- task_type: "architecture"
preferred_model: "opus"
preferred_model: "large"
fallback_chain:
- "anthropic"
- "example-provider"
- "openrouter"
- "ollama"
```
Expand Down Expand Up @@ -1339,8 +1350,8 @@ Every API call is tracked (illustrative schema):
{
"agent_id": "sarah_chen",
"task_id": "task-123",
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"provider": "example-provider",
"model": "example-medium-001",
"input_tokens": 4500,
"output_tokens": 1200,
"cost_usd": 0.0315,
Expand Down Expand Up @@ -1388,10 +1399,10 @@ budget:
enabled: true
threshold: 85 # percent of budget used
boundary: "task_assignment" # task_assignment only — NEVER mid-execution
downgrade_map: # example — aliases reference configured models
opus: "sonnet"
sonnet: "haiku"
haiku: "local-small"
downgrade_map: # ordered pairs — aliases reference configured models
- ["large", "medium"]
- ["medium", "small"]
- ["small", "local-small"]
```

> **Auto-downgrade boundary:** Model downgrades apply only at **task assignment time**, never mid-execution. An agent halfway through an architecture review cannot be switched to a cheaper model — the task completes on its assigned model. The next task assignment respects the downgrade threshold. This prevents quality degradation from mid-thought model switches.
Expand Down Expand Up @@ -1435,7 +1446,7 @@ call_analytics:
- success # true/false
- retry_count # 0 = first attempt succeeded
- retry_reason # rate_limit, timeout, internal_error
- latency_ms # wall-clock time for the call
- latency_ms # wall-clock time for the call (not estimated_latency_ms from config)
- finish_reason # stop, tool_use, max_tokens, error
- cache_hit # prompt caching hit/miss (provider-dependent)
aggregation:
Expand Down Expand Up @@ -1930,24 +1941,24 @@ template:
agents:
- role: "ceo"
name: "{{ ceo_name | auto }}"
model: "opus"
model: "large"
personality_preset: "visionary_leader"

- role: "full_stack_developer"
name: "{{ dev1_name | auto }}"
level: "senior"
model: "sonnet"
model: "medium"
personality_preset: "pragmatic_builder"

- role: "full_stack_developer"
name: "{{ dev2_name | auto }}"
level: "mid"
model: "haiku"
model: "small"
personality_preset: "eager_learner"

- role: "product_manager"
name: "{{ pm_name | auto }}"
model: "sonnet"
model: "medium"
personality_preset: "user_advocate"

workflow: "agile_kanban"
Expand All @@ -1974,16 +1985,16 @@ $ ai-company create
[ ] Operations

? Engineering team size: 5
- 1x Lead (Opus)
- 2x Senior Dev (Sonnet)
- 2x Junior Dev (Haiku)
- 1x Lead (large)
- 2x Senior Dev (medium)
- 2x Junior Dev (small)

? Add QA? yes
- 1x QA Lead (Sonnet)
- 1x QA Engineer (Haiku)
- 1x QA Lead (medium)
- 1x QA Engineer (small)

? Model providers:
[x] Anthropic Claude
[x] Cloud API
[x] Local Ollama
[ ] OpenRouter

Expand Down Expand Up @@ -2142,7 +2153,8 @@ ai-company/
│ │ ├── drivers/ # Provider driver implementations
│ │ │ ├── litellm_driver.py # LiteLLM adapter
│ │ │ └── mappers.py # Request/response mappers
│ │ ├── routing/ # Model routing (5 strategies)
│ │ ├── routing/ # Model routing (6 strategies)
│ │ │ ├── _strategy_helpers.py # Shared routing helper functions
│ │ │ ├── errors.py # Routing errors
│ │ │ ├── models.py # Routing models (candidates, results)
│ │ │ ├── resolver.py # Model resolver
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents

- **Any Company Structure** - From a 2-person startup to a 50+ enterprise, defined via config/templates
- **Deep Agent Identity** - Names, personalities, skills, seniority levels, performance tracking
- **Multi-Provider** - Anthropic Claude, OpenRouter (400+ models), local Ollama, and more via LiteLLM
- **Multi-Provider** - Any LLM via LiteLLM — cloud APIs, OpenRouter (400+ models), local Ollama, and more
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README line introduces vendor/provider names ("OpenRouter", "Ollama"), but CLAUDE.md now states vendor names may only appear in DESIGN_SPEC.md’s provider list. Either update the README to stay vendor-neutral or relax/clarify the CLAUDE.md rule so these references are allowed.

Suggested change
- **Multi-Provider** - Any LLM via LiteLLM — cloud APIs, OpenRouter (400+ models), local Ollama, and more
- **Multi-Provider** - Any LLM via LiteLLM — cloud APIs, local runtimes, and more

Copilot uses AI. Check for mistakes.
- **Smart Cost Management** - Per-agent budget tracking, auto model routing, CFO agent optimization
- **Configurable Autonomy** - From fully autonomous to human-approves-everything, with a Security Ops agent in between
- **Persistent Memory** - Agents remember past decisions, code, relationships (memory layer TBD)
Expand Down
4 changes: 2 additions & 2 deletions src/ai_company/budget/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ class AutoDowngradeConfig(BaseModel):
def _normalize_downgrade_map(cls, data: Any) -> Any:
"""Normalize downgrade_map aliases by stripping leading/trailing whitespace.

Runs before NotBlankStr validation so that ``" gpt-4 "`` becomes
``"gpt-4"`` rather than being kept with surrounding spaces.
Runs before NotBlankStr validation so that ``" large "`` becomes
``"large"`` rather than being kept with surrounding spaces.
Non-string or malformed entries are passed through unchanged so
that Pydantic can surface a proper field-level ``ValidationError``.
"""
Expand Down
9 changes: 8 additions & 1 deletion src/ai_company/config/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,12 @@ class ProviderModelConfig(BaseModel):
"""Configuration for a single LLM model within a provider.

Attributes:
id: Model identifier (e.g. ``"claude-sonnet-4-6"``).
id: Model identifier (e.g. ``"example-medium-001"``).
alias: Short alias for referencing this model in routing rules.
cost_per_1k_input: Cost per 1 000 input tokens in USD.
cost_per_1k_output: Cost per 1 000 output tokens in USD.
max_context: Maximum context window size in tokens.
estimated_latency_ms: Estimated median latency in milliseconds.
"""

model_config = ConfigDict(frozen=True, allow_inf_nan=False)
Expand All @@ -131,6 +132,12 @@ class ProviderModelConfig(BaseModel):
gt=0,
description="Maximum context window size in tokens",
)
estimated_latency_ms: int | None = Field(
default=None,
gt=0,
le=300_000,
description="Estimated median latency in milliseconds",
)


class ProviderConfig(BaseModel):
Expand Down
4 changes: 2 additions & 2 deletions src/ai_company/core/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ class ModelConfig(BaseModel):
"""LLM model configuration for an agent.

Attributes:
provider: LLM provider name (e.g. ``"anthropic"``).
model_id: Model identifier (e.g. ``"claude-sonnet-4-6"``).
provider: LLM provider name (e.g. ``"example-provider"``).
model_id: Model identifier (e.g. ``"example-medium-001"``).
temperature: Sampling temperature (0.0 to 2.0).
max_tokens: Maximum output tokens.
fallback_model: Optional fallback model identifier.
Expand Down
4 changes: 4 additions & 0 deletions src/ai_company/providers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,12 @@
STRATEGY_MAP,
STRATEGY_NAME_CHEAPEST,
STRATEGY_NAME_COST_AWARE,
STRATEGY_NAME_FASTEST,
STRATEGY_NAME_MANUAL,
STRATEGY_NAME_ROLE_BASED,
STRATEGY_NAME_SMART,
CostAwareStrategy,
FastestStrategy,
ManualStrategy,
ModelResolutionError,
ModelResolver,
Expand All @@ -70,6 +72,7 @@
"STRATEGY_MAP",
"STRATEGY_NAME_CHEAPEST",
"STRATEGY_NAME_COST_AWARE",
"STRATEGY_NAME_FASTEST",
"STRATEGY_NAME_MANUAL",
"STRATEGY_NAME_ROLE_BASED",
"STRATEGY_NAME_SMART",
Expand All @@ -85,6 +88,7 @@
"DriverAlreadyRegisteredError",
"DriverFactoryNotFoundError",
"DriverNotRegisteredError",
"FastestStrategy",
"FinishReason",
"InvalidRequestError",
"LiteLLMDriver",
Expand Down
4 changes: 2 additions & 2 deletions src/ai_company/providers/capabilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ class ModelCapabilities(BaseModel):
based on required features (tools, vision, streaming) and cost.

Attributes:
model_id: Provider model identifier (e.g. ``"gpt-4o"``).
provider: Provider name (e.g. ``"openai"``).
model_id: Provider model identifier (e.g. ``"example-large-001"``).
provider: Provider name (e.g. ``"example-provider"``).
max_context_tokens: Maximum context window size in tokens.
max_output_tokens: Maximum output tokens per request.
supports_tools: Whether the model supports tool/function calling.
Expand Down
2 changes: 1 addition & 1 deletion src/ai_company/providers/drivers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Driver implementations for LLM provider backends.

Each driver subclasses ``BaseCompletionProvider`` and wraps a specific
backend SDK (e.g. LiteLLM, native Anthropic SDK).
backend SDK (e.g. LiteLLM).
"""

from .litellm_driver import LiteLLMDriver
Expand Down
Loading