Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ src/ai_company/
communication/ # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution, meeting protocol
config/ # YAML company config loading and validation
core/ # Shared domain models and base classes
engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, workspace isolation, and coordination error classification
engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, workspace isolation, coordination error classification, and prompt policy validation
hr/ # HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection)
memory/ # Persistent agent memory (Mem0 initial, custom stack future — ADR-001), retrieval pipeline (ranking, injection, context formatting), shared org memory (org/), consolidation/archival (consolidation/)
memory/ # Persistent agent memory (Mem0 initial, custom stack future — ADR-001), retrieval pipeline (ranking, injection, context formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (consolidation/)
persistence/ # Operational data persistence — pluggable PersistenceBackend protocol, SQLite initial (§7.6)
observability/ # Structured logging, correlation tracking, log sinks
providers/ # LLM provider abstraction (LiteLLM adapter)
Expand Down Expand Up @@ -84,7 +84,7 @@ src/ai_company/
- **Every module** with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`, `PROMPT_BUILD_START` from `events.prompt`, `MEMORY_RETRIEVAL_START` from `events.memory`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
- **Structured kwargs**: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
- **All error paths** must log at WARNING or ERROR with context before raising
- **All state transitions** must log at INFO
Expand Down
19 changes: 13 additions & 6 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ The MVP validates the core hypothesis: **a single agent can complete a real task
> **How to read this spec:** Sections describe the full vision. Each section with deferred features includes an **MVP** callout box indicating what ships in M3 and what is deferred. The full design is documented upfront to inform architecture decisions — protocol interfaces are designed even for features that won't be built until later milestones.

> **Implementation snapshot (2026-03-10):**
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection, non-inferable filtering) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
> - **Remaining:** M7 security + approval system (SecOps agent, progressive trust, JWT/OAuth auth).

### 1.5 Configuration Philosophy
Expand Down Expand Up @@ -1605,11 +1605,12 @@ receives memories.

> **Non-inferable filter:** Retrieved memories should be filtered before injection to exclude content the agent can discover by reading the codebase or environment. Only inject memories containing non-inferable information: prior decisions, learned conventions, interpersonal context, historical outcomes. [Research](https://arxiv.org/abs/2602.11988) shows generic context increases cost 20%+ with minimal success improvement; LLM-generated context can actually reduce success rates.
>
> **Decision ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D23):** Pluggable `MemoryFilterStrategy` protocol. Initial: tag-based at write time. Define `non-inferable` tag convention enforced at `MemoryBackend.store()` boundary. System prompt instructs agents what qualifies: design rationale, team decisions, "why not X", cross-repo knowledge = non-inferable; code structure, API signatures, file contents = inferable. Uses existing `MemoryMetadata.tags` and `MemoryQuery.tags` — zero new models needed. Future strategies: LLM classification at retrieval, keyword/pattern heuristic.
> **Decision ([ADR-002](docs/decisions/ADR-002-design-decisions-batch-1.md) D23):** Pluggable `MemoryFilterStrategy` protocol. Initial: tag-based at write time. Define `non-inferable` tag convention with advisory validation at `MemoryBackend.store()` boundary (warns on missing tags, never blocks). System prompt instructs agents what qualifies: design rationale, team decisions, "why not X", cross-repo knowledge = non-inferable; code structure, API signatures, file contents = inferable. Uses existing `MemoryMetadata.tags` and `MemoryQuery.tags` — zero new models needed. Future strategies: LLM classification at retrieval, keyword/pattern heuristic.

Pipeline: `MemoryBackend.retrieve()` -> rank by relevance+recency ->
filter by min_relevance -> greedy token-budget packing -> format as
ChatMessage (configured role: SYSTEM or USER) with delimiters.
filter by min_relevance -> apply `MemoryFilterStrategy` (D23, optional) ->
greedy token-budget packing -> format as ChatMessage (configured role:
SYSTEM or USER) with delimiters.

Ranking algorithm:
1. `relevance = entry.relevance_score ?? config.default_relevance`
Expand Down Expand Up @@ -1961,6 +1962,8 @@ Every completion call produces a `CompletionResponse` with `TokenUsage` (token c
- `tokens_per_task` — total tokens consumed (from `AgentContext.accumulated_cost.total_tokens`)
- `cost_per_task` — total USD cost (from `AgentContext.accumulated_cost.cost_usd` via `AgentRunResult.total_cost_usd`)
- `duration_seconds` — wall-clock execution time in seconds (from `AgentRunResult.duration_seconds`)
- `prompt_tokens` — estimated system prompt tokens (from `SystemPrompt.estimated_tokens`)
- `prompt_token_ratio` — ratio of prompt tokens to total tokens (overhead indicator, `@computed_field`; warns when >0.3)

These are natural overhead indicators — a task consuming 15 turns and 50k tokens for a one-line fix signals a problem.

Expand Down Expand Up @@ -2771,14 +2774,15 @@ ai-company/
│ │ ├── role.py # Role model
│ │ ├── role_catalog.py # Role catalog
│ │ └── personality.py # Personality compatibility scoring
│ ├── engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, workspace isolation, and coordination error classification
│ ├── engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, workspace isolation, coordination error classification, and prompt policy validation
│ │ ├── errors.py # Engine error hierarchy
│ │ ├── prompt.py # System prompt builder
│ │ ├── prompt_template.py # System prompt Jinja2 templates
│ │ ├── task_execution.py # TaskExecution + StatusTransition
│ │ ├── context.py # AgentContext + AgentContextSnapshot
│ │ ├── loop_protocol.py # ExecutionLoop protocol + result models
│ │ ├── metrics.py # TaskCompletionMetrics proxy overhead model
│ │ ├── policy_validation.py # Org policy quality heuristics (non-inferable principle)
│ │ ├── react_loop.py # ReAct loop implementation
│ │ ├── plan_models.py # Plan step, plan, and plan-execute config models
│ │ ├── plan_execute_loop.py # Plan-and-Execute loop implementation
Expand Down Expand Up @@ -2910,7 +2914,7 @@ ai-company/
│ │ │ └── structured_phases.py # StructuredPhasesProtocol implementation
│ │ ├── messenger.py # AgentMessenger per-agent facade
│ │ └── subscription.py # Subscription + DeliveryEnvelope models
│ ├── memory/ # Agent memory system — protocols, models, config, factory, retrieval pipeline (M5)
│ ├── memory/ # Agent memory system — protocols, models, config, factory, retrieval pipeline (ranking, injection, context formatting, non-inferable filtering) (M5)
│ │ ├── __init__.py # Re-exports
│ │ ├── capabilities.py # MemoryCapabilities protocol
│ │ ├── config.py # CompanyMemoryConfig, MemoryStorageConfig, MemoryOptionsConfig
Expand All @@ -2922,7 +2926,9 @@ ai-company/
│ │ ├── protocol.py # MemoryBackend protocol
│ │ ├── ranking.py # ScoredMemory model, rank_memories(), scoring functions
│ │ ├── retrieval_config.py # MemoryRetrievalConfig (weights, thresholds, strategy selection)
│ │ ├── filter.py # MemoryFilterStrategy protocol, TagBasedMemoryFilter, PassthroughMemoryFilter
│ │ ├── retriever.py # ContextInjectionStrategy (full retrieval → rank → format pipeline)
│ │ ├── store_guard.py # Advisory non-inferable tag enforcement at store boundary
│ │ ├── shared.py # SharedKnowledgeStore protocol
│ │ ├── consolidation/ # Memory consolidation — strategies, retention, archival
│ │ │ ├── __init__.py
Expand Down Expand Up @@ -2992,6 +2998,7 @@ ai-company/
│ │ │ ├── role.py # ROLE_* constants
│ │ │ ├── routing.py # ROUTING_* constants
│ │ │ ├── sandbox.py # SANDBOX_* constants
│ │ │ ├── security.py # SECURITY_* constants
│ │ │ ├── task.py # TASK_* constants
│ │ │ ├── task_assignment.py # TASK_ASSIGNMENT_* constants
│ │ │ ├── task_routing.py # TASK_ROUTING_* constants
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Task Intelligence (M4)** - Task decomposition, routing, assignment strategies, workspace isolation via git worktrees
- **Templates** - Built-in templates, inheritance/merge, rendering, personality presets
- **Persistence Layer (M5)** - Pluggable `PersistenceBackend` protocol with SQLite backend (aiosqlite), repository protocols, schema migrations
- **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, factory, and context injection retrieval pipeline (ranking, token-budget formatting). Shared organizational memory via `OrgMemoryBackend` protocol with hybrid prompt+retrieval backend. Memory consolidation/archival with pluggable strategies and retention enforcement
- **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, factory, and context injection retrieval pipeline (ranking, token-budget formatting, non-inferable filtering). Shared organizational memory via `OrgMemoryBackend` protocol with hybrid prompt+retrieval backend. Memory consolidation/archival with pluggable strategies and retention enforcement
- **Coordination Error Taxonomy (M5)** - Post-execution classification pipeline detecting logical contradictions, numerical drift, context omissions, and coordination failures
- **Budget Enforcement (M5)** - `BudgetEnforcer` service with pre-flight checks, in-flight budget checking, auto-downgrade, configurable cost tiers, and quota/subscription tracking; `CostOptimizer` CFO service with anomaly detection, efficiency analysis, downgrade recommendations, and approval decisions; `ReportGenerator` for multi-dimensional spending reports
- **Litestar REST API (M6)** - 13 controllers + WebSocket handler covering company, agents, tasks, budget, approvals, analytics, messages, meetings, projects, departments, artifacts, providers, health, and WebSocket real-time feed
Expand Down
21 changes: 19 additions & 2 deletions src/ai_company/engine/agent_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
EXECUTION_ENGINE_TIMEOUT,
EXECUTION_RECOVERY_FAILED,
)
from ai_company.observability.events.prompt import PROMPT_TOKEN_RATIO_HIGH
from ai_company.observability.events.security import SECURITY_DISABLED
from ai_company.providers.enums import MessageRole
from ai_company.providers.models import ChatMessage
Expand Down Expand Up @@ -91,6 +92,9 @@

logger = get_logger(__name__)

_PROMPT_TOKEN_RATIO_THRESHOLD: float = 0.3
"""Prompt-to-total token ratio above which a warning is emitted."""

_DEFAULT_RECOVERY_STRATEGY = FailAndReassignStrategy()
"""Module-level default instance for the recovery strategy."""

Expand Down Expand Up @@ -357,11 +361,12 @@ async def _post_execution_pipeline(
except MemoryError, RecursionError:
raise
except Exception:
logger.debug(
logger.warning(
EXECUTION_ENGINE_ERROR,
agent_id=agent_id,
task_id=task_id,
error="classification failed (details logged by pipeline)",
error="classification failed",
exc_info=True,
)
return execution_result

Expand Down Expand Up @@ -760,8 +765,20 @@ def _log_completion(
tokens_per_task=metrics.tokens_per_task,
cost_per_task=metrics.cost_per_task,
duration_seconds=metrics.duration_seconds,
prompt_tokens=metrics.prompt_tokens,
prompt_token_ratio=metrics.prompt_token_ratio,
)

if metrics.prompt_token_ratio > _PROMPT_TOKEN_RATIO_THRESHOLD:
logger.warning(
PROMPT_TOKEN_RATIO_HIGH,
agent_id=agent_id,
task_id=task_id,
prompt_token_ratio=metrics.prompt_token_ratio,
prompt_tokens=metrics.prompt_tokens,
total_tokens=metrics.tokens_per_task,
)

def _handle_budget_error( # noqa: PLR0913
self,
*,
Expand Down
Loading