Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ uv run pre-commit run --all-files # all pre-commit hooks
```text
src/ai_company/
api/ # FastAPI REST + WebSocket routes
budget/ # Per-agent cost tracking and spending controls
budget/ # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods
cli/ # Typer CLI commands
communication/ # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution, meeting protocol
config/ # YAML company config loading and validation
Expand Down
41 changes: 27 additions & 14 deletions DESIGN_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,21 +971,22 @@ hybrid:
Pipeline steps:

1. **Validate inputs** — agent must be `ACTIVE`, task must be `ASSIGNED` or `IN_PROGRESS`. Raises `ExecutionStateError` on violation.
2. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
3. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
4. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
8. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
9. **Apply post-execution transitions** — on `COMPLETED` termination: IN_PROGRESS → IN_REVIEW → COMPLETED (two-hop auto-complete in M3; reviewers deferred to M4+). On `SHUTDOWN` termination: current status → INTERRUPTED (see §6.7). On `ERROR` termination: recovery strategy is applied (default `FailAndReassignStrategy` transitions to FAILED; see §6.6). All other termination reasons (`MAX_TURNS`, `BUDGET_EXHAUSTED`) leave the task in its current state. Transition failures are logged but do not discard the successful execution result.
10. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.
2. **Pre-flight budget enforcement** — if `BudgetEnforcer` is provided, check monthly hard stop and daily limit via `check_can_execute()`, then apply auto-downgrade via `resolve_model()`. Raises `BudgetExhaustedError` or `DailyLimitExceededError` on violation.
3. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
4. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
5. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
6. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
7. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from `BudgetEnforcer` (task + monthly + daily limits with pre-computed baselines and alert deduplication) or from task budget limit alone when no enforcer is configured.
8. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
9. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
10. **Apply post-execution transitions** — on `COMPLETED` termination: IN_PROGRESS → IN_REVIEW → COMPLETED (two-hop auto-complete in M3; reviewers deferred to M4+). On `SHUTDOWN` termination: current status → INTERRUPTED (see §6.7). On `ERROR` termination: recovery strategy is applied (default `FailAndReassignStrategy` transitions to FAILED; see §6.6). All other termination reasons (`MAX_TURNS`, `BUDGET_EXHAUSTED`) leave the task in its current state. Transition failures are logged but do not discard the successful execution result.
11. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.

Error handling: `MemoryError` and `RecursionError` propagate unconditionally. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.
Error handling: `MemoryError` and `RecursionError` propagate unconditionally. `BudgetExhaustedError` (including `DailyLimitExceededError`) returns `TerminationReason.BUDGET_EXHAUSTED` without recovery — budget exhaustion is a controlled stop, not a crash. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.

Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).
Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`, `recovery_strategy` (defaults to `FailAndReassignStrategy`), `shutdown_checker`, `budget_enforcer`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).

Logs structured events under the `execution.engine.*` namespace (12 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.
Logs structured events under the `execution.engine.*` namespace (13 constants in `events/execution.py`): creation, start, prompt built, completion, errors, budget stopped, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.

**`AgentRunResult`** — frozen Pydantic model wrapping `ExecutionResult` with engine metadata:

Expand Down Expand Up @@ -1778,7 +1779,7 @@ Every API call is tracked (illustrative schema):

### 10.3 CFO Agent Responsibilities

> **MVP: Not in M3.** Budget tracking and per-task cost recording exist (M2), but the CFO agent is M5+. Cost controls (§10.4) are enforced by the engine, not by an agent.
> **MVP: Not in M3.** Budget tracking and per-task cost recording exist (M2); cost controls (§10.4) are now enforced by `BudgetEnforcer` (a service the engine composes, not an agent — M5). The CFO agent is M5+.

The CFO agent (when enabled) acts as a cost management system:

Expand All @@ -1804,6 +1805,7 @@ The CFO agent (when enabled) acts as a cost management system:
```yaml
budget:
total_monthly: 100.00
reset_day: 1
alerts:
warn_at: 75 # percent
critical_at: 90
Expand All @@ -1822,6 +1824,15 @@ budget:

> **Auto-downgrade boundary:** Model downgrades apply only at **task assignment time**, never mid-execution. An agent halfway through an architecture review cannot be switched to a cheaper model — the task completes on its assigned model. The next task assignment respects the downgrade threshold. This prevents quality degradation from mid-thought model switches.

> **Implementation note (M5):** `BudgetEnforcer` composes `CostTracker` +
> `BudgetConfig` to provide three enforcement layers: (1) pre-flight checks
> via `check_can_execute` (monthly hard stop + per-agent daily limit), (2)
> in-flight budget checking via a sync `BudgetChecker` closure with
> pre-computed baselines (task + monthly + daily limits, alert deduplication),
> and (3) task-boundary auto-downgrade via `resolve_model`. Billing periods
> are scoped by `billing_period_start(reset_day)`. `DailyLimitExceededError`
> is a subclass of `BudgetExhaustedError` for granular error handling.

### 10.5 LLM Call Analytics

> **Current state:** Proxy metrics (M3), call categorization + coordination metric data models (M4 models, brought forward), and error taxonomy classification pipeline (M5) are implemented. Runtime collection pipeline for coordination metrics and full analytics layer are M5+.
Expand Down Expand Up @@ -2637,6 +2648,7 @@ ai-company/
│ │ ├── recovery.py # Crash recovery strategies (RecoveryStrategy protocol)
│ │ ├── cost_recording.py # Per-turn cost recording helpers
│ │ ├── run_result.py # AgentRunResult outcome model
│ │ ├── _validation.py # Input validation helpers for AgentEngine
│ │ ├── agent_engine.py # Agent execution engine
│ │ ├── parallel.py # Parallel agent executor (TaskGroup + Semaphore)
│ │ ├── parallel_models.py # AgentAssignment, ParallelExecutionGroup, AgentOutcome, ParallelExecutionResult, ParallelProgress
Expand Down Expand Up @@ -2864,7 +2876,8 @@ ai-company/
│ │ ├── spending_summary.py # _SpendingTotals base + spending summary models
│ │ ├── hierarchy.py # BudgetHierarchy, BudgetConfig
│ │ ├── enums.py # Budget-related enums
│ │ ├── limits.py # Budget enforcement (M5)
│ │ ├── billing.py # Billing period computation utilities
│ │ ├── enforcer.py # BudgetEnforcer service (pre-flight, in-flight, auto-downgrade)
│ │ ├── optimizer.py # Cost optimization / CFO logic (M5)
│ │ └── reports.py # Spending reports (M5)
│ ├── api/ # REST + WebSocket API (M6, stubs only)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
- **Persistence Layer (M5)** - Pluggable `PersistenceBackend` protocol with SQLite backend (aiosqlite), repository protocols, schema migrations
- **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, and factory
- **Coordination Error Taxonomy (M5)** - Post-execution classification pipeline detecting logical contradictions, numerical drift, context omissions, and coordination failures
- **Budget Enforcement (M5)** - `BudgetEnforcer` service with pre-flight checks, in-flight budget checking, and auto-downgrade; CFO agent and advanced reporting pending

### Not implemented yet (planned milestones)

- **Memory Backends (M5)** - Mem0 adapter ([ADR-001](docs/decisions/ADR-001-memory-layer.md), #41) pending; shared knowledge store backends planned
- **Budget Controls (M5)** - Per-agent spending limits, budget hierarchy enforcement
- **API Layer (M6)** - `api/` package and route modules are placeholders
- **CLI Surface (M6)** - `cli/` package is placeholder-only
- **Security/Approval System (M7)** - `security/` package is placeholder-only
Expand Down
5 changes: 5 additions & 0 deletions src/ai_company/budget/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
DESIGN_SPEC Section 10.
"""

from ai_company.budget.billing import billing_period_start, daily_period_start
from ai_company.budget.call_category import LLMCallCategory, OrchestrationAlertLevel
from ai_company.budget.category_analytics import CategoryBreakdown, OrchestrationRatio
from ai_company.budget.config import (
Expand All @@ -28,6 +29,7 @@
RedundancyRate,
)
from ai_company.budget.cost_record import CostRecord
from ai_company.budget.enforcer import BudgetEnforcer
from ai_company.budget.enums import BudgetAlertLevel
from ai_company.budget.hierarchy import (
BudgetHierarchy,
Expand All @@ -48,6 +50,7 @@
"BudgetAlertConfig",
"BudgetAlertLevel",
"BudgetConfig",
"BudgetEnforcer",
"BudgetHierarchy",
"CategoryBreakdown",
"CoordinationEfficiency",
Expand All @@ -71,4 +74,6 @@
"RedundancyRate",
"SpendingSummary",
"TeamBudget",
"billing_period_start",
"daily_period_start",
]
59 changes: 59 additions & 0 deletions src/ai_company/budget/billing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""Billing period computation utilities.

Pure functions for determining billing period boundaries based on a
configurable reset day. Used by :class:`~ai_company.budget.enforcer.BudgetEnforcer`
to scope cost queries to the current billing cycle.
"""

from datetime import UTC, datetime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Initialize the standard logger for this module.

These helpers are now part of the budget-enforcement business logic path, but the module still doesn't define logger = get_logger(__name__). That also leaves the invalid-input raise paths without the standard observability hook used elsewhere under src/ai_company/**.

Suggested fix
 from datetime import UTC, datetime
 
+from ai_company.observability import get_logger
+
+logger = get_logger(__name__)

As per coding guidelines, "Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import UTC, datetime
from datetime import UTC, datetime
from ai_company.observability import get_logger
logger = get_logger(__name__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/budget/billing.py` at line 8, Add the module-level logger by
importing get_logger from ai_company.observability and initializing logger =
get_logger(__name__) at the top of the billing module; specifically, add "from
ai_company.observability import get_logger" and then "logger =
get_logger(__name__)" near the existing datetime import so the functions in this
file (e.g., any raise paths for invalid input) can use logger for observability.



def billing_period_start(
reset_day: int,
*,
now: datetime | None = None,
) -> datetime:
"""Compute the UTC-aware start of the current billing period.

If ``now.day >= reset_day``, returns current month's ``reset_day``
at 00:00 UTC. Otherwise, returns previous month's ``reset_day``
at 00:00 UTC.

Args:
reset_day: Day of month when the billing period resets (1-28).
now: Reference timestamp. Defaults to ``datetime.now(UTC)``.

Returns:
UTC-aware datetime at midnight on the billing period start day.

Raises:
ValueError: If ``reset_day`` is not in ``[1, 28]``.
"""
if not 1 <= reset_day <= 28: # noqa: PLR2004
msg = f"reset_day must be 1-28, got {reset_day}"
raise ValueError(msg)

if now is None:
now = datetime.now(UTC)

if now.day >= reset_day:
return datetime(now.year, now.month, reset_day, tzinfo=UTC)

# Roll back to previous month
if now.month == 1:
return datetime(now.year - 1, 12, reset_day, tzinfo=UTC)
return datetime(now.year, now.month - 1, reset_day, tzinfo=UTC)
Comment on lines +32 to +45
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden the public input boundary for reset_day and now.

billing_period_start(True) is currently treated as day 1, billing_period_start(1.5) falls through to a TypeError, and non-UTC now values are copied into a UTC timestamp without conversion. Because these helpers define the monthly/daily windows used by budget enforcement, that can select the wrong billing period around UTC day/month boundaries.

Suggested hardening
+def _normalize_utc_now(*, now: datetime | None) -> datetime:
+    if now is None:
+        return datetime.now(UTC)
+    if now.tzinfo is None:
+        msg = "now must be timezone-aware"
+        raise ValueError(msg)
+    return now.astimezone(UTC)
+
+
 def billing_period_start(
     reset_day: int,
     *,
     now: datetime | None = None,
 ) -> datetime:
@@
-    if not 1 <= reset_day <= 28:  # noqa: PLR2004
+    if (
+        isinstance(reset_day, bool)
+        or not isinstance(reset_day, int)
+        or not 1 <= reset_day <= 28  # noqa: PLR2004
+    ):
         msg = f"reset_day must be 1-28, got {reset_day}"
         raise ValueError(msg)
 
-    if now is None:
-        now = datetime.now(UTC)
+    now = _normalize_utc_now(now=now)
@@
 def daily_period_start(*, now: datetime | None = None) -> datetime:
@@
-    if now is None:
-        now = datetime.now(UTC)
+    now = _normalize_utc_now(now=now)
     return datetime(now.year, now.month, now.day, tzinfo=UTC)

As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".

Also applies to: 57-59

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/budget/billing.py` around lines 32 - 45, The function
billing_period_start should hard-validate its inputs: ensure reset_day is an int
and in 1..28 (raise TypeError for non-int, ValueError for out-of-range) and
ensure now is a datetime that is normalized to UTC (if now is tz-aware, call now
= now.astimezone(UTC); if now is naive, explicitly set or document treating it
as UTC by replacing tzinfo=UTC). Update the billing_period_start implementation
(and the analogous helper used at lines 57-59) to perform these
checks/conversions before any date arithmetic so you don't silently copy non-UTC
datetimes into UTC or accept non-integer reset_day values.



def daily_period_start(*, now: datetime | None = None) -> datetime:
"""Compute the UTC-aware start of today (midnight UTC).

Args:
now: Reference timestamp. Defaults to ``datetime.now(UTC)``.

Returns:
UTC-aware datetime at midnight of the current day.
"""
if now is None:
now = datetime.now(UTC)
return datetime(now.year, now.month, now.day, tzinfo=UTC)
23 changes: 21 additions & 2 deletions src/ai_company/budget/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""

from collections import Counter
from typing import Any, Self
from typing import Any, Literal, Self

from pydantic import BaseModel, ConfigDict, Field, model_validator

Expand Down Expand Up @@ -74,6 +74,8 @@ class AutoDowngradeConfig(BaseModel):
enabled: Whether auto-downgrade is active.
threshold: Budget percent that triggers downgrade.
downgrade_map: Ordered pairs of (from_alias, to_alias).
boundary: When to apply downgrade (task_assignment only,
never mid-execution per DESIGN_SPEC §10.4).
"""

model_config = ConfigDict(frozen=True)
Expand All @@ -93,6 +95,12 @@ class AutoDowngradeConfig(BaseModel):
default=(),
description="Ordered pairs of (from_alias, to_alias)",
)
boundary: Literal["task_assignment"] = Field(
default="task_assignment",
description=(
"When to apply downgrade (task_assignment only, never mid-execution)"
),
)

@model_validator(mode="before")
@classmethod
Expand Down Expand Up @@ -152,9 +160,11 @@ class BudgetConfig(BaseModel):
per_task_limit: Maximum USD per task.
per_agent_daily_limit: Maximum USD per agent per day.
auto_downgrade: Automatic model downgrade configuration.
reset_day: Day of month when budget resets (1-28, avoids
month-length issues).
"""

model_config = ConfigDict(frozen=True)
model_config = ConfigDict(frozen=True, allow_inf_nan=False)

total_monthly: float = Field(
default=100.0,
Expand All @@ -179,6 +189,15 @@ class BudgetConfig(BaseModel):
default_factory=AutoDowngradeConfig,
description="Automatic model downgrade configuration",
)
reset_day: int = Field(
default=1,
ge=1,
le=28,
strict=True,
description=(
"Day of month when budget resets (1-28, avoids month-length issues)"
),
)

@model_validator(mode="after")
def _validate_per_task_limit_within_monthly(self) -> Self:
Expand Down
Loading