Aureliolo · Aureliolo · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026 · Mar 9, 2026
@@ -44,7 +44,7 @@ uv run pre-commit run --all-files          # all pre-commit hooks
 ```text
 src/ai_company/
   api/            # FastAPI REST + WebSocket routes
-  budget/         # Per-agent cost tracking and spending controls
+  budget/         # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods
   cli/            # Typer CLI commands
   communication/  # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution, meeting protocol
   config/         # YAML company config loading and validation

@@ -971,21 +971,22 @@ hybrid:
 Pipeline steps:
 
 1. **Validate inputs** — agent must be `ACTIVE`, task must be `ASSIGNED` or `IN_PROGRESS`. Raises `ExecutionStateError` on violation.
-2. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
-3. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
-4. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
-5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
-6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
-7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
-8. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
-9. **Apply post-execution transitions** — on `COMPLETED` termination: IN_PROGRESS → IN_REVIEW → COMPLETED (two-hop auto-complete in M3; reviewers deferred to M4+). On `SHUTDOWN` termination: current status → INTERRUPTED (see §6.7). On `ERROR` termination: recovery strategy is applied (default `FailAndReassignStrategy` transitions to FAILED; see §6.6). All other termination reasons (`MAX_TURNS`, `BUDGET_EXHAUSTED`) leave the task in its current state. Transition failures are logged but do not discard the successful execution result.
-10. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.
+2. **Pre-flight budget enforcement** — if `BudgetEnforcer` is provided, check monthly hard stop and daily limit via `check_can_execute()`, then apply auto-downgrade via `resolve_model()`. Raises `BudgetExhaustedError` or `DailyLimitExceededError` on violation.
+3. **Build system prompt** — calls `build_system_prompt()` with agent identity, task, and available tool definitions.
+4. **Create context** — `AgentContext.from_identity()` with the configured `max_turns`.
+5. **Seed conversation** — injects system prompt, optional memory messages, and formatted task instruction as initial messages.
+6. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
+7. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from `BudgetEnforcer` (task + monthly + daily limits with pre-computed baselines and alert deduplication) or from task budget limit alone when no enforcer is configured.
+8. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
+9. **Record costs** — records accumulated `TokenUsage` to `CostTracker` (if available). Cost recording failures are logged but do not affect the result.
+10. **Apply post-execution transitions** — on `COMPLETED` termination: IN_PROGRESS → IN_REVIEW → COMPLETED (two-hop auto-complete in M3; reviewers deferred to M4+). On `SHUTDOWN` termination: current status → INTERRUPTED (see §6.7). On `ERROR` termination: recovery strategy is applied (default `FailAndReassignStrategy` transitions to FAILED; see §6.6). All other termination reasons (`MAX_TURNS`, `BUDGET_EXHAUSTED`) leave the task in its current state. Transition failures are logged but do not discard the successful execution result.
+11. **Return result** — wraps `ExecutionResult` in `AgentRunResult` with engine-level metadata.
 
-Error handling: `MemoryError` and `RecursionError` propagate unconditionally. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.
+Error handling: `MemoryError` and `RecursionError` propagate unconditionally. `BudgetExhaustedError` (including `DailyLimitExceededError`) returns `TerminationReason.BUDGET_EXHAUSTED` without recovery — budget exhaustion is a controlled stop, not a crash. All other exceptions are caught and wrapped in an `AgentRunResult` with `TerminationReason.ERROR`.
 
-Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).
+Constructor accepts: `provider` (required), `execution_loop` (defaults to `ReactLoop`), `tool_registry`, `cost_tracker`, `recovery_strategy` (defaults to `FailAndReassignStrategy`), `shutdown_checker`, `budget_enforcer`. The `run()` method also accepts `memory_messages` — optional working memory to inject between the system prompt and task instruction (memory retrieval is M5; the engine provides the injection hook).
 
-Logs structured events under the `execution.engine.*` namespace (12 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.
+Logs structured events under the `execution.engine.*` namespace (13 constants in `events/execution.py`): creation, start, prompt built, completion, errors, budget stopped, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.
 
 **`AgentRunResult`** — frozen Pydantic model wrapping `ExecutionResult` with engine metadata:
 
@@ -1778,7 +1779,7 @@ Every API call is tracked (illustrative schema):
 
 ### 10.3 CFO Agent Responsibilities
 
-> **MVP: Not in M3.** Budget tracking and per-task cost recording exist (M2), but the CFO agent is M5+. Cost controls (§10.4) are enforced by the engine, not by an agent.
+> **MVP: Not in M3.** Budget tracking and per-task cost recording exist (M2); cost controls (§10.4) are now enforced by `BudgetEnforcer` (a service the engine composes, not an agent — M5). The CFO agent is M5+.
 
 The CFO agent (when enabled) acts as a cost management system:
 
@@ -1804,6 +1805,7 @@ The CFO agent (when enabled) acts as a cost management system:
 ```yaml
 budget:
   total_monthly: 100.00
+  reset_day: 1
   alerts:
     warn_at: 75               # percent
     critical_at: 90
@@ -1822,6 +1824,15 @@ budget:
 
 > **Auto-downgrade boundary:** Model downgrades apply only at **task assignment time**, never mid-execution. An agent halfway through an architecture review cannot be switched to a cheaper model — the task completes on its assigned model. The next task assignment respects the downgrade threshold. This prevents quality degradation from mid-thought model switches.
 
+> **Implementation note (M5):** `BudgetEnforcer` composes `CostTracker` +
+> `BudgetConfig` to provide three enforcement layers: (1) pre-flight checks
+> via `check_can_execute` (monthly hard stop + per-agent daily limit), (2)
+> in-flight budget checking via a sync `BudgetChecker` closure with
+> pre-computed baselines (task + monthly + daily limits, alert deduplication),
+> and (3) task-boundary auto-downgrade via `resolve_model`. Billing periods
+> are scoped by `billing_period_start(reset_day)`. `DailyLimitExceededError`
+> is a subclass of `BudgetExhaustedError` for granular error handling.
+
 ### 10.5 LLM Call Analytics
 
 > **Current state:** Proxy metrics (M3), call categorization + coordination metric data models (M4 models, brought forward), and error taxonomy classification pipeline (M5) are implemented. Runtime collection pipeline for coordination metrics and full analytics layer are M5+.
@@ -2637,6 +2648,7 @@ ai-company/
 │       │   ├── recovery.py         # Crash recovery strategies (RecoveryStrategy protocol)
 │       │   ├── cost_recording.py   # Per-turn cost recording helpers
 │       │   ├── run_result.py       # AgentRunResult outcome model
+│       │   ├── _validation.py      # Input validation helpers for AgentEngine
 │       │   ├── agent_engine.py     # Agent execution engine
 │       │   ├── parallel.py         # Parallel agent executor (TaskGroup + Semaphore)
 │       │   ├── parallel_models.py  # AgentAssignment, ParallelExecutionGroup, AgentOutcome, ParallelExecutionResult, ParallelProgress
@@ -2864,7 +2876,8 @@ ai-company/
 │       │   ├── spending_summary.py # _SpendingTotals base + spending summary models
 │       │   ├── hierarchy.py        # BudgetHierarchy, BudgetConfig
 │       │   ├── enums.py            # Budget-related enums
-│       │   ├── limits.py           # Budget enforcement (M5)
+│       │   ├── billing.py          # Billing period computation utilities
+│       │   ├── enforcer.py         # BudgetEnforcer service (pre-flight, in-flight, auto-downgrade)
 │       │   ├── optimizer.py        # Cost optimization / CFO logic (M5)
 │       │   └── reports.py          # Spending reports (M5)
 │       ├── api/                     # REST + WebSocket API (M6, stubs only)

@@ -23,11 +23,11 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
 - **Persistence Layer (M5)** - Pluggable `PersistenceBackend` protocol with SQLite backend (aiosqlite), repository protocols, schema migrations
 - **Memory Interface (M5)** - Pluggable `MemoryBackend` protocol with capability discovery, shared knowledge protocol, domain models, config, and factory
 - **Coordination Error Taxonomy (M5)** - Post-execution classification pipeline detecting logical contradictions, numerical drift, context omissions, and coordination failures
+- **Budget Enforcement (M5)** - `BudgetEnforcer` service with pre-flight checks, in-flight budget checking, and auto-downgrade; CFO agent and advanced reporting pending
 
 ### Not implemented yet (planned milestones)
 
 - **Memory Backends (M5)** - Mem0 adapter ([ADR-001](docs/decisions/ADR-001-memory-layer.md), #41) pending; shared knowledge store backends planned
-- **Budget Controls (M5)** - Per-agent spending limits, budget hierarchy enforcement
 - **API Layer (M6)** - `api/` package and route modules are placeholders
 - **CLI Surface (M6)** - `cli/` package is placeholder-only
 - **Security/Approval System (M7)** - `security/` package is placeholder-only

@@ -5,6 +5,7 @@
 DESIGN_SPEC Section 10.
 """
 
+from ai_company.budget.billing import billing_period_start, daily_period_start
 from ai_company.budget.call_category import LLMCallCategory, OrchestrationAlertLevel
 from ai_company.budget.category_analytics import CategoryBreakdown, OrchestrationRatio
 from ai_company.budget.config import (
@@ -28,6 +29,7 @@
     RedundancyRate,
 )
 from ai_company.budget.cost_record import CostRecord
+from ai_company.budget.enforcer import BudgetEnforcer
 from ai_company.budget.enums import BudgetAlertLevel
 from ai_company.budget.hierarchy import (
     BudgetHierarchy,
@@ -48,6 +50,7 @@
     "BudgetAlertConfig",
     "BudgetAlertLevel",
     "BudgetConfig",
+    "BudgetEnforcer",
     "BudgetHierarchy",
     "CategoryBreakdown",
     "CoordinationEfficiency",
@@ -71,4 +74,6 @@
     "RedundancyRate",
     "SpendingSummary",
     "TeamBudget",
+    "billing_period_start",
+    "daily_period_start",
 ]
@@ -0,0 +1,59 @@
+"""Billing period computation utilities.
+
+Pure functions for determining billing period boundaries based on a
+configurable reset day. Used by :class:`~ai_company.budget.enforcer.BudgetEnforcer`
+to scope cost queries to the current billing cycle.
+"""
+
+from datetime import UTC, datetime
-from datetime import UTC, datetime
+from datetime import UTC, datetime
+
+from ai_company.observability import get_logger
+
+logger = get_logger(__name__)
-from datetime import UTC, datetime
+from datetime import UTC, datetime
+
+from ai_company.observability import get_logger
+
+logger = get_logger(__name__)
+
+
+def billing_period_start(
+    reset_day: int,
+    *,
+    now: datetime | None = None,
+) -> datetime:
+    """Compute the UTC-aware start of the current billing period.
+
+    If ``now.day >= reset_day``, returns current month's ``reset_day``
+    at 00:00 UTC.  Otherwise, returns previous month's ``reset_day``
+    at 00:00 UTC.
+
+    Args:
+        reset_day: Day of month when the billing period resets (1-28).
+        now: Reference timestamp.  Defaults to ``datetime.now(UTC)``.
+
+    Returns:
+        UTC-aware datetime at midnight on the billing period start day.
+
+    Raises:
+        ValueError: If ``reset_day`` is not in ``[1, 28]``.
+    """
+    if not 1 <= reset_day <= 28:  # noqa: PLR2004
+        msg = f"reset_day must be 1-28, got {reset_day}"
+        raise ValueError(msg)
+
+    if now is None:
+        now = datetime.now(UTC)
+
+    if now.day >= reset_day:
+        return datetime(now.year, now.month, reset_day, tzinfo=UTC)
+
+    # Roll back to previous month
+    if now.month == 1:
+        return datetime(now.year - 1, 12, reset_day, tzinfo=UTC)
+    return datetime(now.year, now.month - 1, reset_day, tzinfo=UTC)
+
+
+def daily_period_start(*, now: datetime | None = None) -> datetime:
+    """Compute the UTC-aware start of today (midnight UTC).
+
+    Args:
+        now: Reference timestamp.  Defaults to ``datetime.now(UTC)``.
+
+    Returns:
+        UTC-aware datetime at midnight of the current day.
+    """
+    if now is None:
+        now = datetime.now(UTC)
+    return datetime(now.year, now.month, now.day, tzinfo=UTC)
@@ -5,7 +5,7 @@
 """
 
 from collections import Counter
-from typing import Any, Self
+from typing import Any, Literal, Self
 
 from pydantic import BaseModel, ConfigDict, Field, model_validator
 
@@ -74,6 +74,8 @@ class AutoDowngradeConfig(BaseModel):
         enabled: Whether auto-downgrade is active.
         threshold: Budget percent that triggers downgrade.
         downgrade_map: Ordered pairs of (from_alias, to_alias).
+        boundary: When to apply downgrade (task_assignment only,
+            never mid-execution per DESIGN_SPEC §10.4).
     """
 
     model_config = ConfigDict(frozen=True)
@@ -93,6 +95,12 @@ class AutoDowngradeConfig(BaseModel):
         default=(),
         description="Ordered pairs of (from_alias, to_alias)",
     )
+    boundary: Literal["task_assignment"] = Field(
+        default="task_assignment",
+        description=(
+            "When to apply downgrade (task_assignment only, never mid-execution)"
+        ),
+    )
 
     @model_validator(mode="before")
     @classmethod
@@ -152,9 +160,11 @@ class BudgetConfig(BaseModel):
         per_task_limit: Maximum USD per task.
         per_agent_daily_limit: Maximum USD per agent per day.
         auto_downgrade: Automatic model downgrade configuration.
+        reset_day: Day of month when budget resets (1-28, avoids
+            month-length issues).
     """
 
-    model_config = ConfigDict(frozen=True)
+    model_config = ConfigDict(frozen=True, allow_inf_nan=False)
 
     total_monthly: float = Field(
         default=100.0,
@@ -179,6 +189,15 @@ class BudgetConfig(BaseModel):
         default_factory=AutoDowngradeConfig,
         description="Automatic model downgrade configuration",
     )
+    reset_day: int = Field(
+        default=1,
+        ge=1,
+        le=28,
+        strict=True,
+        description=(
+            "Day of month when budget resets (1-28, avoids month-length issues)"
+        ),
+    )
 
     @model_validator(mode="after")
     def _validate_per_task_limit_within_monthly(self) -> Self: