Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a6ff65b
test: failing harness for forecast gate + hard ceiling + pareto (#1982)
Aureliolo May 20, 2026
377c044
feat: cost forecast + hard ceiling domain models (#1982)
Aureliolo May 20, 2026
6da4472
feat: cost_forecasts table + repository (#1982)
Aureliolo May 20, 2026
cd1615f
feat: CostForecaster service (#1982)
Aureliolo May 20, 2026
29d3e96
feat: ForecastGate wrapper for work-entry adapters (#1982)
Aureliolo May 20, 2026
6afc8bb
feat: per-turn hard ceiling enforcement in BudgetChecker (#1982)
Aureliolo May 20, 2026
4180eec
feat: Pareto analyzer + stub benchmark provider (#1982)
Aureliolo May 20, 2026
aebd020
fix: pre-push gate findings on cost-dial (#1982)
Aureliolo May 20, 2026
8a8f74d
fix: drop issue back-refs + phase framing from cost-dial docstrings (…
Aureliolo May 20, 2026
1243d15
feat: cost-dial controllers + AppState wiring (#1982)
Aureliolo May 20, 2026
69410f8
feat: cost-dial dashboard ParetoSection + fixture updates (#1982)
Aureliolo May 20, 2026
c546e0b
fix: defensive cost-dial wiring at startup (#1982)
Aureliolo May 20, 2026
c6b9146
feat: wire ForecastGate into entry adapters + PARKED routing (#1982)
Aureliolo May 20, 2026
85a75fc
fix: ruff format pass on ceiling park test (#1982)
Aureliolo May 20, 2026
a756032
fix: drop unused type-ignore (#1982)
Aureliolo May 20, 2026
249750a
feat: wire ParetoSection into BudgetPage
Aureliolo May 20, 2026
ecff842
feat: greenify cost-dial e2e harness
Aureliolo May 20, 2026
b992ba1
feat: ApprovalGate.park_context on hard-ceiling crossings
Aureliolo May 20, 2026
f8db8be
feat: forecast dialog, approval card, ceiling banner + store
Aureliolo May 20, 2026
963a56c
feat: stamp hard-ceiling halt context on forecast for resume banner
Aureliolo May 20, 2026
8924bf7
test: assert ceiling closure propagates task forecast_id
Aureliolo May 20, 2026
86e73b8
fix: await async _handle_budget_error in cost-dial e2e
Aureliolo May 20, 2026
acc7d75
fix: address pre-PR review findings for cost dial
Aureliolo May 20, 2026
87c3e34
fix: address Gemini review on cost-dial (503 consistency, TaskGroup f…
Aureliolo May 20, 2026
3ecf4cd
fix: address CodeRabbit review on cost-dial (write-access guards, mod…
Aureliolo May 20, 2026
1ebff72
fix: use sub-daily-limit cost in hard-ceiling disablement test
Aureliolo May 20, 2026
0178d6f
fix: reuse pending forecast covering the brief instead of minting a d…
Aureliolo May 20, 2026
3e43d3d
fix: reuse existing pending forecast by brief_hash even without a cal…
Aureliolo May 20, 2026
40d76a3
fix: recover forecast_gate save race via re-query and slim run() unde…
Aureliolo May 21, 2026
a0c612a
test: clarify stale-linked-forecast gate test name and assertions
Aureliolo May 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ PYTHONPATH=. uv run zensical build # docs

- Two phases: **construction** (`create_app` body) wires synchronous services; **on_startup** (`_build_lifecycle.on_startup`) wires services that need a connected persistence backend.
- Construction-phase ordering invariants: `agent_registry` must be built BEFORE `auto_wire_meetings`; `tunnel_provider` is wired unconditionally (not gated by `integrations.enabled`).
- On-startup ordering invariants: `SettingsService` auto-wire must precede `WorkflowExecutionObserver` registration (so it picks up resolver-driven `max_subworkflow_depth` instead of the seed default); `OntologyService` wires after `persistence.connect()` via `_wire_ontology_service`.
- On-startup ordering invariants: `SettingsService` auto-wire must precede `WorkflowExecutionObserver` registration (so it picks up resolver-driven `max_subworkflow_depth` instead of the seed default); `OntologyService` wires after `persistence.connect()` via `_wire_ontology_service`. Cost-dial services (`BudgetConfig`, `CostForecastRepository`, `CostForecaster`, `StubBenchmarkScoreProvider`, `ParetoAnalyzer`) wire via `_try_wire_cost_dial` AFTER persistence connects; it is best-effort (logs `BUDGET_FORECAST_UNAVAILABLE` and the controllers 503 if it fails or persistence is absent) and idempotent (skips when already wired), so a transient shared-app boot does not poison startup. The approved forecast's `forecast_id` + `ceiling_amount` are stamped onto the `Task` in the work pipeline's intake phase (`WorkPipelineService._link_forecast`) so the in-loop `BudgetChecker` enforces the per-brief ceiling and the engine can stamp halt context for the resume banner.
- Runtime services: `synthorg.workers.runtime_builder.build_runtime_services` selects behind ONE provider-present switch and returns a `RuntimeServices` pair (worker execution service + multi-agent coordinator) built from a SINGLE shared boot `AgentEngine`: `AgentEngineExecutionService` + a `build_coordinator(...)` coordinator with a provider, `NoProviderExecutionService` + `None` coordinator as the empty-company backstop. The `_install_runtime_services` boot hook installs both via the `AppState.worker_execution_service` and `AppState.coordinator` seams; it is appended FIRST after the persistence/SettingsService hooks so the once-only `set_worker_execution_service` / `set_coordinator` cannot lose the race with the worker property's lazy `LifecycleAdvancingExecutionService` default. Empty-company rejects task creation at the controller (`AgentRuntimeNotConfiguredError`, 4014) and `/coordinate` honestly 503s (no coordinator). `swap_worker_execution_service` / `swap_coordinator` / `swap_provider_registry` hold a lock (synchronised against lazy reads).
- Setup completion: `post_setup_reinit()` (provider reload, agent bootstrap, AND runtime-services rebuild + dual hot-swap of the worker execution service and coordinator, defined in `src/synthorg/api/controllers/setup/agent_helpers.py`) propagates failures, and `settings_svc.set("api", "setup_complete", "true")` only runs if reinit returns clean. The whole check/validate/reinit/persist sequence is serialised under `COMPLETE_LOCK` in the same module so two concurrent `/setup/complete` requests cannot race on the flag write. A half-configured runtime presenting itself as "complete" is worse than a clear error the operator can retry after fixing the underlying provider config.

Expand Down
65 changes: 65 additions & 0 deletions docs/design/budget.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,71 @@ budget:
total_monthly: 100.00
```

## Cost as a First-Class Dial

Beyond the passive ledger and the soft-warning ladder, cost is a prospective,
operator-facing control with three capabilities.

### Pre-flight forecast gate

`CostForecaster` produces a forecast for a brief before any spend commits: a
mid-point `estimated_cost` plus a `[lower_bound, upper_bound]` uncertainty band.
The estimate is a hybrid of a per-tier static prior and a Bayesian-shrinkage blend
with historical per-role observations, so a cold start collapses to the prior and
a warm history pulls toward the observed mean.

`ForecastGate` sits at the work-entry seam between the entry adapters and the work
pipeline. When `forecast_required` is set it refuses to dispatch a brief unless a
persisted `Forecast` row with `decision = approved` covers it; a missing or pending
forecast yields a fresh `pending` row and raises `CostForecastApprovalRequiredError`
(HTTP 402) so the operator decides via the dashboard. The decision state machine is
`pending -> approved | rejected | superseded`; `approved` and `rejected` are terminal.

```yaml
budget:
forecast_required: true
forecast_default_ceiling_multiplier: 1.5 # UI suggests ceiling = upper_bound * this
forecast_shrinkage_prior_weight: 5.0 # Bayesian prior pseudo-count
forecast_static_prior_per_turn_large: 0.10
forecast_static_prior_per_turn_medium: 0.03
forecast_static_prior_per_turn_small: 0.005
forecast_static_prior_per_turn_local_small: 0.0
```

On approval the work-entry intake phase stamps the forecast's `forecast_id` and
the operator-approved `ceiling_amount` onto the `Task` so the in-loop checker and
the engine can act on them.

### Hard real-money ceiling

Independent of the monthly soft-warning ladder, a per-run hard ceiling halts the org
cleanly mid-run. The in-loop `BudgetChecker` raises `RunHardCeilingExceededError` (a
subclass of `BudgetExhaustedError`) the moment accumulated cost meets or exceeds the
task's `hard_ceiling` (falling back to the global `run_hard_ceiling` setting when the
per-task value is unset; `0.0` disables the global fallback). The engine routes the
crossing to `TerminationReason.PARKED` via `ApprovalGate.park_context` so execution
state is preserved, and stamps a `HaltContext` (accumulated cost, ceiling, currency,
timestamp) onto the forecast row. The operator raises the ceiling via
`POST /budget/forecasts/{id}/raise_ceiling` (rejected with `RunHardCeilingTooLowError`
if the new ceiling does not clear the accumulated cost), which clears the halt context
so the run can resume.

```yaml
budget:
run_hard_ceiling: 0.00 # absolute amount in budget.currency; 0 disables the global fallback
```

### Cost / quality Pareto view

`ParetoAnalyzer` answers "90% of the quality at 40% of the cost if you downgrade these
roles". It walks the current per-role model assignments and observed costs, looks up a
downgrade candidate per role, and pairs the `cost_saving_pct` with the `quality_delta_pct`
drawn from a `BenchmarkScoreProvider`. The `StubBenchmarkScoreProvider` supplies
calibrated per-tier constants pending a measured benchmark integration; every
`ParetoPoint` and the frontier carry a `source` field that the dashboard surfaces
verbatim so stub data is never mistaken for measured data. The frontier is advisory:
downgrade callouts link to the agent settings surface rather than mutating models inline.

## Quota Degradation

When a provider's quota is exhausted, the framework applies the configured degradation
Expand Down
4 changes: 4 additions & 0 deletions scripts/_ghost_wiring_manifest.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,7 @@ ENFORCED SubmitRedTeamReportTool #1986 -- constructed by security/redteam/builde
ENFORCED InMemoryRedTeamReportRepository #1986 -- constructed by security/redteam/builder.build_red_team_runtime; single-shot per-execution storage for RedTeamReport entries (persistent backends land in a follow-up)
ENFORCED HeuristicGroundingChecker #1986 -- constructed by security/redteam/grounding/factory.build_grounding_checker when RedTeamConfig.grounding_checker_kind=="heuristic"; deterministic regex-based ungrounded-claim flag, swappable for a substrate-backed checker
ENFORCED build_red_team_agent_identity #1986 -- called by security/redteam/builder.build_red_team_runtime; constructs the built-in Red Team AgentIdentity from the BUILTIN_ROLES catalog
ENFORCED CostForecaster #1982 -- constructed in api/app.py::_wire_cost_dial_services and swapped onto AppState; consumed by ForecastBudgetController.create_forecast
ENFORCED ForecastGate #1982 -- constructed in engine/pipeline/entry/boot.py::_forecast_gate_for and passed through build_work_entry_adapter as the work_pipeline seam; consulted on every brief submission before the work pipeline dispatches
ENFORCED ParetoAnalyzer #1982 -- constructed in api/app.py::_wire_cost_dial_services; consumed by ForecastBudgetController.get_pareto
ENFORCED StubBenchmarkScoreProvider #1982 -- constructed in api/app.py::_wire_cost_dial_services; powers ParetoAnalyzer's quality axis; swap to real benchmark provider when #1980 lands
72 changes: 72 additions & 0 deletions src/synthorg/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@
build_sqlite_persistence_config,
normalize_ssl_mode_value,
)
from synthorg.persistence.cost_forecast_protocol import (
CostForecastRepository, # noqa: TC001 -- runtime annotation in helper
)
from synthorg.persistence.factory import create_backend
from synthorg.persistence.protocol import PersistenceBackend # noqa: TC001
from synthorg.providers.health import ProviderHealthTracker # noqa: TC001
Expand Down Expand Up @@ -222,6 +225,73 @@ def _resolve_budget_int(key: str) -> int:
return resolve_init_int(SettingNamespace.BUDGET, key)


def _wire_cost_dial_services(app_state: AppState) -> None:
"""Wire the cost-dial services onto AppState behind a persistence guard.

Builds the BudgetConfig, StubBenchmarkScoreProvider, the per-backend
CostForecastRepository, the CostForecaster, and the ParetoAnalyzer
then hot-swaps them onto AppState through the lock-protected
``swap_*`` methods so an in-flight controller read cannot race the
boot wiring.
"""
from synthorg.budget.benchmark_stub import ( # noqa: PLC0415
StubBenchmarkScoreProvider,
)
from synthorg.budget.config import BudgetConfig # noqa: PLC0415
from synthorg.budget.forecaster import CostForecaster # noqa: PLC0415
from synthorg.budget.pareto import ParetoAnalyzer # noqa: PLC0415
from synthorg.persistence.sqlite.cost_forecast_repo import ( # noqa: PLC0415
SQLiteCostForecastRepository,
)

budget_config = BudgetConfig()
benchmark_provider = StubBenchmarkScoreProvider()
backend_name = app_state.persistence.backend_name
if backend_name == "sqlite":
forecast_repo: CostForecastRepository = SQLiteCostForecastRepository(
app_state.persistence.get_db(),
write_context=app_state.persistence.write_context,
currency_getter=lambda: budget_config.currency,
)
else:
from synthorg.persistence.postgres.cost_forecast_repo import ( # noqa: PLC0415
PostgresCostForecastRepository,
)

forecast_repo = PostgresCostForecastRepository(
app_state.persistence.get_db(),
currency_getter=lambda: budget_config.currency,
)
forecaster = CostForecaster(budget_config=budget_config)
analyzer = ParetoAnalyzer(
benchmark_provider=benchmark_provider,
budget_config=budget_config,
)
app_state.swap_budget_config(budget_config)
app_state.swap_benchmark_provider(benchmark_provider)
app_state.swap_cost_forecast_repo(forecast_repo)
app_state.swap_cost_forecaster(forecaster)
app_state.swap_pareto_analyzer(analyzer)


def _try_wire_cost_dial(app_state: AppState) -> None:
"""Wire the cost-dial services best-effort; never poison startup."""
if not app_state.has_persistence or app_state.cost_forecaster is not None:
return
try:
_wire_cost_dial_services(app_state)
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.warning(
API_APP_STARTUP,
service="cost_dial",
note="cost-dial wiring failed; controllers will 503",
error_type=type(exc).__name__,
error=safe_error_description(exc),
)


def _build_default_approval_timeout_scheduler(
*,
approval_store: ApprovalStoreProtocol,
Expand Down Expand Up @@ -1057,6 +1127,8 @@ async def _install_runtime_services() -> None:
# ProjectWorkspaceService provisions one persistent git-backed
# tree per project under the workspace base. Persistence-less
# boots (test fixtures, dev apps with no DB) skip wiring -- the
_try_wire_cost_dial(app_state)

# service is optional and gates on ``has_project_workspace_service``.
if app_state.has_persistence and app_state.project_workspace_service is None:
# Guard against partial-startup retry: this hook fires once
Expand Down
3 changes: 3 additions & 0 deletions src/synthorg/api/controllers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from synthorg.api.controllers.budget_config_versions import (
BudgetConfigVersionController,
)
from synthorg.api.controllers.budget_forecast import ForecastBudgetController
from synthorg.api.controllers.capabilities import CapabilitiesController
from synthorg.api.controllers.ceremony_policy import (
CeremonyPolicyController,
Expand Down Expand Up @@ -118,6 +119,7 @@
MeetingController,
ArtifactController,
BudgetController,
ForecastBudgetController,
AnalyticsController,
ProviderController,
ApprovalsController,
Expand Down Expand Up @@ -227,6 +229,7 @@
"EvaluationConfigVersionController",
"EventStreamController",
"ExperimentsController",
"ForecastBudgetController",
"IntegrationHealthController",
"InterruptController",
"LivenessController",
Expand Down
Loading
Loading