Skip to content
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ curl http://localhost:3000/api/v1/health # backend (via web proxy)

```text
src/synthorg/
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation))
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation)), AppState hot-reload slots (provider_registry, model_router with swap methods), settings dispatcher lifecycle
auth/ # Authentication subpackage (controller, service, middleware, JWT + API key + WS ticket store, models, config)
budget/ # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
cli/ # Python CLI module (superseded by top-level cli/ Go binary)
Expand All @@ -128,8 +128,9 @@ src/synthorg/
persistence/ # Operational data persistence — pluggable PersistenceBackend protocol, SQLite initial, SettingsRepository (namespaced settings CRUD) (see Memory & Persistence design page)
observability/ # Structured logging, correlation tracking, log sinks
providers/ # LLM provider abstraction (LiteLLM adapter)
settings/ # Runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces), Fernet encryption for sensitive values, config bridge, ConfigResolver (typed composed reads for controllers), validation, registry, change notifications via message bus
settings/ # Runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces), Fernet encryption for sensitive values, config bridge, ConfigResolver (typed composed reads for controllers), validation, registry, change notifications via message bus, SettingsSubscriber protocol (subscriber.py), SettingsChangeDispatcher (dispatcher.py, polls #settings channel, routes to subscribers, restart_required filtering)
definitions/ # Per-namespace setting definitions (api, company, providers, memory, budget, security, coordination, observability, backup)
subscribers/ # Concrete settings subscribers (ProviderSettingsSubscriber — rebuilds ModelRouter on strategy change, MemorySettingsSubscriber — advisory logging for memory config)
security/ # SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
templates/ # Pre-built company templates, personality presets, and builder
tools/ # Tool registry, built-in tools (file_system/, git, sandbox/, code_runner), git clone SSRF prevention (git_url_validator), MCP bridge (mcp/), role-based access, approval tool (request_human_approval), tool factory (build_default_tools, build_default_tools_from_config)
Expand Down
3 changes: 2 additions & 1 deletion docs/design/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -973,6 +973,7 @@ future CLI tool are thin clients that call the API -- they contain no business l
| `/api/v1/budget` | Spending, limits, projections |
| `/api/v1/approvals` | Pending human approvals queue |
| `/api/v1/analytics` | Performance metrics, dashboards |
| `/api/v1/settings` | Runtime-editable configuration (9 namespaces), schema discovery |
| `/api/v1/providers` | Model provider status, config |
| `/api/v1/ws` | WebSocket for real-time updates (ticket auth via `?ticket=`) |
| `POST /api/v1/auth/ws-ticket` | Exchange JWT for one-time WebSocket connection ticket |
Expand Down Expand Up @@ -1041,7 +1042,7 @@ and retry guidance.
- **Budget Panel**: Spending charts, per-agent breakdown (projections/alerts planned)
- **Meeting Logs**: Placeholder — coming soon
- **Artifact Browser**: Placeholder — coming soon
- **Settings**: Runtime-editable configuration via DB-backed settings persistence (9 namespaces: api, company, providers, memory, budget, security, coordination, observability, backup). 4-layer resolution (DB > env > YAML > code defaults), Fernet encryption for sensitive values, REST API (GET/PUT/DELETE + schema endpoints for dynamic UI generation), change notifications via message bus. `ConfigResolver` provides typed composed reads for API controllers (assembles full Pydantic config models from individually resolved settings, using `asyncio.TaskGroup` for parallel resolution)
- **Settings**: Runtime-editable configuration via DB-backed settings persistence (9 namespaces: api, company, providers, memory, budget, security, coordination, observability, backup). 4-layer resolution (DB > env > YAML > code defaults), Fernet encryption for sensitive values, REST API (GET/PUT/DELETE + schema endpoints for dynamic UI generation), change notifications via message bus. `ConfigResolver` provides typed composed reads for API controllers (assembles full Pydantic config models from individually resolved settings, using `asyncio.TaskGroup` for parallel resolution). **Hot-reload**: `SettingsChangeDispatcher` polls the `#settings` bus channel and routes change notifications to registered `SettingsSubscriber` implementations. Settings marked `restart_required=True` are filtered (logged as WARNING, not dispatched). Concrete subscribers: `ProviderSettingsSubscriber` (rebuilds `ModelRouter` on `routing_strategy` change via `AppState.swap_model_router`), `MemorySettingsSubscriber` (advisory logging for non-restart memory settings)

### Human Roles

Expand Down
73 changes: 69 additions & 4 deletions src/synthorg/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,11 @@
from synthorg.persistence.config import PersistenceConfig, SQLiteConfig
from synthorg.persistence.factory import create_backend
from synthorg.persistence.protocol import PersistenceBackend # noqa: TC001
from synthorg.settings.dispatcher import SettingsChangeDispatcher
from synthorg.settings.subscribers import (
MemorySettingsSubscriber,
ProviderSettingsSubscriber,
)

if TYPE_CHECKING:
from collections.abc import Awaitable, Callable, Sequence
Expand Down Expand Up @@ -169,6 +174,7 @@ def _build_lifecycle( # noqa: PLR0913
persistence: PersistenceBackend | None,
message_bus: MessageBus | None,
bridge: MessageBusBridge | None,
settings_dispatcher: SettingsChangeDispatcher | None,
task_engine: TaskEngine | None,
meeting_scheduler: MeetingScheduler | None,
app_state: AppState,
Expand Down Expand Up @@ -202,6 +208,7 @@ async def on_startup() -> None:
persistence,
message_bus,
bridge,
settings_dispatcher,
task_engine,
meeting_scheduler,
app_state,
Expand All @@ -223,6 +230,7 @@ async def on_shutdown() -> None:
await _safe_shutdown(
task_engine,
meeting_scheduler,
settings_dispatcher,
bridge,
message_bus,
persistence,
Expand Down Expand Up @@ -258,6 +266,8 @@ async def _cleanup_on_failure( # noqa: PLR0913
started_bus: bool,
bridge: MessageBusBridge | None = None,
started_bridge: bool = False,
settings_dispatcher: SettingsChangeDispatcher | None = None,
started_settings_dispatcher: bool = False,
task_engine: TaskEngine | None = None,
started_task_engine: bool = False,
meeting_scheduler: MeetingScheduler | None = None,
Expand All @@ -276,6 +286,12 @@ async def _cleanup_on_failure( # noqa: PLR0913
API_APP_STARTUP,
"Cleanup: failed to stop task engine",
)
if started_settings_dispatcher and settings_dispatcher is not None:
await _try_stop(
settings_dispatcher.stop(),
API_APP_STARTUP,
"Cleanup: failed to stop settings dispatcher",
)
if started_bridge and bridge is not None:
await _try_stop(
bridge.stop(),
Expand Down Expand Up @@ -338,22 +354,24 @@ async def _init_persistence(
raise


async def _safe_startup( # noqa: PLR0913, C901
async def _safe_startup( # noqa: PLR0913, PLR0912, PLR0915, C901
persistence: PersistenceBackend | None,
message_bus: MessageBus | None,
bridge: MessageBusBridge | None,
settings_dispatcher: SettingsChangeDispatcher | None,
task_engine: TaskEngine | None,
meeting_scheduler: MeetingScheduler | None,
app_state: AppState,
) -> None:
"""Start all services: persistence, bus, bridge, task engine, scheduler.
"""Start all services: persistence, bus, bridge, dispatcher, task engine, scheduler.

Executes in order; on failure, cleans up already-started
components in reverse order before re-raising.
"""
started_bus = False
started_bridge = False
started_persistence = False
started_settings_dispatcher = False
started_task_engine = False
started_meeting_scheduler = False
try:
Expand Down Expand Up @@ -391,6 +409,16 @@ async def _safe_startup( # noqa: PLR0913, C901
)
raise
started_bridge = True
if settings_dispatcher is not None:
try:
await settings_dispatcher.start()
except Exception:
logger.exception(
API_APP_STARTUP,
error="Failed to start settings dispatcher",
)
raise
started_settings_dispatcher = True
if task_engine is not None:
try:
task_engine.start()
Expand Down Expand Up @@ -419,6 +447,8 @@ async def _safe_startup( # noqa: PLR0913, C901
started_bus=started_bus,
bridge=bridge,
started_bridge=started_bridge,
settings_dispatcher=settings_dispatcher,
started_settings_dispatcher=started_settings_dispatcher,
task_engine=task_engine,
started_task_engine=started_task_engine,
meeting_scheduler=meeting_scheduler,
Expand All @@ -427,14 +457,15 @@ async def _safe_startup( # noqa: PLR0913, C901
raise


async def _safe_shutdown(
async def _safe_shutdown( # noqa: PLR0913
task_engine: TaskEngine | None,
meeting_scheduler: MeetingScheduler | None,
settings_dispatcher: SettingsChangeDispatcher | None,
bridge: MessageBusBridge | None,
message_bus: MessageBus | None,
persistence: PersistenceBackend | None,
) -> None:
"""Stop scheduler, task engine, bridge, message bus and disconnect persistence.
"""Stop scheduler, task engine, dispatcher, bridge, bus, persistence.

Mirrors ``_cleanup_on_failure`` reverse order: scheduler first (depends on
orchestrator), then task engine so it can drain queued mutations and
Expand All @@ -452,6 +483,12 @@ async def _safe_shutdown(
API_APP_SHUTDOWN,
"Failed to stop task engine",
)
if settings_dispatcher is not None:
await _try_stop(
settings_dispatcher.stop(),
API_APP_SHUTDOWN,
"Failed to stop settings dispatcher",
)
if bridge is not None:
await _try_stop(
bridge.stop(),
Expand Down Expand Up @@ -608,6 +645,12 @@ def create_app( # noqa: PLR0913
)

bridge = _build_bridge(message_bus, channels_plugin)
settings_dispatcher = _build_settings_dispatcher(
message_bus,
settings_service,
effective_config,
app_state,
)
plugins: list[ChannelsPlugin] = [channels_plugin]
middleware = _build_middleware(api_config)

Expand All @@ -621,6 +664,7 @@ def create_app( # noqa: PLR0913
persistence,
message_bus,
bridge,
settings_dispatcher,
task_engine,
meeting_scheduler,
app_state,
Expand Down Expand Up @@ -666,6 +710,27 @@ def _build_bridge(
return MessageBusBridge(message_bus, channels_plugin)


def _build_settings_dispatcher(
message_bus: MessageBus | None,
settings_service: SettingsService | None,
config: RootConfig,
app_state: AppState,
) -> SettingsChangeDispatcher | None:
"""Create settings change dispatcher if bus and settings are available."""
if message_bus is None or settings_service is None:
return None
provider_sub = ProviderSettingsSubscriber(
config=config,
app_state=app_state,
settings_service=settings_service,
)
memory_sub = MemorySettingsSubscriber()
return SettingsChangeDispatcher(
message_bus=message_bus,
subscribers=(provider_sub, memory_sub),
)


def _build_middleware(api_config: ApiConfig) -> list[Middleware]:
"""Build the middleware stack from configuration."""
rl = api_config.rate_limit
Expand Down
94 changes: 87 additions & 7 deletions src/synthorg/api/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,10 @@
from synthorg.hr.registry import AgentRegistryService # noqa: TC001
from synthorg.observability import get_logger
from synthorg.observability.events.api import API_APP_STARTUP, API_SERVICE_UNAVAILABLE
from synthorg.observability.events.settings import SETTINGS_SERVICE_SWAPPED
from synthorg.persistence.protocol import PersistenceBackend # noqa: TC001
from synthorg.providers.registry import ProviderRegistry # noqa: TC001
from synthorg.providers.routing.router import ModelRouter # noqa: TC001
from synthorg.settings.resolver import ConfigResolver
from synthorg.settings.service import SettingsService # noqa: TC001

Expand All @@ -33,13 +36,11 @@
class AppState:
"""Typed application state container.

Service fields (``persistence``, ``message_bus``, ``cost_tracker``,
``auth_service``, ``task_engine``, ``coordinator``,
``agent_registry``) accept ``None`` at construction time for
dev/test mode. Property
accessors raise ``ServiceUnavailableError`` (HTTP 503) when the
service is not configured, producing a clear error instead of an
opaque ``AttributeError``.
All service fields accept ``None`` at construction time for
dev/test mode. Property accessors raise
``ServiceUnavailableError`` (HTTP 503) when the service is not
configured, producing a clear error instead of an opaque
``AttributeError``.

Attributes:
config: Root company configuration.
Expand All @@ -59,8 +60,10 @@ class AppState:
"_meeting_orchestrator",
"_meeting_scheduler",
"_message_bus",
"_model_router",
"_performance_tracker",
"_persistence",
"_provider_registry",
"_settings_service",
"_task_engine",
"_ticket_store",
Expand All @@ -86,6 +89,8 @@ def __init__( # noqa: PLR0913
meeting_orchestrator: MeetingOrchestrator | None = None,
meeting_scheduler: MeetingScheduler | None = None,
settings_service: SettingsService | None = None,
provider_registry: ProviderRegistry | None = None,
model_router: ModelRouter | None = None,
startup_time: float = 0.0,
) -> None:
self.config = config
Expand All @@ -102,6 +107,8 @@ def __init__( # noqa: PLR0913
self._meeting_orchestrator = meeting_orchestrator
self._meeting_scheduler = meeting_scheduler
self._settings_service = settings_service
self._provider_registry = provider_registry
self._model_router = model_router
self._config_resolver: ConfigResolver | None = (
ConfigResolver(settings_service=settings_service, config=config)
if settings_service is not None
Expand Down Expand Up @@ -279,3 +286,76 @@ def set_auth_service(self, service: AuthService) -> None:
logger.error(API_APP_STARTUP, error=msg)
raise RuntimeError(msg)
self._auth_service = service

# ── Swappable provider services (hot-reload) ─────────────────

@property
def has_provider_registry(self) -> bool:
"""Check whether the provider registry is configured."""
return self._provider_registry is not None

@property
def provider_registry(self) -> ProviderRegistry:
"""Return provider registry or raise 503."""
return self._require_service(
self._provider_registry,
"provider_registry",
)

def swap_provider_registry(self, registry: ProviderRegistry) -> None:
"""Replace the provider registry (hot-reload).

Unlike ``set_*`` methods, this does not guard against
replacement — it is designed for repeated hot-reload swaps.
Atomic under asyncio's cooperative scheduling — no ``await``
points, so no coroutine can observe a partially-updated state.

.. note::
Not yet wired to a subscriber — provided for the provider
runtime CRUD feature (issue #451).

Args:
registry: New provider registry instance.
"""
old_count = (
len(self._provider_registry) if self._provider_registry is not None else 0
)
self._provider_registry = registry
logger.info(
SETTINGS_SERVICE_SWAPPED,
service="provider_registry",
old_provider_count=old_count,
new_provider_count=len(registry),
)

@property
def has_model_router(self) -> bool:
"""Check whether the model router is configured."""
return self._model_router is not None

@property
def model_router(self) -> ModelRouter:
"""Return model router or raise 503."""
return self._require_service(self._model_router, "model_router")

def swap_model_router(self, router: ModelRouter) -> None:
"""Replace the model router (hot-reload).

Unlike ``set_*`` methods, this does not guard against
replacement — it is designed for repeated hot-reload swaps.
Atomic under asyncio's cooperative scheduling — no ``await``
points, so no coroutine can observe a partially-updated state.

Args:
router: New model router instance.
"""
old_strategy = (
self._model_router.strategy_name if self._model_router is not None else None
)
self._model_router = router
logger.info(
SETTINGS_SERVICE_SWAPPED,
service="model_router",
old_strategy=old_strategy,
new_strategy=router.strategy_name,
)
1 change: 1 addition & 0 deletions src/synthorg/communication/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"#design",
"#incidents",
"#code-review",
"#settings",
"#watercooler",
)

Expand Down
15 changes: 15 additions & 0 deletions src/synthorg/observability/events/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,18 @@
SETTINGS_NOT_FOUND: Final[str] = "settings.not_found"
SETTINGS_REGISTRY_DUPLICATE: Final[str] = "settings.registry.duplicate"
SETTINGS_CONFIG_PATH_MISS: Final[str] = "settings.config_bridge.path_miss"

# ── Dispatcher & subscriber events ────────────────────────────────

SETTINGS_DISPATCHER_STARTED: Final[str] = "settings.dispatcher.started"
SETTINGS_DISPATCHER_STOPPED: Final[str] = "settings.dispatcher.stopped"
SETTINGS_DISPATCHER_POLL_ERROR: Final[str] = "settings.dispatcher.poll_error"
SETTINGS_DISPATCHER_CHANNEL_DEAD: Final[str] = "settings.dispatcher.channel_dead"
SETTINGS_SUBSCRIBER_NOTIFIED: Final[str] = "settings.subscriber.notified"
SETTINGS_SUBSCRIBER_ERROR: Final[str] = "settings.subscriber.error"
SETTINGS_SUBSCRIBER_RESTART_REQUIRED: Final[str] = (
"settings.subscriber.restart_required"
)
SETTINGS_SERVICE_SWAPPED: Final[str] = "settings.service.swapped"
SETTINGS_SERVICE_SWAP_FAILED: Final[str] = "settings.service.swap_failed"
SETTINGS_CHANNEL_CREATED: Final[str] = "settings.channel.created"
Loading
Loading