Skip to content
Merged
8 changes: 5 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,10 @@ curl http://localhost:3000/api/v1/health # backend (via web proxy)

```text
src/synthorg/
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation)), AppState hot-reload slots (provider_registry, model_router with swap methods, provider_management), settings dispatcher lifecycle
api/ # Litestar REST + WebSocket API (controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation)), AppState hot-reload slots (provider_registry, model_router with swap methods, provider_management), settings dispatcher lifecycle
auth/ # Authentication subpackage (controller, service, middleware, JWT + API key + WS ticket store, models, config)
backup/ # Backup and restore -- scheduled/manual/lifecycle backups of persistence DB, agent memory, and company config. BackupService orchestrator, BackupScheduler (periodic asyncio task), RetentionManager (count + age pruning), tar.gz compression, SHA-256 checksums, manifest tracking, validated restore with atomic rollback and safety backup
handlers/ # ComponentHandler protocol + concrete handlers: PersistenceComponentHandler (SQLite VACUUM INTO), MemoryComponentHandler (copytree), ConfigComponentHandler (copy2)
budget/ # Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
cli/ # Python CLI module (superseded by top-level cli/ Go binary)
communication/ # Message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution
Expand All @@ -130,7 +132,7 @@ src/synthorg/
providers/ # LLM provider abstraction (LiteLLM adapter), auth types (AuthType enum: api_key/oauth/custom_header/none), presets (ProviderPreset, PROVIDER_PRESETS for Ollama/LM Studio/OpenRouter/vLLM), runtime CRUD (management/ -- ProviderManagementService, asyncio.Lock-serialized create/update/delete/test, hot-reload of ProviderRegistry + ModelRouter via AppState swap)
settings/ # Runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces, including JSON type for structural data), Fernet encryption for sensitive values, config bridge (JSON serialization for Pydantic models/collections), ConfigResolver (typed scalar + structural data accessors for controllers — get_agents, get_departments, get_provider_configs with validation fallbacks to YAML), validation, registry, change notifications via message bus, SettingsSubscriber protocol (subscriber.py), SettingsChangeDispatcher (dispatcher.py, polls #settings channel, routes to subscribers, restart_required filtering)
definitions/ # Per-namespace setting definitions (api, company, providers, memory, budget, security, coordination, observability, backup)
subscribers/ # Concrete settings subscribers (ProviderSettingsSubscriber rebuilds ModelRouter on strategy change, MemorySettingsSubscriber advisory logging for memory config)
subscribers/ # Concrete settings subscribers (ProviderSettingsSubscriber -- rebuilds ModelRouter on strategy change, MemorySettingsSubscriber -- advisory logging for memory config, BackupSettingsSubscriber -- reconfigures BackupScheduler interval and RetentionManager policies on settings change)
security/ # SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
templates/ # Pre-built company templates, personality presets, and builder
tools/ # Tool registry, built-in tools (file_system/, git, sandbox/, code_runner), git clone SSRF prevention (git_url_validator), MCP bridge (mcp/), role-based access, approval tool (request_human_approval), tool factory (build_default_tools, build_default_tools_from_config), sandbox factory (sandbox/factory.py: build_sandbox_backends, resolve_sandbox_for_category, cleanup_sandbox_backends -- per-category backend selection from SandboxingConfig)
Expand Down Expand Up @@ -197,7 +199,7 @@ site/ # Astro landing page (synthorg.io)
- **Every module** with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED` from `events.context_budget`). Each domain has its own module see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED` from `events.context_budget`, `BACKUP_STARTED` from `events.backup`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Per the style guide, the event names should use double dashes instead of single dashes. Please update BACKUP_STARTED to BACKUP--STARTED.

Suggested change
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED` from `events.context_budget`, `BACKUP_STARTED` from `events.backup`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED` from `events.context_budget`, `BACKUP_STARTED` from `events.backup`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
+ **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED` from `events.context_budget`, `BACKUP--STARTED` from `events.backup`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

- **Structured kwargs**: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
- **All error paths** must log at WARNING or ERROR with context before raising
- **All state transitions** must log at INFO
Expand Down
55 changes: 54 additions & 1 deletion docs/design/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -1067,7 +1067,7 @@ and retry guidance.
- *Provider management*: Add/edit/delete providers, connection test, preset-based creation -- integrated as a tab alongside company config and user settings.
- *DB-backed persistence*: 9 namespaces (api, company, providers, memory, budget, security, coordination, observability, backup). Setting types: `STRING`, `INTEGER`, `FLOAT`, `BOOLEAN`, `ENUM`, `JSON`. 4-layer resolution: DB > env > YAML > code defaults. Fernet encryption for `sensitive` values. REST API (`GET`/`PUT`/`DELETE` + schema endpoints for dynamic UI generation), change notifications via message bus.
- *`ConfigResolver`*: Typed scalar accessors assemble full Pydantic config models from individually resolved settings (using `asyncio.TaskGroup` for parallel resolution). Structural data accessors (`get_agents`, `get_departments`, `get_provider_configs`) resolve JSON-typed settings with Pydantic schema validation and graceful fallback to `RootConfig` defaults on invalid data.
- *Hot-reload*: `SettingsChangeDispatcher` polls the `#settings` bus channel and routes change notifications to registered `SettingsSubscriber` implementations. Settings marked `restart_required=True` are filtered (logged as WARNING, not dispatched). Concrete subscribers: `ProviderSettingsSubscriber` (rebuilds `ModelRouter` on `routing_strategy` change via `AppState.swap_model_router`), `MemorySettingsSubscriber` (advisory logging for non-restart memory settings).
- *Hot-reload*: `SettingsChangeDispatcher` polls the `#settings` bus channel and routes change notifications to registered `SettingsSubscriber` implementations. Settings marked `restart_required=True` are filtered (logged as WARNING, not dispatched). Concrete subscribers: `ProviderSettingsSubscriber` (rebuilds `ModelRouter` on `routing_strategy` change via `AppState.swap_model_router`), `MemorySettingsSubscriber` (advisory logging for non-restart memory settings), `BackupSettingsSubscriber` (toggles `BackupScheduler` on `enabled` change, reschedules interval on `schedule_hours` change).

### Human Roles

Expand All @@ -1078,3 +1078,56 @@ and retry guidance.
| **Manager** | Department-level authority | Manages one team/department directly |
| **Observer** | Read-only | Watch the company operate, no intervention |
| **Pair Programmer** | Direct collaboration with one agent | Work alongside a specific agent in real-time |

## Backup and Restore

The backup system protects persistent data -- persistence DB, agent memory, and company configuration -- through automated and manual backups with configurable retention policies and validated restore.

### Architecture

- **BackupService**: Central orchestrator coordinating component handlers, manifests, compression, and scheduling
- **ComponentHandler protocol**: Pluggable interface for backing up and restoring individual data components
- `PersistenceComponentHandler`: SQLite `VACUUM INTO` for consistent point-in-time copies
- `MemoryComponentHandler`: `shutil.copytree` with `symlinks=True` for agent memory data directory
- `ConfigComponentHandler`: `shutil.copy2` for company YAML configuration
Comment on lines +1091 to +1092
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Rephrase the noun-heavy phrase for readability.

The phrase “agent memory data directory” is dense; a small rewrite improves scanability in docs.

✍️ Suggested wording
-  - `MemoryComponentHandler`: `shutil.copytree` with `symlinks=True` for agent memory data directory
+  - `MemoryComponentHandler`: `shutil.copytree` with `symlinks=True` for the agent-memory directory
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- `MemoryComponentHandler`: `shutil.copytree` with `symlinks=True` for agent memory data directory
- `ConfigComponentHandler`: `shutil.copy2` for company YAML configuration
- `MemoryComponentHandler`: `shutil.copytree` with `symlinks=True` for the agent-memory directory
- `ConfigComponentHandler`: `shutil.copy2` for company YAML configuration
🧰 Tools
🪛 LanguageTool

[style] ~1091-~1091: Using four (or more) nouns in a row may decrease readability.
Context: ...util.copytreewithsymlinks=Truefor agent memory data directory -ConfigComponentHandler: shutil.c...

(FOUR_NN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/operations.md` around lines 1091 - 1092, Reword the dense phrase
"agent memory data directory" in the docs entry for MemoryComponentHandler to
make it more readable and scannable; for example, replace it with "the directory
that stores an agent's memory data" (or "agent's memory directory") in the list
item referencing MemoryComponentHandler (alongside the existing
ConfigComponentHandler entry) so the line reads clearly while preserving the
mention of shutil.copytree and symlinks=True.

- **BackupScheduler**: Background asyncio task for periodic backups with interruptible sleep via `asyncio.Event`
- **RetentionManager**: Prunes old backups by count and age; never prunes the most recent backup or `pre_migration`-tagged backups

### Backup Triggers

| Trigger | When | Behavior |
|---------|------|----------|
| Scheduled | Configurable interval (default: 6h) | Background, non-blocking |
| Pre-shutdown | `Company.shutdown()` / SIGTERM | Synchronous, skips compression |
| Post-startup | After config load, before accepting tasks | Snapshot as recovery point |
| Manual | `POST /api/v1/admin/backups` | On-demand, returns manifest |
| Pre-migration | Before restore operations | Safety net, automatic |

### Restore Flow

1. Validate `backup_id` format (12-char hex)
2. Load and verify manifest (schema version compatibility check)
3. Re-compute and verify SHA-256 checksum against manifest
4. Validate component sources (handler-specific checks)
5. Create safety backup (pre-migration trigger)
6. Atomic restore per component (`.bak` rollback on failure)
7. Return `RestoreResponse` with safety backup ID

### Configuration

Backup settings live in the `backup` namespace with runtime editability via `BackupSettingsSubscriber`:

- `enabled`: Toggle scheduler start/stop
- `schedule_hours`: Reschedule interval (1--168 hours)
- `compression`, `on_shutdown`, `on_startup`: Advisory (read at use time)
- `path`: Requires restart (not dispatched)

### REST API

| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/api/v1/admin/backups` | Trigger manual backup |
| `GET` | `/api/v1/admin/backups` | List available backups |
| `GET` | `/api/v1/admin/backups/{id}` | Get backup details |
| `DELETE` | `/api/v1/admin/backups/{id}` | Delete a specific backup |
| `POST` | `/api/v1/admin/backups/restore` | Restore from backup (requires `confirm=true`) |
Loading
Loading