Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SynthOrg is a Python framework for building **synthetic organizations**, autonom

Define your company in YAML. Agents collaborate through a message bus, follow workflows (Kanban, Agile sprints, or custom), track costs against budgets, and produce real artifacts. The framework is provider-agnostic (<!--RS:providers_via_litellm-->2700+<!--/RS--> LLMs via [LiteLLM](https://github.com/BerriAI/litellm)), configuration-driven ([Pydantic v2](https://docs.pydantic.dev/) models), and designed for the full autonomy spectrum, from human approval on every action to fully autonomous operation.

> **Early access.** Core subsystems are built and tested (<!--RS:tests-->29,000+<!--/RS--> tests, 80%+ coverage). APIs may change between releases. See the [roadmap](https://synthorg.io/docs/roadmap/) for what's next.
> **Early access.** Core subsystems are built and tested (<!--RS:tests-->30,000+<!--/RS--> tests, 80%+ coverage). APIs may change between releases. See the [roadmap](https://synthorg.io/docs/roadmap/) for what's next.

## Why SynthOrg?

Expand Down
4 changes: 2 additions & 2 deletions data/competitors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@ competitors:
memory: {support: full, note: "Pluggable architecture with 3 backends (Mem0, composite, in-memory); 5 memory types (working, episodic, semantic, procedural, social); hybrid retrieval (dense + BM25 sparse, linear-weighted fusion)"}
tool_use: {support: full, note: "MCP protocol, sandbox isolation, invocation tracking; 14 ToolCategory values (file_system, code_execution, version_control, web, database, terminal, design, communication, analytics, deployment, memory, ontology, mcp, other)"}
human_in_loop: {support: full, note: "Approval gates, review workflows, 4 autonomy tiers (FULL, SEMI, SUPERVISED, LOCKED), two-stage safety classifier, LLM fallback evaluator"}
budget_tracking: {support: partial, note: "Per-token, per-agent, hierarchical cascades, CFO optimization, and automatic model downgrade shipped; risk-unit action budgets still in progress"}
security_model: {support: partial, note: "Rule engine + LLM evaluator, progressive trust, audit trail, output scanning, and hallucination detection shipped; self-healing SSRF still in progress"}
budget_tracking: {support: full, note: "Per-token, per-agent, hierarchical cascades, CFO optimization, automatic model downgrade, and risk-unit action budgets"}
security_model: {support: full, note: "Rule engine + LLM evaluator, progressive trust, audit trail, output scanning, hallucination detection, and self-healing SSRF validation"}
observability: {support: full, note: "Structured logging, correlation tracking, log shipping, redaction, Prometheus metrics, OTLP"}
web_dashboard: {support: full, note: "React 19 dashboard with org chart, tasks, budgets, workflow editor, setup wizard, WebSocket + SSE resilience"}
cli: {support: partial, note: "Go binary for container management and verification; not used for agent orchestration"}
Expand Down
14 changes: 7 additions & 7 deletions data/runtime_stats.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
schema_version: 1
last_generated_utc: '2026-05-13T19:13:08Z'
generator_revision: 3e69976a8
last_generated_utc: '2026-05-15T15:49:25Z'
generator_revision: d29621464
stats:
tests:
raw: 30090
raw: 30536
rounded: 30000
display: 30,000+
mem0_stars:
raw: 55598
raw: 55792
rounded: 55000
display: 55k+
providers_curated:
raw: 19
display: '19'
raw: 20
display: '20'
providers_via_litellm:
raw: 2708
display: 2700+
Expand All @@ -25,7 +25,7 @@ stats:
sources:
tests: uv run python -m pytest --collect-only -q
mem0_stars: gh api repos/mem0ai/mem0 --jq .stargazers_count
providers_curated: synthorg.providers.presets.list_presets
providers_curated: synthorg.providers.presets.list_featured_presets
providers_via_litellm: len(litellm.model_cost)
subagents: glob .claude/agents/*.md
convention_gates: glob scripts/check_*.py
2 changes: 1 addition & 1 deletion docs/design/a2a-protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Four enforcement strategies are defined behind `QuadraticEnforcementStrategy`:
| `hard_block` | Reject new connections when `max_agent_connections` exceeded |
| `disabled` | No detection or enforcement |

Only `alert_only` ships today; the other three are defined in config but not yet implemented. See [Security -> Quadratic Communication Enforcement](security.md#quadratic-communication-enforcement).
`alert_only` is the shipped enforcement strategy. `soft_throttle`, `hard_block`, and `disabled` are defined in config; the dispatch shape is in place but the per-mode behaviour is not yet wired into `MessageBus.publish`. See [Security -> Quadratic Communication Enforcement](security.md#quadratic-communication-enforcement).
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Outdated

## Configuration Summary

Expand Down
28 changes: 21 additions & 7 deletions docs/design/hr-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,16 +293,25 @@ human decision.

!!! info "Design decisions ([Decision Log](../architecture/decisions.md) D9, D10)"

Each decision below names the protocol that ships today and the
concrete `Initial strategy` that the default factory wires. "Initial
strategy" is the shipped default, not aspirational scaffolding;
operators replace it by registering an alternative strategy on the
relevant factory.

- **D9: Task Reassignment.** Pluggable `TaskReassignmentStrategy` protocol. Initial
strategy: queue-return; tasks return to unassigned queue, existing `TaskRoutingService`
re-routes with priority boost for reassigned tasks. Future strategies:
same-department/lowest-load, manager-decides (LLM), HR agent decides.
strategy: queue-return (concrete: `QueueReturnStrategy` in
`src/synthorg/hr/queue_return_strategy.py`); tasks return to unassigned queue,
existing `TaskRoutingService` re-routes with priority boost for reassigned tasks.
Future strategies on the backlog: same-department / lowest-load, manager-decides
(LLM), HR agent decides.
- **D10: Memory Archival.** Pluggable `MemoryArchivalStrategy` protocol. Initial
strategy: full snapshot, read-only. Pipeline: retrieve all memories, archive to
`ArchivalStore`, selectively promote semantic+procedural memories to
strategy: full snapshot, read-only (concrete: `FullSnapshotStrategy` in
`src/synthorg/hr/full_snapshot_strategy.py`). Pipeline: retrieve all memories,
archive to `ArchivalStore`, selectively promote semantic+procedural memories to
`OrgMemoryBackend` (rule-based), clean hot store, mark agent TERMINATED. Rehiring
restores archived memories into a new `AgentIdentity`. Future strategies: selective
discard, full-accessible.
restores archived memories into a new `AgentIdentity`. Future strategies on the
backlog: selective discard, full-accessible.

## Performance Tracking

Expand Down Expand Up @@ -486,6 +495,11 @@ Agents can move between seniority levels based on performance:

!!! info "Design decisions ([Decision Log](../architecture/decisions.md) D13, D14, D15)"

Each decision below names the protocol that ships today and the
concrete `Initial strategy` that the default factory wires. "Initial
strategy" is the shipped default; operators substitute via the
factory.

- **D13: Promotion Criteria.** Pluggable `PromotionCriteriaStrategy` protocol. Initial
strategy: configurable threshold gates. `ThresholdEvaluator` with
`min_criteria_met: int` (N of M) + `required_criteria: list[str]`. Setting `min=total`
Expand Down
10 changes: 5 additions & 5 deletions docs/design/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,11 +289,11 @@ placeholder factories:
through `safe_error_description(exc)` (SEC-1) and `domain_code` falls back
to `exc.domain_code` when present.
- `not_supported(tool_name, reason)`: stable `status="error"` /
`domain_code="not_supported"` envelope for tools whose service facade is
not yet wired. Emits the `MCP_HANDLER_NOT_IMPLEMENTED` WARNING event so
operators can alert on unwired tools. After META-MCP-2 every tool is
wired, so this path only fires for tools registered after PR1 that have
not yet been given a concrete handler.
`domain_code="not_supported"` envelope for tools whose service facade
is not wired. Emits the `MCP_HANDLER_NOT_IMPLEMENTED` WARNING event so
operators can alert on unwired tools. Every tool registered today is
wired; this path fires only for newly registered tools that have not
been given a concrete handler.
- `service_fallback(tool_name, reason)`: helper retained in `common.py`
for future surgical use. Emits `MCP_HANDLER_SERVICE_FALLBACK`;
META-MCP-2 removed every call site and the integration sweep at
Expand Down
31 changes: 31 additions & 0 deletions docs/design/verification-quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,37 @@ intake strategy contracts.

---

## Order of Operations

The four quality and approval surfaces (verification stage, review
pipeline, mid-execution `AUTH_REQUIRED` park, post-completion
`IN_REVIEW` gate) operate at distinct points in the task lifecycle.

| Phase | Surface | Trigger | Task status during | Exit | Where documented |
|-------|---------|---------|--------------------|------|------------------|
| Mid-execution | `AUTH_REQUIRED` park | Agent calls a tool that requires approval at runtime (e.g. `deploy`, `db:admin`). Driven by `ApprovalGate` middleware. | `AUTH_REQUIRED` | Approved: returns to `ASSIGNED`. Denied / timeout: `CANCELLED`. | [Security: Approval Workflow](security.md#approval-workflow) |
| Agent done | Verification stage | Workflow blueprint has a `VERIFICATION` control-flow node. Runs as a separate evaluator agent with its own context. | `IN_PROGRESS` (engine-internal) | Pass: continue to next node. Fail: regenerate. Refer: hand to human via `VERIFICATION_REFER` edge. | This page, [Workflow Node and Edge Types](#workflow-node-and-edge-types) |
| Agent done | Review pipeline | Task transitions `IN_PROGRESS` to `IN_REVIEW`. Chain of `ReviewStage` instances runs. | `IN_REVIEW` | First-failing stage returns the task to `IN_PROGRESS`; all-pass moves to `COMPLETED`. | This page, [Review Pipeline](#review-pipeline) |

Key invariants:

- `AUTH_REQUIRED` is the mid-execution park reason and uses the
`ApprovalGate` middleware in the agent harness. The review pipeline
is the post-completion quality gate and uses `ReviewGateService`.
The two are independent: a single task can encounter both (e.g.
pause for deploy approval mid-task, then enter `IN_REVIEW` once the
agent finishes).
- The verification stage runs BEFORE the review pipeline when both
are configured for the same workflow. Verification is a workflow
blueprint construct (a node in the graph); the review pipeline
fires on the `IN_PROGRESS` to `IN_REVIEW` transition that happens
after the workflow's last node completes.
- The review pipeline does not mint new `TaskStatus` values; the
task stays at `IN_REVIEW` throughout, with stage progress in
metadata.

---

## See Also

- [Task & Workflow Engine](engine.md): task dispatch, state coordination
Expand Down
6 changes: 3 additions & 3 deletions docs/guides/agent-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,11 @@ Current behavior (`delete_agent` in `src/synthorg/api/services/_org_agent_mutati
2. The agent record is removed from the active org configuration.
3. A company snapshot is persisted and an `API_AGENT_DELETED` event is logged and broadcast on the `agents` WebSocket channel.

Planned (not yet implemented): automated task reassignment via `TaskReassignmentStrategy`, memory archival via `MemoryArchivalStrategy`, selective promotion to `OrgMemoryBackend`, and an explicit `TERMINATED` lifecycle state. Until those land, fires are best paired with manual task reassignment before the DELETE call.
Not yet wired into the DELETE flow: automated task reassignment via `TaskReassignmentStrategy` (concrete: `QueueReturnStrategy` in `src/synthorg/hr/queue_return_strategy.py`), memory archival via `MemoryArchivalStrategy` (concrete: `FullSnapshotStrategy` in `src/synthorg/hr/full_snapshot_strategy.py`), selective promotion to `OrgMemoryBackend`, and an explicit `TERMINATED` lifecycle state. The strategies exist as part of the offboarding-service shape but the API DELETE handler does not invoke them; fires are best paired with manual task reassignment before the DELETE call.

## Rehiring from archive (planned)
## Rehiring from archive

A dedicated `POST /api/v1/agents/{agent_name}/rehire` endpoint (which would restore archived memory into a new identity with a fresh hire date and version chain) is **not yet implemented** in the agents controller. Until it ships, rehiring is a manual two-step: list archived agents via the existing listing, then recreate with `POST /api/v1/agents` using a fresh `CreateAgentOrgRequest` payload; memory restoration is performed out-of-band through the Memory Admin API. This planned surface sits alongside the same lifecycle automation called out in [Firing](#firing).
A dedicated `POST /api/v1/agents/{agent_name}/rehire` endpoint (which would restore archived memory into a new identity with a fresh hire date and version chain) is not implemented in the agents controller. Rehiring is a manual two-step today: list archived agents via the existing listing, then recreate with `POST /api/v1/agents` using a fresh `CreateAgentOrgRequest` payload; memory restoration is performed out-of-band through the Memory Admin API. The endpoint sits alongside the same lifecycle automation called out in [Firing](#firing); track on the GitHub issue tracker.

## Lifecycle events (WebSocket)

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/budget.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,7 @@ curl "http://localhost:3001/api/v1/budget/records?task_id=${TASK_ID}" \

The response includes `data` (paginated records), `daily_summary` (per-day totals aggregated across ALL matching records, not just the page), and `period_summary` (overall totals + computed `avg_cost`).

Supported query parameters: `agent_id`, `task_id`, `offset`, `limit`. Additional slicing (by provider, model, tag, project, date range) is done client-side from the paginated response today; a dedicated report-generation endpoint is planned but not yet implemented.
Supported query parameters: `agent_id`, `task_id`, `offset`, `limit`. Additional slicing (by provider, model, tag, project, date range) is done client-side from the paginated response. A dedicated report-generation endpoint with server-side slicing is tracked on the GitHub issue tracker.

### Budget alert webhook integration

Expand Down
70 changes: 54 additions & 16 deletions docs/guides/company-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,41 @@ Each provider lists its available models under the `models` key:
alias: "medium"
```

=== "Cross-provider Fallback"

Configure a secondary provider that takes over when the primary
rejects, rate-limits, or times out a request. The `degradation`
block on the primary names the fallback provider; agents resolve
the alias on the secondary when degradation fires.

```yaml
providers:
primary-cloud:
auth_type: api_key
api_key: "sk-..."
degradation:
fallback_provider: secondary-cloud
trigger_on:
- rate_limit
- provider_timeout
- provider_connection
models:
- id: "example-large-001"
alias: "large"
- id: "example-small-001"
alias: "small"
secondary-cloud:
auth_type: api_key
api_key: "sk-backup-..."
models:
# Both providers expose the same alias names so the routing
# layer can hand off without reconfiguring agents.
- id: "alt-large-001"
alias: "large"
- id: "alt-small-001"
alias: "small"
```

---

## Model Routing
Expand All @@ -207,7 +242,7 @@ The `routing` section controls how models are selected for agent tasks.

### Routing Rules

Rules are evaluated in order. Each rule matches by `role_level` and/or `task_type`:
Rules are evaluated in order. Each rule matches by `role_level` and / or `task_type`. **Validation: at least one of `role_level` or `task_type` MUST be set per rule.** A rule with both fields null is rejected at company-load time with a `ConfigValidationError`.

```yaml
routing:
Expand All @@ -232,10 +267,6 @@ routing:
| `preferred_model` | string | *(required)* | Preferred model alias or ID |
| `fallback` | string | `null` | Fallback model |

!!! note

At least one of `role_level` or `task_type` must be set per rule.

---

## Agents
Expand Down Expand Up @@ -301,20 +332,27 @@ Operational data persistence (tasks, cost records, messages, workflows, audit en
persistence:
backend: "postgres" # "sqlite" (default) or "postgres"
sqlite:
path: "/data/synthorg.db" # file path; used when backend == "sqlite"
postgres: # used when backend == "postgres"
path: "/data/synthorg.db" # file path; used when backend == "sqlite"
wal_mode: true # enable WAL journal mode (default)
journal_size_limit: 67108864 # WAL journal cap in bytes (default 64 MB)
postgres: # required when backend == "postgres"
host: "db.internal"
port: 5432
port: 5432 # default 5432
database: "synthorg"
username: "synthorg_app"
password: "${POSTGRES_PASSWORD}" # SecretStr -- redacted from logs
ssl_mode: "verify-full" # prefer verify-full in production
pool_min_size: 1
pool_max_size: 10
pool_timeout_seconds: 30.0
application_name: "synthorg"
statement_timeout_ms: 30000
connect_timeout_seconds: 10.0
password: "${POSTGRES_PASSWORD}" # SecretStr; redacted from logs
ssl_mode: "verify-full" # default "require"; prefer "verify-full" in prod
pool_min_size: 1 # default 1
pool_max_size: 10 # default 10; must be >= pool_min_size
pool_timeout_seconds: 30.0 # default 30.0
application_name: "synthorg" # appears in pg_stat_activity
statement_timeout_ms: 30000 # default 30000 (0 disables)
connect_timeout_seconds: 10.0 # default 10.0
# TimescaleDB hypertable support (Apache-2.0 features only).
# Not available on managed Postgres providers (RDS, Cloud SQL, Azure).
enable_timescaledb: false
cost_records_chunk_interval: "1 day"
audit_entries_chunk_interval: "1 day"
```

The Postgres backend requires the optional extra: install with
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -368,9 +368,9 @@ curl -X DELETE http://localhost:3001/api/v1/admin/memory/fine-tune/checkpoints/$
-H "Cookie: ${SESSION}"
```

### Planned admin endpoints
### Admin endpoints on the backlog

Consolidation, reindex, procedural-skill management, and organization-memory promotion are described in the [Memory design page](../design/memory.md#consolidation-and-retention) but are not yet exposed as REST endpoints. Track progress against the Memory roadmap; in the meantime these operations happen on agent-lifecycle boundaries (consolidation cycles, startup reindex, procedural-memory auto-generation).
Consolidation, reindex, procedural-skill management, and organization-memory promotion are described in the [Memory design page](../design/memory.md#consolidation-and-retention) but are not exposed as REST endpoints today. These operations happen on agent-lifecycle boundaries (consolidation cycles, startup reindex, procedural-memory auto-generation); a dedicated admin surface is tracked on the GitHub issue tracker under the memory label.

---

Expand Down
Loading
Loading