Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SynthOrg is a self-contained, self-hostable platform for **synthetic organisatio

It is provider-agnostic (<!--RS:providers_via_litellm-->2700+<!--/RS--> LLMs via [LiteLLM](https://github.com/BerriAI/litellm)), configuration-driven ([Pydantic v2](https://docs.pydantic.dev/) models), and licensed BUSL-1.1 (converts to Apache 2.0 at the Change Date).

> **Project status (read this).** The framework and infrastructure are built and tested (<!--RS:tests-->32,000+<!--/RS--> tests, 80%+ coverage): API, dashboard, CLI, dual-backend persistence, the provider layer, and every subsystem as importable, unit-tested components. The autonomous agent **runtime** that makes the organisation actually execute work is **in active development** and tracked openly on the [roadmap](https://synthorg.io/docs/roadmap/) and the [issue tracker](https://github.com/Aureliolo/synthorg/issues). Today, starting SynthOrg brings up the platform and dashboard; running a company end to end is the work in flight. We would rather you see exactly what is built versus in progress than discover it later.
> **Project status (read this).** The framework and infrastructure are built and tested (<!--RS:tests-->33,000+<!--/RS--> tests, 80%+ coverage): API, dashboard, CLI, dual-backend persistence, the provider layer, and every subsystem as importable, unit-tested components. The autonomous agent **runtime** that makes the organisation actually execute work is **in active development** and tracked openly on the [roadmap](https://synthorg.io/docs/roadmap/) and the [issue tracker](https://github.com/Aureliolo/synthorg/issues). Today, starting SynthOrg brings up the platform and dashboard; running a company end to end is the work in flight. We would rather you see exactly what is built versus in progress than discover it later.

## What is available now

Expand Down
12 changes: 6 additions & 6 deletions data/runtime_stats.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
schema_version: 1
last_generated_utc: '2026-05-20T21:40:55Z'
generator_revision: 0fafe591c
last_generated_utc: '2026-05-22T06:03:17Z'
generator_revision: ac5cdc935
stats:
tests:
raw: 32241
rounded: 32000
display: 32,000+
raw: 33060
rounded: 33000
display: 33,000+
mem0_stars:
raw: 56281
raw: 56395
rounded: 56000
display: 56k+
providers_curated:
Expand Down
1 change: 1 addition & 0 deletions docs/design/engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ task:
- "Unit and integration tests with >80% coverage"
- "API documentation"
estimated_complexity: "medium" # simple, medium, complex, epic
stakes: "normal" # low, normal, high, critical (assessed; drives stakes-aware model routing)
task_structure: "parallel" # sequential, parallel, mixed
coordination_topology: "auto" # auto, sas, centralized, decentralized, context_dependent
budget_limit: 2.00 # max spend for this task in base currency (display formatted per budget.currency)
Expand Down
15 changes: 15 additions & 0 deletions docs/design/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,21 @@ routing:
- "ollama"
```

### Stakes-aware routing (orthogonal layer)

Model routing above selects *which provider/model* serves a request. **Stakes-aware
routing** is a separate, pluggable layer that re-tiers that selection based on how
consequential the work is. Each task (and subtask) carries a `stakes` level
(`low` / `normal` / `high` / `critical`), assessed by the `StakesAssessor`. The
`StakesRoutingStrategy` then picks the cheapest model tier whose benchmark score
clears the per-stakes quality floor, bumps one tier when coordination metrics are
unhealthy, and marks high/critical work for the red-team gate. High/critical work
is never routed below the agent's configured tier; low/normal work may drop to a
cheaper tier (still clearing the floor) to save cost. It is config-selectable via
`stakes_routing.strategy` (`stakes_aware` default, `flat` to opt out) and applied in
the engine *before* the budget auto-downgrade, so a hard budget ceiling still wins
over a stakes upgrade. See [Pluggable Subsystems](../reference/pluggable-subsystems.md).

### Multi-Provider Model Resolution

When multiple providers register the same model ID or alias, the `ModelResolver`
Expand Down
14 changes: 14 additions & 0 deletions docs/reference/pluggable-subsystems.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,20 @@ Domain errors live at `meta/errors.py::RollbackMutationDeniedError` (409) and `U
- `engine/workspace/git_backend/external_remote.py::ExternalRemoteGitBackend` (GitHub / GitLab / Gitea / Forgejo resolved via the connection catalog; ships protocol + thin clone/push/fetch glue; deep OAuth hardening is a tracked follow-up).
- `engine/workspace/git_backend/factory.py::build_git_backend()`: `StrategyRegistry[GitBackend]` keyed on `GitBackendType`. Missing required deps fail fast at construction with `GitBackendConfigError`. Wired at boot in `api/app.py::_install_runtime_services` under the `has_persistence` gate, alongside `ProjectWorkspaceService`.

### Stakes assessment (model-routing input)

- `engine/stakes/protocol.py`: `StakesAssessor` `@runtime_checkable` Protocol (`assess_task(task)` / `assess_subtask(subtask)` returning `Stakes`).
- `engine/stakes/heuristic.py::DefaultStakesAssessor` (safe default: deterministic, combines complexity base mapping, high/critical keyword signals, and critical-priority elevation; unknown complexity fails safe upward to HIGH).
- `engine/stakes/config.py::StakesAssessmentConfig` (frozen) with `assessor: NotBlankStr` discriminator, the complexity-to-stakes rules, and the keyword sets.
- `engine/stakes/factory.py::build_stakes_assessor()`: `StrategyRegistry[StakesAssessor]` keyed on `assessor` ("heuristic" default). Consumed by `DecompositionService` (per-subtask) and the work pipeline's LEAF path (parent task).

### Stakes-aware model routing

- `engine/routing_policy/protocol.py`: `StakesRoutingStrategy` `@runtime_checkable` Protocol (`route(task, identity)` returning a frozen `StakesRoutingDecision`).
- `engine/routing_policy/strategies.py::StakesAwareStrategy` (safe default: picks the cheapest tier whose benchmark score clears the per-stakes `QualityFloors`, bumps one tier when coordination metrics are unhealthy, marks high/critical work for the red-team gate, and never downgrades below the agent's configured tier) and `FlatStrategy` (no-op control / opt-out).
- `engine/routing_policy/config.py::StakesRoutingConfig` (frozen) with `strategy: NotBlankStr` discriminator, `QualityFloors` (validated non-decreasing), `red_team_min_stakes`, and the coordination-nudge thresholds.
- `engine/routing_policy/factory.py::build_stakes_router()`: `StrategyRegistry[StakesRoutingStrategy]` keyed on `strategy` ("stakes_aware" default; "stakes_aware" requires a benchmark provider, "flat" is dependency-free). Wired at boot in `workers/runtime_builder.py::_build_stakes_router_or_none` and injected into `AgentEngine`, which applies routing before the budget auto-downgrade (a hard budget ceiling wins over a stakes upgrade).

## Services are a distinct pattern (not pluggable subsystems)

A **service** wraps one or more repositories to keep controllers thin and centralise audit logging, and MAY orchestrate multiple repositories (e.g. `WorkflowService` spans `workflow_definitions` + `workflow_versions`; `MemoryService` spans fine-tune checkpoints + runs + settings).
Expand Down
2 changes: 1 addition & 1 deletion docs/roadmap/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Current status

SynthOrg is in **active development**. The platform, infrastructure, and
subsystem libraries are built and tested (<!--RS:tests-->32,000+<!--/RS-->
subsystem libraries are built and tested (<!--RS:tests-->33,000+<!--/RS-->
tests in the latest run, 80%+ coverage) and integrated through a REST +
WebSocket API, a React 19 dashboard, and a Go CLI. The autonomous agent
**runtime** that makes the organisation actually execute work is the focus of
Expand Down
1 change: 1 addition & 0 deletions scripts/_ghost_wiring_manifest.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,4 @@ ENFORCED SandboxBriefRunner #1995 -- constructed by meta/toolsmith/factory.py::b
ENFORCED ToolCreationApplier #1995 -- constructed by meta/toolsmith/factory.py::build_toolsmith; validates then live-registers an approved blueprint, retires on rollback
ENFORCED DynamicToolRegistry #1995 -- constructed by meta/toolsmith/factory.py::build_toolsmith; mutable live authored-tool registry read behind the static surface
ENFORCED install_dynamic_tool_layer #1995 -- called by api/app.py::_wire_toolsmith; layers the dynamic registry into the live MCP invoker so authored tools dispatch
ENFORCED build_stakes_router #1998 -- called by workers/runtime_builder._build_stakes_router_or_none when a benchmark provider is wired; injected into AgentEngine for stakes-aware tier selection before budget downgrade
11 changes: 11 additions & 0 deletions src/synthorg/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -1290,6 +1290,17 @@ async def _install_runtime_services() -> None:
and app_state.review_gate_service is not None
):
app_state.review_gate_service.set_vision_gate(services.vision_gate)
# Same seam for the adversarial red-team gate: built in the
# runtime wiring once the boot engine exists, attached here so a
# review pipeline supplied with red_team_input reaches the live
# gate. ``None`` when the red-team subsystem is disabled.
if (
services.red_team_runtime is not None
and app_state.review_gate_service is not None
):
app_state.review_gate_service.set_red_team_gate(
services.red_team_runtime.gate,
)
# Bring the real client-request, goal/objective, and
# task-board work-entry paths online: ensure the configured
# default projects exist and attach the entry adapters. No-op
Expand Down
1 change: 1 addition & 0 deletions src/synthorg/config/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ def default_config_dict() -> dict[str, object]:
"communication": {},
"providers": {},
"routing": {},
"stakes_routing": {},
"logging": None,
"graceful_shutdown": {},
"workflow_handoffs": [],
Expand Down
7 changes: 7 additions & 0 deletions src/synthorg/config/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
from synthorg.core.role import CustomRole # noqa: TC001
from synthorg.core.types import NotBlankStr # noqa: TC001
from synthorg.engine.coordination.section_config import CoordinationSectionConfig
from synthorg.engine.routing_policy.config import StakesRoutingConfig
from synthorg.engine.strategy.models import StrategyConfig
from synthorg.engine.task_engine_config import TaskEngineConfig
from synthorg.engine.workflow.config import WorkflowConfig
Expand Down Expand Up @@ -386,6 +387,8 @@ class RootConfig(BaseModel):
communication: Communication configuration.
providers: LLM provider configurations keyed by provider name.
routing: Model routing configuration.
stakes_routing: Stakes-aware model routing configuration (strategy
discriminator, per-stakes quality floors, coordination nudge).
logging: Logging configuration (``None`` to use platform defaults).
graceful_shutdown: Graceful shutdown configuration.
workflow_handoffs: Cross-department workflow handoffs.
Expand Down Expand Up @@ -472,6 +475,10 @@ class RootConfig(BaseModel):
default_factory=RoutingConfig,
description="Model routing configuration",
)
stakes_routing: StakesRoutingConfig = Field(
default_factory=StakesRoutingConfig,
description="Stakes-aware model routing configuration",
)
logging: LogConfig | None = Field(
default=None,
description="Logging configuration",
Expand Down
55 changes: 55 additions & 0 deletions src/synthorg/core/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,61 @@ class Complexity(StrEnum):
EPIC = "epic"


class Stakes(StrEnum):
"""How consequential a subtask or task is for stakes-aware routing.

Distinct from :class:`Priority` (urgency/importance) and
:class:`Complexity` (effort): stakes captures the *cost of being
wrong*. Low-stakes work tolerates a cheap model; high-stakes work
(architecture, irreversible decisions) warrants a strong model and
an adversarial red-team review. The authoritative ordering lives in
``_STAKES_ORDER`` below.
"""

LOW = "low"
NORMAL = "normal"
HIGH = "high"
CRITICAL = "critical"


# Ordering: LOW (least consequential) < NORMAL < HIGH < CRITICAL.
_STAKES_ORDER: tuple[Stakes, ...] = tuple(Stakes)

# Guard against silent breakage if the enum is reordered or extended
# without updating the ordering tuple (mirrors _SENIORITY_ORDER).
_stakes_members = set(Stakes)
_stakes_order_set = set(_STAKES_ORDER)
if _stakes_order_set != _stakes_members:
_missing_stakes = _stakes_members - _stakes_order_set
_extra_stakes = _stakes_order_set - _stakes_members
_stakes_msg = (
f"_STAKES_ORDER is out of sync with Stakes: "
f"missing={_missing_stakes}, extra={_extra_stakes}"
)
raise RuntimeError(_stakes_msg)
del _stakes_members, _stakes_order_set

_STAKES_RANK: dict[Stakes, int] = {
level: idx for idx, level in enumerate(_STAKES_ORDER)
}


def compare_stakes(a: Stakes, b: Stakes) -> int:
"""Compare two stakes levels.

Returns negative if *a* is lower-stakes than *b*, zero if equal,
positive if *a* is higher-stakes than *b*.

Args:
a: First stakes level.
b: Second stakes level.

Returns:
Integer indicating relative stakes.
"""
return _STAKES_RANK[a] - _STAKES_RANK[b]


class WorkflowType(StrEnum):
"""Workflow type for organizing task execution.

Expand Down
9 changes: 9 additions & 0 deletions src/synthorg/core/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
Complexity,
CoordinationTopology,
Priority,
Stakes,
TaskSource,
TaskStatus,
TaskStructure,
Expand Down Expand Up @@ -128,6 +129,14 @@ class Task(BaseModel):
default=Complexity.MEDIUM,
description="Task complexity estimate",
)
stakes: Stakes = Field(
default=Stakes.NORMAL,
description=(
"How consequential this task is, driving stakes-aware model"
" routing (cheap model for low stakes, strong model plus"
" red-team for high/critical stakes)"
),
)
budget_limit: float = Field(
default=0.0,
ge=0.0,
Expand Down
30 changes: 30 additions & 0 deletions src/synthorg/engine/agent_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
from synthorg.engine.plan_models import PlanExecuteConfig
from synthorg.engine.prompt import SystemPrompt
from synthorg.engine.recovery import RecoveryStrategy
from synthorg.engine.routing_policy.router import StakesRouter
from synthorg.engine.session import EventReader
from synthorg.engine.stagnation.protocol import StagnationDetector
from synthorg.engine.task_engine import TaskEngine
Expand Down Expand Up @@ -200,6 +201,7 @@ def __init__( # noqa: PLR0913, PLR0915
interrupt_store: InterruptStore | None = None,
approval_interrupt_timeout_seconds: float | None = None,
external_api_runtime: ExternalApiRuntime | None = None,
stakes_router: StakesRouter | None = None,
clock: Clock | None = None,
) -> None:
self._agent_middleware_chain = agent_middleware_chain
Expand Down Expand Up @@ -238,6 +240,7 @@ def __init__( # noqa: PLR0913, PLR0915
# the agent's registry. ``None`` (mode DISABLED) is a no-op.
self._mcp_self_consumer = mcp_self_consumer
self._approval_interrupt_timeout_seconds = approval_interrupt_timeout_seconds
self._stakes_router = stakes_router
self._stagnation_detector = stagnation_detector
self._auto_loop_config = auto_loop_config
self._hybrid_loop_config = hybrid_loop_config
Expand Down Expand Up @@ -352,6 +355,25 @@ async def coordinate(
raise ExecutionStateError(msg)
return await self._coordinator.coordinate(context)

async def _route_stakes(
self,
identity: AgentIdentity,
task: Task,
) -> AgentIdentity:
"""Apply stakes-aware routing, returning the adjusted identity.

Delegates to the injected :class:`StakesRouter` to pick a model
tier matched to ``task.stakes``. The red-team requirement carried
on the decision is consumed downstream by the review pipeline,
which derives it from the persisted ``task.stakes``; this method
only adjusts the model the subtask runs with.
"""
assert self._stakes_router is not None # noqa: S101 # caller checks
decision = await self._stakes_router.route(task=task, identity=identity)
if decision.selected_model == identity.model:
return identity
return identity.model_copy(update={"model": decision.selected_model})

async def run( # noqa: PLR0913, C901
self,
*,
Expand Down Expand Up @@ -402,6 +424,14 @@ async def run( # noqa: PLR0913, C901
max_turns=max_turns,
)

# Stakes-aware routing runs BEFORE the budget block: it
# sets the target tier from the task's stakes, then the
# budget auto-downgrade below may lower it further when
# budget is tight (a hard ceiling must win over a stakes
# upgrade).
if self._stakes_router is not None:
identity = await self._route_stakes(identity, task)

if self._budget_enforcer:
preflight = await self._budget_enforcer.check_can_execute(
agent_id,
Expand Down
6 changes: 6 additions & 0 deletions src/synthorg/engine/decomposition/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from synthorg.core.enums import (
Complexity,
CoordinationTopology,
Stakes,
TaskStatus,
TaskStructure,
)
Expand All @@ -28,6 +29,7 @@ class SubtaskDefinition(BaseModel):
description: Detailed subtask description.
dependencies: IDs of other subtasks this one depends on.
estimated_complexity: Complexity estimate for routing.
stakes: Stakes level for stakes-aware model routing.
required_skills: Skill IDs needed for routing.
required_tags: Tags needed for multi-faceted routing match. When
set, the routing scorer awards a small bonus to agents whose
Expand All @@ -49,6 +51,10 @@ class SubtaskDefinition(BaseModel):
default=Complexity.MEDIUM,
description="Complexity estimate for routing",
)
stakes: Stakes = Field(
default=Stakes.NORMAL,
description="Stakes level for stakes-aware model routing",
)
required_skills: tuple[NotBlankStr, ...] = Field(
default=(),
description="Skill IDs needed for routing",
Expand Down
Loading
Loading