Aureliolo · Aureliolo · May 22, 2026 · May 21, 2026 · May 21, 2026 · May 21, 2026
@@ -19,7 +19,7 @@ SynthOrg is a self-contained, self-hostable platform for **synthetic organisatio
 
 It is provider-agnostic (<!--RS:providers_via_litellm-->2700+<!--/RS--> LLMs via [LiteLLM](https://github.com/BerriAI/litellm)), configuration-driven ([Pydantic v2](https://docs.pydantic.dev/) models), and licensed BUSL-1.1 (converts to Apache 2.0 at the Change Date).
 
-> **Project status (read this).** The framework and infrastructure are built and tested (<!--RS:tests-->32,000+<!--/RS--> tests, 80%+ coverage): API, dashboard, CLI, dual-backend persistence, the provider layer, and every subsystem as importable, unit-tested components. The autonomous agent **runtime** that makes the organisation actually execute work is **in active development** and tracked openly on the [roadmap](https://synthorg.io/docs/roadmap/) and the [issue tracker](https://github.com/Aureliolo/synthorg/issues). Today, starting SynthOrg brings up the platform and dashboard; running a company end to end is the work in flight. We would rather you see exactly what is built versus in progress than discover it later.
+> **Project status (read this).** The framework and infrastructure are built and tested (<!--RS:tests-->33,000+<!--/RS--> tests, 80%+ coverage): API, dashboard, CLI, dual-backend persistence, the provider layer, and every subsystem as importable, unit-tested components. The autonomous agent **runtime** that makes the organisation actually execute work is **in active development** and tracked openly on the [roadmap](https://synthorg.io/docs/roadmap/) and the [issue tracker](https://github.com/Aureliolo/synthorg/issues). Today, starting SynthOrg brings up the platform and dashboard; running a company end to end is the work in flight. We would rather you see exactly what is built versus in progress than discover it later.
 
 ## What is available now
 

@@ -1,13 +1,13 @@
 schema_version: 1
-last_generated_utc: '2026-05-20T21:40:55Z'
-generator_revision: 0fafe591c
+last_generated_utc: '2026-05-22T06:03:17Z'
+generator_revision: ac5cdc935
 stats:
   tests:
-    raw: 32241
-    rounded: 32000
-    display: 32,000+
+    raw: 33060
+    rounded: 33000
+    display: 33,000+
   mem0_stars:
-    raw: 56281
+    raw: 56395
     rounded: 56000
     display: 56k+
   providers_curated:

@@ -130,6 +130,7 @@ task:
     - "Unit and integration tests with >80% coverage"
     - "API documentation"
   estimated_complexity: "medium"  # simple, medium, complex, epic
+  stakes: "normal"               # low, normal, high, critical (assessed; drives stakes-aware model routing)
   task_structure: "parallel"      # sequential, parallel, mixed
   coordination_topology: "auto"  # auto, sas, centralized, decentralized, context_dependent
   budget_limit: 2.00             # max spend for this task in base currency (display formatted per budget.currency)

@@ -196,6 +196,21 @@ routing:
     - "ollama"
 ```
 
+### Stakes-aware routing (orthogonal layer)
+
+Model routing above selects *which provider/model* serves a request. **Stakes-aware
+routing** is a separate, pluggable layer that re-tiers that selection based on how
+consequential the work is. Each task (and subtask) carries a `stakes` level
+(`low` / `normal` / `high` / `critical`), assessed by the `StakesAssessor`. The
+`StakesRoutingStrategy` then picks the cheapest model tier whose benchmark score
+clears the per-stakes quality floor, bumps one tier when coordination metrics are
+unhealthy, and marks high/critical work for the red-team gate. High/critical work
+is never routed below the agent's configured tier; low/normal work may drop to a
+cheaper tier (still clearing the floor) to save cost. It is config-selectable via
+`stakes_routing.strategy` (`stakes_aware` default, `flat` to opt out) and applied in
+the engine *before* the budget auto-downgrade, so a hard budget ceiling still wins
+over a stakes upgrade. See [Pluggable Subsystems](../reference/pluggable-subsystems.md).
+
 ### Multi-Provider Model Resolution
 
 When multiple providers register the same model ID or alias, the `ModelResolver`

@@ -176,6 +176,20 @@ Domain errors live at `meta/errors.py::RollbackMutationDeniedError` (409) and `U
 - `engine/workspace/git_backend/external_remote.py::ExternalRemoteGitBackend` (GitHub / GitLab / Gitea / Forgejo resolved via the connection catalog; ships protocol + thin clone/push/fetch glue; deep OAuth hardening is a tracked follow-up).
 - `engine/workspace/git_backend/factory.py::build_git_backend()`: `StrategyRegistry[GitBackend]` keyed on `GitBackendType`. Missing required deps fail fast at construction with `GitBackendConfigError`. Wired at boot in `api/app.py::_install_runtime_services` under the `has_persistence` gate, alongside `ProjectWorkspaceService`.
 
+### Stakes assessment (model-routing input)
+
+- `engine/stakes/protocol.py`: `StakesAssessor` `@runtime_checkable` Protocol (`assess_task(task)` / `assess_subtask(subtask)` returning `Stakes`).
+- `engine/stakes/heuristic.py::DefaultStakesAssessor` (safe default: deterministic, combines complexity base mapping, high/critical keyword signals, and critical-priority elevation; unknown complexity fails safe upward to HIGH).
+- `engine/stakes/config.py::StakesAssessmentConfig` (frozen) with `assessor: NotBlankStr` discriminator, the complexity-to-stakes rules, and the keyword sets.
+- `engine/stakes/factory.py::build_stakes_assessor()`: `StrategyRegistry[StakesAssessor]` keyed on `assessor` ("heuristic" default). Consumed by `DecompositionService` (per-subtask) and the work pipeline's LEAF path (parent task).
+
+### Stakes-aware model routing
+
+- `engine/routing_policy/protocol.py`: `StakesRoutingStrategy` `@runtime_checkable` Protocol (`route(task, identity)` returning a frozen `StakesRoutingDecision`).
+- `engine/routing_policy/strategies.py::StakesAwareStrategy` (safe default: picks the cheapest tier whose benchmark score clears the per-stakes `QualityFloors`, bumps one tier when coordination metrics are unhealthy, marks high/critical work for the red-team gate, and never downgrades below the agent's configured tier) and `FlatStrategy` (no-op control / opt-out).
+- `engine/routing_policy/config.py::StakesRoutingConfig` (frozen) with `strategy: NotBlankStr` discriminator, `QualityFloors` (validated non-decreasing), `red_team_min_stakes`, and the coordination-nudge thresholds.
+- `engine/routing_policy/factory.py::build_stakes_router()`: `StrategyRegistry[StakesRoutingStrategy]` keyed on `strategy` ("stakes_aware" default; "stakes_aware" requires a benchmark provider, "flat" is dependency-free). Wired at boot in `workers/runtime_builder.py::_build_stakes_router_or_none` and injected into `AgentEngine`, which applies routing before the budget auto-downgrade (a hard budget ceiling wins over a stakes upgrade).
+
 ## Services are a distinct pattern (not pluggable subsystems)
 
 A **service** wraps one or more repositories to keep controllers thin and centralise audit logging, and MAY orchestrate multiple repositories (e.g. `WorkflowService` spans `workflow_definitions` + `workflow_versions`; `MemoryService` spans fine-tune checkpoints + runs + settings).

@@ -3,7 +3,7 @@
 ## Current status
 
 SynthOrg is in **active development**. The platform, infrastructure, and
-subsystem libraries are built and tested (<!--RS:tests-->32,000+<!--/RS-->
+subsystem libraries are built and tested (<!--RS:tests-->33,000+<!--/RS-->
 tests in the latest run, 80%+ coverage) and integrated through a REST +
 WebSocket API, a React 19 dashboard, and a Go CLI. The autonomous agent
 **runtime** that makes the organisation actually execute work is the focus of

@@ -90,3 +90,4 @@ ENFORCED SandboxBriefRunner #1995 -- constructed by meta/toolsmith/factory.py::b
 ENFORCED ToolCreationApplier #1995 -- constructed by meta/toolsmith/factory.py::build_toolsmith; validates then live-registers an approved blueprint, retires on rollback
 ENFORCED DynamicToolRegistry #1995 -- constructed by meta/toolsmith/factory.py::build_toolsmith; mutable live authored-tool registry read behind the static surface
 ENFORCED install_dynamic_tool_layer #1995 -- called by api/app.py::_wire_toolsmith; layers the dynamic registry into the live MCP invoker so authored tools dispatch
+ENFORCED build_stakes_router #1998 -- called by workers/runtime_builder._build_stakes_router_or_none when a benchmark provider is wired; injected into AgentEngine for stakes-aware tier selection before budget downgrade
@@ -1290,6 +1290,17 @@ async def _install_runtime_services() -> None:
             and app_state.review_gate_service is not None
         ):
             app_state.review_gate_service.set_vision_gate(services.vision_gate)
+        # Same seam for the adversarial red-team gate: built in the
+        # runtime wiring once the boot engine exists, attached here so a
+        # review pipeline supplied with red_team_input reaches the live
+        # gate. ``None`` when the red-team subsystem is disabled.
+        if (
+            services.red_team_runtime is not None
+            and app_state.review_gate_service is not None
+        ):
+            app_state.review_gate_service.set_red_team_gate(
+                services.red_team_runtime.gate,
+            )
         # Bring the real client-request, goal/objective, and
         # task-board work-entry paths online: ensure the configured
         # default projects exist and attach the entry adapters. No-op

@@ -22,6 +22,7 @@ def default_config_dict() -> dict[str, object]:
         "communication": {},
         "providers": {},
         "routing": {},
+        "stakes_routing": {},
         "logging": None,
         "graceful_shutdown": {},
         "workflow_handoffs": [],

@@ -27,6 +27,7 @@
 from synthorg.core.role import CustomRole  # noqa: TC001
 from synthorg.core.types import NotBlankStr  # noqa: TC001
 from synthorg.engine.coordination.section_config import CoordinationSectionConfig
+from synthorg.engine.routing_policy.config import StakesRoutingConfig
 from synthorg.engine.strategy.models import StrategyConfig
 from synthorg.engine.task_engine_config import TaskEngineConfig
 from synthorg.engine.workflow.config import WorkflowConfig
@@ -386,6 +387,8 @@ class RootConfig(BaseModel):
         communication: Communication configuration.
         providers: LLM provider configurations keyed by provider name.
         routing: Model routing configuration.
+        stakes_routing: Stakes-aware model routing configuration (strategy
+            discriminator, per-stakes quality floors, coordination nudge).
         logging: Logging configuration (``None`` to use platform defaults).
         graceful_shutdown: Graceful shutdown configuration.
         workflow_handoffs: Cross-department workflow handoffs.
@@ -472,6 +475,10 @@ class RootConfig(BaseModel):
         default_factory=RoutingConfig,
         description="Model routing configuration",
     )
+    stakes_routing: StakesRoutingConfig = Field(
+        default_factory=StakesRoutingConfig,
+        description="Stakes-aware model routing configuration",
+    )
     logging: LogConfig | None = Field(
         default=None,
         description="Logging configuration",

@@ -356,6 +356,61 @@ class Complexity(StrEnum):
     EPIC = "epic"
 
 
+class Stakes(StrEnum):
+    """How consequential a subtask or task is for stakes-aware routing.
+
+    Distinct from :class:`Priority` (urgency/importance) and
+    :class:`Complexity` (effort): stakes captures the *cost of being
+    wrong*. Low-stakes work tolerates a cheap model; high-stakes work
+    (architecture, irreversible decisions) warrants a strong model and
+    an adversarial red-team review. The authoritative ordering lives in
+    ``_STAKES_ORDER`` below.
+    """
+
+    LOW = "low"
+    NORMAL = "normal"
+    HIGH = "high"
+    CRITICAL = "critical"
+
+
+# Ordering: LOW (least consequential) < NORMAL < HIGH < CRITICAL.
+_STAKES_ORDER: tuple[Stakes, ...] = tuple(Stakes)
+
+# Guard against silent breakage if the enum is reordered or extended
+# without updating the ordering tuple (mirrors _SENIORITY_ORDER).
+_stakes_members = set(Stakes)
+_stakes_order_set = set(_STAKES_ORDER)
+if _stakes_order_set != _stakes_members:
+    _missing_stakes = _stakes_members - _stakes_order_set
+    _extra_stakes = _stakes_order_set - _stakes_members
+    _stakes_msg = (
+        f"_STAKES_ORDER is out of sync with Stakes: "
+        f"missing={_missing_stakes}, extra={_extra_stakes}"
+    )
+    raise RuntimeError(_stakes_msg)
+del _stakes_members, _stakes_order_set
+
+_STAKES_RANK: dict[Stakes, int] = {
+    level: idx for idx, level in enumerate(_STAKES_ORDER)
+}
+
+
+def compare_stakes(a: Stakes, b: Stakes) -> int:
+    """Compare two stakes levels.
+
+    Returns negative if *a* is lower-stakes than *b*, zero if equal,
+    positive if *a* is higher-stakes than *b*.
+
+    Args:
+        a: First stakes level.
+        b: Second stakes level.
+
+    Returns:
+        Integer indicating relative stakes.
+    """
+    return _STAKES_RANK[a] - _STAKES_RANK[b]
+
+
 class WorkflowType(StrEnum):
     """Workflow type for organizing task execution.
 

@@ -13,6 +13,7 @@
     Complexity,
     CoordinationTopology,
     Priority,
+    Stakes,
     TaskSource,
     TaskStatus,
     TaskStructure,
@@ -128,6 +129,14 @@ class Task(BaseModel):
         default=Complexity.MEDIUM,
         description="Task complexity estimate",
     )
+    stakes: Stakes = Field(
+        default=Stakes.NORMAL,
+        description=(
+            "How consequential this task is, driving stakes-aware model"
+            " routing (cheap model for low stakes, strong model plus"
+            " red-team for high/critical stakes)"
+        ),
+    )
     budget_limit: float = Field(
         default=0.0,
         ge=0.0,

@@ -83,6 +83,7 @@
     from synthorg.engine.plan_models import PlanExecuteConfig
     from synthorg.engine.prompt import SystemPrompt
     from synthorg.engine.recovery import RecoveryStrategy
+    from synthorg.engine.routing_policy.router import StakesRouter
     from synthorg.engine.session import EventReader
     from synthorg.engine.stagnation.protocol import StagnationDetector
     from synthorg.engine.task_engine import TaskEngine
@@ -200,6 +201,7 @@ def __init__(  # noqa: PLR0913, PLR0915
         interrupt_store: InterruptStore | None = None,
         approval_interrupt_timeout_seconds: float | None = None,
         external_api_runtime: ExternalApiRuntime | None = None,
+        stakes_router: StakesRouter | None = None,
         clock: Clock | None = None,
     ) -> None:
         self._agent_middleware_chain = agent_middleware_chain
@@ -238,6 +240,7 @@ def __init__(  # noqa: PLR0913, PLR0915
         # the agent's registry. ``None`` (mode DISABLED) is a no-op.
         self._mcp_self_consumer = mcp_self_consumer
         self._approval_interrupt_timeout_seconds = approval_interrupt_timeout_seconds
+        self._stakes_router = stakes_router
         self._stagnation_detector = stagnation_detector
         self._auto_loop_config = auto_loop_config
         self._hybrid_loop_config = hybrid_loop_config
@@ -352,6 +355,25 @@ async def coordinate(
             raise ExecutionStateError(msg)
         return await self._coordinator.coordinate(context)
 
+    async def _route_stakes(
+        self,
+        identity: AgentIdentity,
+        task: Task,
+    ) -> AgentIdentity:
+        """Apply stakes-aware routing, returning the adjusted identity.
+
+        Delegates to the injected :class:`StakesRouter` to pick a model
+        tier matched to ``task.stakes``. The red-team requirement carried
+        on the decision is consumed downstream by the review pipeline,
+        which derives it from the persisted ``task.stakes``; this method
+        only adjusts the model the subtask runs with.
+        """
+        assert self._stakes_router is not None  # noqa: S101  # caller checks
+        decision = await self._stakes_router.route(task=task, identity=identity)
+        if decision.selected_model == identity.model:
+            return identity
+        return identity.model_copy(update={"model": decision.selected_model})
+
     async def run(  # noqa: PLR0913, C901
         self,
         *,
@@ -402,6 +424,14 @@ async def run(  # noqa: PLR0913, C901
                     max_turns=max_turns,
                 )
 
+                # Stakes-aware routing runs BEFORE the budget block: it
+                # sets the target tier from the task's stakes, then the
+                # budget auto-downgrade below may lower it further when
+                # budget is tight (a hard ceiling must win over a stakes
+                # upgrade).
+                if self._stakes_router is not None:
+                    identity = await self._route_stakes(identity, task)
+
                 if self._budget_enforcer:
                     preflight = await self._budget_enforcer.check_can_execute(
                         agent_id,

@@ -12,6 +12,7 @@
 from synthorg.core.enums import (
     Complexity,
     CoordinationTopology,
+    Stakes,
     TaskStatus,
     TaskStructure,
 )
@@ -28,6 +29,7 @@ class SubtaskDefinition(BaseModel):
         description: Detailed subtask description.
         dependencies: IDs of other subtasks this one depends on.
         estimated_complexity: Complexity estimate for routing.
+        stakes: Stakes level for stakes-aware model routing.
         required_skills: Skill IDs needed for routing.
         required_tags: Tags needed for multi-faceted routing match.  When
             set, the routing scorer awards a small bonus to agents whose
@@ -49,6 +51,10 @@ class SubtaskDefinition(BaseModel):
         default=Complexity.MEDIUM,
         description="Complexity estimate for routing",
     )
+    stakes: Stakes = Field(
+        default=Stakes.NORMAL,
+        description="Stakes level for stakes-aware model routing",
+    )
     required_skills: tuple[NotBlankStr, ...] = Field(
         default=(),
         description="Skill IDs needed for routing",