Skip to content

feat: stakes-aware model routing (#1998)#2038

Merged
Aureliolo merged 8 commits into
mainfrom
feat/1998-stakes-aware-model-routing
May 22, 2026
Merged

feat: stakes-aware model routing (#1998)#2038
Aureliolo merged 8 commits into
mainfrom
feat/1998-stakes-aware-model-routing

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds stakes-aware model routing: a pluggable layer that re-tiers each task's model selection by how consequential the work is, so cheap models handle low-stakes subtasks and strong models (+ red-team) handle high/critical ones. Total cost drops versus flat downgrade-at-boundary routing at no quality-floor regression.

  • Stakes assessment (engine/stakes/): StakesAssessor protocol + DefaultStakesAssessor (complexity base mapping, high/critical keyword signals, critical-priority elevation, fail-safe upward on unknown complexity) + config discriminator + factory. New Stakes enum (low/normal/high/critical) with a compare_stakes total order.
  • Routing policy (engine/routing_policy/): StakesRoutingStrategy protocol + StakesAwareStrategy (cheapest tier clearing the per-stakes QualityFloors, coordination-metrics nudge, red-team marking, never below the agent's configured tier) and FlatStrategy control arm + config discriminator + factory. Safe default is stakes_aware.
  • Wiring: assessed stakes stamped onto subtasks (decomposition) and LEAF tasks (pipeline); router built at boot in workers/runtime_builder and injected into AgentEngine, applied before the budget auto-downgrade (a hard budget ceiling still wins over a stakes upgrade). Red-team review task pinned to CRITICAL so its reviewer is never downgraded.

Test plan

  • Unit: stakes assessment (complexity/keyword/priority, fail-safe), routing floor selection, red-team threshold, coordination nudge (incl. exact-threshold boundary), tier-ladder helpers, score==floor boundary, benchmark-unavailable fallback, factory dispatch, cost-monotonicity property test, acceptance comparison (cost drops at equal-or-better quality).
  • E2E: tests/e2e/test_stakes_routing_e2e.py drives the full work pipeline through the deterministic scripted simulation harness on a mixed-stakes brief and asserts the stakes-aware run accrues strictly less cost than the flat control arm (verified passing locally).
  • Full unit suite green (30610 passed); ruff, mypy strict, and all convention gates pass via pre-push.

Review coverage

Pre-reviewed by 19 agents (code/python/security/conventions/logging/resilience/async/type-design/test-quality/docs/comment-rot/api-drift/issue-resolution + 5 audit mini-passes). 10 valid findings fixed; 8 false positives documented (notably the PEP 758 except A, B: "syntax error" disproved by passing ruff/mypy/tests, and api-contract-drift premised on stakes being on the wire when it is an internal engine concept).

closes #1998

Aureliolo added 6 commits May 22, 2026 07:16
NotBlankStr strategy discriminator, QualityFloors ordering validator, under-floor and benchmark-failure guards in the router, redaction-safe decomposition logging, stakes-assessed state log, ghost-wiring manifest entry, tier/boundary/fallback tests, e2e cost-drop simulation, and docs.
The new QualityFloors non-decreasing validator rejects polyfactory's independent random floor draws, mirroring the existing IntegrationsConfig pin in the same test.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 77f41268-3739-45f5-b3c0-e544a0471943

📥 Commits

Reviewing files that changed from the base of the PR and between da17866 and f7a101e.

📒 Files selected for processing (1)
  • web/src/stores/tasks.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Test Integration
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/stores/tasks.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/stores/**/*.ts: List reads (fetch*) must set error: string | null on the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the global afterEach. The global afterEach in web/src/test-setup.tsx already calls useToastStore.getState().dismissAll(), cancelPendingPersist(), and useThemeStore.getState().teardown().

Files:

  • web/src/stores/tasks.ts
web/src/{api/endpoints,stores}/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via PaginationMeta. Stores must keep nextCursor + hasMore in state (not offset arithmetic) and early-return when !hasMore || !nextCursor. Display counts must come from data.length.

Files:

  • web/src/stores/tasks.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/stores/tasks.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

  • web/src/stores/tasks.ts
🔇 Additional comments (1)
web/src/stores/tasks.ts (1)

10-19: LGTM!

Also applies to: 31-49


Walkthrough

This PR implements stakes-aware model routing and per-task/subtask stakes assessment. It adds a Stakes enum and Task.stakes, a stakes-assessor subsystem (config, protocol, heuristic, factory), and a pluggable routing-policy subsystem (config, tiers, strategies, router, factory). The engine is wired to apply stakes routing before budget auto-downgrade, review gates support red-team marking, observability events were added, comprehensive unit and e2e tests validate cost/quality properties, and frontend/web code was updated to carry the new stakes enum in payloads and fixtures.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a stakes-aware model routing system designed to optimize cost and quality by matching model tiers to the criticality of individual tasks. By assessing task stakes through complexity, keyword signals, and priority, the system ensures that low-stakes tasks utilize cheaper models while high-stakes tasks are handled by stronger models and flagged for adversarial red-team review. This approach provides a more nuanced routing mechanism that operates orthogonally to existing budget constraints.

Highlights

  • Stakes Assessment: Added a new StakesAssessor protocol and DefaultStakesAssessor implementation to classify task importance based on complexity, keyword signals, and priority.
  • Model Routing: Introduced StakesRoutingStrategy to dynamically select model tiers based on task stakes, ensuring low-stakes tasks use cheaper models while high-stakes tasks receive stronger models.
  • Red-Team Integration: Integrated high/critical stakes marking with the red-team review gate to ensure sensitive tasks receive appropriate adversarial review.
  • Configuration & Wiring: Added pluggable configuration for routing strategies and wired the new router into the AgentEngine execution pipeline, ensuring it runs before budget auto-downgrade.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing feat/1998-stakes-aware-model-routing (f7a101e) with main (5ffc545)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 347-372: The resolver is built from all providers which allows
routing to models owned by inactive providers; in _build_stakes_router_or_none
replace ModelResolver.from_config(app_state.config.providers) with a resolver
constructed only from the active provider configuration (use the active provider
name from app_state.config.names[0] or equivalent and pass only that provider's
entry), so ModelResolver only knows about the runtime provider, and then pass
that resolver into build_stakes_router (ensuring coordination_store and
benchmark_provider usage stays the same).

In `@tests/unit/engine/routing_policy/test_acceptance_comparison.py`:
- Around line 42-53: Annotate the module-level test constants as immutable by
importing Final from typing and declaring types with Final, e.g. change
_PROVIDER, _TIER_MODEL_IDS and _TIER_TOTAL_COST to _PROVIDER: Final[str],
_TIER_MODEL_IDS: Final[dict[ModelTier, str]] and _TIER_TOTAL_COST:
Final[dict[ModelTier, float]] so the intent of immutability is explicit; keep
the existing values and types (use the same ModelTier alias) and add the Final
import at the top of the test module.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a50db7b3-5e24-4e8a-89d2-0d0476c28c75

📥 Commits

Reviewing files that changed from the base of the PR and between 5ffc545 and ac5cdc9.

📒 Files selected for processing (42)
  • docs/design/engine.md
  • docs/design/providers.md
  • docs/reference/pluggable-subsystems.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/api/app.py
  • src/synthorg/config/defaults.py
  • src/synthorg/config/schema.py
  • src/synthorg/core/enums.py
  • src/synthorg/core/task.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/engine/decomposition/service.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/engine/routing_policy/config.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/routing_policy/tiers.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/stakes/factory.py
  • src/synthorg/engine/stakes/heuristic.py
  • src/synthorg/engine/stakes/protocol.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/security/redteam/runner.py
  • src/synthorg/workers/runtime_builder.py
  • tests/e2e/test_stakes_routing_e2e.py
  • tests/unit/config/test_schema.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • tests/unit/engine/stakes/test_assessor.py
  • tests/unit/engine/stakes/test_propagation.py
  • tests/unit/observability/test_events.py
  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Test Integration
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (12)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • src/synthorg/engine/routing_policy/config.py
  • src/synthorg/engine/stakes/factory.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • src/synthorg/engine/stakes/heuristic.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • src/synthorg/engine/routing_policy/config.py
  • src/synthorg/engine/stakes/factory.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • src/synthorg/engine/stakes/heuristic.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • src/synthorg/engine/routing_policy/config.py
  • src/synthorg/engine/stakes/factory.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • src/synthorg/engine/stakes/heuristic.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/observability/test_events.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • src/synthorg/engine/routing_policy/config.py
  • tests/unit/engine/stakes/test_assessor.py
  • src/synthorg/engine/stakes/factory.py
  • tests/unit/config/test_schema.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • tests/unit/engine/stakes/test_propagation.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • tests/e2e/test_stakes_routing_e2e.py
  • src/synthorg/engine/stakes/heuristic.py
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: Two-phase API startup: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend
Construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider unconditionally
On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial AFTER persistence; knowledge substrate via _wire_knowledge_engine AFTER persistence, gated on has_persistence AND has_memory_backend
Pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value

Files:

  • src/synthorg/api/app.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • tests/unit/engine/stakes/test_assessor.py
  • tests/unit/config/test_schema.py
  • tests/unit/engine/stakes/test_propagation.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • tests/e2e/test_stakes_routing_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/observability/test_events.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • tests/unit/engine/stakes/test_assessor.py
  • tests/unit/config/test_schema.py
  • tests/unit/engine/stakes/test_propagation.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • tests/e2e/test_stakes_routing_e2e.py
{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks

Files:

  • src/synthorg/workers/runtime_builder.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/observability/test_events.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • src/synthorg/engine/routing_policy/config.py
  • tests/unit/engine/stakes/test_assessor.py
  • src/synthorg/engine/stakes/factory.py
  • tests/unit/config/test_schema.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • tests/unit/engine/stakes/test_propagation.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • tests/e2e/test_stakes_routing_e2e.py
  • src/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • tests/unit/observability/test_events.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • tests/unit/engine/routing_policy/test_engine_integration.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • tests/unit/engine/routing_policy/test_tiers.py
  • src/synthorg/engine/routing_policy/config.py
  • tests/unit/engine/stakes/test_assessor.py
  • src/synthorg/engine/stakes/factory.py
  • tests/unit/config/test_schema.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • tests/unit/engine/stakes/test_propagation.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/engine/routing_policy/test_strategies.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • tests/unit/engine/routing_policy/test_cost_properties.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • tests/e2e/test_stakes_routing_e2e.py
  • src/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/engine/decomposition/models.py
  • src/synthorg/observability/events/stakes_routing.py
  • src/synthorg/engine/routing_policy/router.py
  • src/synthorg/engine/routing_policy/__init__.py
  • src/synthorg/api/app.py
  • src/synthorg/config/defaults.py
  • src/synthorg/engine/stakes/protocol.py
  • src/synthorg/engine/routing_policy/models.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/stakes/__init__.py
  • src/synthorg/engine/routing_policy/protocol.py
  • src/synthorg/core/task.py
  • src/synthorg/engine/routing_policy/config.py
  • src/synthorg/engine/stakes/factory.py
  • src/synthorg/engine/routing_policy/factory.py
  • src/synthorg/engine/routing_policy/tiers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/config/schema.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/engine/decomposition/service.py
  • src/synthorg/engine/stakes/config.py
  • src/synthorg/engine/pipeline/service.py
  • src/synthorg/security/redteam/runner.py
  • src/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/design/providers.md
  • docs/design/engine.md
  • docs/reference/pluggable-subsystems.md
🔇 Additional comments (40)
src/synthorg/engine/decomposition/models.py (1)

15-15: LGTM!

Also applies to: 32-32, 54-57

src/synthorg/observability/events/stakes_routing.py (1)

1-11: LGTM!

src/synthorg/engine/routing_policy/router.py (1)

20-60: LGTM!

src/synthorg/engine/routing_policy/__init__.py (1)

1-41: LGTM!

src/synthorg/api/app.py (1)

1293-1303: LGTM!

tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)

97-131: ⚡ Quick win

Test determinism relies on implicit stakes assessment heuristics.

The test docstring claims "deterministic simulation" (line 3), but _mixed_plan() does not explicitly set stakes= on subtasks. Instead, the test relies on DecompositionService assessing stakes based on estimated_complexity and description keywords ("architecture", "production", "irreversible").

While this exercises the integrated assessment logic (which appears intentional for an acceptance test), it creates fragility: if the assessment keywords or complexity-to-stakes mapping changes, assertions at lines 224, 229, and 234 will fail.

Consider whether explicit stakes assignment would improve test maintainability:

SubtaskDefinition(
    id="st-arch",
    title="Design the sharding architecture",
    description="Make the core architecture decision for sharding",
    estimated_complexity=Complexity.COMPLEX,
    stakes=Stakes.HIGH,  # Explicit for determinism
),

Alternatively, if testing the integrated assessment is the intent, add a comment documenting the expected assessment behavior to make the dependency explicit.

Also applies to: 224-224, 229-229, 234-234

tests/unit/observability/test_events.py (1)

274-274: LGTM!

docs/design/providers.md (1)

199-212: LGTM!

docs/design/engine.md (1)

133-133: LGTM!

src/synthorg/config/defaults.py (1)

25-25: LGTM!

src/synthorg/engine/stakes/protocol.py (1)

1-27: LGTM!

tests/unit/engine/routing_policy/test_engine_integration.py (1)

1-115: LGTM!

src/synthorg/engine/routing_policy/models.py (1)

10-35: LGTM!

src/synthorg/core/enums.py (1)

359-412: LGTM!

src/synthorg/engine/stakes/__init__.py (1)

1-26: LGTM!

src/synthorg/engine/routing_policy/protocol.py (1)

1-26: LGTM!

src/synthorg/core/task.py (1)

16-17: LGTM!

Also applies to: 132-139

docs/reference/pluggable-subsystems.md (1)

179-192: LGTM!

tests/unit/engine/routing_policy/test_tiers.py (1)

1-68: LGTM!

src/synthorg/engine/routing_policy/config.py (1)

1-132: LGTM!

tests/unit/engine/stakes/test_assessor.py (1)

1-206: LGTM!

src/synthorg/engine/stakes/factory.py (1)

1-43: LGTM!

web/src/api/types/openapi.gen.ts (2)

12010-12023: LGTM!


12175-12175: LGTM!

tests/unit/config/test_schema.py (1)

599-608: LGTM!

src/synthorg/engine/routing_policy/factory.py (1)

1-96: LGTM!

src/synthorg/engine/routing_policy/tiers.py (1)

1-37: LGTM!

tests/unit/engine/stakes/test_propagation.py (1)

1-108: LGTM!

src/synthorg/config/schema.py (1)

30-30: LGTM!

Also applies to: 390-391, 478-481

src/synthorg/engine/agent_engine.py (1)

86-86: LGTM!

Also applies to: 204-204, 243-243, 358-376, 427-434

tests/unit/engine/routing_policy/test_strategies.py (1)

1-402: LGTM!

src/synthorg/engine/routing_policy/strategies.py (1)

1-277: LGTM!

src/synthorg/engine/review_gate.py (1)

110-121: LGTM!

src/synthorg/engine/decomposition/service.py (1)

17-18: LGTM!

Also applies to: 31-31, 43-43, 49-49, 53-53, 86-95, 126-136, 153-153

tests/unit/engine/routing_policy/test_cost_properties.py (1)

1-93: LGTM!

src/synthorg/engine/stakes/config.py (1)

1-137: LGTM!

src/synthorg/engine/pipeline/service.py (1)

34-34: LGTM!

Also applies to: 44-44, 54-54, 97-97, 114-114, 125-125, 251-278

src/synthorg/security/redteam/runner.py (1)

16-16: LGTM!

Also applies to: 129-132

tests/e2e/test_stakes_routing_e2e.py (1)

1-400: LGTM!

src/synthorg/engine/stakes/heuristic.py (1)

1-92: LGTM!

Comment thread src/synthorg/workers/runtime_builder.py Outdated
Comment thread tests/unit/engine/routing_policy/test_acceptance_comparison.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a stakes-aware model routing system that classifies tasks into stakes levels (LOW to CRITICAL) to optimize model selection and cost. It includes a heuristic assessor that evaluates tasks based on complexity and keywords, a routing strategy that utilizes benchmark quality floors and coordination metrics, and integration into the core agent engine. Feedback from the review focused on clarifying documentation regarding tier downgrade logic, adopting more robust Pydantic patterns like model_copy, and adding defensive checks for coordination metrics to prevent potential runtime errors.

Comment thread docs/design/providers.md Outdated
Comment on lines +207 to +208
unhealthy, marks high/critical work for the red-team gate, and never downgrades
below the agent's configured tier. It is config-selectable via
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation states that the strategy "never downgrades below the agent's configured tier," but the implementation in StakesAwareStrategy only enforces this rule when red_team_required is true (i.e., for high or critical stakes). For low or normal stakes, the strategy intentionally allows downgrading to cheaper models to achieve the cost-saving goals described in the PR summary. Please clarify this in the documentation to avoid confusion.

Comment on lines +164 to +171
selected_model = ModelConfig(
provider=resolved.provider_name,
model_id=resolved.model_id,
temperature=current.temperature,
max_tokens=current.max_tokens,
fallback_model=current.fallback_model,
model_tier=target_tier,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of manually constructing a new ModelConfig instance, consider using model_copy(update=...). This is more robust against future changes to the ModelConfig schema, ensuring that any additional fields (e.g., stop sequences or other provider-specific settings) are preserved from the original configuration.

            selected_model = current.model_copy(
                update={
                    "provider": resolved.provider_name,
                    "model_id": resolved.model_id,
                    "model_tier": target_tier,
                }
            )

)
for rec in records:
amp = rec.metrics.error_amplification
if (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add a defensive check for rec.metrics before accessing its attributes. While the metrics are expected to be present in a valid record, a null check prevents a potential AttributeError if a record is malformed or incomplete.

            if rec.metrics is None:
                continue
            amp = rec.metrics.error_amplification

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 91.42012% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.97%. Comparing base (5ffc545) to head (f7a101e).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/engine/routing_policy/strategies.py 85.86% 8 Missing and 5 partials ⚠️
src/synthorg/core/enums.py 68.75% 4 Missing and 1 partial ⚠️
src/synthorg/engine/stakes/config.py 85.18% 3 Missing and 1 partial ⚠️
src/synthorg/engine/routing_policy/config.py 91.42% 2 Missing and 1 partial ⚠️
src/synthorg/api/app.py 0.00% 1 Missing and 1 partial ⚠️
src/synthorg/engine/decomposition/service.py 90.00% 1 Missing ⚠️
src/synthorg/engine/review_gate.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2038      +/-   ##
==========================================
+ Coverage   84.95%   84.97%   +0.01%     
==========================================
  Files        2125     2139      +14     
  Lines      124801   125133     +332     
  Branches    10433    10465      +32     
==========================================
+ Hits       106030   106332     +302     
- Misses      16148    16170      +22     
- Partials     2623     2631       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… feedback

CI failures:
- Add required 'stakes' field to all web Task fixtures/stores/stories/mocks
  (dashboard type-check, build, melange, and lighthouse failed because stakes
  was generated as required on the Task DTO but the hand-written TS fixtures
  were never updated). Re-export STAKES_VALUES/Stakes from the enums barrel
  and validate stakes in the WS task-frame guard like other behavioural enums.
- Regenerate data/runtime_stats.yaml (tests bucket 32,000+ to 33,000+) and
  re-inject the RS markers in README.md and docs/roadmap/index.md.

Reviewer feedback:
- runtime_builder: scope the stakes-router ModelResolver to the single active
  provider (CodeRabbit) so a tier can never resolve to an inactive provider
  model and execute with the wrong client.
- strategies: use model_copy(update=) instead of reconstructing ModelConfig
  (Gemini).
- providers.md: clarify only high/critical work is floored at the configured
  tier; low/normal may downgrade to save cost (Gemini).
- test_acceptance_comparison: Final annotations on module constants plus a note
  documenting the integrated stakes-assessment dependency (CodeRabbit).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@web/src/stores/tasks.ts`:
- Around line 65-70: STAKES_SET is hard-coded with literal stake strings and
must be derived from the canonical enum/tuple to avoid drift; replace the
literal array in the STAKES_SET initializer with a runtime derivation from the
generated canonical tuple/enum (e.g., map or Object.values of the generated
STAKES/STakes tuple/enum) so the set is built from the single source-of-truth
and keep the ReadonlySet<string> typing and the satisfies readonly Stakes[]
check in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d290f104-7dfe-44ad-b470-ff18549aac74

📥 Commits

Reviewing files that changed from the base of the PR and between ac5cdc9 and da17866.

📒 Files selected for processing (25)
  • README.md
  • data/runtime_stats.yaml
  • docs/design/providers.md
  • docs/roadmap/index.md
  • src/synthorg/engine/routing_policy/strategies.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • web/src/__tests__/helpers/factories.ts
  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/__tests__/stores/agents.test.ts
  • web/src/__tests__/stores/tasks.test.ts
  • web/src/__tests__/utils/tasks.property.test.ts
  • web/src/api/types/enums.ts
  • web/src/mocks/handlers/tasks.ts
  • web/src/pages/agents/TaskHistory.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/stores/tasks.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Test E2E
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Test Integration
  • GitHub Check: Test Unit
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (16)
{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

  • docs/roadmap/index.md
  • README.md
  • docs/design/providers.md
docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/roadmap/index.md
  • docs/design/providers.md
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/__tests__/stores/agents.test.ts
  • web/src/__tests__/stores/tasks.test.ts
  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/api/types/enums.ts
  • web/src/__tests__/helpers/factories.ts
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/__tests__/utils/tasks.property.test.ts
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/agents/TaskHistory.stories.tsx
  • web/src/mocks/handlers/tasks.ts
  • web/src/stores/tasks.ts
web/src/{stores,**/*.test.{ts,tsx}}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Active-handle gate (MANDATORY): every unit test runs under web/test-infra/active-handle-tracker.ts, which fails any test that leaks an event-loop-holding resource. A new store that schedules timers / attaches listeners MUST expose a teardown hook and register it in the global afterEach; otherwise the gate fails the first test that triggers the schedule.

Files:

  • web/src/__tests__/stores/agents.test.ts
  • web/src/__tests__/stores/tasks.test.ts
  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/__tests__/utils/tasks.property.test.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/__tests__/stores/agents.test.ts
  • web/src/__tests__/stores/tasks.test.ts
  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/api/types/enums.ts
  • web/src/__tests__/helpers/factories.ts
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/__tests__/utils/tasks.property.test.ts
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/agents/TaskHistory.stories.tsx
  • web/src/mocks/handlers/tasks.ts
  • web/src/stores/tasks.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

  • web/src/__tests__/stores/agents.test.ts
  • web/src/__tests__/stores/tasks.test.ts
  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/api/types/enums.ts
  • web/src/__tests__/helpers/factories.ts
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/__tests__/utils/tasks.property.test.ts
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/agents/TaskHistory.stories.tsx
  • web/src/mocks/handlers/tasks.ts
  • web/src/stores/tasks.ts
web/src/**/*.{jsx,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{jsx,tsx}: Use @eslint-react/no-leaked-conditional-rendering to catch the {count && <Foo />} bug where 0 renders verbatim. For ReactNode | undefined props use {value != null && value !== false && <jsx>}; for compound truthiness use Boolean(...).
Use @eslint-react/globals to restrict window / document / localStorage / etc. inside render. Hoist offenders into a useCallback event handler, a useEffect, or a useSyncExternalStore-backed hook.

Files:

  • web/src/__tests__/pages/TaskDetailPage.test.tsx
  • web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/agents/TaskHistory.stories.tsx
web/src/**/*.stories.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Storybook 10 is ESM-only; essentials are built into core, but @storybook/addon-docs is now separate; imports moved to storybook/test and storybook/actions

Files:

  • web/src/pages/tasks/TaskDetailActions.stories.tsx
  • web/src/pages/tasks/TaskColumn.stories.tsx
  • web/src/pages/tasks/TaskDetailHeader.stories.tsx
  • web/src/pages/tasks/TaskDetailMetadata.stories.tsx
  • web/src/pages/tasks/TaskDetailPanel.stories.tsx
  • web/src/pages/tasks/TaskListView.stories.tsx
  • web/src/pages/tasks/TaskCard.stories.tsx
  • web/src/pages/tasks/TaskDetailTimeline.stories.tsx
  • web/src/pages/agents/TaskHistory.stories.tsx
web/src/mocks/handlers/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/mocks/handlers/**/*.ts: MSW handlers (MANDATORY): web/src/mocks/handlers/ must mirror web/src/api/endpoints/*.ts 1:1 with a default happy-path handler for every exported endpoint. Use onUnhandledRequest: 'error' in test setup; tests override per-case via server.use(...), never vi.mock('@/api/endpoints/*').
Use typed envelope helpers (successFor, paginatedFor, voidSuccess) to keep MSW handlers in lockstep with endpoint return types

Files:

  • web/src/mocks/handlers/tasks.ts
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/routing_policy/strategies.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/routing_policy/strategies.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/routing_policy/strategies.py
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks

Files:

  • src/synthorg/workers/runtime_builder.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • src/synthorg/engine/routing_policy/strategies.py
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/stores/**/*.ts: List reads (fetch*) must set error: string | null on the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the global afterEach. The global afterEach in web/src/test-setup.tsx already calls useToastStore.getState().dismissAll(), cancelPendingPersist(), and useThemeStore.getState().teardown().

Files:

  • web/src/stores/tasks.ts
web/src/{api/endpoints,stores}/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via PaginationMeta. Stores must keep nextCursor + hasMore in state (not offset arithmetic) and early-return when !hasMore || !nextCursor. Display counts must come from data.length.

Files:

  • web/src/stores/tasks.ts
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/engine/routing_policy/test_acceptance_comparison.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
🧠 Learnings (10)
📚 Learning: 2026-05-16T18:36:19.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/guides/contributing.md:95-95
Timestamp: 2026-05-16T18:36:19.195Z
Learning: In the SynthOrg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule should be applied only to these docs: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. This rule is enforced by scripts/check_doc_numeric_macros.py (with runtime substitution by scripts/inject_runtime_stats.py), so reviewers should not flag similar numeric-claim issues in other paths (e.g., anything under docs/guides/). When checking those scoped files, the rule skips fenced code blocks and only flags digits that are adjacent to stat nouns (tests/providers/agents/stars/releases). Numeric CLI flags like “--num-workers=4” inside fenced bash code blocks are not subject to this rule.

Applied to files:

  • docs/roadmap/index.md
  • README.md
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, follow the `Doc Numeric Claims (MANDATORY)` rule enforced by `scripts/check_doc_numeric_macros.py` only for these markdown files: `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`. The gate flags digits that appear adjacent to the stat nouns `tests`, `providers`, `agents`, `stars`, and `releases`—those numeric claims must use the required `<!--RS:...-->` macro format. Do not apply this rule to prose that mentions Python version numbers (e.g., “Python 3.14” / “Python 3.15”); those should not be flagged as requiring `<!--RS:...-->`.

Applied to files:

  • docs/roadmap/index.md
  • README.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: In the synthorg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule is enforced only for this exact set of Markdown files: README.md, docs/index.md, docs/roadmap/index.md, docs/architecture/decisions.md, and docs/reference/convention-gates.md. During code reviews, do not raise RS-marker/numeric-claims findings for numeric values in any other files (e.g., docs/getting_started.md, docs/guides/*, docs/reference/conventions.md), since they are not checked or injected by scripts/check_doc_numeric_macros.py or scripts/inject_runtime_stats.py.

Applied to files:

  • docs/roadmap/index.md
  • README.md
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/roadmap/index.md
  • README.md
  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/roadmap/index.md
  • README.md
  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/roadmap/index.md
  • README.md
  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/roadmap/index.md
  • docs/design/providers.md
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • src/synthorg/engine/routing_policy/strategies.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/routing_policy/test_acceptance_comparison.py
  • src/synthorg/engine/routing_policy/strategies.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/routing_policy/strategies.py
🔇 Additional comments (29)
docs/roadmap/index.md (1)

6-7: LGTM!

data/runtime_stats.yaml (1)

2-3: LGTM!

Also applies to: 6-8, 10-10

web/src/__tests__/stores/agents.test.ts (1)

91-91: LGTM!

web/src/__tests__/stores/tasks.test.ts (1)

29-29: LGTM!

web/src/__tests__/pages/TaskDetailPage.test.tsx (1)

24-24: LGTM!

web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx (1)

24-24: LGTM!

web/src/pages/tasks/TaskDetailActions.stories.tsx (1)

21-21: LGTM!

web/src/api/types/enums.ts (1)

40-40: LGTM!

Also applies to: 69-69

web/src/__tests__/helpers/factories.ts (1)

36-36: LGTM!

web/src/pages/tasks/TaskColumn.stories.tsx (1)

23-23: LGTM!

README.md (1)

22-22: LGTM!

web/src/pages/tasks/TaskDetailHeader.stories.tsx (1)

20-20: LGTM!

web/src/__tests__/utils/tasks.property.test.ts (1)

42-42: LGTM!

web/src/pages/tasks/TaskDetailMetadata.stories.tsx (1)

20-20: LGTM!

web/src/pages/tasks/TaskDetailPanel.stories.tsx (1)

24-24: LGTM!

web/src/pages/tasks/TaskListView.stories.tsx (1)

21-21: LGTM!

web/src/pages/tasks/TaskCard.stories.tsx (1)

21-21: LGTM!

web/src/pages/tasks/TaskDetailTimeline.stories.tsx (1)

20-20: LGTM!

web/src/pages/agents/TaskHistory.stories.tsx (1)

21-21: LGTM!

web/src/mocks/handlers/tasks.ts (1)

30-30: LGTM!

src/synthorg/workers/runtime_builder.py (1)

347-351: LGTM!

Also applies to: 357-361, 368-371, 392-394, 409-411, 700-700

web/src/stores/tasks.ts (1)

18-18: LGTM!

Also applies to: 245-245, 452-467

docs/design/providers.md (1)

207-209: LGTM!

tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)

18-19: LGTM!

Also applies to: 44-46, 51-52, 101-106

src/synthorg/engine/routing_policy/strategies.py (5)

1-29: LGTM!


32-53: LGTM!


56-132: LGTM!


163-169: LGTM!


190-274: LGTM!

Comment thread web/src/stores/tasks.ts Outdated
web/src/stores/tasks.ts: build the runtime-check enum sets from the
generated *_VALUES tuples (COMPLEXITY/TASK_STRUCTURE/COORDINATION_TOPOLOGY/
STAKES) instead of re-declared literal lists, so a value added to an enum
cannot drift out of sync with its frame-guard validator within a build
(CodeRabbit flagged STAKES_SET; applied to all four for consistency and to
match the file's own header comment + the DEPARTMENT_NAME_SET precedent in
enums.ts). Behaviour is unchanged: the generated tuple is still
build-time-frozen, so an unknown behavioural enum value is still dropped
rather than mis-routed.
@Aureliolo Aureliolo merged commit 9b98312 into main May 22, 2026
82 checks passed
@Aureliolo Aureliolo deleted the feat/1998-stakes-aware-model-routing branch May 22, 2026 06:41
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 06:41 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 22, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Introduced conversational interface for direct clarify and propose
interactions.
- Cost management now includes forecast gates, hard ceilings, and Pareto
considerations.
- Added living documentation engine combining wiki and
retrieval-augmented generation features.
- Real intake engine is now operational for live data processing.
- Virtual desktop tool with vision verification gate available for
enhanced workspace control.

### What's new
- Per-project reproducible environments for consistent setups.
- Headless browser testing tool integrated for automated UI validation.
- Governed external API and data access tool introduced.
- Hardened external-remote git backend with sandbox mounts and
push-queue dispatching.
- Adversarial red-team gate subsystem for enhanced security testing.
- Self-extending toolkit to dynamically expand capabilities.
- Stakes-aware model routing enables prioritized processing.
- Task-board entry adapter connects live runtime with project
management.
- Persistent project workspace with pluggable git backend and
per-project push queues implemented.
- Knowledge and provenance substrate added to track data lineage.
- Scoring and data contract framework for golden-company benchmark
evaluations.

### Under the hood
- Desktop Dockerfile pinned by digest to improve build stability and
documented publishing gap fixed.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.7](v0.8.6...v0.8.7)
(2026-05-22)


### Features

* conversational interface v1 - 1:1 clarify + propose
([#2019](#2019))
([216ef94](216ef94)),
closes [#1968](#1968)
* cost as a first-class dial (forecast gate, hard ceiling, Pareto)
([#2029](#2029))
([700a59e](700a59e)),
closes [#1982](#1982)
* **env:** reproducible per-project environments
([#2039](#2039))
([d2c0ef9](d2c0ef9)),
closes [#1994](#1994)
* **evals:** [#1980](#1980)
spine -- scoring + data contract for golden-company benchmark
([#2025](#2025))
([53108e8](53108e8))
* goal/objective entry adapter
([#1964](#1964))
([#2022](#2022))
([cb15c3c](cb15c3c))
* governed external API/data access tool
([#1991](#1991))
([#2032](#2032))
([e08b451](e08b451))
* harden external-remote git backend + per-project sandbox mount +
push-queue dispatch
([#2020](#2020))
([#2030](#2030))
([2fa2e1e](2fa2e1e))
* headless browser testing tool
([#1992](#1992))
([#2024](#2024))
([277b52a](277b52a))
* knowledge + provenance substrate
([#2036](#2036))
([48c897b](48c897b))
* living documentation engine (dual-purpose wiki + RAG namespace)
([#2028](#2028))
([3d10da9](3d10da9)),
closes [#1976](#1976)
* real intake engine online
([#2017](#2017))
([9d8eb34](9d8eb34))
* **redteam:** adversarial red-team gate subsystem
([#1986](#1986))
([#2026](#2026))
([d2207e9](d2207e9))
* self-extending toolkit
([#1995](#1995))
([#2035](#2035))
([5ffc545](5ffc545))
* stakes-aware model routing
([#1998](#1998))
([#2038](#2038))
([9b98312](9b98312))
* task-board entry adapter to live runtime
([#1963](#1963))
([#2023](#2023))
([a8f1eea](a8f1eea))
* virtual desktop tool and vision verifier gate
([#2031](#2031))
([dfe8b42](dfe8b42)),
closes [#1993](#1993)
* **workspace:** persistent project workspace + pluggable git backend +
per-project push queue
([#2021](#2021))
([ee58ee7](ee58ee7))


### Bug Fixes

* pin desktop Dockerfile by digest (Scorecard
[#309](#309)) + document
publish gap ([#2034](#2034))
([8fda188](8fda188))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stakes-aware model routing

1 participant