feat: stakes-aware model routing (#1998)#2038
Conversation
NotBlankStr strategy discriminator, QualityFloors ordering validator, under-floor and benchmark-failure guards in the router, redaction-safe decomposition logging, stakes-assessed state log, ghost-wiring manifest entry, tier/boundary/fallback tests, e2e cost-drop simulation, and docs.
The new QualityFloors non-decreasing validator rejects polyfactory's independent random floor draws, mirroring the existing IntegrationsConfig pin in the same test.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🧰 Additional context used📓 Path-based instructions (5)web/src/**/*.{js,jsx,ts,tsx,mts}📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/stores/**/*.ts📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/{api/endpoints,stores}/**/*.ts📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/**/*.{ts,tsx,mts}📄 CodeRabbit inference engine (web/CLAUDE.md)
Files:
web/src/**/*.{ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🔇 Additional comments (1)
WalkthroughThis PR implements stakes-aware model routing and per-task/subtask stakes assessment. It adds a Stakes enum and Task.stakes, a stakes-assessor subsystem (config, protocol, heuristic, factory), and a pluggable routing-policy subsystem (config, tiers, strategies, router, factory). The engine is wired to apply stakes routing before budget auto-downgrade, review gates support red-team marking, observability events were added, comprehensive unit and e2e tests validate cost/quality properties, and frontend/web code was updated to carry the new stakes enum in payloads and fixtures. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a stakes-aware model routing system designed to optimize cost and quality by matching model tiers to the criticality of individual tasks. By assessing task stakes through complexity, keyword signals, and priority, the system ensures that low-stakes tasks utilize cheaper models while high-stakes tasks are handled by stronger models and flagged for adversarial red-team review. This approach provides a more nuanced routing mechanism that operates orthogonally to existing budget constraints. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 347-372: The resolver is built from all providers which allows
routing to models owned by inactive providers; in _build_stakes_router_or_none
replace ModelResolver.from_config(app_state.config.providers) with a resolver
constructed only from the active provider configuration (use the active provider
name from app_state.config.names[0] or equivalent and pass only that provider's
entry), so ModelResolver only knows about the runtime provider, and then pass
that resolver into build_stakes_router (ensuring coordination_store and
benchmark_provider usage stays the same).
In `@tests/unit/engine/routing_policy/test_acceptance_comparison.py`:
- Around line 42-53: Annotate the module-level test constants as immutable by
importing Final from typing and declaring types with Final, e.g. change
_PROVIDER, _TIER_MODEL_IDS and _TIER_TOTAL_COST to _PROVIDER: Final[str],
_TIER_MODEL_IDS: Final[dict[ModelTier, str]] and _TIER_TOTAL_COST:
Final[dict[ModelTier, float]] so the intent of immutability is explicit; keep
the existing values and types (use the same ModelTier alias) and add the Final
import at the top of the test module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: a50db7b3-5e24-4e8a-89d2-0d0476c28c75
📒 Files selected for processing (42)
docs/design/engine.mddocs/design/providers.mddocs/reference/pluggable-subsystems.mdscripts/_ghost_wiring_manifest.txtsrc/synthorg/api/app.pysrc/synthorg/config/defaults.pysrc/synthorg/config/schema.pysrc/synthorg/core/enums.pysrc/synthorg/core/task.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/decomposition/models.pysrc/synthorg/engine/decomposition/service.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/engine/routing_policy/config.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/routing_policy/tiers.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/stakes/factory.pysrc/synthorg/engine/stakes/heuristic.pysrc/synthorg/engine/stakes/protocol.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/security/redteam/runner.pysrc/synthorg/workers/runtime_builder.pytests/e2e/test_stakes_routing_e2e.pytests/unit/config/test_schema.pytests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/engine/routing_policy/test_cost_properties.pytests/unit/engine/routing_policy/test_engine_integration.pytests/unit/engine/routing_policy/test_strategies.pytests/unit/engine/routing_policy/test_tiers.pytests/unit/engine/stakes/test_assessor.pytests/unit/engine/stakes/test_propagation.pytests/unit/observability/test_events.pyweb/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Test Integration
- GitHub Check: Dashboard Test
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (12)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site
Files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pysrc/synthorg/engine/routing_policy/config.pysrc/synthorg/engine/stakes/factory.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pysrc/synthorg/engine/stakes/heuristic.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Nofrom __future__ import annotations(3.14 has PEP 649); PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use@computed_fieldfor derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Usefrom synthorg.observability import get_logger; variable alwayslogger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...
Files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pysrc/synthorg/engine/routing_policy/config.pysrc/synthorg/engine/stakes/factory.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pysrc/synthorg/engine/stakes/heuristic.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pysrc/synthorg/engine/routing_policy/config.pysrc/synthorg/engine/stakes/factory.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pysrc/synthorg/engine/stakes/heuristic.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/
Files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pytests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/observability/test_events.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pytests/unit/engine/routing_policy/test_engine_integration.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pytests/unit/engine/routing_policy/test_tiers.pysrc/synthorg/engine/routing_policy/config.pytests/unit/engine/stakes/test_assessor.pysrc/synthorg/engine/stakes/factory.pytests/unit/config/test_schema.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pytests/unit/engine/stakes/test_propagation.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pytests/unit/engine/routing_policy/test_strategies.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pytests/unit/engine/routing_policy/test_cost_properties.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pytests/e2e/test_stakes_routing_e2e.pysrc/synthorg/engine/stakes/heuristic.py
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Generated DTO types (MANDATORY): NEVER hand-edit
web/src/api/types/*.gen.ts. Regenerate withuv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse
web/src/components/ui/components and design tokens only per web Dashboard Design System in web/CLAUDE.md
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: Two-phase API startup: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend
Construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider unconditionally
On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial AFTER persistence; knowledge substrate via _wire_knowledge_engine AFTER persistence, gated on has_persistence AND has_memory_backend
Pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Files:
src/synthorg/api/app.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/observability/test_events.pytests/unit/engine/routing_policy/test_engine_integration.pytests/unit/engine/routing_policy/test_tiers.pytests/unit/engine/stakes/test_assessor.pytests/unit/config/test_schema.pytests/unit/engine/stakes/test_propagation.pytests/unit/engine/routing_policy/test_strategies.pytests/unit/engine/routing_policy/test_cost_properties.pytests/e2e/test_stakes_routing_e2e.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/observability/test_events.pytests/unit/engine/routing_policy/test_engine_integration.pytests/unit/engine/routing_policy/test_tiers.pytests/unit/engine/stakes/test_assessor.pytests/unit/config/test_schema.pytests/unit/engine/stakes/test_propagation.pytests/unit/engine/routing_policy/test_strategies.pytests/unit/engine/routing_policy/test_cost_properties.pytests/e2e/test_stakes_routing_e2e.py
{README.md,docs/**/*.md,web/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md
Files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
docs/**/*.{md,d2,mmd}
📄 CodeRabbit inference engine (CLAUDE.md)
Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI
Files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
src/synthorg/workers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks
Files:
src/synthorg/workers/runtime_builder.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pytests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/observability/test_events.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pytests/unit/engine/routing_policy/test_engine_integration.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pytests/unit/engine/routing_policy/test_tiers.pysrc/synthorg/engine/routing_policy/config.pytests/unit/engine/stakes/test_assessor.pysrc/synthorg/engine/stakes/factory.pytests/unit/config/test_schema.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pytests/unit/engine/stakes/test_propagation.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pytests/unit/engine/routing_policy/test_strategies.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pytests/unit/engine/routing_policy/test_cost_properties.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pytests/e2e/test_stakes_routing_e2e.pysrc/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pytests/unit/engine/routing_policy/test_acceptance_comparison.pytests/unit/observability/test_events.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pytests/unit/engine/routing_policy/test_engine_integration.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pytests/unit/engine/routing_policy/test_tiers.pysrc/synthorg/engine/routing_policy/config.pytests/unit/engine/stakes/test_assessor.pysrc/synthorg/engine/stakes/factory.pytests/unit/config/test_schema.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pytests/unit/engine/stakes/test_propagation.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pytests/unit/engine/routing_policy/test_strategies.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pytests/unit/engine/routing_policy/test_cost_properties.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pytests/e2e/test_stakes_routing_e2e.pysrc/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/engine/decomposition/models.pysrc/synthorg/observability/events/stakes_routing.pysrc/synthorg/engine/routing_policy/router.pysrc/synthorg/engine/routing_policy/__init__.pysrc/synthorg/api/app.pysrc/synthorg/config/defaults.pysrc/synthorg/engine/stakes/protocol.pysrc/synthorg/engine/routing_policy/models.pysrc/synthorg/core/enums.pysrc/synthorg/engine/stakes/__init__.pysrc/synthorg/engine/routing_policy/protocol.pysrc/synthorg/core/task.pysrc/synthorg/engine/routing_policy/config.pysrc/synthorg/engine/stakes/factory.pysrc/synthorg/engine/routing_policy/factory.pysrc/synthorg/engine/routing_policy/tiers.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/config/schema.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/engine/review_gate.pysrc/synthorg/engine/decomposition/service.pysrc/synthorg/engine/stakes/config.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/security/redteam/runner.pysrc/synthorg/engine/stakes/heuristic.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/providers.mddocs/design/engine.mddocs/reference/pluggable-subsystems.md
🔇 Additional comments (40)
src/synthorg/engine/decomposition/models.py (1)
15-15: LGTM!Also applies to: 32-32, 54-57
src/synthorg/observability/events/stakes_routing.py (1)
1-11: LGTM!src/synthorg/engine/routing_policy/router.py (1)
20-60: LGTM!src/synthorg/engine/routing_policy/__init__.py (1)
1-41: LGTM!src/synthorg/api/app.py (1)
1293-1303: LGTM!tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)
97-131: ⚡ Quick winTest determinism relies on implicit stakes assessment heuristics.
The test docstring claims "deterministic simulation" (line 3), but
_mixed_plan()does not explicitly setstakes=on subtasks. Instead, the test relies onDecompositionServiceassessing stakes based onestimated_complexityand description keywords ("architecture", "production", "irreversible").While this exercises the integrated assessment logic (which appears intentional for an acceptance test), it creates fragility: if the assessment keywords or complexity-to-stakes mapping changes, assertions at lines 224, 229, and 234 will fail.
Consider whether explicit stakes assignment would improve test maintainability:
SubtaskDefinition( id="st-arch", title="Design the sharding architecture", description="Make the core architecture decision for sharding", estimated_complexity=Complexity.COMPLEX, stakes=Stakes.HIGH, # Explicit for determinism ),Alternatively, if testing the integrated assessment is the intent, add a comment documenting the expected assessment behavior to make the dependency explicit.
Also applies to: 224-224, 229-229, 234-234
tests/unit/observability/test_events.py (1)
274-274: LGTM!docs/design/providers.md (1)
199-212: LGTM!docs/design/engine.md (1)
133-133: LGTM!src/synthorg/config/defaults.py (1)
25-25: LGTM!src/synthorg/engine/stakes/protocol.py (1)
1-27: LGTM!tests/unit/engine/routing_policy/test_engine_integration.py (1)
1-115: LGTM!src/synthorg/engine/routing_policy/models.py (1)
10-35: LGTM!src/synthorg/core/enums.py (1)
359-412: LGTM!src/synthorg/engine/stakes/__init__.py (1)
1-26: LGTM!src/synthorg/engine/routing_policy/protocol.py (1)
1-26: LGTM!src/synthorg/core/task.py (1)
16-17: LGTM!Also applies to: 132-139
docs/reference/pluggable-subsystems.md (1)
179-192: LGTM!tests/unit/engine/routing_policy/test_tiers.py (1)
1-68: LGTM!src/synthorg/engine/routing_policy/config.py (1)
1-132: LGTM!tests/unit/engine/stakes/test_assessor.py (1)
1-206: LGTM!src/synthorg/engine/stakes/factory.py (1)
1-43: LGTM!web/src/api/types/openapi.gen.ts (2)
12010-12023: LGTM!
12175-12175: LGTM!tests/unit/config/test_schema.py (1)
599-608: LGTM!src/synthorg/engine/routing_policy/factory.py (1)
1-96: LGTM!src/synthorg/engine/routing_policy/tiers.py (1)
1-37: LGTM!tests/unit/engine/stakes/test_propagation.py (1)
1-108: LGTM!src/synthorg/config/schema.py (1)
30-30: LGTM!Also applies to: 390-391, 478-481
src/synthorg/engine/agent_engine.py (1)
86-86: LGTM!Also applies to: 204-204, 243-243, 358-376, 427-434
tests/unit/engine/routing_policy/test_strategies.py (1)
1-402: LGTM!src/synthorg/engine/routing_policy/strategies.py (1)
1-277: LGTM!src/synthorg/engine/review_gate.py (1)
110-121: LGTM!src/synthorg/engine/decomposition/service.py (1)
17-18: LGTM!Also applies to: 31-31, 43-43, 49-49, 53-53, 86-95, 126-136, 153-153
tests/unit/engine/routing_policy/test_cost_properties.py (1)
1-93: LGTM!src/synthorg/engine/stakes/config.py (1)
1-137: LGTM!src/synthorg/engine/pipeline/service.py (1)
34-34: LGTM!Also applies to: 44-44, 54-54, 97-97, 114-114, 125-125, 251-278
src/synthorg/security/redteam/runner.py (1)
16-16: LGTM!Also applies to: 129-132
tests/e2e/test_stakes_routing_e2e.py (1)
1-400: LGTM!src/synthorg/engine/stakes/heuristic.py (1)
1-92: LGTM!
There was a problem hiding this comment.
Code Review
This pull request introduces a stakes-aware model routing system that classifies tasks into stakes levels (LOW to CRITICAL) to optimize model selection and cost. It includes a heuristic assessor that evaluates tasks based on complexity and keywords, a routing strategy that utilizes benchmark quality floors and coordination metrics, and integration into the core agent engine. Feedback from the review focused on clarifying documentation regarding tier downgrade logic, adopting more robust Pydantic patterns like model_copy, and adding defensive checks for coordination metrics to prevent potential runtime errors.
| unhealthy, marks high/critical work for the red-team gate, and never downgrades | ||
| below the agent's configured tier. It is config-selectable via |
There was a problem hiding this comment.
The documentation states that the strategy "never downgrades below the agent's configured tier," but the implementation in StakesAwareStrategy only enforces this rule when red_team_required is true (i.e., for high or critical stakes). For low or normal stakes, the strategy intentionally allows downgrading to cheaper models to achieve the cost-saving goals described in the PR summary. Please clarify this in the documentation to avoid confusion.
| selected_model = ModelConfig( | ||
| provider=resolved.provider_name, | ||
| model_id=resolved.model_id, | ||
| temperature=current.temperature, | ||
| max_tokens=current.max_tokens, | ||
| fallback_model=current.fallback_model, | ||
| model_tier=target_tier, | ||
| ) |
There was a problem hiding this comment.
Instead of manually constructing a new ModelConfig instance, consider using model_copy(update=...). This is more robust against future changes to the ModelConfig schema, ensuring that any additional fields (e.g., stop sequences or other provider-specific settings) are preserved from the original configuration.
selected_model = current.model_copy(
update={
"provider": resolved.provider_name,
"model_id": resolved.model_id,
"model_tier": target_tier,
}
)| ) | ||
| for rec in records: | ||
| amp = rec.metrics.error_amplification | ||
| if ( |
There was a problem hiding this comment.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2038 +/- ##
==========================================
+ Coverage 84.95% 84.97% +0.01%
==========================================
Files 2125 2139 +14
Lines 124801 125133 +332
Branches 10433 10465 +32
==========================================
+ Hits 106030 106332 +302
- Misses 16148 16170 +22
- Partials 2623 2631 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… feedback CI failures: - Add required 'stakes' field to all web Task fixtures/stores/stories/mocks (dashboard type-check, build, melange, and lighthouse failed because stakes was generated as required on the Task DTO but the hand-written TS fixtures were never updated). Re-export STAKES_VALUES/Stakes from the enums barrel and validate stakes in the WS task-frame guard like other behavioural enums. - Regenerate data/runtime_stats.yaml (tests bucket 32,000+ to 33,000+) and re-inject the RS markers in README.md and docs/roadmap/index.md. Reviewer feedback: - runtime_builder: scope the stakes-router ModelResolver to the single active provider (CodeRabbit) so a tier can never resolve to an inactive provider model and execute with the wrong client. - strategies: use model_copy(update=) instead of reconstructing ModelConfig (Gemini). - providers.md: clarify only high/critical work is floored at the configured tier; low/normal may downgrade to save cost (Gemini). - test_acceptance_comparison: Final annotations on module constants plus a note documenting the integrated stakes-assessment dependency (CodeRabbit).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@web/src/stores/tasks.ts`:
- Around line 65-70: STAKES_SET is hard-coded with literal stake strings and
must be derived from the canonical enum/tuple to avoid drift; replace the
literal array in the STAKES_SET initializer with a runtime derivation from the
generated canonical tuple/enum (e.g., map or Object.values of the generated
STAKES/STakes tuple/enum) so the set is built from the single source-of-truth
and keep the ReadonlySet<string> typing and the satisfies readonly Stakes[]
check in place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: d290f104-7dfe-44ad-b470-ff18549aac74
📒 Files selected for processing (25)
README.mddata/runtime_stats.yamldocs/design/providers.mddocs/roadmap/index.mdsrc/synthorg/engine/routing_policy/strategies.pysrc/synthorg/workers/runtime_builder.pytests/unit/engine/routing_policy/test_acceptance_comparison.pyweb/src/__tests__/helpers/factories.tsweb/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/__tests__/stores/agents.test.tsweb/src/__tests__/stores/tasks.test.tsweb/src/__tests__/utils/tasks.property.test.tsweb/src/api/types/enums.tsweb/src/mocks/handlers/tasks.tsweb/src/pages/agents/TaskHistory.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/stores/tasks.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Test E2E
- GitHub Check: Dashboard Test
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test Integration
- GitHub Check: Test Unit
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (16)
{README.md,docs/**/*.md,web/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md
Files:
docs/roadmap/index.mdREADME.mddocs/design/providers.md
docs/**/*.{md,d2,mmd}
📄 CodeRabbit inference engine (CLAUDE.md)
Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI
Files:
docs/roadmap/index.mddocs/design/providers.md
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/__tests__/stores/agents.test.tsweb/src/__tests__/stores/tasks.test.tsweb/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/api/types/enums.tsweb/src/__tests__/helpers/factories.tsweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/__tests__/utils/tasks.property.test.tsweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/agents/TaskHistory.stories.tsxweb/src/mocks/handlers/tasks.tsweb/src/stores/tasks.ts
web/src/{stores,**/*.test.{ts,tsx}}
📄 CodeRabbit inference engine (web/CLAUDE.md)
Active-handle gate (MANDATORY): every unit test runs under
web/test-infra/active-handle-tracker.ts, which fails any test that leaks an event-loop-holding resource. A new store that schedules timers / attaches listeners MUST expose a teardown hook and register it in the globalafterEach; otherwise the gate fails the first test that triggers the schedule.
Files:
web/src/__tests__/stores/agents.test.tsweb/src/__tests__/stores/tasks.test.tsweb/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/__tests__/utils/tasks.property.test.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/__tests__/stores/agents.test.tsweb/src/__tests__/stores/tasks.test.tsweb/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/api/types/enums.tsweb/src/__tests__/helpers/factories.tsweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/__tests__/utils/tasks.property.test.tsweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/agents/TaskHistory.stories.tsxweb/src/mocks/handlers/tasks.tsweb/src/stores/tasks.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse
web/src/components/ui/components and design tokens only per web Dashboard Design System in web/CLAUDE.md
Files:
web/src/__tests__/stores/agents.test.tsweb/src/__tests__/stores/tasks.test.tsweb/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/api/types/enums.tsweb/src/__tests__/helpers/factories.tsweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/__tests__/utils/tasks.property.test.tsweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/agents/TaskHistory.stories.tsxweb/src/mocks/handlers/tasks.tsweb/src/stores/tasks.ts
web/src/**/*.{jsx,tsx}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{jsx,tsx}: Use@eslint-react/no-leaked-conditional-renderingto catch the{count && <Foo />}bug where0renders verbatim. ForReactNode | undefinedprops use{value != null && value !== false && <jsx>}; for compound truthiness useBoolean(...).
Use@eslint-react/globalsto restrictwindow/document/localStorage/ etc. inside render. Hoist offenders into auseCallbackevent handler, auseEffect, or auseSyncExternalStore-backed hook.
Files:
web/src/__tests__/pages/TaskDetailPage.test.tsxweb/src/__tests__/pages/tasks/TaskDetailPanel.test.tsxweb/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/agents/TaskHistory.stories.tsx
web/src/**/*.stories.{ts,tsx}
📄 CodeRabbit inference engine (web/CLAUDE.md)
Storybook 10 is ESM-only; essentials are built into core, but
@storybook/addon-docsis now separate; imports moved tostorybook/testandstorybook/actions
Files:
web/src/pages/tasks/TaskDetailActions.stories.tsxweb/src/pages/tasks/TaskColumn.stories.tsxweb/src/pages/tasks/TaskDetailHeader.stories.tsxweb/src/pages/tasks/TaskDetailMetadata.stories.tsxweb/src/pages/tasks/TaskDetailPanel.stories.tsxweb/src/pages/tasks/TaskListView.stories.tsxweb/src/pages/tasks/TaskCard.stories.tsxweb/src/pages/tasks/TaskDetailTimeline.stories.tsxweb/src/pages/agents/TaskHistory.stories.tsx
web/src/mocks/handlers/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/mocks/handlers/**/*.ts: MSW handlers (MANDATORY):web/src/mocks/handlers/must mirrorweb/src/api/endpoints/*.ts1:1 with a default happy-path handler for every exported endpoint. UseonUnhandledRequest: 'error'in test setup; tests override per-case viaserver.use(...), nevervi.mock('@/api/endpoints/*').
Use typed envelope helpers (successFor,paginatedFor,voidSuccess) to keep MSW handlers in lockstep with endpoint return types
Files:
web/src/mocks/handlers/tasks.ts
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site
Files:
src/synthorg/workers/runtime_builder.pysrc/synthorg/engine/routing_policy/strategies.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Nofrom __future__ import annotations(3.14 has PEP 649); PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use@computed_fieldfor derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Usefrom synthorg.observability import get_logger; variable alwayslogger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...
Files:
src/synthorg/workers/runtime_builder.pysrc/synthorg/engine/routing_policy/strategies.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/workers/runtime_builder.pysrc/synthorg/engine/routing_policy/strategies.py
src/synthorg/workers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks
Files:
src/synthorg/workers/runtime_builder.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/
Files:
src/synthorg/workers/runtime_builder.pytests/unit/engine/routing_policy/test_acceptance_comparison.pysrc/synthorg/engine/routing_policy/strategies.py
web/src/stores/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/stores/**/*.ts: List reads (fetch*) must seterror: string | nullon the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the globalafterEach. The globalafterEachinweb/src/test-setup.tsxalready callsuseToastStore.getState().dismissAll(),cancelPendingPersist(), anduseThemeStore.getState().teardown().
Files:
web/src/stores/tasks.ts
web/src/{api/endpoints,stores}/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via
PaginationMeta. Stores must keepnextCursor+hasMorein state (not offset arithmetic) and early-return when!hasMore || !nextCursor. Display counts must come fromdata.length.
Files:
web/src/stores/tasks.ts
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/engine/routing_policy/test_acceptance_comparison.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/routing_policy/test_acceptance_comparison.py
🧠 Learnings (10)
📚 Learning: 2026-05-16T18:36:19.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/guides/contributing.md:95-95
Timestamp: 2026-05-16T18:36:19.195Z
Learning: In the SynthOrg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule should be applied only to these docs: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. This rule is enforced by scripts/check_doc_numeric_macros.py (with runtime substitution by scripts/inject_runtime_stats.py), so reviewers should not flag similar numeric-claim issues in other paths (e.g., anything under docs/guides/). When checking those scoped files, the rule skips fenced code blocks and only flags digits that are adjacent to stat nouns (tests/providers/agents/stars/releases). Numeric CLI flags like “--num-workers=4” inside fenced bash code blocks are not subject to this rule.
Applied to files:
docs/roadmap/index.mdREADME.md
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, follow the `Doc Numeric Claims (MANDATORY)` rule enforced by `scripts/check_doc_numeric_macros.py` only for these markdown files: `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`. The gate flags digits that appear adjacent to the stat nouns `tests`, `providers`, `agents`, `stars`, and `releases`—those numeric claims must use the required `<!--RS:...-->` macro format. Do not apply this rule to prose that mentions Python version numbers (e.g., “Python 3.14” / “Python 3.15”); those should not be flagged as requiring `<!--RS:...-->`.
Applied to files:
docs/roadmap/index.mdREADME.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: In the synthorg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule is enforced only for this exact set of Markdown files: README.md, docs/index.md, docs/roadmap/index.md, docs/architecture/decisions.md, and docs/reference/convention-gates.md. During code reviews, do not raise RS-marker/numeric-claims findings for numeric values in any other files (e.g., docs/getting_started.md, docs/guides/*, docs/reference/conventions.md), since they are not checked or injected by scripts/check_doc_numeric_macros.py or scripts/inject_runtime_stats.py.
Applied to files:
docs/roadmap/index.mdREADME.md
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/roadmap/index.mdREADME.mddocs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/roadmap/index.mdREADME.mddocs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/roadmap/index.mdREADME.mddocs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/roadmap/index.mddocs/design/providers.md
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/workers/runtime_builder.pytests/unit/engine/routing_policy/test_acceptance_comparison.pysrc/synthorg/engine/routing_policy/strategies.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
src/synthorg/workers/runtime_builder.pytests/unit/engine/routing_policy/test_acceptance_comparison.pysrc/synthorg/engine/routing_policy/strategies.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/workers/runtime_builder.pysrc/synthorg/engine/routing_policy/strategies.py
🔇 Additional comments (29)
docs/roadmap/index.md (1)
6-7: LGTM!data/runtime_stats.yaml (1)
2-3: LGTM!Also applies to: 6-8, 10-10
web/src/__tests__/stores/agents.test.ts (1)
91-91: LGTM!web/src/__tests__/stores/tasks.test.ts (1)
29-29: LGTM!web/src/__tests__/pages/TaskDetailPage.test.tsx (1)
24-24: LGTM!web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx (1)
24-24: LGTM!web/src/pages/tasks/TaskDetailActions.stories.tsx (1)
21-21: LGTM!web/src/api/types/enums.ts (1)
40-40: LGTM!Also applies to: 69-69
web/src/__tests__/helpers/factories.ts (1)
36-36: LGTM!web/src/pages/tasks/TaskColumn.stories.tsx (1)
23-23: LGTM!README.md (1)
22-22: LGTM!web/src/pages/tasks/TaskDetailHeader.stories.tsx (1)
20-20: LGTM!web/src/__tests__/utils/tasks.property.test.ts (1)
42-42: LGTM!web/src/pages/tasks/TaskDetailMetadata.stories.tsx (1)
20-20: LGTM!web/src/pages/tasks/TaskDetailPanel.stories.tsx (1)
24-24: LGTM!web/src/pages/tasks/TaskListView.stories.tsx (1)
21-21: LGTM!web/src/pages/tasks/TaskCard.stories.tsx (1)
21-21: LGTM!web/src/pages/tasks/TaskDetailTimeline.stories.tsx (1)
20-20: LGTM!web/src/pages/agents/TaskHistory.stories.tsx (1)
21-21: LGTM!web/src/mocks/handlers/tasks.ts (1)
30-30: LGTM!src/synthorg/workers/runtime_builder.py (1)
347-351: LGTM!Also applies to: 357-361, 368-371, 392-394, 409-411, 700-700
web/src/stores/tasks.ts (1)
18-18: LGTM!Also applies to: 245-245, 452-467
docs/design/providers.md (1)
207-209: LGTM!tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)
18-19: LGTM!Also applies to: 44-46, 51-52, 101-106
src/synthorg/engine/routing_policy/strategies.py (5)
1-29: LGTM!
32-53: LGTM!
56-132: LGTM!
163-169: LGTM!
190-274: LGTM!
web/src/stores/tasks.ts: build the runtime-check enum sets from the generated *_VALUES tuples (COMPLEXITY/TASK_STRUCTURE/COORDINATION_TOPOLOGY/ STAKES) instead of re-declared literal lists, so a value added to an enum cannot drift out of sync with its frame-guard validator within a build (CodeRabbit flagged STAKES_SET; applied to all four for consistency and to match the file's own header comment + the DEPARTMENT_NAME_SET precedent in enums.ts). Behaviour is unchanged: the generated tuple is still build-time-frozen, so an unknown behavioural enum value is still dropped rather than mis-routed.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Introduced conversational interface for direct clarify and propose interactions. - Cost management now includes forecast gates, hard ceilings, and Pareto considerations. - Added living documentation engine combining wiki and retrieval-augmented generation features. - Real intake engine is now operational for live data processing. - Virtual desktop tool with vision verification gate available for enhanced workspace control. ### What's new - Per-project reproducible environments for consistent setups. - Headless browser testing tool integrated for automated UI validation. - Governed external API and data access tool introduced. - Hardened external-remote git backend with sandbox mounts and push-queue dispatching. - Adversarial red-team gate subsystem for enhanced security testing. - Self-extending toolkit to dynamically expand capabilities. - Stakes-aware model routing enables prioritized processing. - Task-board entry adapter connects live runtime with project management. - Persistent project workspace with pluggable git backend and per-project push queues implemented. - Knowledge and provenance substrate added to track data lineage. - Scoring and data contract framework for golden-company benchmark evaluations. ### Under the hood - Desktop Dockerfile pinned by digest to improve build stability and documented publishing gap fixed. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.7](v0.8.6...v0.8.7) (2026-05-22) ### Features * conversational interface v1 - 1:1 clarify + propose ([#2019](#2019)) ([216ef94](216ef94)), closes [#1968](#1968) * cost as a first-class dial (forecast gate, hard ceiling, Pareto) ([#2029](#2029)) ([700a59e](700a59e)), closes [#1982](#1982) * **env:** reproducible per-project environments ([#2039](#2039)) ([d2c0ef9](d2c0ef9)), closes [#1994](#1994) * **evals:** [#1980](#1980) spine -- scoring + data contract for golden-company benchmark ([#2025](#2025)) ([53108e8](53108e8)) * goal/objective entry adapter ([#1964](#1964)) ([#2022](#2022)) ([cb15c3c](cb15c3c)) * governed external API/data access tool ([#1991](#1991)) ([#2032](#2032)) ([e08b451](e08b451)) * harden external-remote git backend + per-project sandbox mount + push-queue dispatch ([#2020](#2020)) ([#2030](#2030)) ([2fa2e1e](2fa2e1e)) * headless browser testing tool ([#1992](#1992)) ([#2024](#2024)) ([277b52a](277b52a)) * knowledge + provenance substrate ([#2036](#2036)) ([48c897b](48c897b)) * living documentation engine (dual-purpose wiki + RAG namespace) ([#2028](#2028)) ([3d10da9](3d10da9)), closes [#1976](#1976) * real intake engine online ([#2017](#2017)) ([9d8eb34](9d8eb34)) * **redteam:** adversarial red-team gate subsystem ([#1986](#1986)) ([#2026](#2026)) ([d2207e9](d2207e9)) * self-extending toolkit ([#1995](#1995)) ([#2035](#2035)) ([5ffc545](5ffc545)) * stakes-aware model routing ([#1998](#1998)) ([#2038](#2038)) ([9b98312](9b98312)) * task-board entry adapter to live runtime ([#1963](#1963)) ([#2023](#2023)) ([a8f1eea](a8f1eea)) * virtual desktop tool and vision verifier gate ([#2031](#2031)) ([dfe8b42](dfe8b42)), closes [#1993](#1993) * **workspace:** persistent project workspace + pluggable git backend + per-project push queue ([#2021](#2021)) ([ee58ee7](ee58ee7)) ### Bug Fixes * pin desktop Dockerfile by digest (Scorecard [#309](#309)) + document publish gap ([#2034](#2034)) ([8fda188](8fda188)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Adds stakes-aware model routing: a pluggable layer that re-tiers each task's model selection by how consequential the work is, so cheap models handle low-stakes subtasks and strong models (+ red-team) handle high/critical ones. Total cost drops versus flat downgrade-at-boundary routing at no quality-floor regression.
engine/stakes/):StakesAssessorprotocol +DefaultStakesAssessor(complexity base mapping, high/critical keyword signals, critical-priority elevation, fail-safe upward on unknown complexity) + config discriminator + factory. NewStakesenum (low/normal/high/critical) with acompare_stakestotal order.engine/routing_policy/):StakesRoutingStrategyprotocol +StakesAwareStrategy(cheapest tier clearing the per-stakesQualityFloors, coordination-metrics nudge, red-team marking, never below the agent's configured tier) andFlatStrategycontrol arm + config discriminator + factory. Safe default isstakes_aware.workers/runtime_builderand injected intoAgentEngine, applied before the budget auto-downgrade (a hard budget ceiling still wins over a stakes upgrade). Red-team review task pinned toCRITICALso its reviewer is never downgraded.Test plan
tests/e2e/test_stakes_routing_e2e.pydrives the full work pipeline through the deterministic scripted simulation harness on a mixed-stakes brief and asserts the stakes-aware run accrues strictly less cost than the flat control arm (verified passing locally).Review coverage
Pre-reviewed by 19 agents (code/python/security/conventions/logging/resilience/async/type-design/test-quality/docs/comment-rot/api-drift/issue-resolution + 5 audit mini-passes). 10 valid findings fixed; 8 false positives documented (notably the PEP 758
except A, B:"syntax error" disproved by passing ruff/mypy/tests, and api-contract-drift premised onstakesbeing on the wire when it is an internal engine concept).closes #1998