feat: stakes-aware model routing (#1998) by Aureliolo · Pull Request #2038 · Aureliolo/synthorg

Aureliolo · 2026-05-22T05:31:47Z

Summary

Adds stakes-aware model routing: a pluggable layer that re-tiers each task's model selection by how consequential the work is, so cheap models handle low-stakes subtasks and strong models (+ red-team) handle high/critical ones. Total cost drops versus flat downgrade-at-boundary routing at no quality-floor regression.

Stakes assessment (engine/stakes/): StakesAssessor protocol + DefaultStakesAssessor (complexity base mapping, high/critical keyword signals, critical-priority elevation, fail-safe upward on unknown complexity) + config discriminator + factory. New Stakes enum (low/normal/high/critical) with a compare_stakes total order.
Routing policy (engine/routing_policy/): StakesRoutingStrategy protocol + StakesAwareStrategy (cheapest tier clearing the per-stakes QualityFloors, coordination-metrics nudge, red-team marking, never below the agent's configured tier) and FlatStrategy control arm + config discriminator + factory. Safe default is stakes_aware.
Wiring: assessed stakes stamped onto subtasks (decomposition) and LEAF tasks (pipeline); router built at boot in workers/runtime_builder and injected into AgentEngine, applied before the budget auto-downgrade (a hard budget ceiling still wins over a stakes upgrade). Red-team review task pinned to CRITICAL so its reviewer is never downgraded.

Test plan

Unit: stakes assessment (complexity/keyword/priority, fail-safe), routing floor selection, red-team threshold, coordination nudge (incl. exact-threshold boundary), tier-ladder helpers, score==floor boundary, benchmark-unavailable fallback, factory dispatch, cost-monotonicity property test, acceptance comparison (cost drops at equal-or-better quality).
E2E: tests/e2e/test_stakes_routing_e2e.py drives the full work pipeline through the deterministic scripted simulation harness on a mixed-stakes brief and asserts the stakes-aware run accrues strictly less cost than the flat control arm (verified passing locally).
Full unit suite green (30610 passed); ruff, mypy strict, and all convention gates pass via pre-push.

Review coverage

Pre-reviewed by 19 agents (code/python/security/conventions/logging/resilience/async/type-design/test-quality/docs/comment-rot/api-drift/issue-resolution + 5 audit mini-passes). 10 valid findings fixed; 8 false positives documented (notably the PEP 758 except A, B: "syntax error" disproved by passing ruff/mypy/tests, and api-contract-drift premised on stakes being on the wire when it is an internal engine concept).

closes #1998

…downgraded

NotBlankStr strategy discriminator, QualityFloors ordering validator, under-floor and benchmark-failure guards in the router, redaction-safe decomposition logging, stakes-assessed state log, ghost-wiring manifest entry, tier/boundary/fallback tests, e2e cost-drop simulation, and docs.

The new QualityFloors non-decreasing validator rejects polyfactory's independent random floor draws, mirroring the existing IntegrationsConfig pin in the same test.

github-actions · 2026-05-22T05:31:58Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-05-22T05:32:01Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 77f41268-3739-45f5-b3c0-e544a0471943

📥 Commits

Reviewing files that changed from the base of the PR and between da17866 and f7a101e.

📒 Files selected for processing (1)

web/src/stores/tasks.ts

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Deploy Preview
GitHub Check: Build Backend
GitHub Check: Lighthouse Site
GitHub Check: Lighthouse Dashboard
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Dashboard Test
GitHub Check: Test E2E
GitHub Check: Test Unit
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Integration
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (5)

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/stores/tasks.ts

web/src/stores/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/stores/**/*.ts: List reads (fetch*) must set error: string | null on the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the global afterEach. The global afterEach in web/src/test-setup.tsx already calls useToastStore.getState().dismissAll(), cancelPendingPersist(), and useThemeStore.getState().teardown().

Files:

web/src/stores/tasks.ts

web/src/{api/endpoints,stores}/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via PaginationMeta. Stores must keep nextCursor + hasMore in state (not offset arithmetic) and early-return when !hasMore || !nextCursor. Display counts must come from data.length.

Files:

web/src/stores/tasks.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/stores/tasks.ts

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

web/src/stores/tasks.ts

🔇 Additional comments (1)

web/src/stores/tasks.ts (1)

10-19: LGTM!

Also applies to: 31-49

Walkthrough

This PR implements stakes-aware model routing and per-task/subtask stakes assessment. It adds a Stakes enum and Task.stakes, a stakes-assessor subsystem (config, protocol, heuristic, factory), and a pluggable routing-policy subsystem (config, tiers, strategies, router, factory). The engine is wired to apply stakes routing before budget auto-downgrade, review gates support red-team marking, observability events were added, comprehensive unit and e2e tests validate cost/quality properties, and frontend/web code was updated to carry the new stakes enum in payloads and fixtures.

gemini-code-assist · 2026-05-22T05:33:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a stakes-aware model routing system designed to optimize cost and quality by matching model tiers to the criticality of individual tasks. By assessing task stakes through complexity, keyword signals, and priority, the system ensures that low-stakes tasks utilize cheaper models while high-stakes tasks are handled by stronger models and flagged for adversarial red-team review. This approach provides a more nuanced routing mechanism that operates orthogonally to existing budget constraints.

Highlights

Stakes Assessment: Added a new StakesAssessor protocol and DefaultStakesAssessor implementation to classify task importance based on complexity, keyword signals, and priority.
Model Routing: Introduced StakesRoutingStrategy to dynamically select model tiers based on task stakes, ensuring low-stakes tasks use cheaper models while high-stakes tasks receive stronger models.
Red-Team Integration: Integrated high/critical stakes marking with the red-team review gate to ensure sensitive tasks receive appropriate adversarial review.
Configuration & Wiring: Added pluggable configuration for routing strategies and wired the new router into the AgentEngine execution pipeline, ensuring it runs before budget auto-downgrade.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codspeed-hq · 2026-05-22T05:34:46Z

Merging this PR will not alter performance

✅ 54 untouched benchmarks

_{Comparing feat/1998-stakes-aware-model-routing (f7a101e) with main (5ffc545)}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 347-372: The resolver is built from all providers which allows
routing to models owned by inactive providers; in _build_stakes_router_or_none
replace ModelResolver.from_config(app_state.config.providers) with a resolver
constructed only from the active provider configuration (use the active provider
name from app_state.config.names[0] or equivalent and pass only that provider's
entry), so ModelResolver only knows about the runtime provider, and then pass
that resolver into build_stakes_router (ensuring coordination_store and
benchmark_provider usage stays the same).

In `@tests/unit/engine/routing_policy/test_acceptance_comparison.py`:
- Around line 42-53: Annotate the module-level test constants as immutable by
importing Final from typing and declaring types with Final, e.g. change
_PROVIDER, _TIER_MODEL_IDS and _TIER_TOTAL_COST to _PROVIDER: Final[str],
_TIER_MODEL_IDS: Final[dict[ModelTier, str]] and _TIER_TOTAL_COST:
Final[dict[ModelTier, float]] so the intent of immutability is explicit; keep
the existing values and types (use the same ModelTier alias) and add the Final
import at the top of the test module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a50db7b3-5e24-4e8a-89d2-0d0476c28c75

📥 Commits

Reviewing files that changed from the base of the PR and between 5ffc545 and ac5cdc9.

📒 Files selected for processing (42)

docs/design/engine.md
docs/design/providers.md
docs/reference/pluggable-subsystems.md
scripts/_ghost_wiring_manifest.txt
src/synthorg/api/app.py
src/synthorg/config/defaults.py
src/synthorg/config/schema.py
src/synthorg/core/enums.py
src/synthorg/core/task.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/decomposition/models.py
src/synthorg/engine/decomposition/service.py
src/synthorg/engine/pipeline/service.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/engine/routing_policy/config.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/routing_policy/tiers.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/stakes/factory.py
src/synthorg/engine/stakes/heuristic.py
src/synthorg/engine/stakes/protocol.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/security/redteam/runner.py
src/synthorg/workers/runtime_builder.py
tests/e2e/test_stakes_routing_e2e.py
tests/unit/config/test_schema.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/engine/routing_policy/test_cost_properties.py
tests/unit/engine/routing_policy/test_engine_integration.py
tests/unit/engine/routing_policy/test_strategies.py
tests/unit/engine/routing_policy/test_tiers.py
tests/unit/engine/stakes/test_assessor.py
tests/unit/engine/stakes/test_propagation.py
tests/unit/observability/test_events.py
web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: Build Backend
GitHub Check: Lighthouse Site
GitHub Check: Test Integration
GitHub Check: Dashboard Test
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test E2E
GitHub Check: Test Unit
GitHub Check: CodSpeed Python benchmarks
GitHub Check: CodSpeed Web benchmarks
GitHub Check: Build Preview
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (12)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
src/synthorg/engine/routing_policy/config.py
src/synthorg/engine/stakes/factory.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
src/synthorg/engine/stakes/heuristic.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
src/synthorg/engine/routing_policy/config.py
src/synthorg/engine/stakes/factory.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
src/synthorg/engine/stakes/heuristic.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
src/synthorg/engine/routing_policy/config.py
src/synthorg/engine/stakes/factory.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
src/synthorg/engine/stakes/heuristic.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/observability/test_events.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
tests/unit/engine/routing_policy/test_engine_integration.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
tests/unit/engine/routing_policy/test_tiers.py
src/synthorg/engine/routing_policy/config.py
tests/unit/engine/stakes/test_assessor.py
src/synthorg/engine/stakes/factory.py
tests/unit/config/test_schema.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
tests/unit/engine/stakes/test_propagation.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
tests/unit/engine/routing_policy/test_strategies.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
tests/unit/engine/routing_policy/test_cost_properties.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
tests/e2e/test_stakes_routing_e2e.py
src/synthorg/engine/stakes/heuristic.py

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: Two-phase API startup: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend
Construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider unconditionally
On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial AFTER persistence; knowledge substrate via _wire_knowledge_engine AFTER persistence, gated on has_persistence AND has_memory_backend
Pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value

Files:

src/synthorg/api/app.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/observability/test_events.py
tests/unit/engine/routing_policy/test_engine_integration.py
tests/unit/engine/routing_policy/test_tiers.py
tests/unit/engine/stakes/test_assessor.py
tests/unit/config/test_schema.py
tests/unit/engine/stakes/test_propagation.py
tests/unit/engine/routing_policy/test_strategies.py
tests/unit/engine/routing_policy/test_cost_properties.py
tests/e2e/test_stakes_routing_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/observability/test_events.py
tests/unit/engine/routing_policy/test_engine_integration.py
tests/unit/engine/routing_policy/test_tiers.py
tests/unit/engine/stakes/test_assessor.py
tests/unit/config/test_schema.py
tests/unit/engine/stakes/test_propagation.py
tests/unit/engine/routing_policy/test_strategies.py
tests/unit/engine/routing_policy/test_cost_properties.py
tests/e2e/test_stakes_routing_e2e.py

{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks

Files:

src/synthorg/workers/runtime_builder.py

🧠 Learnings (7)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/observability/test_events.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
tests/unit/engine/routing_policy/test_engine_integration.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
tests/unit/engine/routing_policy/test_tiers.py
src/synthorg/engine/routing_policy/config.py
tests/unit/engine/stakes/test_assessor.py
src/synthorg/engine/stakes/factory.py
tests/unit/config/test_schema.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
tests/unit/engine/stakes/test_propagation.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
tests/unit/engine/routing_policy/test_strategies.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
tests/unit/engine/routing_policy/test_cost_properties.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
tests/e2e/test_stakes_routing_e2e.py
src/synthorg/engine/stakes/heuristic.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
tests/unit/observability/test_events.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
tests/unit/engine/routing_policy/test_engine_integration.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
tests/unit/engine/routing_policy/test_tiers.py
src/synthorg/engine/routing_policy/config.py
tests/unit/engine/stakes/test_assessor.py
src/synthorg/engine/stakes/factory.py
tests/unit/config/test_schema.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
tests/unit/engine/stakes/test_propagation.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
tests/unit/engine/routing_policy/test_strategies.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
tests/unit/engine/routing_policy/test_cost_properties.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
tests/e2e/test_stakes_routing_e2e.py
src/synthorg/engine/stakes/heuristic.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/engine/decomposition/models.py
src/synthorg/observability/events/stakes_routing.py
src/synthorg/engine/routing_policy/router.py
src/synthorg/engine/routing_policy/__init__.py
src/synthorg/api/app.py
src/synthorg/config/defaults.py
src/synthorg/engine/stakes/protocol.py
src/synthorg/engine/routing_policy/models.py
src/synthorg/core/enums.py
src/synthorg/engine/stakes/__init__.py
src/synthorg/engine/routing_policy/protocol.py
src/synthorg/core/task.py
src/synthorg/engine/routing_policy/config.py
src/synthorg/engine/stakes/factory.py
src/synthorg/engine/routing_policy/factory.py
src/synthorg/engine/routing_policy/tiers.py
src/synthorg/workers/runtime_builder.py
src/synthorg/config/schema.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/engine/review_gate.py
src/synthorg/engine/decomposition/service.py
src/synthorg/engine/stakes/config.py
src/synthorg/engine/pipeline/service.py
src/synthorg/security/redteam/runner.py
src/synthorg/engine/stakes/heuristic.py

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/design/providers.md
docs/design/engine.md
docs/reference/pluggable-subsystems.md

🔇 Additional comments (40)

src/synthorg/engine/decomposition/models.py (1)

15-15: LGTM!

Also applies to: 32-32, 54-57

src/synthorg/observability/events/stakes_routing.py (1)

1-11: LGTM!

src/synthorg/engine/routing_policy/router.py (1)

20-60: LGTM!

src/synthorg/engine/routing_policy/__init__.py (1)

1-41: LGTM!

src/synthorg/api/app.py (1)

1293-1303: LGTM!
tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)
97-131: ⚡ Quick win

Test determinism relies on implicit stakes assessment heuristics.

The test docstring claims "deterministic simulation" (line 3), but _mixed_plan() does not explicitly set stakes= on subtasks. Instead, the test relies on DecompositionService assessing stakes based on estimated_complexity and description keywords ("architecture", "production", "irreversible").

While this exercises the integrated assessment logic (which appears intentional for an acceptance test), it creates fragility: if the assessment keywords or complexity-to-stakes mapping changes, assertions at lines 224, 229, and 234 will fail.

Consider whether explicit stakes assignment would improve test maintainability:
SubtaskDefinition(
    id="st-arch",
    title="Design the sharding architecture",
    description="Make the core architecture decision for sharding",
    estimated_complexity=Complexity.COMPLEX,
    stakes=Stakes.HIGH,  # Explicit for determinism
),
Alternatively, if testing the integrated assessment is the intent, add a comment documenting the expected assessment behavior to make the dependency explicit.

Also applies to: 224-224, 229-229, 234-234
tests/unit/observability/test_events.py (1)

274-274: LGTM!

docs/design/providers.md (1)

199-212: LGTM!

docs/design/engine.md (1)

133-133: LGTM!

src/synthorg/config/defaults.py (1)

25-25: LGTM!

src/synthorg/engine/stakes/protocol.py (1)

1-27: LGTM!

tests/unit/engine/routing_policy/test_engine_integration.py (1)

1-115: LGTM!

src/synthorg/engine/routing_policy/models.py (1)

10-35: LGTM!

src/synthorg/core/enums.py (1)

359-412: LGTM!

src/synthorg/engine/stakes/__init__.py (1)

1-26: LGTM!

src/synthorg/engine/routing_policy/protocol.py (1)

1-26: LGTM!

src/synthorg/core/task.py (1)

16-17: LGTM!

Also applies to: 132-139

docs/reference/pluggable-subsystems.md (1)

179-192: LGTM!

tests/unit/engine/routing_policy/test_tiers.py (1)

1-68: LGTM!

src/synthorg/engine/routing_policy/config.py (1)

1-132: LGTM!

tests/unit/engine/stakes/test_assessor.py (1)

1-206: LGTM!

src/synthorg/engine/stakes/factory.py (1)

1-43: LGTM!

web/src/api/types/openapi.gen.ts (2)

12010-12023: LGTM!

12175-12175: LGTM!

tests/unit/config/test_schema.py (1)

599-608: LGTM!

src/synthorg/engine/routing_policy/factory.py (1)

1-96: LGTM!

src/synthorg/engine/routing_policy/tiers.py (1)

1-37: LGTM!

tests/unit/engine/stakes/test_propagation.py (1)

1-108: LGTM!

src/synthorg/config/schema.py (1)

30-30: LGTM!

Also applies to: 390-391, 478-481

src/synthorg/engine/agent_engine.py (1)

86-86: LGTM!

Also applies to: 204-204, 243-243, 358-376, 427-434

tests/unit/engine/routing_policy/test_strategies.py (1)

1-402: LGTM!

src/synthorg/engine/routing_policy/strategies.py (1)

1-277: LGTM!

src/synthorg/engine/review_gate.py (1)

110-121: LGTM!

src/synthorg/engine/decomposition/service.py (1)

17-18: LGTM!

Also applies to: 31-31, 43-43, 49-49, 53-53, 86-95, 126-136, 153-153

tests/unit/engine/routing_policy/test_cost_properties.py (1)

1-93: LGTM!

src/synthorg/engine/stakes/config.py (1)

1-137: LGTM!

src/synthorg/engine/pipeline/service.py (1)

34-34: LGTM!

Also applies to: 44-44, 54-54, 97-97, 114-114, 125-125, 251-278

src/synthorg/security/redteam/runner.py (1)

16-16: LGTM!

Also applies to: 129-132

tests/e2e/test_stakes_routing_e2e.py (1)

1-400: LGTM!

src/synthorg/engine/stakes/heuristic.py (1)

1-92: LGTM!

gemini-code-assist

Code Review

This pull request introduces a stakes-aware model routing system that classifies tasks into stakes levels (LOW to CRITICAL) to optimize model selection and cost. It includes a heuristic assessor that evaluates tasks based on complexity and keywords, a routing strategy that utilizes benchmark quality floors and coordination metrics, and integration into the core agent engine. Feedback from the review focused on clarifying documentation regarding tier downgrade logic, adopting more robust Pydantic patterns like model_copy, and adding defensive checks for coordination metrics to prevent potential runtime errors.

gemini-code-assist · 2026-05-22T05:40:25Z

+unhealthy, marks high/critical work for the red-team gate, and never downgrades
+below the agent's configured tier. It is config-selectable via


The documentation states that the strategy "never downgrades below the agent's configured tier," but the implementation in StakesAwareStrategy only enforces this rule when red_team_required is true (i.e., for high or critical stakes). For low or normal stakes, the strategy intentionally allows downgrading to cheaper models to achieve the cost-saving goals described in the PR summary. Please clarify this in the documentation to avoid confusion.

gemini-code-assist · 2026-05-22T05:40:25Z

+            selected_model = ModelConfig(
+                provider=resolved.provider_name,
+                model_id=resolved.model_id,
+                temperature=current.temperature,
+                max_tokens=current.max_tokens,
+                fallback_model=current.fallback_model,
+                model_tier=target_tier,
+            )


Instead of manually constructing a new ModelConfig instance, consider using model_copy(update=...). This is more robust against future changes to the ModelConfig schema, ensuring that any additional fields (e.g., stop sequences or other provider-specific settings) are preserved from the original configuration.

selected_model = current.model_copy( update={ "provider": resolved.provider_name, "model_id": resolved.model_id, "model_tier": target_tier, } )

gemini-code-assist · 2026-05-22T05:40:25Z

+        )
+        for rec in records:
+            amp = rec.metrics.error_amplification
+            if (


Add a defensive check for rec.metrics before accessing its attributes. While the metrics are expected to be present in a valid record, a null check prevents a potential AttributeError if a record is malformed or incomplete.

if rec.metrics is None: continue amp = rec.metrics.error_amplification

codecov · 2026-05-22T05:45:06Z

Codecov Report

❌ Patch coverage is 91.42012% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.97%. Comparing base (5ffc545) to head (f7a101e).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/synthorg/engine/routing_policy/strategies.py	85.86%	8 Missing and 5 partials ⚠️
src/synthorg/core/enums.py	68.75%	4 Missing and 1 partial ⚠️
src/synthorg/engine/stakes/config.py	85.18%	3 Missing and 1 partial ⚠️
src/synthorg/engine/routing_policy/config.py	91.42%	2 Missing and 1 partial ⚠️
src/synthorg/api/app.py	0.00%	1 Missing and 1 partial ⚠️
src/synthorg/engine/decomposition/service.py	90.00%	1 Missing ⚠️
src/synthorg/engine/review_gate.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2038      +/-   ##
==========================================
+ Coverage   84.95%   84.97%   +0.01%     
==========================================
  Files        2125     2139      +14     
  Lines      124801   125133     +332     
  Branches    10433    10465      +32     
==========================================
+ Hits       106030   106332     +302     
- Misses      16148    16170      +22     
- Partials     2623     2631       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… feedback CI failures: - Add required 'stakes' field to all web Task fixtures/stores/stories/mocks (dashboard type-check, build, melange, and lighthouse failed because stakes was generated as required on the Task DTO but the hand-written TS fixtures were never updated). Re-export STAKES_VALUES/Stakes from the enums barrel and validate stakes in the WS task-frame guard like other behavioural enums. - Regenerate data/runtime_stats.yaml (tests bucket 32,000+ to 33,000+) and re-inject the RS markers in README.md and docs/roadmap/index.md. Reviewer feedback: - runtime_builder: scope the stakes-router ModelResolver to the single active provider (CodeRabbit) so a tier can never resolve to an inactive provider model and execute with the wrong client. - strategies: use model_copy(update=) instead of reconstructing ModelConfig (Gemini). - providers.md: clarify only high/critical work is floored at the configured tier; low/normal may downgrade to save cost (Gemini). - test_acceptance_comparison: Final annotations on module constants plus a note documenting the integrated stakes-assessment dependency (CodeRabbit).

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@web/src/stores/tasks.ts`:
- Around line 65-70: STAKES_SET is hard-coded with literal stake strings and
must be derived from the canonical enum/tuple to avoid drift; replace the
literal array in the STAKES_SET initializer with a runtime derivation from the
generated canonical tuple/enum (e.g., map or Object.values of the generated
STAKES/STakes tuple/enum) so the set is built from the single source-of-truth
and keep the ReadonlySet<string> typing and the satisfies readonly Stakes[]
check in place.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d290f104-7dfe-44ad-b470-ff18549aac74

📥 Commits

Reviewing files that changed from the base of the PR and between ac5cdc9 and da17866.

📒 Files selected for processing (25)

README.md
data/runtime_stats.yaml
docs/design/providers.md
docs/roadmap/index.md
src/synthorg/engine/routing_policy/strategies.py
src/synthorg/workers/runtime_builder.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
web/src/__tests__/helpers/factories.ts
web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/__tests__/stores/agents.test.ts
web/src/__tests__/stores/tasks.test.ts
web/src/__tests__/utils/tasks.property.test.ts
web/src/api/types/enums.ts
web/src/mocks/handlers/tasks.ts
web/src/pages/agents/TaskHistory.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/stores/tasks.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Deploy Preview
GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Lighthouse Site
GitHub Check: Lighthouse Dashboard
GitHub Check: Test E2E
GitHub Check: Dashboard Test
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Integration
GitHub Check: Test Unit
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (16)

{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

docs/roadmap/index.md
README.md
docs/design/providers.md

docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

docs/roadmap/index.md
docs/design/providers.md

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/__tests__/stores/agents.test.ts
web/src/__tests__/stores/tasks.test.ts
web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/api/types/enums.ts
web/src/__tests__/helpers/factories.ts
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/__tests__/utils/tasks.property.test.ts
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/agents/TaskHistory.stories.tsx
web/src/mocks/handlers/tasks.ts
web/src/stores/tasks.ts

web/src/{stores,**/*.test.{ts,tsx}}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Active-handle gate (MANDATORY): every unit test runs under web/test-infra/active-handle-tracker.ts, which fails any test that leaks an event-loop-holding resource. A new store that schedules timers / attaches listeners MUST expose a teardown hook and register it in the global afterEach; otherwise the gate fails the first test that triggers the schedule.

Files:

web/src/__tests__/stores/agents.test.ts
web/src/__tests__/stores/tasks.test.ts
web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/__tests__/utils/tasks.property.test.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/__tests__/stores/agents.test.ts
web/src/__tests__/stores/tasks.test.ts
web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/api/types/enums.ts
web/src/__tests__/helpers/factories.ts
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/__tests__/utils/tasks.property.test.ts
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/agents/TaskHistory.stories.tsx
web/src/mocks/handlers/tasks.ts
web/src/stores/tasks.ts

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

web/src/__tests__/stores/agents.test.ts
web/src/__tests__/stores/tasks.test.ts
web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/api/types/enums.ts
web/src/__tests__/helpers/factories.ts
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/__tests__/utils/tasks.property.test.ts
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/agents/TaskHistory.stories.tsx
web/src/mocks/handlers/tasks.ts
web/src/stores/tasks.ts

web/src/**/*.{jsx,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{jsx,tsx}: Use @eslint-react/no-leaked-conditional-rendering to catch the {count && <Foo />} bug where 0 renders verbatim. For ReactNode | undefined props use {value != null && value !== false && <jsx>}; for compound truthiness use Boolean(...).
Use @eslint-react/globals to restrict window / document / localStorage / etc. inside render. Hoist offenders into a useCallback event handler, a useEffect, or a useSyncExternalStore-backed hook.

Files:

web/src/__tests__/pages/TaskDetailPage.test.tsx
web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx
web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/agents/TaskHistory.stories.tsx

web/src/**/*.stories.{ts,tsx}

📄 CodeRabbit inference engine (web/CLAUDE.md)

Storybook 10 is ESM-only; essentials are built into core, but @storybook/addon-docs is now separate; imports moved to storybook/test and storybook/actions

Files:

web/src/pages/tasks/TaskDetailActions.stories.tsx
web/src/pages/tasks/TaskColumn.stories.tsx
web/src/pages/tasks/TaskDetailHeader.stories.tsx
web/src/pages/tasks/TaskDetailMetadata.stories.tsx
web/src/pages/tasks/TaskDetailPanel.stories.tsx
web/src/pages/tasks/TaskListView.stories.tsx
web/src/pages/tasks/TaskCard.stories.tsx
web/src/pages/tasks/TaskDetailTimeline.stories.tsx
web/src/pages/agents/TaskHistory.stories.tsx

web/src/mocks/handlers/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/mocks/handlers/**/*.ts: MSW handlers (MANDATORY): web/src/mocks/handlers/ must mirror web/src/api/endpoints/*.ts 1:1 with a default happy-path handler for every exported endpoint. Use onUnhandledRequest: 'error' in test setup; tests override per-case via server.use(...), never vi.mock('@/api/endpoints/*').
Use typed envelope helpers (successFor, paginatedFor, voidSuccess) to keep MSW handlers in lockstep with endpoint return types

Files:

web/src/mocks/handlers/tasks.ts

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/workers/runtime_builder.py
src/synthorg/engine/routing_policy/strategies.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/workers/runtime_builder.py
src/synthorg/engine/routing_policy/strategies.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/workers/runtime_builder.py
src/synthorg/engine/routing_policy/strategies.py

src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: AgentEngine builds ONE provider-present switch returning RuntimeServices (AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None); install_runtime_services appends FIRST; swap* hold locks

Files:

src/synthorg/workers/runtime_builder.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

src/synthorg/workers/runtime_builder.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
src/synthorg/engine/routing_policy/strategies.py

web/src/stores/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/stores/**/*.ts: List reads (fetch*) must set error: string | null on the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the global afterEach. The global afterEach in web/src/test-setup.tsx already calls useToastStore.getState().dismissAll(), cancelPendingPersist(), and useThemeStore.getState().teardown().

Files:

web/src/stores/tasks.ts

web/src/{api/endpoints,stores}/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via PaginationMeta. Stores must keep nextCursor + hasMore in state (not offset arithmetic) and early-return when !hasMore || !nextCursor. Display counts must come from data.length.

Files:

web/src/stores/tasks.ts

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

tests/unit/engine/routing_policy/test_acceptance_comparison.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/engine/routing_policy/test_acceptance_comparison.py

🧠 Learnings (10)

📚 Learning: 2026-05-16T18:36:19.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/guides/contributing.md:95-95
Timestamp: 2026-05-16T18:36:19.195Z
Learning: In the SynthOrg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule should be applied only to these docs: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. This rule is enforced by scripts/check_doc_numeric_macros.py (with runtime substitution by scripts/inject_runtime_stats.py), so reviewers should not flag similar numeric-claim issues in other paths (e.g., anything under docs/guides/). When checking those scoped files, the rule skips fenced code blocks and only flags digits that are adjacent to stat nouns (tests/providers/agents/stars/releases). Numeric CLI flags like “--num-workers=4” inside fenced bash code blocks are not subject to this rule.

Applied to files:

docs/roadmap/index.md
README.md

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, follow the `Doc Numeric Claims (MANDATORY)` rule enforced by `scripts/check_doc_numeric_macros.py` only for these markdown files: `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`. The gate flags digits that appear adjacent to the stat nouns `tests`, `providers`, `agents`, `stars`, and `releases`—those numeric claims must use the required `<!--RS:...-->` macro format. Do not apply this rule to prose that mentions Python version numbers (e.g., “Python 3.14” / “Python 3.15”); those should not be flagged as requiring `<!--RS:...-->`.

Applied to files:

docs/roadmap/index.md
README.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: In the synthorg repo, the “Doc Numeric Claims (MANDATORY)” RS-marker rule is enforced only for this exact set of Markdown files: README.md, docs/index.md, docs/roadmap/index.md, docs/architecture/decisions.md, and docs/reference/convention-gates.md. During code reviews, do not raise RS-marker/numeric-claims findings for numeric values in any other files (e.g., docs/getting_started.md, docs/guides/*, docs/reference/conventions.md), since they are not checked or injected by scripts/check_doc_numeric_macros.py or scripts/inject_runtime_stats.py.

Applied to files:

docs/roadmap/index.md
README.md

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/roadmap/index.md
README.md
docs/design/providers.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/roadmap/index.md
README.md
docs/design/providers.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/roadmap/index.md
README.md
docs/design/providers.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/roadmap/index.md
docs/design/providers.md

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/workers/runtime_builder.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
src/synthorg/engine/routing_policy/strategies.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/workers/runtime_builder.py
tests/unit/engine/routing_policy/test_acceptance_comparison.py
src/synthorg/engine/routing_policy/strategies.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/workers/runtime_builder.py
src/synthorg/engine/routing_policy/strategies.py

🔇 Additional comments (29)

docs/roadmap/index.md (1)

6-7: LGTM!

data/runtime_stats.yaml (1)

2-3: LGTM!

Also applies to: 6-8, 10-10

web/src/__tests__/stores/agents.test.ts (1)

91-91: LGTM!

web/src/__tests__/stores/tasks.test.ts (1)

29-29: LGTM!

web/src/__tests__/pages/TaskDetailPage.test.tsx (1)

24-24: LGTM!

web/src/__tests__/pages/tasks/TaskDetailPanel.test.tsx (1)

24-24: LGTM!

web/src/pages/tasks/TaskDetailActions.stories.tsx (1)

21-21: LGTM!

web/src/api/types/enums.ts (1)

40-40: LGTM!

Also applies to: 69-69

web/src/__tests__/helpers/factories.ts (1)

36-36: LGTM!

web/src/pages/tasks/TaskColumn.stories.tsx (1)

23-23: LGTM!

README.md (1)

22-22: LGTM!

web/src/pages/tasks/TaskDetailHeader.stories.tsx (1)

20-20: LGTM!

web/src/__tests__/utils/tasks.property.test.ts (1)

42-42: LGTM!

web/src/pages/tasks/TaskDetailMetadata.stories.tsx (1)

20-20: LGTM!

web/src/pages/tasks/TaskDetailPanel.stories.tsx (1)

24-24: LGTM!

web/src/pages/tasks/TaskListView.stories.tsx (1)

21-21: LGTM!

web/src/pages/tasks/TaskCard.stories.tsx (1)

21-21: LGTM!

web/src/pages/tasks/TaskDetailTimeline.stories.tsx (1)

20-20: LGTM!

web/src/pages/agents/TaskHistory.stories.tsx (1)

21-21: LGTM!

web/src/mocks/handlers/tasks.ts (1)

30-30: LGTM!

src/synthorg/workers/runtime_builder.py (1)

347-351: LGTM!

Also applies to: 357-361, 368-371, 392-394, 409-411, 700-700

web/src/stores/tasks.ts (1)

18-18: LGTM!

Also applies to: 245-245, 452-467

docs/design/providers.md (1)

207-209: LGTM!

tests/unit/engine/routing_policy/test_acceptance_comparison.py (1)

18-19: LGTM!

Also applies to: 44-46, 51-52, 101-106

src/synthorg/engine/routing_policy/strategies.py (5)

1-29: LGTM!

32-53: LGTM!

56-132: LGTM!

163-169: LGTM!

190-274: LGTM!

web/src/stores/tasks.ts: build the runtime-check enum sets from the generated *_VALUES tuples (COMPLEXITY/TASK_STRUCTURE/COORDINATION_TOPOLOGY/ STAKES) instead of re-declared literal lists, so a value added to an enum cannot drift out of sync with its frame-guard validator within a build (CodeRabbit flagged STAKES_SET; applied to all four for consistency and to match the file's own header comment + the DEPARTMENT_NAME_SET precedent in enums.ts). Behaviour is unchanged: the generated tuple is still build-time-frozen, so an unknown behavioural enum value is still dropped rather than mis-routed.

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Introduced conversational interface for direct clarify and propose interactions. - Cost management now includes forecast gates, hard ceilings, and Pareto considerations. - Added living documentation engine combining wiki and retrieval-augmented generation features. - Real intake engine is now operational for live data processing. - Virtual desktop tool with vision verification gate available for enhanced workspace control. ### What's new - Per-project reproducible environments for consistent setups. - Headless browser testing tool integrated for automated UI validation. - Governed external API and data access tool introduced. - Hardened external-remote git backend with sandbox mounts and push-queue dispatching. - Adversarial red-team gate subsystem for enhanced security testing. - Self-extending toolkit to dynamically expand capabilities. - Stakes-aware model routing enables prioritized processing. - Task-board entry adapter connects live runtime with project management. - Persistent project workspace with pluggable git backend and per-project push queues implemented. - Knowledge and provenance substrate added to track data lineage. - Scoring and data contract framework for golden-company benchmark evaluations. ### Under the hood - Desktop Dockerfile pinned by digest to improve build stability and documented publishing gap fixed.  :robot: I have created a release *beep* *boop* --- ## [0.8.7](v0.8.6...v0.8.7) (2026-05-22) ### Features * conversational interface v1 - 1:1 clarify + propose ([#2019](#2019)) ([216ef94](216ef94)), closes [#1968](#1968) * cost as a first-class dial (forecast gate, hard ceiling, Pareto) ([#2029](#2029)) ([700a59e](700a59e)), closes [#1982](#1982) * **env:** reproducible per-project environments ([#2039](#2039)) ([d2c0ef9](d2c0ef9)), closes [#1994](#1994) * **evals:** [#1980](#1980) spine -- scoring + data contract for golden-company benchmark ([#2025](#2025)) ([53108e8](53108e8)) * goal/objective entry adapter ([#1964](#1964)) ([#2022](#2022)) ([cb15c3c](cb15c3c)) * governed external API/data access tool ([#1991](#1991)) ([#2032](#2032)) ([e08b451](e08b451)) * harden external-remote git backend + per-project sandbox mount + push-queue dispatch ([#2020](#2020)) ([#2030](#2030)) ([2fa2e1e](2fa2e1e)) * headless browser testing tool ([#1992](#1992)) ([#2024](#2024)) ([277b52a](277b52a)) * knowledge + provenance substrate ([#2036](#2036)) ([48c897b](48c897b)) * living documentation engine (dual-purpose wiki + RAG namespace) ([#2028](#2028)) ([3d10da9](3d10da9)), closes [#1976](#1976) * real intake engine online ([#2017](#2017)) ([9d8eb34](9d8eb34)) * **redteam:** adversarial red-team gate subsystem ([#1986](#1986)) ([#2026](#2026)) ([d2207e9](d2207e9)) * self-extending toolkit ([#1995](#1995)) ([#2035](#2035)) ([5ffc545](5ffc545)) * stakes-aware model routing ([#1998](#1998)) ([#2038](#2038)) ([9b98312](9b98312)) * task-board entry adapter to live runtime ([#1963](#1963)) ([#2023](#2023)) ([a8f1eea](a8f1eea)) * virtual desktop tool and vision verifier gate ([#2031](#2031)) ([dfe8b42](dfe8b42)), closes [#1993](#1993) * **workspace:** persistent project workspace + pluggable git backend + per-project push queue ([#2021](#2021)) ([ee58ee7](ee58ee7)) ### Bug Fixes * pin desktop Dockerfile by digest (Scorecard [#309](#309)) + document publish gap ([#2034](#2034)) ([8fda188](8fda188)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 6 commits May 22, 2026 07:16

feat: stakes-aware model routing (#1998)

c27db07

fix: pin red-team review task to critical stakes so its agent is not …

094951b

…downgraded

test: register stakes_routing event module in discovery test

274b870

fix: satisfy persistence-boundary and review-origin gates

99a953d

test: pin stakes_routing default in RootConfig polyfactory build

ac5cdc9

The new QualityFloors non-decreasing validator rejects polyfactory's independent random floor draws, mirroring the existing IntegrationsConfig pin in the same test.

Aureliolo had a problem deploying to lighthouse May 22, 2026 05:32 — with GitHub Actions Failure

Aureliolo temporarily deployed to lighthouse May 22, 2026 05:32 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 05:33 — with GitHub Actions Inactive

coderabbitai Bot requested changes May 22, 2026

View reviewed changes

Comment thread src/synthorg/workers/runtime_builder.py Outdated

Comment thread tests/unit/engine/routing_policy/test_acceptance_comparison.py Outdated

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

Aureliolo temporarily deployed to lighthouse May 22, 2026 06:11 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 06:12 — with GitHub Actions Inactive

coderabbitai Bot requested changes May 22, 2026

View reviewed changes

Comment thread web/src/stores/tasks.ts Outdated

Aureliolo temporarily deployed to lighthouse May 22, 2026 06:26 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 06:27 — with GitHub Actions Inactive

coderabbitai Bot approved these changes May 22, 2026

View reviewed changes

Aureliolo merged commit 9b98312 into main May 22, 2026
82 checks passed

Aureliolo deleted the feat/1998-stakes-aware-model-routing branch May 22, 2026 06:41

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 06:41 — with GitHub Actions Inactive

synthorg-repo-bot Bot mentioned this pull request May 22, 2026

chore(main): release 0.8.7 #2018

Merged

		unhealthy, marks high/critical work for the red-team gate, and never downgrades
		below the agent's configured tier. It is config-selectable via

Conversation

Aureliolo commented May 22, 2026

Summary

Test plan

Review coverage

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 22, 2026 •

edited

Loading

coderabbitai Bot commented May 22, 2026 •

edited

Loading

codspeed-hq Bot commented May 22, 2026 •

edited

Loading

codecov Bot commented May 22, 2026 •

edited

Loading