Skip to content

feat: agent runtime online + minimal safety spine (runtime root)#2003

Merged
Aureliolo merged 10 commits into
mainfrom
fix/agent-runtime-online
May 18, 2026
Merged

feat: agent runtime online + minimal safety spine (runtime root)#2003
Aureliolo merged 10 commits into
mainfrom
fix/agent-runtime-online

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Wires the agent runtime online behind the provider-present switch (issue #1956, EPIC #1955 critical-path root).

  • synthorg.workers.runtime_builder.build_worker_execution_service constructs a fully-wired AgentEngine (LLM + sandboxed tools + per-call workspace + memory, governed by the SecOps safety spine) when a provider is configured, installed via the existing AppState.worker_execution_service seam; NoProviderExecutionService is the empty-company backstop.
  • Empty-company mode rejects task creation at the controller (AgentRuntimeNotConfiguredError, code 4014); a provider added later wakes the runtime live via the existing setup-reinit path (no restart).
  • Deterministic ScriptedDriver test/simulation provider + consolidated shared scripted test double.
  • Ghost-wiring manifest: AgentEngine flipped PENDING -> ENFORCED; scripts/check_no_ghost_wiring.py passes (Convention Rollout: lock ships with the fix).

Pre-PR review

Reviewed by 20 review agents + 4 audit mini-pass agents before first push. 22 valid findings triaged and all implemented (_audit/pre-pr-review/triage.md):

  • Regression fix: the new empty-company guard broke 11 pre-existing tests; fixed at the shared conftest seam (scripted provider in _shared_app, coordination-wiring, and the coordination-controller fixture) rather than per-file.
  • Concurrency: swap_worker_execution_service now holds _lazy_service_lock; swap_provider_registry serialised under a dedicated lock; SequencedResponseStrategy._index guarded.
  • Workspace: agent sandbox root now env-pinned to the runtime data dir in deployments (resolve_agent_workspace_root_env); dead/lying docstring corrected.
  • Observability: distinct WORKERS_EXECUTION_SERVICE_TASK_NOT_FOUND event; ERROR log before boot-install failure propagates; INFO logs on lazy-service construction + bridge-config flip.
  • Hygiene: migration-framing comments rewritten to describe current behaviour; dead _next_status branch + unused constant removed; module/CLAUDE.md docs updated.
  • Two reviewer suggestions documented as not-implemented with primary-evidence rationale (PEP 758 false positive; ScriptedProvider() guard-double contract; MSW convention).

Test plan

  • uv run ruff check src/ tests/ + ruff format + uv run mypy src/ tests/: clean (3896 files).
  • New unit tests for identity name-fallback, autonomy_resolver=None, post-run-missing, multi-provider warning, deep-workspace creation; e2e seam asserts the agent executed >=1 LLM turn.
  • Affected modules + the 11 previously-failing tests: 100+ pass locally.
  • Full suite regated by pre-push hooks (affected mypy + pytest + convention gates incl. no-ghost-wiring).

Closes #1956

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request completes the critical-path work for the agent runtime root by introducing a conditional initialization mechanism. It ensures that the agent engine is only constructed when a valid provider is present, providing a robust 'empty-company' mode that prevents task submission failures and improves observability. The changes include a new deterministic test driver for simulation, enhanced concurrency controls for service hot-swapping, and improved workspace persistence configuration.

Highlights

  • Agent Runtime Initialization: Implemented a provider-present switch that wires the AgentEngine only when a provider is configured, using NoProviderExecutionService as an empty-company backstop.
  • Task Submission Guard: Added a 4014 error (AgentRuntimeNotConfiguredError) to reject task creation when no LLM provider is configured, preventing tasks from entering a stalled state.
  • Deterministic Testing: Introduced a ScriptedDriver with pluggable strategies (Deterministic, Sequenced, Single) to enable reproducible end-to-end simulation and testing without real LLM calls.
  • Workspace Management: Added environment-driven workspace root resolution to ensure agent sandboxes persist on mounted data volumes in production deployments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f26e35e-739f-4df0-b4f4-45175f3d812d

📥 Commits

Reviewing files that changed from the base of the PR and between f745bd4 and 45cbbf3.

📒 Files selected for processing (4)
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_runtime_builder.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Unit
  • GitHub Check: Test E2E
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Timeout/slow test failures are source-code regression; never edit tests/baselines/unit_timing.json or baseline scripts; use ALLOW_BASELINE_GROWTH=1 for explicit approval

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; global timeout 30s; coverage 80% min

xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts); prevents Windows 3.14+ ProactorEventLoop leak

Windows tests: unit tests use WindowsSelectorEventLoopPolicy; subprocess tests override back (3.14 IOCP teardown race)

Test doubles: use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundaries is blocked by scripts/check_mock_spec.py

Import FakeClock and mock_of from tests._shared; inject via clock= and spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))

Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_runtime_builder.py
  • tests/_shared/scripted_provider.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_runtime_builder.py
  • tests/_shared/scripted_provider.py
!(providers/presets.py|web/public/provider-logos/**|.claude/**)

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001

Files:

  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_runtime_builder.py
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No hardcoded numerics; use settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, module-level annotated named constants; enforced by scripts/check_no_magic_numbers.py

Comments explain WHY only; no reviewer citations/issue back-refs/migration framing; enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py

No from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 except for except A, B: (no parens unless binding)

Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Errors: <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py

Pydantic v2 frozen + extra="forbid" on every frozen model; use @computed_field for derived; use NotBlankStr for identifiers; per-line # lint-allow: frozen-extra-forbid for boundaries

Args models at every system boundary; use parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)

Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable

Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML

Use from synthorg.observability import get_logger; variable always logger; never import logging / print() in app code

Event names from observability.events.<domain> constants; use structured kwargs (logger.info(EVENT, key=value))

Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED AFTER persistence write

Secret-log redaction (SEC-1): never use error=str(exc) o...

Files:

  • src/synthorg/workers/runtime_builder.py
src/synthorg/{workers,api}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Worker execution service: use build_worker_execution_service with provider-present switch; install via AppState.worker_execution_service seam; use swap_worker_execution_service / swap_provider_registry with locking

Files:

  • src/synthorg/workers/runtime_builder.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/workers/runtime_builder.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Read `docs/design/` page before implementing; deviations need approval (see DESIGN_SPEC.md)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: No region/currency/locale privileged; use metric units; use British English
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Every convention PR ships its enforcement gate
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; use `/aurelio-review-pr` after PR; fix EVERYTHING valid
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Git commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: GitHub queries: use `gh issue list` via Bash, NOT MCP `list_issues`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: After every squash merge → `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Use `uv sync`, `ruff`, `mypy`, `pytest` with specified commands; install git hooks via `bash scripts/install_git_hooks.sh`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:34:01.473Z
Learning: Web: see `web/CLAUDE.md`; CLI: see `cli/CLAUDE.md` (use `go -C cli`, never `cd cli`); Shell: see `~/.claude/rules/common/bash.md`
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_runtime_builder.py
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
🔇 Additional comments (4)
tests/integration/api/test_task_create_empty_company.py (1)

1-114: LGTM!

tests/unit/workers/test_runtime_builder.py (1)

1-150: LGTM!

src/synthorg/workers/runtime_builder.py (1)

1-186: LGTM!

tests/_shared/scripted_provider.py (1)

1-277: LGTM!


Walkthrough

This PR selects and installs a worker execution service at startup (provider-present vs empty-company), adds AppState workspace-root pinning and a hot-swap seam, introduces a deterministic scripted CompletionProvider, implements AgentEngineExecutionService and NoProviderExecutionService, blocks task creation when no provider exists (409 / AGENT_RUNTIME_NOT_CONFIGURED), and adds runtime builder wiring plus E2E/integration/unit tests validating install ordering, empty-company rejection, and service swap behavior.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 18, 2026 06:06 — with GitHub Actions Inactive
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 18, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing fix/agent-runtime-online (45cbbf3) with main (a7253b4)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (f485bea) during the generation of this report, so a7253b4 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/api/app_helpers.py`:
- Around line 144-147: The code in app_helpers.py reads SYNTHORG_DB_PATH into
db_path and returns Path(db_path).parent / "agent-workspaces" without checking
if db_path is absolute; update the logic in the function that contains db_path
so you validate that db_path is an absolute path (use os.path.isabs or
Path(db_path).is_absolute()) before deriving the workspace root, and only return
Path(db_path).parent / "agent-workspaces" when it is absolute; if it is missing
or not absolute, fall back to the existing SYNTHORG_DATABASE_URL branch or
another safe absolute default.

In `@tests/e2e/test_runtime_online_seam.py`:
- Around line 94-95: Update the test function signatures to use a concrete Path
type for the tmp_path parameter instead of object (e.g., change "tmp_path:
object" to "tmp_path: Path") and remove any "# type: ignore[arg-type]" comments;
ensure you import Path from pathlib at the top of the file if not already
present and apply the same change for the second occurrence around the other
test (the tmp_path parameter at lines ~140-141) so type checking is preserved
for the tmp_path seam.

In `@tests/integration/api/test_task_create_empty_company.py`:
- Around line 27-29: The header-name check in the loop over
resp.headers.multi_items() is case-sensitive and can miss Set-Cookie variants;
update the condition that currently tests k != "set-cookie" to perform a
case-insensitive comparison (e.g., compare k.lower() or k.casefold() to
"set-cookie") so that any capitalization of the Set-Cookie header is accepted
while still continuing for non-cookie headers; keep the loop and variables k and
v unchanged except for the updated comparison.

In `@tests/unit/api/test_state_swap_worker_execution_service.py`:
- Around line 56-57: The test is currently catching any Exception for the access
to state.worker_execution_service; change the assertion to expect the specific
domain/service exception that the worker execution retrieval raises (replace
Exception with the actual exception class thrown, e.g., TaskEngineError or
ServiceUnavailableError) so the test only passes for the intended failure path;
update the pytest.raises(...) wrapper around state.worker_execution_service to
reference that specific exception type (locate the exception class in the task
engine / service module and import it into the test).

In `@tests/unit/providers/drivers/test_scripted.py`:
- Around line 128-131: The test currently calls structlog.reset_defaults()
directly, which mutates global state and can leak into other tests; change the
test to restore the previous structlog configuration after the assertion (or
call reset inside a finally block) so global structlog settings are not left
modified—wrap the reset/use of
ScriptedDriver(strategy=DeterministicResponseStrategy()) and capture_logs() in a
try/finally that restores the original structlog configuration (or move reset
into a teardown fixture) to ensure no process-level state leakage across tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: b2dc969c-1524-48b2-92ad-88e50f3dedce

📥 Commits

Reviewing files that changed from the base of the PR and between 89b57ce and 353bd65.

📒 Files selected for processing (33)
  • CLAUDE.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/providers/drivers/scripted.py
  • src/synthorg/providers/registry.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
  • tests/e2e/conftest.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/conftest.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/unit/api/conftest.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/workers/test_execution_service.py
  • tests/unit/workers/test_runtime_builder.py
  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Unit
  • GitHub Check: Test Integration
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/providers/registry.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/api/app.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/providers/drivers/scripted.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants

Comments must explain WHY only; no reviewer citations / issue back-refs / migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py

No from __future__ import annotations (Python 3.14 has PEP 649)

Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Errors must use <Domain><Condition>Error pattern inheriting from DomainError, never from Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py

Pydantic v2 models must be frozen with extra='forbid' on every frozen model project-wide; @computed_field is auto-exempt; use per-line # lint-allow: frozen-extra-forbid -- <reason> for boundaries; enforced by check_frozen_model_extra_forbid.py

Use @computed_field for derived fields in Pydantic models

Use NotBlankStr for identifiers in Pydantic models

Require args models at every system boundary; use parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

For immutability, use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Use asyncio.TaskGroup for fan-out/fan-in; async helpers must catch Exception and re-raise MemoryError/RecursionError

Use Clock seam pattern: clock: Clock | None = None; tests inject FakeClock

Lifecycle services must own _lifecycle_lock; timed-out stops mark unrestartable

Untrusted content (SEC-1) must use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML

Import logger via from synthorg.observability import get_logger; variable must always be logger; never use import logging or print() in app code

Event names must come from observability.events.<domain> constants; use structured kwargs (`logger.info...

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/providers/registry.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/api/app.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/providers/drivers/scripted.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/providers/registry.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/api/app.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/providers/drivers/scripted.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls must go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses; retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error

Files:

  • src/synthorg/providers/registry.py
  • src/synthorg/providers/drivers/scripted.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min

Windows: unit tests must use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back

Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py

Import FakeClock and mock_of from tests._shared; inject via clock= and helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))

Never skip/xfail flaky tests; fix fundamentally; use asyncio.Event().wait() instead of sleep(large)

Files:

  • tests/unit/workers/test_execution_service.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/conftest.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/unit/workers/test_runtime_builder.py
  • tests/_shared/scripted_provider.py
  • tests/integration/api/conftest.py
  • tests/e2e/conftest.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/workers/test_execution_service.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/conftest.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/unit/workers/test_runtime_builder.py
  • tests/_shared/scripted_provider.py
  • tests/integration/api/conftest.py
  • tests/e2e/conftest.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

agent_registry must be built BEFORE auto_wire_meetings during API construction phase

tunnel_provider is wired unconditionally during construction phase (not gated by integrations.enabled)

On-startup: SettingsService auto-wire must precede WorkflowExecutionObserver registration; OntologyService wires after persistence.connect() via _wire_ontology_service

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/app.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/state.py
  • src/synthorg/api/controllers/tasks.py
**/*.{d2,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 for architecture/nested container diagrams with theme 200 (Dark Mauve), pinned to v0.7.1 in CI; use Mermaid for flowcharts/sequence/pipelines; use Markdown tables for tabular data

Files:

  • CLAUDE.md
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{tsx,ts}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ design tokens only for Web Dashboard Design System; see web/CLAUDE.md for details

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
src/synthorg/api/controllers/setup/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Setup completion: post_setup_reinit() must serialize provider reload, agent bootstrap, worker-execution-service rebuild under COMPLETE_LOCK to prevent race conditions on setup_complete flag

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Worker execution service: use synthorg.workers.runtime_builder.build_worker_execution_service behind provider-present switch; install via AppState.worker_execution_service seam; boot install hook appended FIRST after persistence/SettingsService hooks

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Read `docs/design/` page before implementing; deviations need approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: No region/locale should be privileged; use metric units; use British English for all user-facing text
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Every convention PR must ship its enforcement gate
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: After issue completion: branch + commit + push (no auto-PR); use `/pre-pr-review` before creating PR; after PR created use `/aurelio-review-pr` for external feedback
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Commits must follow `<type>: <description>` format (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Branches must follow `<type>/<slug>` format from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: After every squash merge, run `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:06:25.508Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/providers/registry.py
  • src/synthorg/observability/events/api.py
  • tests/unit/workers/test_execution_service.py
  • src/synthorg/api/app_helpers.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/unit/api/controllers/test_coordination.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/domain_errors.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/unit/providers/drivers/test_scripted.py
  • src/synthorg/api/app.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/conftest.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/e2e/test_runtime_online_seam.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • tests/unit/workers/test_runtime_builder.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
  • src/synthorg/api/controllers/tasks.py
  • tests/integration/api/conftest.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/providers/drivers/scripted.py
  • tests/e2e/conftest.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • CLAUDE.md
🔇 Additional comments (36)
CLAUDE.md (1)

82-83: LGTM!

scripts/_ghost_wiring_manifest.txt (1)

26-26: LGTM!

src/synthorg/core/domain_errors.py (1)

248-264: LGTM!

src/synthorg/core/error_taxonomy.py (1)

109-109: LGTM!

src/synthorg/observability/events/api.py (1)

64-64: LGTM!

src/synthorg/observability/events/provider.py (1)

9-9: LGTM!

src/synthorg/observability/events/workers.py (1)

72-81: LGTM!

web/src/api/types/error-codes.gen.ts (1)

57-57: LGTM!

web/src/api/types/openapi.gen.ts (1)

7295-7295: LGTM!

src/synthorg/workers/execution_service.py (5)

1-63: LGTM!


66-93: LGTM!


96-192: LGTM!


195-348: LGTM!


351-386: LGTM!

tests/unit/workers/test_execution_service.py (4)

1-31: LGTM!


34-44: LGTM!


47-130: LGTM!


132-237: LGTM!

src/synthorg/providers/drivers/scripted.py (1)

1-293: LGTM!

src/synthorg/providers/registry.py (1)

141-146: LGTM!

tests/_shared/scripted_provider.py (1)

1-269: LGTM!

tests/unit/engine/quality/scripted_provider.py (1)

1-19: LGTM!

tests/e2e/conftest.py (1)

1-38: LGTM!

tests/integration/api/conftest.py (1)

12-94: LGTM!

tests/integration/api/test_runtime_install_ordering.py (1)

1-57: LGTM!

tests/integration/api/test_task_create_empty_company.py (1)

37-112: LGTM!

tests/integration/engine/test_coordination_wiring.py (1)

285-302: LGTM!

tests/unit/api/conftest.py (1)

30-30: LGTM!

Also applies to: 46-47, 356-370, 426-427, 459-460

tests/unit/api/controllers/test_coordination.py (1)

127-144: LGTM!

src/synthorg/workers/runtime_builder.py (1)

43-144: LGTM!

tests/unit/workers/test_runtime_builder.py (1)

26-150: LGTM!

src/synthorg/api/state.py (1)

176-177: LGTM!

Also applies to: 428-441, 858-935, 953-958

src/synthorg/api/state_services.py (1)

137-137: LGTM!

Also applies to: 451-490, 527-531

src/synthorg/api/app.py (1)

33-34: LGTM!

Also applies to: 980-1029

src/synthorg/api/controllers/tasks.py (1)

25-25: LGTM!

Also applies to: 37-37, 182-188

src/synthorg/api/controllers/setup/agent_helpers.py (1)

142-166: LGTM!

Comment thread src/synthorg/api/app_helpers.py
Comment thread tests/e2e/test_runtime_online_seam.py Outdated
Comment thread tests/integration/api/test_task_create_empty_company.py
Comment thread tests/unit/api/test_state_swap_worker_execution_service.py Outdated
Comment thread tests/unit/providers/drivers/test_scripted.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the "provider-present switch" for the agent runtime, enabling the system to select between a live AgentEngineExecutionService and a NoProviderExecutionService backstop based on LLM provider availability. It introduces the AgentRuntimeNotConfiguredError (4014) to reject task creation in unconfigured environments and adds a deterministic ScriptedDriver for testing and simulation. The changes also support hot-swapping execution services during setup and environment-driven resolution of agent workspace roots. Review feedback focuses on adhering to the "no-hardcoded-values" style rule by defining constants for workspace paths, improving error specificity for unassigned tasks, and ensuring the AutonomyResolver correctly utilizes the application's security configuration.

Comment on lines +126 to +149
def resolve_agent_workspace_root_env() -> Path | None:
"""Resolve the agent sandbox workspace root from the environment.

Returns ``<runtime data dir>/agent-workspaces`` when an env-driven
deployment is in effect, so the agent's file-system / sandbox tools
write onto the mounted data volume rather than a process temp dir.
Returns ``None`` for injected / dev apps (no deployment env vars),
where :attr:`AppState.agent_workspace_root` keeps its documented
process-stable temp fallback.

Precedence mirrors the persistence env resolution:
``SYNTHORG_ARTIFACT_DIR`` (explicit), then ``SYNTHORG_DB_PATH``
parent (sqlite volume), then ``/data`` when only
``SYNTHORG_DATABASE_URL`` is set (postgres compose volume).
"""
artifact_dir = os.environ.get("SYNTHORG_ARTIFACT_DIR", "").strip()
if artifact_dir:
return Path(_resolve_artifact_dir_env()) / "agent-workspaces"
db_path = os.environ.get("SYNTHORG_DB_PATH", "").strip()
if db_path:
return Path(db_path).parent / "agent-workspaces"
if os.environ.get("SYNTHORG_DATABASE_URL", "").strip():
return Path("/data") / "agent-workspaces"
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The string "agent-workspaces" is hardcoded and repeated three times in this function. Additionally, "/data" is a hardcoded path. Per the "no-hardcoded-values" rule in CLAUDE.md, these should be defined as constants to improve maintainability and adhere to project standards.

_AGENT_WORKSPACE_DIR: Final[str] = "agent-workspaces"
_DEFAULT_DATA_DIR: Final[str] = "/data"


def resolve_agent_workspace_root_env() -> Path | None:
    """Resolve the agent sandbox workspace root from the environment.

    Returns <runtime data dir>/agent-workspaces when an env-driven
    deployment is in effect, so the agent's file-system / sandbox tools
    write onto the mounted data volume rather than a process temp dir.
    Returns None for injected / dev apps (no deployment env vars),
    where AppState.agent_workspace_root keeps its documented
    process-stable temp fallback.

    Precedence mirrors the persistence env resolution:
    SYNTHORG_ARTIFACT_DIR (explicit), then SYNTHORG_DB_PATH
    parent (sqlite volume), then /data when only
    SYNTHORG_DATABASE_URL is set (postgres compose volume).
    """
    artifact_dir = os.environ.get("SYNTHORG_ARTIFACT_DIR", "").strip()
    if artifact_dir:
        return Path(_resolve_artifact_dir_env()) / _AGENT_WORKSPACE_DIR
    db_path = os.environ.get("SYNTHORG_DB_PATH", "").strip()
    if db_path:
        return Path(db_path).parent / _AGENT_WORKSPACE_DIR
    if os.environ.get("SYNTHORG_DATABASE_URL", "").strip():
        return Path(_DEFAULT_DATA_DIR) / _AGENT_WORKSPACE_DIR
    return None
References
  1. Avoid hardcoded values; define them as constants. (link)

Comment thread src/synthorg/api/state.py Outdated
"""
if self._agent_workspace_root is not None:
return self._agent_workspace_root
return Path(tempfile.gettempdir()) / "synthorg-agent-workspaces"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The string "synthorg-agent-workspaces" is hardcoded here. It should be defined as a constant (e.g., _DEFAULT_WORKSPACE_DIR) to comply with the project's "no-hardcoded-values" policy.

References
  1. Avoid hardcoded values; define them as constants. (link)

Comment on lines +292 to +316
async def _resolve_identity(
self,
assigned_to: str | None,
*,
task_id: str,
) -> AgentIdentity:
"""Resolve the task's assigned agent identity, or raise."""
if assigned_to:
identity = await self._agent_registry.get(assigned_to)
if identity is None:
identity = await self._agent_registry.get_by_name(assigned_to)
if identity is not None:
return identity
logger.warning(
WORKERS_EXECUTION_SERVICE_NO_OP,
task_id=task_id,
reason="agent_not_registered",
assigned_to=assigned_to,
)
msg = (
f"Task {task_id!r} is assigned to an agent that is not "
f"registered in the runtime. Complete setup so agents are "
f"bootstrapped before submitting work."
)
raise AgentRuntimeNotConfiguredError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When assigned_to is None, the error message and exception type are misleading. AgentRuntimeNotConfiguredError is intended for missing LLM providers, but an unassigned task is a state issue for that specific task. The current message also incorrectly states the task is assigned to an agent that is not registered when it is actually not assigned at all. Handling the None case separately with a more accurate error message and type (e.g., NotFoundError) improves clarity.

    async def _resolve_identity(
        self,
        assigned_to: str | None,
        *,
        task_id: str,
    ) -> AgentIdentity:
        """Resolve the task's assigned agent identity, or raise."""
        if not assigned_to:
            logger.warning(
                WORKERS_EXECUTION_SERVICE_NO_OP,
                task_id=task_id,
                reason="task_unassigned",
            )
            msg = f"Task {task_id!r} is not assigned to any agent."
            raise NotFoundError(msg)

        identity = await self._agent_registry.get(assigned_to)
        if identity is None:
            identity = await self._agent_registry.get_by_name(assigned_to)

        if identity is not None:
            return identity

        logger.warning(
            WORKERS_EXECUTION_SERVICE_NO_OP,
            task_id=task_id,
            reason="agent_not_registered",
            assigned_to=assigned_to,
        )
        msg = (
            f"Task {task_id!r} is assigned to agent {assigned_to!r} which is not "
            f"registered in the runtime. Complete setup so agents are "
            f"bootstrapped before submitting work."
        )
        raise AgentRuntimeNotConfiguredError(msg)

Comment on lines +128 to +131
autonomy_resolver = AutonomyResolver(
registry=ActionTypeRegistry(),
config=AutonomyConfig(),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AutonomyResolver is being initialized with a default AutonomyConfig(). This likely ignores any autonomy-related settings defined in the application configuration (app_state.config.security). The resolver should be populated from the application's configuration to ensure user settings are respected.

Suggested change
autonomy_resolver = AutonomyResolver(
registry=ActionTypeRegistry(),
config=AutonomyConfig(),
)
autonomy_resolver = AutonomyResolver(
registry=ActionTypeRegistry(),
config=app_state.config.security.autonomy,
)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/api/state.py`:
- Around line 930-936: set_agent_workspace_root currently accepts and persists
relative Paths which can cause sandbox writes relative to the current working
directory; convert and persist an absolute Path instead: inside
set_agent_workspace_root, if the provided path is not absolute call
path.resolve(strict=False) (or Path.cwd()/path if you prefer) to produce an
absolute Path, then pass that absolute Path into the existing _set_once call so
_agent_workspace_root is always stored as an absolute path.

In `@src/synthorg/workers/execution_service.py`:
- Around line 299-306: The code wrongly raises NotFoundError when a task exists
but is unassigned; replace the NotFoundError in the assigned_to check with a
domain/state error (e.g., raise InvalidTaskStateError or TaskStateError) so
callers treat this as a configuration/scheduling problem, not a 404; keep the
existing logger.warning call (WORKERS_EXECUTION_SERVICE_NO_OP, task_id=task_id,
reason="task_unassigned") and raise the new exception with the same descriptive
message (f"Task {task_id!r} is not assigned to any agent.") and ensure you
import or define the chosen exception type in the module's exceptions import
list.
- Around line 260-264: Wrap the call to self._engine.run(...) in a try/except
that catches Exception, logs an ERROR (or WARNING as appropriate) including
task_id and agent_id (pull them from the local task/identity objects or their
attributes), the effective_autonomy and any short context, then re-raise the
exception; update the block around self._engine.run (where run_result is
assigned) to ensure failures are logged with task_id/agent_id before
propagating.

In `@tests/unit/providers/drivers/test_scripted.py`:
- Around line 126-129: Replace the loose assertion in test_warns_on_construction
so it looks for the specific safety-warning emitted by ScriptedDriver instead of
any warning: call ScriptedDriver(strategy=DeterministicResponseStrategy())
inside capture_logs() and then assert that there exists a log entry where
entry.get("log_level") == "warning" and the entry message (e.g.,
entry.get("message","")) contains the expected safety text or stable identifier
(for example the substring "ScriptedDriver" or the exact safety-warning string
your code emits); this pins the test to the specific safety-log contract
produced by ScriptedDriver rather than any unrelated warning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: b377d445-2abd-4e9e-a737-3a47d48451ae

📥 Commits

Reviewing files that changed from the base of the PR and between 353bd65 and 6829f3a.

📒 Files selected for processing (10)
  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/workers/runtime_builder.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/providers/drivers/test_scripted.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

No region/currency/locale privileged; use metric units; use British English per docs/reference/regional-defaults.md

Files:

  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • src/synthorg/api/app_helpers.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/controllers/test_setup.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/providers/drivers/test_scripted.py
  • src/synthorg/workers/execution_service.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min

Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back

Test doubles: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundaries is blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline)

FakeClock and mock_of import from tests._shared; inject via clock= and the helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))

Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/providers/drivers/test_scripted.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/providers/drivers/test_scripted.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env; no os.environ.get outside startup per docs/reference/configuration-precedence.md

Numerics must live in settings/definitions/; allow 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
src/**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Comments must document WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py

Vendor-agnostic names in project code/tests: use example-provider, test-provider, example-{large,medium,small}-001 instead of real vendor names. Allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No from __future__ import annotations in Python 3.14+; use PEP 758 except except A, B: (parens required for binding)

Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Errors must use <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError directly. Enforced by check_domain_error_hierarchy.py

Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries)

Use @computed_field for derived model fields; NotBlankStr for identifiers

Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py

Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)

Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock; lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable

Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML processing

Logging: use from synthorg.observability import get_logger; variable name always logger; never use import logging or print() in app code

Event names from observability.events.<domain> constants; use structured kwargs (logger.info(EVENT, key=value))

Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED AFTER persistence write

Secret-log redaction (SEC-1): never use error=str(exc) or interpolate {exc}; use error_type=type(exc).__name__ + error=safe_error_description(exc); never use exc_info=True; OTel: use `span.set_attribute("exception.message",...

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: two phases - construction (synchronous services in create_app body) and on_startup (services needing persistence). Ordering invariants: agent_registry before auto_wire_meetings; tunnel_provider unconditional; SettingsService before WorkflowExecutionObserver; OntologyService after persistence.connect()

Files:

  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Worker execution service: synthorg.workers.runtime_builder.build_worker_execution_service selects provider-backed or empty-company backstop; install via AppState.worker_execution_service seam; empty-company rejects with AgentRuntimeNotConfiguredError (4014); use swap_worker_execution_service / swap_provider_registry with lock synchronisation

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Read `docs/design/` page before implementing; deviations require approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Present every plan for accept/deny before coding (Planning gate)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Every convention PR must ship its enforcement gate per docs/reference/convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; no `/aurelio-review-pr` for external feedback until after PR creation. Fix EVERYTHING valid; no deferring.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: xdist `-n 8 --dist=loadfile` auto-applied via pyproject `addopts` (`loadfile` prevents 3.14+Windows ProactorEventLoop leak)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Pre-commit/pre-push hooks: `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: GitHub queries: use `gh issue list` via Bash, NOT MCP `list_issues`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: After every squash merge → `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: See `web/CLAUDE.md` for Web framework conventions (React 19); see `cli/CLAUDE.md` for CLI (Go binary) conventions
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T06:43:06.946Z
Learning: Python 3.14+ only; BUSL-1.1 → Apache 2.0 after Change Date; layout: `src/synthorg/` (src layout), `tests/`, `web/` (React 19), `cli/` (Go binary)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • src/synthorg/api/app_helpers.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/api/controllers/test_setup.py
  • src/synthorg/api/state.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/providers/drivers/test_scripted.py
  • src/synthorg/workers/execution_service.py

Comment thread src/synthorg/api/state.py
Comment thread src/synthorg/workers/execution_service.py Outdated
Comment thread src/synthorg/workers/execution_service.py Outdated
Comment thread tests/unit/providers/drivers/test_scripted.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 84.83755% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.97%. Comparing base (89b57ce) to head (45cbbf3).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/providers/drivers/scripted.py 81.92% 11 Missing and 4 partials ⚠️
src/synthorg/api/app_helpers.py 42.10% 8 Missing and 3 partials ⚠️
src/synthorg/api/app.py 61.11% 6 Missing and 1 partial ⚠️
...rc/synthorg/api/controllers/setup/agent_helpers.py 44.44% 5 Missing ⚠️
src/synthorg/api/state.py 93.33% 2 Missing ⚠️
src/synthorg/workers/execution_service.py 96.82% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2003      +/-   ##
==========================================
- Coverage   84.97%   84.97%   -0.01%     
==========================================
  Files        1880     1882       +2     
  Lines      110872   111136     +264     
  Branches     9457     9481      +24     
==========================================
+ Hits        94218    94440     +222     
- Misses      14343    14372      +29     
- Partials     2311     2324      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aureliolo added 9 commits May 18, 2026 09:09
CI: the new shared session-scoped provider_registry fixture poisoned the shared app, breaking two pre-existing no-provider assertions. Both tests now swap an empty ProviderRegistry and restore in finally (mirrors test_complete_succeeds_with_all_prerequisites): test_complete_rejects_without_providers and test_returns_status_with_seeded_users.

CodeRabbit: app_helpers validates SYNTHORG_DB_PATH is absolute before deriving the workspace root (raises, consistent with the artifact-dir guard); test_runtime_online_seam tmp_path typed as Path with the type-ignore dropped; test_task_create_empty_company matches Set-Cookie case-insensitively; test_state_swap asserts ServiceUnavailableError instead of bare Exception; removed redundant in-test structlog.reset_defaults() in two tests since the autouse _reset_structlog_state fixture already isolates.

Gemini: app_helpers and state hoist the agent-workspaces and data-dir literals to module Final constants; execution_service raises NotFoundError with an accurate message for an unassigned task instead of AgentRuntimeNotConfiguredError; runtime_builder feeds AutonomyResolver the configured app_state.config.config.autonomy instead of a default AutonomyConfig().
All four are CodeRabbit follow-ups on the round-1 fixes, reviewed on the new head.

set_agent_workspace_root rejects non-absolute paths (ValueError), matching the resolve_agent_workspace_root_env guard so sandbox writes cannot land cwd-relative. engine.run() is now wrapped so a provider/tooling failure logs an ERROR with task_id and agent_id before propagating (WORKERS_EXECUTION_SERVICE_FAILED). The existing-but-unassigned task now raises ConflictError (409) instead of NotFoundError: this reconciles the round-1 Gemini ask (do not use the misleading AgentRuntimeNotConfiguredError) with CodeRabbit round 2 (a loaded task is not a 404) -- a 409 state conflict is accurate for both. test_warns_on_construction now pins the specific PROVIDER_SCRIPTED_DRIVER_INSTANTIATED event and strategy field instead of any warning.

Added coverage for every new path: test_set_rejects_relative_path, test_unassigned_task_raises_conflict, test_engine_run_failure_logs_and_reraises.
@Aureliolo Aureliolo force-pushed the fix/agent-runtime-online branch from 6829f3a to f745bd4 Compare May 18, 2026 07:14
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 18, 2026 07:16 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 42-143: The build_worker_execution_service function is too large;
split its responsibilities into small helpers: extract provider resolution into
a helper like resolve_provider(registry) that checks
app_state.has_active_provider, calls registry.list_providers(), logs warnings
and returns the selected provider or raises/returns None; extract workspace
creation and web timeout/config read into
prepare_workspace_and_timeout(workspace_root, config_resolver) which ensures
workspace_root.mkdir(...) and returns web_request_timeout; extract tool
construction into build_tool_registry(workspace_root, app_state.config,
web_request_timeout) which calls build_default_tools_from_config and returns a
ToolRegistry; extract engine construction into construct_agent_engine(provider,
registry, tool_registry, app_state) that builds and returns the AgentEngine
instance; extract autonomy resolver creation into
create_autonomy_resolver(app_state.config); then have
build_worker_execution_service call these helpers in sequence, log the final
info, and return AgentEngineExecutionService(agent_engine, task_engine,
agent_registry, autonomy_resolver).

In `@tests/_shared/scripted_provider.py`:
- Around line 75-92: The constructor __init__ currently accepts conflicting
inputs (responses, response, error) and silently prioritizes responses; update
__init__ to validate that at most one of responses, response, or error is
provided and raise a ValueError if multiple are supplied, then keep the existing
logic that sets self._strategy to SequencedResponseStrategy or
SingleResponseStrategy (or None) and preserve deepcopy of capabilities
(TEST_CAPABILITIES) and assignment to self._capabilities and self._strategy.
- Around line 152-155: get_model_capabilities currently returns the internal
mutable _capabilities object by reference which allows tests to mutate shared
state; change get_model_capabilities to return a defensive copy instead (e.g.,
use copy.deepcopy(self._capabilities) or construct a new ModelCapabilities from
the fields of self._capabilities) so callers get an independent object; add the
needed import (copy) if using deepcopy and ensure the async method returns the
copied instance rather than self._capabilities.

In `@tests/integration/api/test_task_create_empty_company.py`:
- Around line 44-46: After calling _extract_auth_cookies(...) store and validate
the returned session_token and csrf_token immediately: ensure neither is
empty/None and raise/assert with a clear message so the test fails fast on
bootstrap regressions; update the test in test_task_create_empty_company.py
where session_token and csrf_token are set (and before setting client.headers
and X-CSRF-Token) to perform these checks.

In `@tests/unit/workers/test_runtime_builder.py`:
- Line 11: Replace the SystemClock import with the shared FakeClock and mock_of
from tests._shared, then instantiate/use FakeClock in the tests instead of
SystemClock and inject it via clock= parameters where relevant; specifically
change the top-level import "from synthorg.core.clock import SystemClock" to
"from tests._shared import FakeClock, mock_of", replace any SystemClock() usages
with FakeClock(), and where helpers use mocked types switch to the helper's spec
subscript form using mock_of[YourType] (apply same changes for the other
occurrences mentioned around lines 69-70, 102-103, 134-135).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: dbba6097-1b89-46e9-90d3-c245a31b0b1d

📥 Commits

Reviewing files that changed from the base of the PR and between 6829f3a and f745bd4.

📒 Files selected for processing (35)
  • CLAUDE.md
  • scripts/_ghost_wiring_manifest.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/providers/drivers/scripted.py
  • src/synthorg/providers/registry.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/workers/runtime_builder.py
  • tests/_shared/scripted_provider.py
  • tests/e2e/conftest.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/integration/api/conftest.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/unit/api/conftest.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/workers/test_execution_service.py
  • tests/unit/workers/test_runtime_builder.py
  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Unit
  • GitHub Check: Test E2E
  • GitHub Check: Test Integration
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (12)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use SettingsService/ConfigResolver for DB > env > code default (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at the boot site; no os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value per docs/reference/configuration-precedence.md

No from __future__ import annotations (3.14 has PEP 649); use PEP 758 except: except A, B: no parens unless binding

Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Errors must follow <Domain><Condition>Error naming from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py

Pydantic v2 frozen + extra="forbid" on every frozen model project-wide; gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries; use @computed_field for derived fields

Use NotBlankStr for identifiers in Pydantic models

Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)

Clock seam: clock: Clock | None = None; tests inject FakeClock; lifecycle services own _lifecycle_lock; timed-out stops mark unrestartable

Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML

Use from synthorg.observability import get_logger; variable always named logger; never import logging / print() in app code

Event names from observability.events.<domain> constants; use structured kwargs (`logger.info(EVENT, key=valu...

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/drivers/scripted.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/drivers/scripted.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal; enforced by scripts/check_no_magic_numbers.py

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/drivers/scripted.py
{src,web}/**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Comments must explain WHY only; no reviewer citations / issue back-refs / migration framing; enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py

Files:

  • src/synthorg/observability/events/provider.py
  • web/src/api/types/error-codes.gen.ts
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/error_taxonomy.py
  • web/src/api/types/openapi.gen.ts
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/drivers/scripted.py
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ design tokens only in Web Dashboard Design System per web/CLAUDE.md

Files:

  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min

Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back

Test doubles: use ladder in conventions.md section 12.1; FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary is blocked by scripts/check_mock_spec.py

FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))

Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/workers/test_runtime_builder.py
  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/api/conftest.py
  • tests/e2e/conftest.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_execution_service.py
  • tests/integration/api/conftest.py
  • tests/_shared/scripted_provider.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/workers/test_runtime_builder.py
  • tests/unit/api/controllers/test_setup_status.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/api/controllers/test_coordination.py
  • tests/integration/engine/test_coordination_wiring.py
  • tests/integration/api/test_runtime_install_ordering.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/unit/providers/drivers/test_scripted.py
  • tests/unit/api/conftest.py
  • tests/e2e/conftest.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_execution_service.py
  • tests/integration/api/conftest.py
  • tests/_shared/scripted_provider.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: construction phase (create_app body) wires synchronous services; on_startup phase (_build_lifecycle.on_startup) wires services needing persistence; ordering invariants: agent_registry before auto_wire_meetings; tunnel_provider wired unconditionally; SettingsService before WorkflowExecutionObserver; OntologyService after persistence.connect()

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/api/state.py
src/synthorg/api/controllers/setup/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Setup completion: post_setup_reinit() propagates failures; settings_svc.set("api", "setup_complete", "true") only after reinit succeeds; serialize check/validate/reinit/persist under COMPLETE_LOCK to prevent race conditions

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Resilience: provider calls through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses; retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error

Files:

  • src/synthorg/providers/registry.py
  • src/synthorg/providers/drivers/scripted.py
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Worker execution service: select via synthorg.workers.runtime_builder.build_worker_execution_service behind provider-present switch; install via AppState.worker_execution_service seam; append boot install hook FIRST after persistence/SettingsService; empty-company rejects task creation with AgentRuntimeNotConfiguredError (4014); use swap_worker_execution_service / swap_provider_registry with lock synchronisation

Files:

  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Read `docs/design/` page before implementing; deviations need approval per [DESIGN_SPEC.md](docs/DESIGN_SPEC.md)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: No region/currency/locale privileged; use metric units; use British English per [docs/reference/regional-defaults.md](docs/reference/regional-defaults.md)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Every convention PR ships its enforcement gate per [docs/reference/convention-gates.md](docs/reference/convention-gates.md)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Never edit `tests/baselines/unit_timing.json` or any `scripts/*_baseline.{txt,json}` / `scripts/_*_baseline.py`; timeout/slow failures = source-code regression; both families are PreToolUse-blocked; per-invocation bypass requires explicit user approval via `ALLOW_BASELINE_GROWTH=1 git commit ...`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; after PR use `/aurelio-review-pr` for external feedback; fix EVERYTHING valid; no deferring
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Use `.pre-commit-config.yaml` for pre-commit/pre-push hooks; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: GitHub queries: use `gh issue list` via Bash, NOT MCP `list_issues`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: After every squash merge → `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-18T07:15:31.194Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • tests/unit/workers/test_runtime_builder.py
  • tests/unit/api/controllers/test_setup_status.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/workers.py
  • src/synthorg/core/error_taxonomy.py
  • tests/unit/api/test_state_swap_worker_execution_service.py
  • tests/unit/api/controllers/test_coordination.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/tasks.py
  • tests/integration/engine/test_coordination_wiring.py
  • src/synthorg/core/domain_errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/api/state_services.py
  • tests/integration/api/test_runtime_install_ordering.py
  • src/synthorg/api/app.py
  • src/synthorg/api/app_helpers.py
  • src/synthorg/workers/runtime_builder.py
  • tests/e2e/test_runtime_online_seam.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/engine/quality/scripted_provider.py
  • tests/unit/providers/drivers/test_scripted.py
  • src/synthorg/workers/execution_service.py
  • tests/unit/api/conftest.py
  • tests/e2e/conftest.py
  • tests/integration/api/test_task_create_empty_company.py
  • tests/unit/workers/test_execution_service.py
  • tests/integration/api/conftest.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/drivers/scripted.py
  • tests/_shared/scripted_provider.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • CLAUDE.md
🔇 Additional comments (33)
src/synthorg/observability/events/provider.py (1)

9-9: LGTM!

web/src/api/types/error-codes.gen.ts (1)

57-57: LGTM!

tests/unit/api/controllers/test_setup_status.py (1)

12-12: LGTM!

Also applies to: 26-39

scripts/_ghost_wiring_manifest.txt (1)

26-26: LGTM!

src/synthorg/observability/events/api.py (1)

64-64: LGTM!

src/synthorg/observability/events/workers.py (1)

72-82: LGTM!

src/synthorg/core/error_taxonomy.py (1)

109-109: LGTM!

CLAUDE.md (1)

82-83: LGTM!

web/src/api/types/openapi.gen.ts (1)

7295-7295: LGTM!

tests/unit/api/test_state_swap_worker_execution_service.py (1)

1-81: LGTM!

tests/unit/api/controllers/test_coordination.py (1)

127-130: LGTM!

Also applies to: 142-144

src/synthorg/api/controllers/setup/agent_helpers.py (1)

142-166: LGTM!

src/synthorg/api/controllers/tasks.py (1)

25-25: LGTM!

Also applies to: 37-37, 182-189

tests/integration/engine/test_coordination_wiring.py (1)

285-289: LGTM!

Also applies to: 300-302

src/synthorg/core/domain_errors.py (1)

248-264: LGTM!

src/synthorg/providers/registry.py (1)

141-146: LGTM!

src/synthorg/api/state_services.py (1)

137-137: LGTM!

Also applies to: 451-460, 471-490, 527-531

tests/integration/api/test_runtime_install_ordering.py (1)

1-57: LGTM!

src/synthorg/api/app.py (1)

33-34: LGTM!

Also applies to: 980-1029

src/synthorg/api/app_helpers.py (1)

48-50: LGTM!

Also applies to: 110-110, 129-161

tests/e2e/test_runtime_online_seam.py (1)

1-197: LGTM!

tests/unit/api/controllers/test_setup.py (1)

372-380: LGTM!

Also applies to: 385-385

tests/unit/engine/quality/scripted_provider.py (1)

1-18: LGTM!

tests/unit/providers/drivers/test_scripted.py (1)

1-138: LGTM!

src/synthorg/workers/execution_service.py (1)

5-31: LGTM!

Also applies to: 36-55, 71-77, 99-113, 139-147, 197-410

tests/unit/api/conftest.py (1)

30-31: LGTM!

Also applies to: 46-47, 356-370, 426-427, 459-460

tests/e2e/conftest.py (1)

1-6: LGTM!

Also applies to: 12-18, 23-29

tests/integration/api/test_task_create_empty_company.py (1)

1-34: LGTM!

Also applies to: 50-112

tests/unit/workers/test_execution_service.py (1)

1-296: LGTM!

tests/integration/api/conftest.py (1)

12-31: LGTM!

Also applies to: 34-93

src/synthorg/api/state.py (1)

9-14: LGTM!

Also applies to: 84-88, 160-161, 178-178, 430-443, 860-865, 883-950, 968-973

src/synthorg/providers/drivers/scripted.py (1)

1-293: LGTM!

tests/_shared/scripted_provider.py (1)

1-74: LGTM!

Also applies to: 93-151, 157-269

Comment thread src/synthorg/workers/runtime_builder.py Outdated
Comment thread tests/_shared/scripted_provider.py
Comment thread tests/_shared/scripted_provider.py Outdated
Comment thread tests/integration/api/test_task_create_empty_company.py
Comment thread tests/unit/workers/test_runtime_builder.py Outdated
CodeRabbit follow-ups on the round-2 head.

build_worker_execution_service split into _select_active_provider, _build_tool_registry, and _construct_agent_engine helpers so each unit is under the 50-line rule. ScriptedProvider test double now raises ValueError when more than one of responses/response/error is supplied, and get_model_capabilities returns a deepcopy so callers cannot mutate shared capabilities state. The empty-company integration test asserts the auth bootstrap session and csrf cookies are non-empty before reuse. test_runtime_builder uses FakeClock from tests._shared instead of SystemClock per the clock-seam convention.

All gates green: mypy-affected, pytest-affected, ghost-wiring, function-size.
@Aureliolo Aureliolo merged commit e5eef1a into main May 18, 2026
82 checks passed
@Aureliolo Aureliolo deleted the fix/agent-runtime-online branch May 18, 2026 07:48
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 18, 2026 07:48 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request May 18, 2026
…harness) (#2006)

## Summary

Wires the GHOST `IntakeEngine` into the shipped boot path so the
client-simulation runtime comes online and becomes the deterministic e2e
acceptance substrate (independent child of EPIC #1955; runtime root
#2003 already merged).

- New `src/synthorg/client/runtime_builder.py`
(`build_client_simulation_runtime`): constructs the real `IntakeEngine`
+ a single-stage `ReviewPipeline` (`InternalReviewStage`) and returns a
populated `ClientSimulationState`. Called from `create_app`'s
construction phase when a `TaskEngine` is present, so
`has_simulation_runtime` is true and the `/simulations` + `/requests`
controllers register.
- Pluggable intake: `IntakeConfig` discriminator +
`build_intake_strategy()` factory (`direct` -> `DirectIntake`, no LLM;
`agent` -> `AgentIntake`, LLM triage). New `simulations.intake_strategy`
/ `simulations.intake_model` settings, resolved at the boot site via the
bootstrap resolver (env > default; `read_only_post_init`, matching the
existing `simulations.task_timeout_seconds` pattern). A selected `agent`
strategy that cannot be satisfied degrades to `direct` with a WARNING; a
`direct` failure propagates.
- Default `direct` strategy makes zero LLM calls, so the runtime comes
online for an empty company.
- Docs synced: `docs/design/client-simulation.md` (removed the "not
wired" warning, added the Boot wiring section + factory-table row);
`README.md` (intake engine is now exercised end to end).

## CORE acceptance

`scripts/_ghost_wiring_manifest.txt` `IntakeEngine` line flipped
`PENDING` -> `ENFORCED` in this PR; `scripts/check_no_ghost_wiring.py`
passes (verified construction site under `src/synthorg/client/`,
reachable from `app.py`).

## Bundled fix: pre-existing latent circular import

Wiring `runtime_builder` surfaced a pre-existing cycle:
`synthorg.client.*` was unimportable as a *first* import in a fresh
interpreter (`budget.cost_record` -> `ontology` -> `persistence` ->
`security` -> `providers` -> back into budget mid-init). Reproduced on
unmodified `synthorg.client.runner`, so it is not introduced here. Root
fix: PEP 562 lazy export of `OntologyService` /
`OntologyEntityRepository` from `synthorg.ontology.__init__` (locked
double-checked cache, mirroring the established `synthorg.tools.mcp`
pattern). Public API unchanged: `from synthorg.ontology import
OntologyService` still works, resolved on first access. Authorized by
the maintainer as in-scope.

## Tests (four layers)

- Seam (e2e): builder drives a real `ClientRequest` SUBMITTED ->
TASK_CREATED against a real `TaskEngine`.
- Boot-wired HTTP surface: no kwarg; `/capabilities` reports
simulations/requests on, routes register.
- Direct + scripted-agent harness: `SimulationRunner.run` driven
directly against the boot-wired runtime, deterministic asserted metrics,
zero LLM spend.
- Factory/builder/settings/lazy-export unit tests; obsolete capability
tests updated to the new reality.

The deterministic harness assertion drives `SimulationRunner.run`
directly rather than the HTTP poll-to-completion path: litestar
`TestClient` does not progress fire-and-forget `asyncio.create_task`
background tasks between sync requests, so a poll-based e2e would be
flaky. The HTTP surface (routes register, 201 on start) is still covered
separately.

## Review coverage

Pre-PR review ran 23 agents (17 specialist + 6 audit mini-pass). All
valid findings (V1-V5) and the user-approved pre-existing/outside-diff
items (P1: `web/src/api/endpoints/clients.ts` migrated to the generated
`@/api/types` barrel, fixing the `SimulationStatusResponse` /
`acceptance_criteria` drift; P2: pre-raise logging on intake
`ValueError` paths; P3: named tolerance constants, function-length
refactor) implemented. Seven findings assessed invalid (agent misreads
of PEP 758 / bootstrap-resolver semantics, gate-sanctioned
`@computed_field` exemption, established-pattern conflicts) with
rationale in the triage record.

Gates green: ruff, mypy (3902 files), `check_no_ghost_wiring.py`,
affected pytest, full `pre-commit run --all-files`; web type-check,
lint, 3182 tests.

Closes #1961
Aureliolo added a commit that referenced this pull request May 18, 2026
## Summary

Wires the four GHOST governance subsystems from EPIC #1955 so they
actually run on a live agent behind the provider-present switch (the
runtime root #1956/#2003 only wired the minimal safety spine).

- **Single boot `ApprovalGate` + real park/resume.** One gate
constructed at boot in `lifecycle_builder._wire_approval_gate` (backed
by the dual-backend `ParkedContextRepository`), attached to `AppState`
and injected into `AgentEngine` so the engine parks and `/approvals`
resumes on the *same* gate. New `AgentEngine.resume_parked_run` restores
the parked `AgentContext`, injects the approval decision (D21;
SYSTEM-message follow-up since the loop appends the escalated tool
result before the park check), and continues `_execute`.
`AgentEngineExecutionService.dispatch_resume` runs the resume as a
tracked `BackgroundTaskRegistry` task off the approve/reject request
path, drained on shutdown.
- **Progressive trust enforced.** `TrustService` threaded into the
tool-invoker seam; effective tool permissions are narrowed to the more
restrictive of identity level and earned trust level (auto-initialised
on first sight). `DISABLED` strategy is a no-op; switching strategy
changes enforcement.
- **Agent → SynthOrg-MCP self-consumer.** New
`engine/mcp_self_consumer.py` (pluggable: StrEnum mode + frozen config +
factory + safe `DISABLED` default) exposes SynthOrg's own MCP tools to a
running agent, trust-scoped via `MCPToolScoper`, with `actor=identity`
threaded so `require_admin_guardrails` fails closed on admin tools.
- **AutonomyChangeStrategy wired.** Built at boot from
`config.autonomy.change_strategy` (default `HUMAN_ONLY`), attached to
`AppState`; the autonomy controller now enforces the D6 seniority rule,
consults the strategy, and enqueues a real approval (replacing the
always-deny stub).
- **Anti-ghost lock:** `scripts/_ghost_wiring_manifest.txt` flips
`ApprovalGate`, `TrustService`, `build_mcp_self_consumer`,
`build_autonomy_change_strategy` to `ENFORCED` in this PR.
- `docs/design/security.md` updated to current runtime-enforcement
state.

## Test plan

- New unit suites: `test_agent_engine_resume.py`,
`test_agent_engine_trust.py`, `test_mcp_self_consumer.py`,
`test_enforcement.py`; extended approval-gate / runtime-builder /
execution-service / approvals-helper / autonomy / startup-wiring /
guards tests; conversation-shape lock for the resume-injection
invariant.
- `uv run ruff check src/ tests/` + `ruff format` + `uv run mypy src/
tests/`: clean (3903 files).
- Affected-tests gate: 29154 passed, 18 skipped. Full pre-commit +
pre-push (ghost-wiring, dto-types-in-sync, ESLint, all convention gates)
green.
- Acceptance verified end-to-end at every seam: park → `/approvals` →
resume on approve/reject; trust-strategy switch changes the permitted
tool set; agent invokes a SynthOrg MCP read tool; admin tool fails
closed without confirm/reason/actor; autonomy request routes through the
strategy.

## Review coverage

Pre-PR reviewed by 22 review + audit-mini-pass agents
(issue-resolution-verifier: all 5 acceptance criteria RESOLVED;
ghost-wiring: PASS). 11 valid findings triaged and **all implemented**
(`_audit/pre-pr-review/triage.md`): fail-safe `_cleanup_parked` (no
silent duplicate resume), louder no-parked-context path,
`wrap_untrusted` for the injected decision reason, self-validating
`AutonomyLevelRequest.reason` + regenerated TS types, MSW `reason`
validation, `NamedTuple` trust result, logging coverage, function-length
extraction, larger resume drain budget, tighter guard test, added tests.
The ~20 "`except A, B:` syntax error" flags were dismissed as a
confirmed PEP 758 false positive (CLAUDE.md-mandated; all gates green).

Closes #1957
Aureliolo pushed a commit that referenced this pull request May 19, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Multi-agent coordination is now active immediately on startup for
smoother operation.
- Governance rules are fully enforced during use, ensuring compliance at
all times.
- Coordination metrics update live, giving real-time insights into
system activity.
- Review agents are now reliably processed, preventing silent drops in
tasks.
- Sandbox containers can be reused for agents and tasks, speeding up
execution and reducing overhead.

### What's new
- Agents support online runtime with a minimal safety framework to
improve stability.
- Recorded LLM interactions can be deterministically replayed at the
provider interface.
- Distributed path validation has been enhanced for more robust data
routing.
- A client-simulation runtime was added for end-to-end testing of the
IntakeEngine.
- A new work pipeline spine architecture has been introduced to
streamline task processing.

### Under the hood
- Infrastructure, Python, and web dependencies have all been updated to
latest versions.
- Updated apko lockfiles in the CI/CD pipeline improve build
consistency.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.6](v0.8.5...v0.8.6)
(2026-05-19)


### Features

* agent runtime online + minimal safety spine (runtime root)
([#2003](#2003))
([e5eef1a](e5eef1a)),
closes [#1956](#1956)
* deterministic recorded-LLM cassette replay at the provider chokepoint
([#2010](#2010))
([cabf55d](cabf55d))
* distributed path validation + hardening
([#2011](#2011))
([a382e4a](a382e4a)),
closes [#1966](#1966)
* wire IntakeEngine via boot client-simulation runtime (e2e test
harness) ([#2006](#2006))
([6a9c0aa](6a9c0aa)),
closes [#1961](#1961)
* work pipeline spine
([#1960](#1960))
([#2013](#2013))
([29b64e3](29b64e3))


### Bug Fixes

* bring the multi-agent coordinator online at boot
([#2007](#2007))
([180b38a](180b38a)),
closes [#1958](#1958)
* full governance enforcement online
([#1957](#1957))
([#2005](#2005))
([4140fc5](4140fc5))
* harden anti-ghost-wiring gate and fix silently-dropped review agents
([#2000](#2000))
([89b57ce](89b57ce))
* make coordination metrics live
([#1959](#1959))
([#2012](#2012))
([c4775e2](c4775e2))
* sandbox lifecycle dispatch (per-agent / per-task container reuse)
([#2008](#2008))
([03d2587](03d2587)),
closes [#1965](#1965)


### Documentation

* add GitButler concept-only concurrency research
([#1978](#1978))
([#2009](#2009))
([9e4f5c1](9e4f5c1))
* honest-hybrid refresh of README, site, and design specs
([#2001](#2001))
([f485bea](f485bea))


### CI/CD

* update apko lockfiles
([#2004](#2004))
([e2b9eee](e2b9eee))


### Maintenance

* Update Infrastructure dependencies
([#2014](#2014))
([0b16bdf](0b16bdf))
* Update Python dependencies
([#2015](#2015))
([a7224bb](a7224bb))
* Update Web dependencies
([#2016](#2016))
([7a7fe76](7a7fe76))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Aureliolo added a commit that referenced this pull request May 24, 2026
…2096) (#2097)

## Summary

Audit follow-up to merged #1956 (PR #2003 "Agent runtime online +
minimal safety spine"). The audit found 13 of 14 acceptance / scope
items PASS; the only WEAK item (composed end-to-end test) was already
covered by `tests/e2e/test_runtime_online_seam.py`. The single genuine
gap was operator observability of the SecOps spine state at boot. This
PR closes that gap.

Closes #2096.

## What changed

**Runtime-services boot log carries the safety-spine state** so
operators reading `synthorg.log` can see, at a glance, whether the
SecOps interceptor will be `active` / `shadow` / `disabled` once a
provider runs. Two structured kwargs (`security_enabled`,
`security_enforcement_mode`) added to all three
`logger.info(API_APP_STARTUP, service="runtime_services", ...)` call
sites in `src/synthorg/workers/runtime_builder.py` (the two no-provider
branches in `_select_active_provider` and the provider-present branch in
`build_runtime_services`).

**Three pre-PR-review findings folded in by user request:**

- `# module-kind: orchestrator` header added to
`src/synthorg/workers/runtime_builder.py` (was defaulting to `code` tier
with cap 500 but is an orchestrator at 840 gate-LOC; cap is now
correctly `orchestrator` 600 with the baseline carrying the actual
size).
- `# module-kind: service` header added to
`src/synthorg/workers/execution_service.py` (sibling classification gap;
same shape).
- Try / except around the `_build_runtime_coordinator` TaskGroup so a
failure in decomposition / routing-scorer / workspace config resolve
logs operator context (via `log_exception_redacted`) before propagating,
instead of escaping silently.
- New summary `logger.info(API_APP_STARTUP, mode="agent_engine_built",
...)` at the end of `build_runtime_services` reporting which optional
subsystems (coordinator, work pipeline, red team, vision gate) wired, so
operators can see the final boot shape in one line.

## Test plan

- `tests/unit/workers/test_runtime_builder.py` adds a
`TestBootLogSafetySpineState` class with three tests asserting the new
`security_enabled` + `security_enforcement_mode` kwargs appear on each
of the three boot branches (no-provider, empty-registry,
provider-present), using `structlog.testing.capture_logs`.
- Pre-push gates ran clean locally: ruff, ruff-format, mypy (affected),
pytest-unit (affected), all convention gates including `module-size
budget` and `no-ghost-wiring`.
- E2E acceptance is already proven by
`tests/e2e/test_runtime_online_seam.py::test_runtime_executes_task_through_seam_with_safety_spine`
(unchanged).

## Review coverage

Pre-reviewed by 4 agents per "reduced agent count" instruction:
`code-reviewer` (APPROVED), `python-reviewer` (APPROVED),
`logging-audit` (COMPLIANT), `conventions-enforcer` (1 MAJOR + 1
MEDIUM). Mandatory `docs-consistency` / `comment-quality-rot` /
mini-pass agents skipped per explicit user instruction (no docs, no
comment narrative, no ghost wiring introduced).

All four valid findings (MAJOR + MEDIUM + 2 SUGGESTION) applied in this
PR per user direction.

## Files

- `src/synthorg/workers/runtime_builder.py` (+23 gate-LOC: 3 log sites +
2 local vars + try / except wrapper + summary log + module-kind header)
- `src/synthorg/workers/execution_service.py` (module-kind header only;
0 gate-LOC)
- `tests/unit/workers/test_runtime_builder.py` (new
`TestBootLogSafetySpineState` class; +73 lines)
- `scripts/_module_size_baseline.json` (mechanical baseline bump 809 to
840 for `runtime_builder.py`; user-approved)
- `.codespellrc` (added
`.test_durations.unit,.test_durations.integration` to skip list; was
blocking pre-commit on a deliberate `parse_bool` test parametrize ID)

## Manifest

No new `_ghost_wiring_manifest.txt` entries. `AgentEngine` is already
`ENFORCED` from #1956.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent runtime online + minimal safety spine (runtime root)

1 participant