Skip to content

feat(env): reproducible per-project environments#2039

Merged
Aureliolo merged 15 commits into
mainfrom
feat/1994-reproducible-project-environments
May 22, 2026
Merged

feat(env): reproducible per-project environments#2039
Aureliolo merged 15 commits into
mainfrom
feat/1994-reproducible-project-environments

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Implements the pluggable per-project reproducible-environment subsystem so "worked in the agent sandbox" equals "works on a fresh clone".

  • Declaration formats (pluggable via EnvironmentType discriminator + factory): manifest (default, backend-agnostic synthorg.env.yaml + generated bootstrap.sh), devcontainer (sealed Docker image build), nix (hermetic flake). Safe default ships; auto-seeds + commits a default declaration into a fresh workspace.
  • Provisioning is once-per-(project_id, declaration_hash) (persisted project_environments row is the durable cache; a declaration edit re-provisions). Fail-loud: a broken environment never presents itself as ready.
  • Sandbox integration: the provisioned result threads to the agent's per-tool-call sandbox via the ambient ActiveSandboxEnvironment contextvar (image override + env additions), bound for one agent run. Override images run under the existing hardened host config.
  • Persistence: ProjectEnvironmentRepository (SQLite + Postgres, dual-backend conformance-tested), 1:1 with Project, FK cascade.
  • Wiring: EnvironmentService constructed in _install_runtime_services behind has_persistence and threaded into AgentEngineExecutionService.

Pre-PR review hardening (this branch)

Reviewed locally by 23 agents (18 specialised + 5 audit mini-pass); all valid findings addressed:

  • Model invariants enforced (image_ref required only for DEVCONTAINER; SHA-256 hash format; provisioned_at <= updated_at; BuildOutcome timeout marker).
  • Manifest lockfile paths guarded against workspace escape before hashing.
  • Transient docker image builds retried via GeneralRetryHandler (timeout / registry / network markers); deterministic build failures are not retried.
  • Declaration-sourced env additions screened through the subprocess denylist so a declared secret / exec-hijack var (LD_PRELOAD, PYTHONPATH, ...) cannot bypass the filter.
  • Build subprocess reaped on cancellation; bounded command output in setup-failure logs; warning when env provisioning is skipped for a missing workspace; row-persisted state-transition log.
  • Docs: error-code reference (4017/4018, 8016-8025), CLAUDE.md wiring invariant, persistence protocol inventory, DESIGN_SPEC + coordination page.

Note on retry scope: the approved resilience suggestion (R1/R2) is implemented via a precise transient-only retry predicate around the docker build (R2). The blanket is_retryable=True on EnvironmentProvisionError/EnvironmentDockerBuildError (R1) was deliberately not set, because it would mislabel deterministic provision failures (bad manifest/Dockerfile) as retryable to API clients. Say the word if you want the API-client retry-after signal on just EnvironmentDockerBuildError.

Test plan

  • uv run ruff check src/ tests/ clean; uv run mypy --num-workers=4 src/ tests/ clean (4354 files).
  • Targeted unit suites green (env subsystem, sandbox screening, model validators, persistence protocol, workers).
  • Pre-push gate green: mypy-affected, pytest-unit-affected, ghost-wiring, domain-error-hierarchy, dual-backend parity, schema-drift, dead-API, frozen-model, and ~40 convention gates.
  • Acceptance: tests/integration/engine/workspace/test_environment_acceptance.py provisions, commits, fresh-clones, runs bootstrap.sh with stock tools (no SynthOrg present), and runs the deliverable's own tests (marked slow).

Closes #1994

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a new reproducible-environment subsystem designed to eliminate 'works on my machine' issues. By allowing projects to commit environment declarations (manifests, devcontainers, or nix flakes), the system ensures that agent execution environments are consistently provisioned. The change includes a new service layer, persistence models, and integration with the existing sandbox infrastructure to thread environment configurations into agent runs.

Highlights

  • Reproducible Environment Subsystem: Introduced a pluggable per-project environment subsystem that ensures development environments are consistent across agent sandboxes, CI, and fresh clones.
  • Environment Strategies: Implemented three strategies: 'manifest' (default, bootstrap-based), 'devcontainer' (sealed Docker images), and 'nix' (hermetic flakes).
  • Sandbox Integration: Integrated the environment subsystem into the agent sandbox via an ambient contextvar, allowing per-task image overrides and environment variable additions.
  • Persistence and Resilience: Added a durable cache via 'ProjectEnvironmentRepository' (SQLite/Postgres) and implemented robust build retries for transient failures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a3ac355e-29ec-4319-893c-bf42e5eb2a72

📥 Commits

Reviewing files that changed from the base of the PR and between 593e271 and 469997f.

📒 Files selected for processing (8)
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/workers/runtime_builder.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/tools/sandbox/test_protocol.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Integration
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Dashboard Test
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Read design spec from docs/design/ page before implementing; deviations need approval per DESIGN_SPEC.md

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py (SingletonRepository, IdKeyedRepository, FilteredQueryRepository, AppendOnlyRepository, StatefulRepository, MVCCRepository)

No from __future__ import annotations in Python files; use PEP 758 syntax except except A, B: requires parens when binding

Type hints required on public functions; enforce mypy strict mode; use Google-style docstrings; limit line length to 88; functions <50 lines; files <800 lines

Error classes must follow <Domain><Condition>Error naming pattern and inherit from DomainError, never directly from Exception, RuntimeError, or similar

Pydantic v2: all frozen models must have extra="forbid" project-wide; use @computed_field for derived values; use NotBlankStr for identifiers

Use args models at every system boundary; call parse_typed() for every external dict ingestion, enforced by check_boundary_typed.py

For immutability: use model_copy(update=...) or copy.deepcopy(); apply deepcopy at system boundaries

Use asyncio.TaskGroup for async fan-out/fan-in; helper functions catch Exception but re-raise MemoryError and RecursionError

Inject Clock seam via clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark services unrestartable

Wrap untrusted content via wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML input (SEC-1)

Repository CRUD: implement methods save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples

In persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive datetimes); call normalize_utc for already-typed values

Import logger via `from synthorg.observability i...

Files:

  • tests/unit/tools/sandbox/test_protocol.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Mark tests with @pytest.mark.{unit,integration,e2e,slow}; async tests use auto mode; enforce 30s global timeout; maintain minimum 80% code coverage

xdist applied automatically via pyproject addopts as -n 8 --dist=loadfile (loadfile prevents Python 3.14+ Windows ProactorEventLoop leak)

Windows test execution: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back to ProactorEventLoop

Use test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; never use bare MagicMock at typed boundaries (blocked by check_mock_spec.py)

Import FakeClock and mock_of from tests._shared; inject via clock= parameter and helper's spec subscript

Use vendor-agnostic provider names in project code/tests: example-provider, test-provider, example-{large,medium,small}-001; never use real vendor names except in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Hypothesis testing: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never use @pytest.mark.skip or @pytest.mark.xfail for flaky tests, fix fundamentally

Files:

  • tests/unit/tools/sandbox/test_protocol.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/tools/sandbox/test_protocol.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
**/*.{py,ts,tsx,js,jsx,go,sh,bash}

📄 CodeRabbit inference engine (CLAUDE.md)

Diagrams: use D2 (v0.7.1 pinned in CI) for architecture/nested containers with theme 200 (Dark Mauve); use Mermaid for flowcharts/sequence/pipelines; use Markdown tables for tabular data

Files:

  • tests/unit/tools/sandbox/test_protocol.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Present every implementation plan for accept/deny before coding (MANDATORY planning phase)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: No region/country/currency/locale should be privileged; use metric units; use British English for all user-facing text per `docs/reference/regional-defaults.md`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Every convention PR must ship its enforcement gate; detail in `docs/reference/convention-gates.md`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Configuration precedence: DB > env > code default via `SettingsService`/`ConfigResolver` (Cat-1) or env > code default (Cat-2, `read_only_post_init`); Cat-3 bootstrap secrets are pure env; YAML is company-template format, not a precedence tier; no `os.environ.get` outside startup; pre-init Cat-2 reads use `settings.bootstrap_resolver.resolve_init_value`; per `docs/reference/configuration-precedence.md`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Test regression prevention: timeout/slow test failures indicate source-code regression; never edit `tests/baselines/unit_timing.json` or any `scripts/*_baseline.{txt,json}` / `scripts/_*_baseline.py`; PreToolUse-blocked; per-invocation bypass: `ALLOW_BASELINE_GROWTH=1 git commit ...` requires explicit user approval
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: After issue completion: branch + commit + push (no auto-PR); use `/pre-pr-review` instead; `gh pr create` is blocked by `scripts/check_no_pr_create.sh`; after PR: use `/aurelio-review-pr` for external feedback; fix all valid findings; no deferring
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: MCP: 200+ tools across 15 domain modules under `meta/mcp/domains/`; define `ToolHandler` + `args_model`; call `require_admin_guardrails()` on admin tools; route through service layers; see `mcp-handler-contract.md`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Telemetry: opt-in and off by default; every event property must be in `_ALLOWED_PROPERTIES`; per `telemetry.md`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Resilience: provider calls go through `BaseCompletionProvider` (retry + rate limit); never implement retry in driver subclasses; retryable errors: `RateLimitError`, `Provider{Timeout,Connection,Internal}Error`; non-provider transient I/O uses `GeneralRetryHandler` with `retryable` predicate
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: WebSocket frame-level timeout closes silent peers (1008); revalidation saturation closes (4011)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Conversational propose: `POST /meta/chat/propose` is opt-in (`meta.chief_of_staff.propose_enabled`); `ChiefOfStaffProposer` is built by `build_chief_of_staff_proposer` (ENFORCED manifest entry); 503s when ANY of provider, connected persistence, or work pipeline is missing; human content wrapped via `wrap_untrusted(TAG_TASK_DATA, ...)`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: API startup lifecycle: construction phase (`create_app` body) wires synchronous services; on_startup phase (`_build_lifecycle.on_startup`) wires services needing persistence backend; construction-phase ordering: `agent_registry` before `auto_wire_meetings`; `tunnel_provider` unconditional; on-startup ordering: `SettingsService` before `WorkflowExecutionObserver`; `OntologyService` after `persistence.connect()`; cost-dial services via `_try_wire_cost_dial` after persistence, best-effort idempotent; knowledge substrate via `_wire_knowledge_engine` best-effort gated on `has_persistence` AND `has_memory_backend`; `EnvironmentService` wired via `_install_runtime_services`; `setup_complete` serialized under `COMPLETE_LOCK`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Runtime services: `build_runtime_services` selects via ONE provider-present switch; returns `RuntimeServices` pair (worker execution service + coordinator) from SINGLE shared `AgentEngine`; `_install_runtime_services` boots both via `AppState.worker_execution_service` and `AppState.coordinator` seams, appended FIRST; `swap_worker_execution_service` / `swap_coordinator` / `swap_provider_registry` hold lock; empty-company rejects at controller (`AgentRuntimeNotConfiguredError`, 4014) and `/coordinate` 503s
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Git: commits use `<type>: <description>` format (feat/fix/refactor/docs/test/chore/perf/ci), enforced by commitizen; signed commits required on protected refs (GPG/SSH or GitHub App `synthorg-repo-bot`); branches named `<type>/<slug>` from main; pre-commit/pre-push hooks via `.pre-commit-config.yaml`; tool-call gates in `.claude/settings.json` PreToolUse
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: Squash merge: PR body becomes squash commit message; trailers (`Release-As`, `Closes `#N``) must be in PR body; after every squash merge run `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T09:42:30.751Z
Learning: CLI is Docker-only for init/start/stop/status operations; new features go in dashboard and REST API, not CLI
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/tools/sandbox/test_protocol.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • tests/unit/tools/sandbox/test_protocol.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
🔇 Additional comments (8)
tests/unit/tools/sandbox/test_protocol.py (1)

40-47: LGTM!

src/synthorg/observability/events/workspace.py (1)

80-80: LGTM!

src/synthorg/engine/workspace/environment/nix.py (1)

33-33: LGTM!

Also applies to: 92-109, 134-139

tests/unit/engine/workspace/environment/test_devcontainer_strategy.py (1)

168-170: LGTM!

tests/integration/engine/workspace/test_environment_acceptance.py (1)

232-234: LGTM!

Also applies to: 246-248

src/synthorg/workers/runtime_builder.py (1)

736-740: LGTM!

tests/unit/engine/workspace/environment/test_manifest_strategy.py (1)

148-164: LGTM!

src/synthorg/engine/workspace/environment/devcontainer.py (1)

203-219: LGTM!


Walkthrough

Implements a pluggable per-project environment system (manifest/devcontainer/Nix) with provisioning, durable caching, and strict reuse semantics. Adds EnvironmentService, EnvironmentStrategy protocols, concrete strategies (manifest, devcontainer, nix), image-builder and committer helpers, contextvar-based ActiveSandboxEnvironment, and integration with Docker/Subprocess sandboxes and AgentEngineExecutionService. Persists provisioned rows via ProjectEnvironmentRepository with SQLite/Postgres implementations and migrations. Wires service at app startup, adds a SandboxEnvironmentRunner adapter, updates error codes and observability events, expands tests (unit, integration, conformance), and updates design/reference docs.

Suggested labels

autorelease: tagged

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a pluggable per-project reproducible environment subsystem supporting manifest, devcontainer, and Nix strategies. It includes the necessary domain models, persistence repositories, and worker integration to provision environments before agent execution. The review identifies a caching bug in the devcontainer strategy regarding default Dockerfiles, a potential IndexError in command list processing, and a memory leak risk in the service's lock management. Furthermore, a security inconsistency was noted where Docker sandboxes lack the environment variable screening applied to subprocess sandboxes.

Comment on lines +208 to +210
dockerfile = self._resolve_within(
workspace_path, path.parent, build.get("dockerfile")
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The declaration_hash method fails to include the default Dockerfile in the hash when the dockerfile key is omitted from the build configuration. This results in a caching bug where changes to the default Dockerfile do not trigger a re-provision. It should use the same default value ("Dockerfile") as the _build_image and managed_paths methods.

            dockerfile_path = build.get("dockerfile", "Dockerfile")
            dockerfile = self._resolve_within(
                workspace_path, path.parent, dockerfile_path
            )

Comment on lines +348 to +352
elif isinstance(post_create, list) and all(
isinstance(part, str) for part in post_create
):
parts: list[str] = [str(p) for p in post_create]
command, args = parts[0], tuple(parts[1:])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If postCreateCommand is an empty list in devcontainer.json, this block will be entered (as all([]) is true), but parts[0] will raise an IndexError. A check for a non-empty list should be added to handle this case gracefully.

Suggested change
elif isinstance(post_create, list) and all(
isinstance(part, str) for part in post_create
):
parts: list[str] = [str(p) for p in post_create]
command, args = parts[0], tuple(parts[1:])
elif isinstance(post_create, list) and post_create and all(
isinstance(part, str) for part in post_create
):
parts: list[str] = [str(p) for p in post_create]
command, args = parts[0], tuple(parts[1:])

Comment on lines +89 to +95
self._locks: dict[str, asyncio.Lock] = {}
self._locks_guard = asyncio.Lock()

async def _lock_for(self, project_id: str) -> asyncio.Lock:
"""Return the per-project provisioning lock (created once)."""
async with self._locks_guard:
return self._locks.setdefault(project_id, asyncio.Lock())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _locks dictionary grows indefinitely as new projects are provisioned, which can lead to a memory leak in long-lived deployments. Since locks are only needed during the active provisioning of a project, using a weakref.WeakValueDictionary would allow them to be garbage collected once no longer in use. Additionally, the _locks_guard is redundant for dictionary operations in a single-threaded asyncio event loop.

Suggested change
self._locks: dict[str, asyncio.Lock] = {}
self._locks_guard = asyncio.Lock()
async def _lock_for(self, project_id: str) -> asyncio.Lock:
"""Return the per-project provisioning lock (created once)."""
async with self._locks_guard:
return self._locks.setdefault(project_id, asyncio.Lock())
from weakref import WeakValueDictionary
self._locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
async def _lock_for(self, project_id: str) -> asyncio.Lock:
"""Return the per-project provisioning lock (created once)."""
# WeakValueDictionary.setdefault is atomic in this context.
return self._locks.setdefault(project_id, asyncio.Lock())

Comment on lines +719 to +720
if active_env is not None and active_env.env_additions:
effective_overrides = {**active_env.env_additions, **(env_overrides or {})}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Unlike SubprocessSandbox, DockerSandbox does not screen active_env.env_additions through the secret/dangerous-variable denylist. While Docker provides container isolation, environment variables like LD_PRELOAD or PYTHONPATH can still be used to hijack tool execution integrity within the sandbox. For consistency and security, these additions should be screened before being merged into the execution environment, similar to the implementation in SubprocessSandbox.execute.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/tools/sandbox/docker_sandbox.py (1)

716-745: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Include the active environment in the reuse identity.

Line 724 still resolves reusable containers by owner/project only, so _acquire_owner_handle() can hand back a keep-alive container that was created under a different active_env. In that case the new image_override at Lines 717 and 744 is ignored, and any env baked into the old container also stays in play, so this run can diverge from the newly provisioned declaration. Please key reuse on the environment identity as well (for example declaration hash / image ref) or force recreation when the active environment changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/tools/sandbox/docker_sandbox.py` around lines 716 - 745, The
reuse key for keep-alive containers currently only considers owner/project, so
_acquire_owner_handle() can return a container created under a different
active_env; update the reuse identity to include the active environment (e.g.
active_env id/hash or image_override) or force recreation when active_env
differs. Concretely, compute a combined reuse key that incorporates
get_active_sandbox_environment() (or active_env.image_override / a declaration
hash) alongside the existing owner_key/project id and pass that to
_acquire_owner_handle() / _create_keepalive_handle() (or detect mismatch and
recreate) so containers are only reused when the environment identity matches
the current active_env.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/core/enums.py`:
- Around line 486-502: The enums.py file is too large; extract domain-focused
enums into smaller modules and keep a stable re-export boundary: move the
EnvironmentType class into a new module (e.g., environment enums module) and any
related enum groups into their own modules, then import and re-export them from
the original enums module (using explicit from .environment import
EnvironmentType and include it in __all__) so existing imports keep working;
update any internal imports to point to the new modules if they reference the
moved symbols directly and run tests to ensure no import breakage.

In `@src/synthorg/engine/workspace/environment/devcontainer.py`:
- Around line 348-353: The code assumes non-empty postCreateCommand lists and
will IndexError when parts[0] is accessed; inside the branch that handles list
post_create (the variables parts, command, args), add a guard that rejects empty
lists (e.g., if not parts: raise a config/ValueError with a clear message that
postCreateCommand must not be empty) so callers get a config error instead of an
IndexError; then continue to set command = parts[0] and args = tuple(parts[1:]).
- Around line 206-213: The digest computation currently only includes
build.get("dockerfile") when present, missing the implicit default "Dockerfile";
update the logic in the method that computes the declaration hash (the block
using variables build, dockerfile, digest and calling self._resolve_within) so
that when build is a dict and build.get("dockerfile") is unset or None you
resolve and include path.parent / "Dockerfile" (or otherwise treat the default
"Dockerfile") by checking if that default file exists and, if so, updating
digest with its bytes before returning NotBlankStr(digest.hexdigest()).

In `@src/synthorg/engine/workspace/environment/image_builder.py`:
- Around line 87-89: Wrap the proc.kill() call in a ProcessLookupError
suppressor so a race where the process exits just before kill() doesn’t raise
and break the timeout/cancel recovery; specifically modify the block that
currently does "proc.kill(); with contextlib.suppress(ProcessLookupError): await
asyncio.shield(proc.wait())" to suppress ProcessLookupError around proc.kill()
(or use a try/except around proc.kill()) and keep the existing
contextlib.suppress around awaiting proc.wait() to preserve cancellation
handling for the proc variable.

In `@src/synthorg/observability/events/persistence.py`:
- Around line 328-349: The new project-environment event constants
(PERSISTENCE_PROJECT_ENVIRONMENT_SAVE_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_FETCHED,
PERSISTENCE_PROJECT_ENVIRONMENT_FETCH_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_LISTED,
PERSISTENCE_PROJECT_ENVIRONMENT_LIST_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_DELETE_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_DESERIALIZE_FAILED) should be moved out of the
large persistence.py into a domain-specific observability module (e.g., create
src/synthorg/observability/events/persistence_project.py or
src/synthorg/observability/events/project.py), export them from that new module,
and update any imports that referenced them to import from the new module;
ensure the original persistence.py stays under the 800-line cap and add an
__init__.py export if needed to preserve existing import paths.

In `@src/synthorg/persistence/postgres/project_environment_repo.py`:
- Line 104: Replace the "SELECT *" usage with an explicit column list matching
the ProjectEnvironment model fields when hydrating Pydantic models: update the
query string "SELECT * FROM project_environments WHERE project_id = %s" and the
other query at lines 159-160 to select only the mapped columns (e.g., id,
project_id, name, config, created_at, updated_at — use the actual field names
from the ProjectEnvironment model). Ensure both queries return the same ordered
column names expected by the code that constructs ProjectEnvironment so
deserialization does not break with extra="forbid".
- Around line 35-41: _row_to_environment currently builds a dict from the DB row
and calls ProjectEnvironment.model_validate directly, bypassing the project's
boundary-typing rule; replace that final validation with the boundary parser by
calling parse_typed(ProjectEnvironment, data) (after you still coerce
environment_type with EnvironmentType(...) and timestamps via
coerce_row_timestamp for "provisioned_at" and "updated_at") so the external DB
dict is validated through parse_typed rather than model_validate.

In
`@src/synthorg/persistence/postgres/revisions/20260521000002_project_environments.sql`:
- Around line 11-20: Add DB-level CHECKs on the project_environments table to
forbid empty strings for identifier/hash columns and to ensure timestamps are
ordered: update the CREATE TABLE for project_environments to add CHECKs such as
requiring trim(project_id) <> '' and trim(declaration_hash) <> '' (and any other
identifier columns you want nonblank) and a CHECK (updated_at >= provisioned_at)
to prevent updated_at preceding provisioned_at; mirror the same constraints in
the shared schema definition (src/synthorg/persistence/postgres/schema.sql) so
migrations and drift checks stay consistent.

In `@src/synthorg/workers/execution_service.py`:
- Around line 482-487: The resume path is missing the sandbox context: wrap the
call to resume_parked_run inside the active_sandbox_environment context manager
so resumed executions get the same provisioned image/env overrides as
execute_once; specifically, in the _resume_parked method move or add a with
active_sandbox_environment(active_env): block around the call to
self._engine.resume_parked_run(...) (or resume_parked_run if delegated), passing
the same active_env used by execute_once to ensure resumed runs use the sandbox
environment.

In `@tests/conformance/persistence/test_project_environment_repository.py`:
- Around line 28-29: The default image_ref generation uses a hardcoded "proj-1"
instead of the helper's project_id; update the branch that sets image_ref when
image_ref is None and EnvironmentType.DEVCONTAINER to incorporate the provided
project_id (e.g., build the string using project_id rather than "proj-1") so the
default becomes deterministic for different project ids (use the existing
image_ref variable, EnvironmentType.DEVCONTAINER branch, and the helper's
project_id parameter to construct the default).

In `@tests/integration/engine/workspace/test_environment_acceptance.py`:
- Around line 183-185: In _plant_deliverable, ensure git commands don't silently
fail by asserting their success: when calling _run("git", "add", "-A",
cwd=workspace) and _run("git", "commit", "-m", "Deliverable project",
cwd=workspace) check the returned result (or enable check=True) and raise/assert
on non-zero exit, including the captured stdout/stderr in the error message so
failures in _run are surfaced immediately; update the calls in
_plant_deliverable to validate return codes and include command output for
debugging.

In `@tests/unit/engine/workspace/environment/test_manifest_strategy.py`:
- Around line 144-157: The test_traversal_lockfile_not_read currently always
mutates outside_secret.txt which only validates the "../outside_secret.txt"
input; for the "sub/../../escape" case you must mutate the actual resolved
escaped target instead so the assertion verifies the strategy.declaration_hash
ignores the resolved path bytes. Update the test_traversal_lockfile_not_read
(using _strategy, _write_manifest, and declaration_hash) to compute the resolved
path for the given escape string (resolve tmp_path / escape) and write the
changed content to that resolved file for the second parameter instead of always
writing to outside_secret.txt.

---

Outside diff comments:
In `@src/synthorg/tools/sandbox/docker_sandbox.py`:
- Around line 716-745: The reuse key for keep-alive containers currently only
considers owner/project, so _acquire_owner_handle() can return a container
created under a different active_env; update the reuse identity to include the
active environment (e.g. active_env id/hash or image_override) or force
recreation when active_env differs. Concretely, compute a combined reuse key
that incorporates get_active_sandbox_environment() (or active_env.image_override
/ a declaration hash) alongside the existing owner_key/project id and pass that
to _acquire_owner_handle() / _create_keepalive_handle() (or detect mismatch and
recreate) so containers are only reused when the environment identity matches
the current active_env.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: f65effcd-7768-49ff-b716-88487d26b13e

📥 Commits

Reviewing files that changed from the base of the PR and between 9b98312 and 39af44d.

📒 Files selected for processing (64)
  • CLAUDE.md
  • docs/DESIGN_SPEC.md
  • docs/design/coordination.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/reference/pluggable-subsystems.md
  • src/synthorg/api/app.py
  • src/synthorg/api/state.py
  • src/synthorg/core/enums.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/persistence/postgres/revisions/20260521000002_project_environments.sql
  • src/synthorg/persistence/postgres/schema.sql
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/persistence/sqlite/revisions/20260521000002_project_environments.sql
  • src/synthorg/persistence/sqlite/schema.sql
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/workers/runtime_builder.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/api/fakes.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/workers/test_execution_service_environment.py
  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse components from web/src/components/ui/ and use design tokens only in web dashboard; see web/CLAUDE.md

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provide type hints on public functions; enforce with mypy strict; use Google-style docstrings; limit line length to 88 characters; keep functions <50 lines; keep files <800 lines

Files:

  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/observability/events/persistence.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/persistence/sqlite/backend.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; 30s global timeout; 80% min coverage; xdist -n 8 --dist=loadfile auto-applied via pyproject; Windows unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); override in subprocess tests

Use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; never use bare MagicMock at typed boundaries (blocked by check_mock_spec.py); import from tests._shared

Hypothesis: use 10 deterministic CI examples; failures are real bugs (fix + add @example(...) assertion); never skip/xfail flaky tests; fix fundamentally; use asyncio.Event().wait() instead of sleep(large)

Files:

  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/workers/test_execution_service_environment.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/api/fakes.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/integration/engine/workspace/test_environment_acceptance.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/workers/test_execution_service_environment.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/api/fakes.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No hardcoded numeric values; place numerics in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal); enforced by check_no_magic_numbers.py

Comments must explain WHY only; no reviewer citations, issue back-references, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py

Do not use from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 exception syntax without parens unless binding (except A, B: no parens)

Define errors with <Domain><Condition>Error pattern inheriting from DomainError, never directly from Exception/RuntimeError; enforced by check_domain_error_hierarchy.py

Use Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate: check_frozen_model_extra_forbid.py); @computed_field is auto-exempt; use # lint-allow: frozen-extra-forbid -- <reason> for exceptions; use @computed_field for derived fields; use NotBlankStr for identifiers

Place args models at every system boundary; use parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

Use immutability: model_copy(update=...) for Pydantic models or copy.deepcopy() for other types; apply deepcopy at system boundaries

Use asyncio.TaskGroup for fan-out/fan-in async patterns; ensure helpers catch Exception and re-raise MemoryError/RecursionError

Implement Clock seam: clock: Clock | None = None parameter; inject FakeClock in tests; services own _lifecycle_lock; mark timed-out stops unrestartable

Wrap untrusted content using wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML content (SEC-1)

Use from synthorg.observability import get_logger with variable named logger; never import logging or use print() in app code

Use event names from observability.events.<domain> constants; log v...

Files:

  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/backend.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/backend.py
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returns RuntimeServices pair (execution service + coordinator); AgentEngineExecutionService + coordinator vs. NoProviderExecutionService + None as empty-company backstop; install via AppState seams; empty-company rejects task creation (4014) and /coordinate 503s

Files:

  • src/synthorg/workers/environment_runner.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture/nested container diagrams; use mermaid for flowcharts/sequence/pipeline diagrams; use Markdown tables for tabular data; pin D2 CLI to v0.7.1 in CI

Files:

  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/design/coordination.md
  • docs/reference/errors.md
  • docs/DESIGN_SPEC.md
src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry is opt-in and off by default; every event property must be in _ALLOWED_PROPERTIES

Files:

  • src/synthorg/observability/events/persistence.py
  • src/synthorg/observability/events/workspace.py
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement repository CRUD using: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples

Use parse_iso_utc / format_iso_utc from persistence._shared for datetime persistence (reject naive); use normalize_utc for already-typed values

Files:

  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/persistence/sqlite/backend.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Two-phase API startup lifecycle: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend; construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider wired unconditionally

On-startup ordering invariants: SettingsService auto-wire precedes WorkflowExecutionObserver registration; OntologyService wires after persistence.connect(); cost-dial services wire via _try_wire_cost_dial AFTER persistence (best-effort, idempotent, logs BUDGET_FORECAST_UNAVAILABLE); knowledge substrate wires via _wire_knowledge_engine (best-effort, gated on has_persistence AND has_memory_backend); EnvironmentService wires in _install_runtime_services behind has_persistence

Files:

  • src/synthorg/api/app.py
  • src/synthorg/api/state.py
tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py

Files:

  • tests/conformance/persistence/test_project_environment_repository.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Read design specification in `docs/design/` before implementing; deviations require approval
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Present every implementation plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: No region/currency/locale should be privileged; use metric units and British English
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Only `src/synthorg/persistence/` may import sqlite/psycopg or emit raw SQL; new repositories inherit generic categories from `persistence/_generics.py` and use `runtime_checkable` protocols
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Every convention PR must ship its enforcement gate via `check_*` script and PreToolUse hook registration
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Configuration must follow precedence: DB > env > code default via `SettingsService`/`ConfigResolver` (Cat-1), or env > code default (Cat-2, `read_only_post_init`); Cat-3 bootstrap secrets are pure env; use `settings.bootstrap_resolver.resolve_init_value` for pre-init Cat-2 reads
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; do not use `gh pr create` (blocked by `check_no_pr_create.sh`); after PR: use `/aurelio-review-pr` for external feedback
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Git commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced; signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`); branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Pre-commit/pre-push hooks defined in `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Use squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: Use `gh issue list` via Bash for GitHub queries, NOT MCP `list_issues`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: After every squash merge, run `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T06:44:44.817Z
Learning: CLI is Docker-only (init/start/stop/status); new features go in dashboard + REST API, not CLI
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/observability/events/persistence.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/persistence/sqlite/backend.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/observability/events/persistence.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • src/synthorg/engine/workspace/environment/nix.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/persistence/sqlite/backend.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/api/app.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/api/state.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/backend.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/reference/pluggable-subsystems.md
  • CLAUDE.md
  • docs/design/persistence.md
  • docs/design/coordination.md
  • docs/reference/errors.md
  • docs/DESIGN_SPEC.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/reference/pluggable-subsystems.md
  • CLAUDE.md
  • docs/design/persistence.md
  • docs/design/coordination.md
  • docs/reference/errors.md
  • docs/DESIGN_SPEC.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/design/coordination.md
  • docs/reference/errors.md
  • docs/DESIGN_SPEC.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/design/coordination.md
  • docs/reference/errors.md
  • docs/DESIGN_SPEC.md
🔇 Additional comments (53)
src/synthorg/core/error_taxonomy.py (1)

119-119: LGTM!

Also applies to: 173-175

src/synthorg/core/project_environment.py (1)

1-78: LGTM!

src/synthorg/engine/errors.py (1)

301-353: LGTM!

src/synthorg/observability/events/workspace.py (1)

68-82: LGTM!

src/synthorg/engine/workspace/environment/__init__.py (1)

1-67: LGTM!

src/synthorg/engine/workspace/environment/config.py (1)

1-92: LGTM!

src/synthorg/engine/workspace/environment/templates.py (1)

1-57: LGTM!

src/synthorg/engine/workspace/environment/protocol.py (1)

28-201: LGTM!

src/synthorg/engine/workspace/environment/manifest.py (1)

48-259: LGTM!

src/synthorg/engine/workspace/environment/nix.py (1)

47-157: LGTM!

src/synthorg/engine/workspace/environment/hash_cache.py (1)

22-43: LGTM!

src/synthorg/engine/workspace/environment/committer.py (1)

24-92: LGTM!

src/synthorg/engine/workspace/environment/factory.py (4)

1-28: LGTM!


30-68: LGTM!


70-77: LGTM!


80-99: LGTM!

CLAUDE.md (1)

81-81: LGTM!

docs/DESIGN_SPEC.md (1)

24-24: LGTM!

docs/design/coordination.md (1)

431-478: LGTM!

docs/design/persistence.md (1)

85-85: LGTM!

docs/reference/errors.md (1)

105-106: LGTM!

Also applies to: 162-171

docs/reference/pluggable-subsystems.md (1)

179-188: LGTM!

web/src/api/types/error-codes.gen.ts (1)

67-67: LGTM!

Also applies to: 113-115

web/src/api/types/openapi.gen.ts (1)

8048-8048: LGTM!

tests/unit/api/fakes.py (1)

633-662: LGTM!

tests/unit/api/fakes_backend.py (1)

42-42: LGTM!

Also applies to: 605-605, 727-730

tests/unit/core/test_project_environment_model.py (1)

15-99: LGTM!

tests/unit/engine/test_environment_errors.py (1)

19-62: LGTM!

tests/unit/engine/workspace/environment/test_devcontainer_strategy.py (1)

32-301: LGTM!

tests/unit/engine/workspace/environment/test_env_factory.py (1)

22-46: LGTM!

tests/unit/engine/workspace/environment/test_env_service.py (1)

1-285: LGTM!

tests/unit/engine/workspace/environment/test_hash_cache.py (1)

1-65: LGTM!

tests/unit/engine/workspace/environment/test_nix_strategy.py (1)

1-104: LGTM!

tests/unit/persistence/sqlite/test_project_environment_repo.py (1)

1-115: LGTM!

tests/unit/persistence/test_protocol.py (1)

44-46: LGTM!

Also applies to: 82-83, 617-637, 1139-1145, 1459-1470

tests/unit/tools/sandbox/test_active_environment.py (1)

1-100: LGTM!

tests/unit/tools/sandbox/test_subprocess_sandbox.py (1)

100-115: LGTM!

tests/unit/workers/test_environment_runner.py (1)

1-62: LGTM!

tests/unit/workers/test_execution_service_environment.py (1)

1-124: LGTM!

src/synthorg/persistence/project_environment_protocol.py (1)

13-88: LGTM!

src/synthorg/persistence/protocol.py (1)

111-113: LGTM!

Also applies to: 428-431

src/synthorg/persistence/sqlite/_backend_accessors.py (1)

89-91: LGTM!

Also applies to: 155-155, 324-329

src/synthorg/persistence/sqlite/backend.py (1)

124-126: LGTM!

Also applies to: 219-219, 292-292, 455-458

src/synthorg/persistence/sqlite/project_environment_repo.py (1)

35-225: LGTM!

src/synthorg/persistence/sqlite/revisions/20260521000002_project_environments.sql (1)

11-23: LGTM!

src/synthorg/persistence/sqlite/schema.sql (1)

455-468: LGTM!

src/synthorg/persistence/postgres/backend.py (1)

127-129: LGTM!

Also applies to: 251-253, 310-310, 379-379, 445-445, 709-714

src/synthorg/engine/workspace/environment/service.py (1)

1-236: LGTM!

src/synthorg/workers/environment_runner.py (1)

1-65: LGTM!

src/synthorg/workers/runtime_builder.py (1)

31-32: LGTM!

Also applies to: 55-58, 677-695

src/synthorg/api/app.py (1)

1250-1278: LGTM!

src/synthorg/api/state.py (1)

165-166: LGTM!

Also applies to: 253-254, 504-505, 1060-1076

src/synthorg/tools/sandbox/active_environment.py (1)

1-66: LGTM!

Comment on lines +486 to +502
class EnvironmentType(StrEnum):
"""Discriminator selecting how a project declares its dev environment.

``MANIFEST`` is the safe default: a backend-agnostic bootstrap manifest
(committed lockfiles plus ordered setup commands) that provisions into
the mounted workspace and runs in both the subprocess and Docker
sandboxes, and emits a stock ``bootstrap.sh`` so a fresh clone is
reproducible without the product present. ``DEVCONTAINER`` builds a
sealed Docker image from ``.devcontainer/devcontainer.json`` (Docker
backend only). ``NIX`` provisions a hermetic environment from a
``flake.nix`` via ``nix develop``.
"""

MANIFEST = "manifest"
DEVCONTAINER = "devcontainer"
NIX = "nix"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Split this module to comply with the Python file-size limit.

This change adds to a module that is already over the limit (src/synthorg/core/enums.py is 896 lines). Please split enums into domain-focused modules and re-export from a stable boundary.

As per coding guidelines: **/*.py: ... keep files <800 lines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/core/enums.py` around lines 486 - 502, The enums.py file is too
large; extract domain-focused enums into smaller modules and keep a stable
re-export boundary: move the EnvironmentType class into a new module (e.g.,
environment enums module) and any related enum groups into their own modules,
then import and re-export them from the original enums module (using explicit
from .environment import EnvironmentType and include it in __all__) so existing
imports keep working; update any internal imports to point to the new modules if
they reference the moved symbols directly and run tests to ensure no import
breakage.

Comment thread src/synthorg/engine/workspace/environment/devcontainer.py
Comment thread src/synthorg/engine/workspace/environment/devcontainer.py
Comment thread src/synthorg/engine/workspace/environment/image_builder.py Outdated
Comment on lines +328 to +349
# Project environment events
PERSISTENCE_PROJECT_ENVIRONMENT_SAVE_FAILED: Final[str] = (
"persistence.project_environment.save_failed"
)
PERSISTENCE_PROJECT_ENVIRONMENT_FETCHED: Final[str] = (
"persistence.project_environment.fetched"
)
PERSISTENCE_PROJECT_ENVIRONMENT_FETCH_FAILED: Final[str] = (
"persistence.project_environment.fetch_failed"
)
PERSISTENCE_PROJECT_ENVIRONMENT_LISTED: Final[str] = (
"persistence.project_environment.listed"
)
PERSISTENCE_PROJECT_ENVIRONMENT_LIST_FAILED: Final[str] = (
"persistence.project_environment.list_failed"
)
PERSISTENCE_PROJECT_ENVIRONMENT_DELETE_FAILED: Final[str] = (
"persistence.project_environment.delete_failed"
)
PERSISTENCE_PROJECT_ENVIRONMENT_DESERIALIZE_FAILED: Final[str] = (
"persistence.project_environment.deserialize_failed"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Move these new event constants into split observability modules.

At Line 328+, this addition increases a file already above the repository limit (src/synthorg/observability/events/persistence.py is 818 lines). Please split persistence events by domain (for example: project, workflow, security, etc.) and keep this file under the cap.

As per coding guidelines: **/*.py: ... keep files <800 lines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/observability/events/persistence.py` around lines 328 - 349, The
new project-environment event constants
(PERSISTENCE_PROJECT_ENVIRONMENT_SAVE_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_FETCHED,
PERSISTENCE_PROJECT_ENVIRONMENT_FETCH_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_LISTED,
PERSISTENCE_PROJECT_ENVIRONMENT_LIST_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_DELETE_FAILED,
PERSISTENCE_PROJECT_ENVIRONMENT_DESERIALIZE_FAILED) should be moved out of the
large persistence.py into a domain-specific observability module (e.g., create
src/synthorg/observability/events/persistence_project.py or
src/synthorg/observability/events/project.py), export them from that new module,
and update any imports that referenced them to import from the new module;
ensure the original persistence.py stays under the 800-line cap and add an
__init__.py export if needed to preserve existing import paths.

Comment thread src/synthorg/workers/execution_service.py
Comment thread tests/conformance/persistence/test_project_environment_repository.py Outdated
Comment thread tests/integration/engine/workspace/test_environment_acceptance.py Outdated
Comment thread tests/unit/engine/workspace/environment/test_manifest_strategy.py Outdated
Aureliolo added 12 commits May 22, 2026 09:46
Address pre-PR review findings on the per-project environment subsystem:

- Enforce model invariants: image_ref required only for DEVCONTAINER,
  SHA-256 declaration_hash format, provisioned_at not after updated_at,
  BuildOutcome timed_out paired with exit_code -1 (ProjectEnvironment and
  ProvisionedEnvironment).
- Guard manifest lockfile paths against workspace escape before hashing.
- Retry transient docker image builds via GeneralRetryHandler (timeout,
  registry, network markers); deterministic build failures are not retried.
- Screen declaration-sourced env additions through the subprocess denylist
  so a declared secret or exec-hijack var cannot bypass the filter.
- Reap the build subprocess on cancellation; include bounded command
  output in setup-failure logs; warn when env provisioning is skipped for
  a missing workspace; log the row-persisted state transition.
- Tests: concurrent provision, cancellation, build retry and exhaustion,
  lockfile traversal, env screening, model validators; slow marker on the
  acceptance test; concrete fake repo plus clear() for backend parity.
- Docs: error-code reference (4017, 4018, 8016-8025), CLAUDE.md wiring
  invariant, persistence protocol inventory, DESIGN_SPEC and coordination.

Pre-reviewed by 23 agents, all valid findings addressed.
Mirror the unit helper so the dual-backend conformance _environment()
auto-sets an image_ref for DEVCONTAINER, keeping it robust against the
new ProjectEnvironment model invariant if a future DEVCONTAINER-default
caller is added.
The internal retry sentinel must not inherit plain Exception (domain-error
hierarchy gate). Subclass EnvironmentDockerBuildError so it stays a typed
domain error while still serving as the retry-handler control signal.
Devcontainer: include implicit default Dockerfile in declaration hash; reject empty postCreateCommand lists instead of IndexError.

EnvironmentService: WeakValueDictionary for per-project locks (bounded memory); drop redundant guard.

image_builder: suppress ProcessLookupError around proc.kill() kill-race.

Persistence: explicit column lists instead of SELECT * (both backends); DB-level CHECK constraints (nonblank project_id/declaration_hash, updated_at >= provisioned_at) in revisions + schema.sql for both backends.

execution_service: scope resumed parked runs under the provisioned active_sandbox_environment so pause/resume stays reproducible.

Sandbox: screen declaration-sourced env additions through the secret/loader-injection denylist in DockerSandbox (parity with SubprocessSandbox); include the active environment image identity in the container reuse key so a warm container is never reused under a different image.

Tests: deterministic image_ref from project_id; assert git add/commit success; mutate the resolved escaped path in the traversal test.

Deferred to #2040 (not gate-enforced, scope-unrelated): split core/enums.py and observability/events/persistence.py into packages.
@Aureliolo Aureliolo force-pushed the feat/1994-reproducible-project-environments branch from 39af44d to 593e271 Compare May 22, 2026 08:38
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 22, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing feat/1994-reproducible-project-environments (469997f) with main (9b98312)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/workers/runtime_builder.py (1)

731-740: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the resolved code-execution backend for owner release.

Line 736 still hardwires the docker backend, while Lines 726-730 resolve the effective backend for ToolCategory.CODE_EXECUTION. If code execution is mapped to subprocess, release can hit the wrong backend and skip lifecycle owner cleanup.

Suggested fix
     worker_execution_service = AgentEngineExecutionService(
         engine=engine,
         task_engine=app_state.task_engine,
         agent_registry=app_state.agent_registry,
         autonomy_resolver=autonomy_resolver,
-        sandbox_backend=sandbox_backends.get("docker"),
+        sandbox_backend=environment_runner_backend,
         lifecycle_strategy_kind=(app_state.config.sandboxing.docker.lifecycle.strategy),
         project_workspace_service=app_state.project_workspace_service,
         environment_service=app_state.environment_service,
         environment_runner_backend=environment_runner_backend,
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/workers/runtime_builder.py` around lines 731 - 740, The
AgentEngineExecutionService is being passed a hardcoded docker sandbox backend
(sandbox_backends.get("docker")) causing owner-release logic to use the wrong
backend; update the constructor call in runtime_builder.py to pass the resolved
code-execution backend variable (the value computed when resolving
ToolCategory.CODE_EXECUTION between lines 726-730) into the sandbox_backend
parameter and also use that resolved backend's lifecycle strategy instead of
app_state.config.sandboxing.docker.lifecycle.strategy so lifecycle owner cleanup
runs against the actual backend implementation (reference
AgentEngineExecutionService, sandbox_backend, lifecycle_strategy_kind, and the
ToolCategory.CODE_EXECUTION resolution logic).
♻️ Duplicate comments (2)
src/synthorg/persistence/postgres/project_environment_repo.py (1)

35-41: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use parse_typed() at the database dict-ingestion boundary.

Line [41] validates external dict_row input via ProjectEnvironment.model_validate(data). Use boundary parsing to match the repository rule and gate expectations.

Proposed fix
 from synthorg.core.project_environment import ProjectEnvironment
 from synthorg.core.types import NotBlankStr  # noqa: TC001
+from synthorg.core.validation import parse_typed
@@
-    return ProjectEnvironment.model_validate(data)
+    return parse_typed(ProjectEnvironment, data)

As per coding guidelines: "Args models required at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/persistence/postgres/project_environment_repo.py` around lines
35 - 41, The _row_to_environment function is accepting external db dicts and
currently calls ProjectEnvironment.model_validate(data); replace that boundary
validation with the project's typed parser by calling parse_typed for the
ProjectEnvironment at the dict-ingestion boundary, i.e. after converting row to
dict and coercing environment_type/provisioned_at/updated_at (keep
EnvironmentType(...) and coerce_row_timestamp calls) invoke
parse_typed(ProjectEnvironment, data) instead of
ProjectEnvironment.model_validate to enforce the repository boundary rule.
src/synthorg/engine/workspace/environment/image_builder.py (1)

89-91: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep proc.wait() outside the kill() suppress block.

Line [89]-Line [91] suppress ProcessLookupError for both operations, so when kill() races and raises, wait() is skipped. Keep suppression on kill() only, then always await reaping.

Proposed fix
-        with contextlib.suppress(ProcessLookupError):
-            proc.kill()
-            await asyncio.shield(proc.wait())
+        with contextlib.suppress(ProcessLookupError):
+            proc.kill()
+        with contextlib.suppress(ProcessLookupError):
+            await asyncio.shield(proc.wait())
#!/bin/bash
# Verify the cleanup block structure in image_builder.py
rg -n -A6 -B4 '_kill_and_reap|proc.kill\(|asyncio\.shield\(proc\.wait\(\)\)' src/synthorg/engine/workspace/environment/image_builder.py
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/engine/workspace/environment/image_builder.py` around lines 89 -
91, The current cleanup block wraps both proc.kill() and await
asyncio.shield(proc.wait()) inside contextlib.suppress(ProcessLookupError), so
if proc.kill() races and raises the exception the awaited reap is skipped;
change the code to suppress ProcessLookupError only around proc.kill() (e.g.,
use contextlib.suppress(ProcessLookupError): proc.kill()), then unconditionally
await asyncio.shield(proc.wait()) afterwards to ensure the process is always
reaped; update the block where proc.kill(), asyncio.shield(proc.wait()), and
proc are referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/engine/workspace/environment/nix.py`:
- Around line 89-92: Add a logging call that records the configuration error
before raising EnvironmentConfigError: ensure the module has a logger (e.g.
logger = logging.getLogger(__name__)) and, in the flake existence check where
msg = f"nix flake {_FLAKE_FILENAME!r} not found" and in the other failure path
around lines 119-122, call logger.error(msg) (or logger.warning(msg) if
preferred) immediately before raising EnvironmentConfigError(msg); keep the same
msg variable so the log and raised error stay consistent and include any
available context/exception info when present.
- Around line 93-97: The declaration hash currently concatenates raw bytes from
flake (variable flake) and the lock file (lock / _FLAKE_LOCK_FILENAME) into
digest in sequence, which can collide across different file-boundary splits;
modify the hashing in the function that returns NotBlankStr(digest.hexdigest())
so you prefix or frame each input before updating the digest (for example
include a distinct separator token and/or the filename and/or a fixed-size
length prefix for flake.read_bytes() and lock.read_bytes()) ensuring you call
digest.update with framed input for flake and again for lock only if
lock.is_file(), so the two inputs are unambiguously delimited in the hash.

In `@tests/integration/engine/workspace/test_environment_acceptance.py`:
- Around line 232-234: The test currently assigns before = _run("git",
"rev-list", "--count", "HEAD", cwd=workspace).stdout.strip() and then
int(before) without verifying the _run result; change this to capture the result
(e.g., result = _run(...)) and assert result.returncode == 0 (or check
result.success) before converting stdout to int, failing with a clear message
that includes result.stderr; update the variable usage so
int(result.stdout.strip()) is only called after the assertion to avoid confusing
ValueError exceptions.

In `@tests/unit/engine/workspace/environment/test_devcontainer_strategy.py`:
- Line 168: The test writes a Dockerfile without an explicit encoding; update
the (tmp_path / ".devcontainer" / "Dockerfile").write_text(...) call in
test_devcontainer_strategy.py to include encoding="utf-8" to match the file's
established pattern (other write_text calls at Lines ~105/149/224/228/241); keep
the same content string ("FROM x\n") but add the encoding parameter.

---

Outside diff comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 731-740: The AgentEngineExecutionService is being passed a
hardcoded docker sandbox backend (sandbox_backends.get("docker")) causing
owner-release logic to use the wrong backend; update the constructor call in
runtime_builder.py to pass the resolved code-execution backend variable (the
value computed when resolving ToolCategory.CODE_EXECUTION between lines 726-730)
into the sandbox_backend parameter and also use that resolved backend's
lifecycle strategy instead of
app_state.config.sandboxing.docker.lifecycle.strategy so lifecycle owner cleanup
runs against the actual backend implementation (reference
AgentEngineExecutionService, sandbox_backend, lifecycle_strategy_kind, and the
ToolCategory.CODE_EXECUTION resolution logic).

---

Duplicate comments:
In `@src/synthorg/engine/workspace/environment/image_builder.py`:
- Around line 89-91: The current cleanup block wraps both proc.kill() and await
asyncio.shield(proc.wait()) inside contextlib.suppress(ProcessLookupError), so
if proc.kill() races and raises the exception the awaited reap is skipped;
change the code to suppress ProcessLookupError only around proc.kill() (e.g.,
use contextlib.suppress(ProcessLookupError): proc.kill()), then unconditionally
await asyncio.shield(proc.wait()) afterwards to ensure the process is always
reaped; update the block where proc.kill(), asyncio.shield(proc.wait()), and
proc are referenced.

In `@src/synthorg/persistence/postgres/project_environment_repo.py`:
- Around line 35-41: The _row_to_environment function is accepting external db
dicts and currently calls ProjectEnvironment.model_validate(data); replace that
boundary validation with the project's typed parser by calling parse_typed for
the ProjectEnvironment at the dict-ingestion boundary, i.e. after converting row
to dict and coercing environment_type/provisioned_at/updated_at (keep
EnvironmentType(...) and coerce_row_timestamp calls) invoke
parse_typed(ProjectEnvironment, data) instead of
ProjectEnvironment.model_validate to enforce the repository boundary rule.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7a9d554-10c2-4955-b843-cb12d881f70c

📥 Commits

Reviewing files that changed from the base of the PR and between 39af44d and 593e271.

📒 Files selected for processing (68)
  • CLAUDE.md
  • docs/DESIGN_SPEC.md
  • docs/design/coordination.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/reference/pluggable-subsystems.md
  • src/synthorg/api/app.py
  • src/synthorg/api/state.py
  • src/synthorg/core/enums.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/persistence/postgres/revisions/20260521000002_project_environments.sql
  • src/synthorg/persistence/postgres/schema.sql
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/persistence/sqlite/revisions/20260521000002_project_environments.sql
  • src/synthorg/persistence/sqlite/schema.sql
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/tools/sandbox/docker_config.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/tools/sandbox/protocol.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/workers/runtime_builder.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/api/fakes.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/workers/test_execution_service.py
  • tests/unit/workers/test_execution_service_environment.py
  • web/src/api/types/error-codes.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Test Unit
  • GitHub Check: Test E2E
  • GitHub Check: Test Integration
  • GitHub Check: Dashboard Test
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
**/*.{md,d2,mermaid}

📄 CodeRabbit inference engine (CLAUDE.md)

Diagrams: use d2 for architecture / nested containers, mermaid for flowcharts / sequence / pipelines; markdown tables for tabular data; D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/DESIGN_SPEC.md
  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/design/coordination.md
  • CLAUDE.md
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/error-codes.gen.ts
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Always read docs/design/ page before implementing; deviations require approval per DESIGN_SPEC.md

Present every plan for accept/deny before coding

No region/currency/locale privilege; use metric units; British English per docs/reference/regional-defaults.md

Every convention PR must ship its enforcement gate per docs/reference/convention-gates.md

No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal); enforced by scripts/check_no_magic_numbers.py

Post-implementation + pre-PR: branch + commit + push (no auto-PR); use /pre-pr-review (gh pr create blocked by scripts/check_no_pr_create.sh); after PR use /aurelio-review-pr for external feedback; fix EVERYTHING valid, no deferring

Comments explain WHY only; no reviewer citations / issue back-refs / migration framing; enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py

Files:

  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/errors.py
  • tests/unit/workers/test_execution_service.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/api/state.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/persistence/postgres/backend.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/persistence/sqlite/backend.py
  • tests/unit/api/fakes_backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/persistence/test_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test regression: timeout/slow failures indicate source-code regression; never edit tests/baselines/unit_timing.json or any scripts/*_baseline.{txt,json} / scripts/_*_baseline.py; both families PreToolUse-blocked; per-invocation bypass for gate baselines: ALLOW_BASELINE_GROWTH=1 git commit ... (requires explicit user approval)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min; xdist -n 8 --dist=loadfile auto-applied via pyproject addopts (loadfile prevents 3.14+Windows ProactorEventLoop leak)

Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back

Test doubles ladder in conventions.md section 12.1: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return) blocked by scripts/check_mock_spec.py

FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001 (allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/)

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail for flaky tests; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/api/fakes.py
  • tests/unit/workers/test_execution_service.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_execution_service_environment.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/workspace/environment/test_env_factory.py
  • tests/unit/api/fakes.py
  • tests/unit/workers/test_execution_service.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • tests/unit/workers/test_environment_runner.py
  • tests/unit/engine/test_environment_errors.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_execution_service_environment.py
  • tests/unit/persistence/test_protocol.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • tests/unit/core/test_project_environment_model.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No from __future__ import annotations (Python 3.14 has PEP 649); PEP 758 except: except A, B: requires parens unless binding

Type hints required on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Errors: use <Domain><Condition>Error from DomainError, never inherit directly from Exception/RuntimeError/etc; enforced by check_domain_error_hierarchy.py

Pydantic v2: frozen + extra="forbid" on every frozen model (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore"); @computed_field for derived; NotBlankStr for identifiers

Args models required at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception and re-raise MemoryError/RecursionError

Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable; lifecycle defined in docs/reference/lifecycle-sync.md

Untrusted content (SEC-1): wrap via wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML per docs/reference/sec-prompt-safety.md

Logging: import from synthorg.observability as logger variable; never import logging / print() in app code; event names from observability.events.<domain> constants; structured kwargs; error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED AFTER persistence write per docs/reference/sec-prompt-safety.md

Secret-log redaction (SEC-1): never use error=str(exc) or interpolate {exc}; use error_type=type(exc).__name__ + error=safe_error_description(exc); never exc_info=True; OTel: forbid `span.rec...

Files:

  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • src/synthorg/api/state.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • src/synthorg/api/state.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Design specification docs in docs/design/ must be read before implementation; reference DESIGN_SPEC.md for approval process

Files:

  • docs/design/persistence.md
  • docs/design/coordination.md
src/synthorg/workers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects ONE provider-present switch, returns RuntimeServices pair built from SINGLE shared boot AgentEngine: AgentEngineExecutionService + coordinator OR NoProviderExecutionService + None backstop; wired via AppState.worker_execution_service / AppState.coordinator; empty-company rejects task creation (4014 AgentRuntimeNotConfiguredError) and /coordinate 503s; swap_worker_execution_service / swap_coordinator / swap_provider_registry hold lock

Files:

  • src/synthorg/workers/environment_runner.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/workers/execution_service.py
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py (SingletonRepository, IdKeyedRepository, FilteredQueryRepository, AppendOnlyRepository, StatefulRepository, MVCCRepository); bespoke methods permitted only under ADR-0001 D7 for real performance optimization or domain invariant enforcement; protocols stay @runtime_checkable

Repository CRUD method names: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples

Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed

Files:

  • src/synthorg/persistence/protocol.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: construction phase (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend; agent_registry must build BEFORE auto_wire_meetings; tunnel_provider wired unconditionally; on-startup ordering: SettingsService precedes WorkflowExecutionObserver; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial (best-effort, logs BUDGET_FORECAST_UNAVAILABLE); knowledge substrate via _wire_knowledge_engine (gated on has_persistence AND has_memory_backend, logs KNOWLEDGE_SUBSTRATE_UNAVAILABLE); EnvironmentService behind has_persistence threaded into AgentEngineExecutionService via build_runtime_services

Files:

  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests consume backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py

Files:

  • tests/conformance/persistence/test_project_environment_repository.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T08:39:57.125Z
Learning: Pre-commit/pre-push hooks: `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T08:39:57.125Z
Learning: After every squash merge → `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-22T08:39:57.125Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/DESIGN_SPEC.md
  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/design/coordination.md
  • CLAUDE.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/DESIGN_SPEC.md
  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/design/coordination.md
  • CLAUDE.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/DESIGN_SPEC.md
  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/design/coordination.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/DESIGN_SPEC.md
  • docs/reference/pluggable-subsystems.md
  • docs/design/persistence.md
  • docs/reference/errors.md
  • docs/design/coordination.md
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/errors.py
  • tests/unit/workers/test_execution_service.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/api/state.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/persistence/postgres/backend.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/persistence/sqlite/backend.py
  • tests/unit/api/fakes_backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/persistence/test_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

  • tests/unit/engine/workspace/environment/test_env_factory.py
  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • tests/unit/api/fakes.py
  • src/synthorg/engine/errors.py
  • tests/unit/workers/test_execution_service.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • tests/unit/tools/sandbox/test_active_environment.py
  • src/synthorg/api/state.py
  • tests/unit/engine/workspace/environment/test_hash_cache.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • tests/unit/workers/test_environment_runner.py
  • src/synthorg/persistence/postgres/backend.py
  • tests/unit/engine/test_environment_errors.py
  • src/synthorg/persistence/sqlite/backend.py
  • tests/unit/api/fakes_backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • tests/unit/tools/sandbox/test_subprocess_sandbox.py
  • tests/unit/workers/test_execution_service_environment.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • tests/unit/persistence/test_protocol.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • tests/unit/engine/workspace/environment/test_nix_strategy.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • tests/conformance/persistence/test_project_environment_repository.py
  • tests/unit/engine/workspace/environment/test_env_service.py
  • tests/unit/engine/workspace/environment/test_manifest_strategy.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • tests/unit/core/test_project_environment_model.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • tests/unit/engine/workspace/environment/test_devcontainer_strategy.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • tests/integration/engine/workspace/test_environment_acceptance.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
  • tests/unit/persistence/sqlite/test_project_environment_repo.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

  • src/synthorg/engine/workspace/environment/config.py
  • src/synthorg/core/enums.py
  • src/synthorg/observability/events/persistence.py
  • src/synthorg/workers/environment_runner.py
  • src/synthorg/tools/sandbox/active_environment.py
  • src/synthorg/observability/events/workspace.py
  • src/synthorg/tools/sandbox/config.py
  • src/synthorg/engine/workspace/environment/templates.py
  • src/synthorg/tools/sandbox/protocol.py
  • src/synthorg/engine/errors.py
  • src/synthorg/engine/workspace/environment/__init__.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/persistence/protocol.py
  • src/synthorg/tools/sandbox/docker_config.py
  • src/synthorg/api/state.py
  • src/synthorg/core/project_environment.py
  • src/synthorg/engine/workspace/environment/hash_cache.py
  • src/synthorg/tools/sandbox/subprocess_sandbox.py
  • src/synthorg/persistence/postgres/backend.py
  • src/synthorg/persistence/sqlite/backend.py
  • src/synthorg/engine/workspace/environment/factory.py
  • src/synthorg/api/app.py
  • src/synthorg/engine/workspace/environment/nix.py
  • src/synthorg/persistence/sqlite/_backend_accessors.py
  • src/synthorg/core/error_taxonomy.py
  • src/synthorg/persistence/project_environment_protocol.py
  • src/synthorg/engine/workspace/environment/protocol.py
  • src/synthorg/tools/sandbox/docker_sandbox.py
  • src/synthorg/engine/workspace/environment/image_builder.py
  • src/synthorg/engine/workspace/environment/manifest.py
  • src/synthorg/persistence/postgres/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/committer.py
  • src/synthorg/tools/sandbox/docker_sandbox_exec.py
  • src/synthorg/workers/execution_service.py
  • src/synthorg/engine/workspace/environment/service.py
  • src/synthorg/persistence/sqlite/project_environment_repo.py
  • src/synthorg/engine/workspace/environment/devcontainer.py
🔇 Additional comments (84)
docs/DESIGN_SPEC.md (1)

24-24: LGTM!

web/src/api/types/openapi.gen.ts (1)

8048-8048: LGTM!

Also applies to: 12010-12023, 12175-12175

src/synthorg/core/enums.py (1)

359-412: File-size cap is still exceeded in this module.

This change adds more code to a module that remains above the 800-line repository limit.
As per coding guidelines: src/synthorg/**/*.py: "files <800 lines".

Also applies to: 541-557

src/synthorg/observability/events/persistence.py (1)

328-349: File-size cap is still exceeded in this module.

This addition further grows a file that remains above the 800-line cap.
As per coding guidelines: src/synthorg/**/*.py: "files <800 lines".

web/src/api/types/error-codes.gen.ts (1)

67-67: LGTM!

Also applies to: 113-115

tests/unit/engine/workspace/environment/test_env_factory.py (1)

1-46: LGTM!

docs/reference/pluggable-subsystems.md (1)

179-202: LGTM!

src/synthorg/engine/workspace/environment/config.py (1)

1-92: LGTM!

docs/design/persistence.md (1)

85-85: LGTM!

src/synthorg/workers/environment_runner.py (1)

1-65: LGTM!

docs/reference/errors.md (1)

105-106: LGTM!

Also applies to: 162-171

docs/design/coordination.md (1)

431-478: LGTM!

src/synthorg/tools/sandbox/active_environment.py (1)

1-65: LGTM!

src/synthorg/observability/events/workspace.py (1)

68-81: LGTM!

src/synthorg/tools/sandbox/config.py (1)

5-5: LGTM!

Also applies to: 9-29, 73-73

src/synthorg/engine/workspace/environment/templates.py (1)

1-57: LGTM!

src/synthorg/tools/sandbox/protocol.py (1)

79-79: LGTM!

Also applies to: 96-100

tests/unit/api/fakes.py (1)

633-662: LGTM!

CLAUDE.md (1)

81-81: LGTM!

src/synthorg/engine/errors.py (1)

301-353: LGTM!

tests/unit/workers/test_execution_service.py (1)

492-492: LGTM!

Also applies to: 506-506, 566-568

src/synthorg/persistence/postgres/schema.sql (1)

469-485: LGTM!

src/synthorg/persistence/sqlite/schema.sql (1)

455-471: LGTM!

src/synthorg/engine/workspace/environment/__init__.py (1)

1-67: LGTM!

src/synthorg/workers/runtime_builder.py (1)

31-59: LGTM!

Also applies to: 74-75, 351-387, 389-415, 704-704, 723-730

src/synthorg/persistence/postgres/revisions/20260521000002_project_environments.sql (1)

1-26: LGTM!

src/synthorg/persistence/protocol.py (1)

111-113: LGTM!

Also applies to: 428-431

src/synthorg/tools/sandbox/docker_config.py (1)

18-18: LGTM!

Also applies to: 167-175

tests/unit/tools/sandbox/test_active_environment.py (1)

1-100: LGTM!

src/synthorg/api/state.py (1)

165-165: LGTM!

Also applies to: 253-253, 504-504, 1060-1076

tests/unit/engine/workspace/environment/test_hash_cache.py (1)

1-65: LGTM!

src/synthorg/core/project_environment.py (1)

1-78: LGTM!

src/synthorg/engine/workspace/environment/hash_cache.py (1)

1-44: LGTM!

src/synthorg/tools/sandbox/subprocess_sandbox.py (1)

34-34: LGTM!

Also applies to: 273-301, 640-654, 721-721

tests/unit/workers/test_environment_runner.py (1)

1-62: LGTM!

src/synthorg/persistence/postgres/backend.py (1)

127-129: LGTM!

Also applies to: 251-253, 310-310, 379-379, 445-445, 709-714

tests/unit/engine/test_environment_errors.py (1)

1-62: LGTM!

src/synthorg/persistence/sqlite/backend.py (1)

124-126: LGTM!

Also applies to: 219-219, 292-292, 455-458

tests/unit/api/fakes_backend.py (1)

42-42: LGTM!

Also applies to: 605-605, 727-729

src/synthorg/engine/workspace/environment/factory.py (1)

1-100: LGTM!

src/synthorg/api/app.py (1)

300-334: LGTM!

Also applies to: 1286-1289, 1333-1343

tests/unit/tools/sandbox/test_subprocess_sandbox.py (1)

100-115: LGTM!

tests/unit/workers/test_execution_service_environment.py (1)

1-124: LGTM!

src/synthorg/persistence/sqlite/_backend_accessors.py (1)

89-91: LGTM!

Also applies to: 155-155, 324-329

src/synthorg/persistence/sqlite/revisions/20260521000002_project_environments.sql (1)

1-27: LGTM!

tests/unit/persistence/test_protocol.py (1)

44-46: LGTM!

Also applies to: 82-82, 617-637, 1139-1145, 1459-1470

src/synthorg/core/error_taxonomy.py (1)

119-119: LGTM!

Also applies to: 173-175

src/synthorg/persistence/project_environment_protocol.py (1)

1-89: LGTM!

src/synthorg/engine/workspace/environment/protocol.py (2)

1-105: LGTM!

Also applies to: 114-202


106-113: ⚡ Quick win

Harden ProvisionedEnvironment.env_vars against in-place mutation.

ProvisionedEnvironment is configured as frozen=True, but in Pydantic v2 this only prevents attribute reassignment—mutable contents of fields like env_vars: dict[str, str] can still be modified in place without triggering the model’s frozen checks (so immutability is leaky at this boundary). Make env_vars truly immutable (e.g., convert to an immutable mapping/tuple at construction) or ensure defensive copying at every system boundary (model_copy(update=...) / copy.deepcopy() per immutability guidance).

tests/unit/engine/workspace/environment/test_nix_strategy.py (1)

1-104: LGTM!

src/synthorg/tools/sandbox/docker_sandbox.py (1)

417-457: LGTM!

Also applies to: 480-541, 753-797

src/synthorg/engine/workspace/environment/manifest.py (1)

48-67: LGTM!

Also applies to: 141-159, 195-252

tests/conformance/persistence/test_project_environment_repository.py (1)

16-130: LGTM!

tests/unit/engine/workspace/environment/test_env_service.py (1)

123-285: LGTM!

tests/unit/engine/workspace/environment/test_manifest_strategy.py (1)

66-259: LGTM!

tests/unit/core/test_project_environment_model.py (1)

1-99: LGTM!

src/synthorg/engine/workspace/environment/committer.py (1)

1-93: LGTM!

src/synthorg/tools/sandbox/docker_sandbox_exec.py (1)

1-941: LGTM!

src/synthorg/workers/execution_service.py (2)

1-99: LGTM!

Also applies to: 282-345, 407-537, 573-636, 670-825, 827-893


793-797: 💤 Low value

Type mismatch concern not applicable: task.project is already NotBlankStr

Task.project is declared as NotBlankStr (non-optional) in src/synthorg/core/task.py, and TaskExecution.task is that same Task, so task.project / ctx.task_execution.task.project are compatible with _provision_environment(project_id: NotBlankStr | None) (the ctx.task_execution else None branch correctly yields NotBlankStr | None).

tests/unit/engine/workspace/environment/test_devcontainer_strategy.py (1)

1-167: LGTM!

Also applies to: 169-301

src/synthorg/engine/workspace/environment/service.py (1)

1-238: LGTM!

src/synthorg/persistence/sqlite/project_environment_repo.py (2)

1-31: LGTM!

Also applies to: 33-229


32-32: ⚡ Quick win

Don’t change _MAX_LIST_ROWS to Final[int]
scripts/check_no_magic_numbers.py allowlists module-level annotated numeric constants typed as NAME: int|float|Final|Final[int]|Final[float] = literal, and other SQLite repos use _MAX_LIST_ROWS: int = ..., so the current _MAX_LIST_ROWS: int = 10_000 already matches the enforced rule.

			> Likely an incorrect or invalid review comment.
tests/integration/engine/workspace/test_environment_acceptance.py (1)

1-231: LGTM!

Also applies to: 235-251

src/synthorg/engine/workspace/environment/devcontainer.py (16)

197-213: The previous review concern about including the implicit default Dockerfile in the declaration hash has been addressed. Line 209 now correctly uses build.get("dockerfile", "Dockerfile").


333-374: The previous review concern about guarding against empty postCreateCommand lists has been addressed. Lines 351-353 now properly check if not post_create and raise EnvironmentConfigError.


1-62: LGTM!


64-87: LGTM!


90-116: LGTM!


122-142: LGTM!


144-159: LGTM!


161-180: LGTM!


182-195: LGTM!


215-238: LGTM!


240-251: LGTM!


253-256: LGTM!


258-320: LGTM!


322-331: LGTM!


376-431: LGTM!


433-434: LGTM!

tests/unit/persistence/sqlite/test_project_environment_repo.py (2)

1-46: LGTM!


49-114: LGTM!

Comment thread src/synthorg/engine/workspace/environment/nix.py
Comment thread src/synthorg/engine/workspace/environment/nix.py Outdated
Comment thread tests/integration/engine/workspace/test_environment_acceptance.py Outdated
Comment thread tests/unit/engine/workspace/environment/test_devcontainer_strategy.py Outdated
Aureliolo added 2 commits May 22, 2026 11:00
nix: log ENVIRONMENT_DECLARATION_INVALID before raising on missing flake; length-frame flake.nix/flake.lock hash inputs against collision.

devcontainer: length-frame declaration + Dockerfile hash inputs (same hardening as nix).

runtime_builder: release sandbox owner on the resolved code-execution backend, not a hardwired docker backend.

tests: assert git rev-list success before parsing count; explicit encoding on a devcontainer Dockerfile write.
test_manifest_strategy traversal test resolved the escaping lockfile to tmp_path.parent (the shared per-worker temp dir), so parallel tests collided on <worker>/escape -> IsADirectoryError, the sole Test Unit failure on CI (the 13-22min stalls were the separate environmental xdist node-down flake). Nest the workspace in a subdir so the escape resolves inside this test's unique tmp_path while still escaping the workspace.

test_protocol.py: FakeSandboxBackend stub gains image_override kwarg to mirror the SandboxBackend protocol.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 86.62420% with 126 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.98%. Comparing base (9b98312) to head (469997f).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...thorg/engine/workspace/environment/devcontainer.py 81.76% 26 Missing and 7 partials ⚠️
...org/persistence/sqlite/project_environment_repo.py 68.57% 22 Missing ⚠️
...g/persistence/postgres/project_environment_repo.py 75.00% 16 Missing ⚠️
...horg/engine/workspace/environment/image_builder.py 69.56% 14 Missing ⚠️
src/synthorg/tools/sandbox/docker_sandbox.py 26.66% 11 Missing ⚠️
src/synthorg/workers/execution_service.py 76.19% 9 Missing and 1 partial ⚠️
src/synthorg/engine/workspace/environment/nix.py 91.04% 5 Missing and 1 partial ⚠️
.../synthorg/engine/workspace/environment/protocol.py 85.71% 4 Missing and 2 partials ⚠️
.../synthorg/engine/workspace/environment/manifest.py 95.23% 4 Missing and 1 partial ⚠️
src/synthorg/tools/sandbox/docker_sandbox_exec.py 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2039      +/-   ##
==========================================
- Coverage   84.99%   84.98%   -0.02%     
==========================================
  Files        2139     2157      +18     
  Lines      125133   126065     +932     
  Branches    10465    10530      +65     
==========================================
+ Hits       106359   107135     +776     
- Misses      16147    16283     +136     
- Partials     2627     2647      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Aureliolo Aureliolo merged commit d2c0ef9 into main May 22, 2026
82 checks passed
@Aureliolo Aureliolo deleted the feat/1994-reproducible-project-environments branch May 22, 2026 09:59
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 09:59 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 22, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Introduced conversational interface for direct clarify and propose
interactions.
- Cost management now includes forecast gates, hard ceilings, and Pareto
considerations.
- Added living documentation engine combining wiki and
retrieval-augmented generation features.
- Real intake engine is now operational for live data processing.
- Virtual desktop tool with vision verification gate available for
enhanced workspace control.

### What's new
- Per-project reproducible environments for consistent setups.
- Headless browser testing tool integrated for automated UI validation.
- Governed external API and data access tool introduced.
- Hardened external-remote git backend with sandbox mounts and
push-queue dispatching.
- Adversarial red-team gate subsystem for enhanced security testing.
- Self-extending toolkit to dynamically expand capabilities.
- Stakes-aware model routing enables prioritized processing.
- Task-board entry adapter connects live runtime with project
management.
- Persistent project workspace with pluggable git backend and
per-project push queues implemented.
- Knowledge and provenance substrate added to track data lineage.
- Scoring and data contract framework for golden-company benchmark
evaluations.

### Under the hood
- Desktop Dockerfile pinned by digest to improve build stability and
documented publishing gap fixed.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.7](v0.8.6...v0.8.7)
(2026-05-22)


### Features

* conversational interface v1 - 1:1 clarify + propose
([#2019](#2019))
([216ef94](216ef94)),
closes [#1968](#1968)
* cost as a first-class dial (forecast gate, hard ceiling, Pareto)
([#2029](#2029))
([700a59e](700a59e)),
closes [#1982](#1982)
* **env:** reproducible per-project environments
([#2039](#2039))
([d2c0ef9](d2c0ef9)),
closes [#1994](#1994)
* **evals:** [#1980](#1980)
spine -- scoring + data contract for golden-company benchmark
([#2025](#2025))
([53108e8](53108e8))
* goal/objective entry adapter
([#1964](#1964))
([#2022](#2022))
([cb15c3c](cb15c3c))
* governed external API/data access tool
([#1991](#1991))
([#2032](#2032))
([e08b451](e08b451))
* harden external-remote git backend + per-project sandbox mount +
push-queue dispatch
([#2020](#2020))
([#2030](#2030))
([2fa2e1e](2fa2e1e))
* headless browser testing tool
([#1992](#1992))
([#2024](#2024))
([277b52a](277b52a))
* knowledge + provenance substrate
([#2036](#2036))
([48c897b](48c897b))
* living documentation engine (dual-purpose wiki + RAG namespace)
([#2028](#2028))
([3d10da9](3d10da9)),
closes [#1976](#1976)
* real intake engine online
([#2017](#2017))
([9d8eb34](9d8eb34))
* **redteam:** adversarial red-team gate subsystem
([#1986](#1986))
([#2026](#2026))
([d2207e9](d2207e9))
* self-extending toolkit
([#1995](#1995))
([#2035](#2035))
([5ffc545](5ffc545))
* stakes-aware model routing
([#1998](#1998))
([#2038](#2038))
([9b98312](9b98312))
* task-board entry adapter to live runtime
([#1963](#1963))
([#2023](#2023))
([a8f1eea](a8f1eea))
* virtual desktop tool and vision verifier gate
([#2031](#2031))
([dfe8b42](dfe8b42)),
closes [#1993](#1993)
* **workspace:** persistent project workspace + pluggable git backend +
per-project push queue
([#2021](#2021))
([ee58ee7](ee58ee7))


### Bug Fixes

* pin desktop Dockerfile by digest (Scorecard
[#309](#309)) + document
publish gap ([#2034](#2034))
([8fda188](8fda188))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reproducible per-project environments

1 participant