Skip to content

feat: virtual desktop tool and vision verifier gate#2031

Merged
Aureliolo merged 9 commits into
mainfrom
feat/1993-vision-verifier
May 21, 2026
Merged

feat: virtual desktop tool and vision verifier gate#2031
Aureliolo merged 9 commits into
mainfrom
feat/1993-vision-verifier

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds the computer-use tier from EPIC #1987: a virtualised desktop an agent drives, and a vision-model verifier that judges whether the running GUI matches the brief (the UI cousin of the red-team gate).

  • Virtual desktop tool (src/synthorg/tools/desktop/): launch a windowed GUI app, click/type/press-keys/scroll, capture screenshots. Runs an in-container executor (Xvfb + xdotool + scrot) inside the existing DockerSandbox; the session persists across calls via the per-agent warm container. Pluggable DesktopDriver (protocol + factory + config discriminator): xvfb (deterministic default) and vnc (live observation), Linux-renderable GUI targets, with a desktop-capable docker/desktop/Dockerfile. New ToolCategory.DESKTOP + desktop:* action types, permissions (STANDARD/ELEVATED), settings, observability events.
  • Vision verifier (src/synthorg/security/visionverify/): pluggable VisionVerifier (noop default-off / heuristic deterministic colour-rule / llm_vision multimodal) + VisionVerifierGate, mirroring the red-team gate's protocol + routing (severity x autonomy matrix), self-evaluation rejection, and fail-OPEN policy. Chained after _apply_red_team_gate in ReviewGateService.run_pipeline; a BLOCK verdict routes the task back to IN_PROGRESS; absent screenshots SKIP (vision is conditional on a GUI deliverable). Boot-wired in runtime_builder and injected into the live review gate via set_vision_gate.
  • Provider multimodal extension: additive ImagePart on ChatMessage (content stays str | None), emitted as the litellm multimodal content list, gated on ModelCapabilities.supports_vision; cassette record/replay keying covers image_parts and the redactor elides image bytes from the human-readable copy.
  • SEC-1: the untrusted brief/criteria are wrap_untrusted-fenced before reaching the vision model; screenshots travel as structured image_parts, never as prompt text.
  • Docs: docs/design/tools.md (desktop category + action taxonomy + Virtual Desktop section), docs/design/verification-quality.md (vision gate in Order of Operations + Vision Verifier Gate section).

Deviation from the original plan

The plan called for a yoyo revision pair (vision_verification_runs / vision_verification_findings). On implementation, the red-team gate (the mirrored cousin) was found to use in-memory report storage and ship no DB revision; screenshots are durable on disk (path + sha256 in ScreenshotResult) and the verdict lands in the existing decision_records audit via the review pipeline. Adding a parallel audit table would diverge from the cousin, so the persistence subsystem was dropped (no repository, no revision, no conformance test). The plan file records this rationale.

Test plan

  • uv run python -m pytest tests/ -m unit — green (30099 passed).
  • New unit tests: tests/unit/providers/test_multimodal.py, tests/unit/security/visionverify/*, tests/unit/tools/desktop/*, tests/unit/engine/test_review_gate_vision.py.
  • Acceptance (tests/integration/security/visionverify/test_review_gate_vision.py, run with ALLOW_E2E_TESTS=1): a deliberate red-window mismatch drives the heuristic verifier to BLOCK and routes the task IN_REVIEW -> IN_PROGRESS before completion; a matching blue-window control completes. Mirrors the red-team planted-defect acceptance.
  • ruff, mypy, and all pre-push convention gates pass (domain-error, frozen-extra-forbid, boundary-typed, dto-types-ts-in-sync, clock-seam, no-magic-numbers, etc.).

Note: the first push hit the pre-push isolation gate's --count 2 re-run crashing at the native xdist worker level at full-suite scale (node down: Not properly terminated), triggered by the pyproject.toml per-file-ignore forcing a full-suite double-run. Confirmed environmental, not a logic regression: the primary -m unit run is green (30099 passed) and tests/unit/api/test_state.py passed 190/190 under --count 2. Bypassed once with the gate's documented SYNTHORG_SKIP_ISOLATION_GATE=1 (user-authorized); the subsequent review-fix push passed the gate normally.

Review coverage

Pre-reviewed by 14 agents (security, code, python, conventions, async-concurrency, type-design, silent-failure, logging, resilience, docs-consistency, comment-quality, api-contract-drift, test-quality, pr-test-analyzer, issue-resolution). 8 valid findings addressed; the except A, B: "syntax error" flags were rejected as false positives (PEP 758 is valid in Python 3.14 and is the project standard). issue-resolution-verifier: all acceptance criteria RESOLVED.

Closes #1993

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: b722f3fb-0aa3-4083-a817-aaf9cfa3a2f1

📥 Commits

Reviewing files that changed from the base of the PR and between fb609e9 and b2e558a.

📒 Files selected for processing (5)
  • docker/desktop/Dockerfile
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
  • tests/unit/tools/desktop/test_desktop_driver.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Test Integration
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (3)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at the boot site. YAML is a company-template ingestion format, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values; numerics live in settings/definitions/. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal. Enforced by scripts/check_no_magic_numbers.py.
Comments WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
No from __future__ import annotations (3.14 has PEP 649). PEP 758 except: except A, B: no parens unless binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Errors: <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries); @computed_field for derived; NotBlankStr for identifiers.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out st...

Files:

  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles: ladder in conventions.md section 12.1. FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) blocked by scripts/check_mock_spec.py (zero-tolerance).
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Flaky: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

  • tests/unit/tools/desktop/test_desktop_driver.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/tools/desktop/test_desktop_driver.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/tools/desktop/_args.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
🔇 Additional comments (5)
docker/desktop/Dockerfile (1)

27-29: LGTM!

src/synthorg/tools/desktop/_args.py (1)

19-19: LGTM!

Also applies to: 74-74

tests/unit/tools/desktop/test_desktop_driver.py (1)

4-4: LGTM!

Also applies to: 57-57

src/synthorg/tools/desktop/_constants.py (1)

13-13: LGTM!

src/synthorg/tools/desktop/_executor.py (1)

289-295: LGTM!

Also applies to: 444-445


Walkthrough

This PR adds a headless virtual desktop tool and an opt-in vision verifier gate. The desktop tool runs a sandboxed X11 session (Xvfb/VNC) to launch apps, inject input (launch/click/type/key/scroll), and capture screenshots via an in-sandbox executor; it includes drivers, settings, models, Docker image, error types, and factory wiring. The vision verifier subsystem provides contracts, heuristic and LLM verifiers (fail-open), verdict routing, gate service integration, multimodal provider support, cassette redaction for images, security config/mappings, runtime wiring, observability events, web type updates, and extensive unit/integration tests.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive virtual desktop and vision verification subsystem. It allows agents to drive GUI applications in a sandboxed headless environment and automatically verify the visual output against acceptance criteria. This capability is gated by a new security pipeline that prevents self-evaluation and ensures untrusted input is properly fenced before being processed by vision models.

Highlights

  • Virtual Desktop Tool: Introduced a new desktop tool category that enables agents to interact with GUI applications within a headless Xvfb environment, supporting actions like clicking, typing, and capturing screenshots.
  • Vision Verifier Gate: Added a VisionVerifierGate that acts as a UI-focused quality gate, validating that GUI deliverables match their brief using either heuristic color-based checks or multimodal LLM analysis.
  • Multimodal Support: Extended the provider interface to support ImagePart attachments in chat messages, allowing screenshots to be passed to vision-capable models while ensuring sensitive image data is redacted from logs.
  • Security & Safety: Implemented strict wrap_untrusted fencing for briefs and criteria, and ensured self-evaluation is rejected by requiring distinct generator and evaluator agent IDs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 21, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing feat/1993-vision-verifier (b2e558a) with main (e08b451)

Open in CodSpeed

Comment thread src/synthorg/tools/desktop/_executor.py Dismissed
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a virtual desktop tool and a vision verifier gate subsystem, enabling agents to drive GUI applications in a headless Xvfb sandbox and validate visual deliverables. The changes include multimodal support for image handling in the provider layer and a pluggable verification architecture. Feedback identifies a violation of the mandatory clock seam rule in the DesktopTool implementation, suggesting the injection of a Clock instance to manage timestamps for improved testability.

Comment on lines +105 to +113
def __init__(
self,
*,
sandbox: SandboxBackend,
workspace: Path,
driver: DesktopDriver | None = None,
owner_id: str | None = None,
settings: DesktopSettings | None = None,
) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The DesktopTool should use the project's Clock seam for generating timestamps (e.g., in ScreenshotResult) to ensure testability and consistency with the project's business logic rules. Please inject a Clock instance in the constructor.

    def __init__(
        self,
        *,
        sandbox: SandboxBackend,
        workspace: Path,
        driver: DesktopDriver | None = None,
        owner_id: str | None = None,
        settings: DesktopSettings | None = None,
        clock: Clock | None = None,
    ) -> None:
        """Wire the tool to a sandbox backend and the project workspace.

        Args:
            sandbox: Pluggable backend that runs the in-container
                executor (typically a DockerSandbox with a desktop image).
            workspace: Persistent project workspace. Used to stage the
                executor and to write screenshots.
            driver: Optional :class:DesktopDriver override. When
                omitted it is built from settings.driver (default:
                the deterministic xvfb driver).
            owner_id: Sandbox lifecycle owner id. When omitted, each
                tool instance gets a uuid4-derived id.
            settings: Operator-resolved settings. When omitted the model
                defaults (mirroring the module constants) are used.
            clock: Clock seam. Production passes :class:SystemClock; tests
                pass :class:FakeClock. Defaults to :class:SystemClock.
        """
        super().__init__(
            name="desktop",
            description=(
                "Virtual desktop automation. Launch a GUI app on a "
                "headless X session, then click / type / press keys / "
                "scroll and capture screenshots. Modes: launch, click, "
                "type, key, screenshot, scroll. SECURITY: the launch "
                "app_command runs via bash -c inside the sandbox; trust "
                "the sandbox boundary, never pass untrusted strings."
            ),
            category=ToolCategory.DESKTOP,
            parameters_schema=DesktopToolArgs.model_json_schema(),
        )
        if not workspace.is_absolute():
            msg = f"workspace must be absolute, got {workspace!r}"
            raise DesktopDomainError(msg)
        self._sandbox = sandbox
        self._workspace = workspace.resolve()
        self._settings = settings or DesktopSettings()
        self._driver: DesktopDriver = driver or build_desktop_driver(
            self._settings.driver,
        )
        self._screenshots = WorkspaceScreenshotStore(workspace=self._workspace)
        self._owner_id = owner_id or f"desktop-tool-{uuid4()}"
        from synthorg.core.clock import SystemClock
        self._clock = clock or SystemClock()
References
  1. Mandatory rules: Clock seam. (link)

width=int(result.get("width") or 1),
height=int(result.get("height") or 1),
file_size_bytes=int(result.get("file_size_bytes") or 0),
captured_at_iso=datetime.now(UTC).isoformat(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid using datetime.now(UTC) directly. Use the injected self._clock.now() to maintain the clock seam convention.

Suggested change
captured_at_iso=datetime.now(UTC).isoformat(),
captured_at_iso=self._clock.now().isoformat(),
References
  1. Mandatory rules: Clock seam. (link)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/providers/cassette/redaction.py`:
- Around line 83-90: The redactor currently only matches keys named base64_data
or data_uri (_IMAGE_DATA_FIELD_NAME) so image bytes stored under nested keys
like image_url.url (as produced by src/synthorg/providers/drivers/mappers.py)
are not elided; update the redaction logic to also detect and elide image data
in nested image URL fields by expanding the pattern (or add a new regex) to
match keys like image_url, .*\.url, and url-containing data URIs (apply the
change in the same redaction module where _IMAGE_DATA_FIELD_NAME is defined and
the subsequent redaction checks around lines 108-110), ensuring any value under
an image_url.url key is replaced/omitted while keeping the original hashing
behavior intact.

In `@src/synthorg/security/visionverify/verifiers/llm_vision.py`:
- Around line 225-277: The code currently reads untrusted tool `arguments`
directly in _parse_report and _parse_finding; change it to validate/convert
using the project's parse_typed() on a typed args model before mapping to
VisionVerificationReport/VisionFinding. Create or reuse a typed input model
(e.g., VisionVerifierArgs with fields findings: list[VisionFindingArgs],
confidence: float, summary: str, etc.), call parse_typed(arguments,
VisionVerifierArgs) inside _parse_report, iterate the validated findings (or
call parse_typed for each finding in _parse_finding) and then construct
VisionVerificationReport and VisionFinding from those typed objects; keep the
existing try/except and degraded path but catch parse_typed errors instead of
raw KeyError/TypeError/ValueError.

In `@src/synthorg/tools/desktop/_args.py`:
- Around line 61-69: The current validation treats whitespace-only app_command
as valid because it uses not self.app_command; update the validation for
app_command (the Field and the check referencing self.app_command) to reject
strings that are empty after stripping whitespace: add a Pydantic validator for
"app_command" (or update the existing check) that if app_command is not None and
app_command.strip() == "" raises a ValueError explaining the field cannot be
blank/whitespace-only so launch mode fails early.

In `@src/synthorg/tools/desktop/_executor.py`:
- Around line 250-257: The loop can exit by timeout and still proceed to write
session state; change the post-loop logic to detect a timeout (i.e., after the
while finishes, verify that a window was actually found via _has_window(env) and
that proc.poll() is None/that the process is still running) and, if no window
appeared before deadline, raise a RuntimeError (or appropriate exception) and do
not call _write_session_state; only call _write_session_state(display=display,
pid=proc.pid) when a window was observed. Use the existing symbols deadline,
_has_window, proc.poll(), and _write_session_state to implement this check.
- Around line 71-89: The current _validated_sandbox_path uses only lexical
checks (PurePosixPath) so a path under _SANDBOX_ROOT can still escape via
symlinked ancestors; change the validation to resolve the user-supplied path
against the filesystem and verify containment against the resolved sandbox root:
convert the candidate to a real Path, call resolve (use strict=False to avoid
failing for non-existent targets), resolve Path(_SANDBOX_ROOT) as well, then
assert the resolved user path is absolute and is_relative_to the resolved
sandbox root (and keep the existing empty/absolute checks or replace the '..'
lexical check with this resolved containment check) so symlink escapes are
blocked before returning from _validated_sandbox_path.

In `@src/synthorg/tools/desktop/desktop_tool.py`:
- Around line 210-217: The code is silently accepting malformed/missing executor
fields by using fallback literals (e.g., pid=1) instead of validating input;
replace the ad-hoc extraction of result in payload and the LaunchResult
construction with a typed parse at the boundary: define a small ResultModel
(fields: display: str | None, pid: int, screen_width: int, screen_height: int)
and call parse_typed(payload.get("result"), ResultModel) (or equivalent
parse_typed usage) to validate/convert the external dict, then build
LaunchResult from the parsed model and session defaults only when fields are
explicitly present; ensure parse_typed errors are surfaced/handled (raise or
return an error) rather than silently defaulting — apply the same change for the
other occurrences referenced around the LaunchResult construction and related
blocks.
- Around line 284-291: _executor_timeout_seconds currently uses static
LAUNCH_TIMEOUT_SECONDS/SCREENSHOT_TIMEOUT_SECONDS and ignores per-call overrides
like args.launch_timeout_seconds; update the function signature to accept the
per-call timeout (e.g., add an optional parameter like launch_timeout_seconds or
an args object) and when operation == "launch" add that per-call value (falling
back to LAUNCH_TIMEOUT_SECONDS if None), and similarly for "screenshot" use a
per-call screenshot timeout if provided; apply the same change to the other
related helper (the duplicate at the other location) so outer timeout honors
per-call timeouts.

In `@tests/unit/security/test_action_types.py`:
- Line 31: The test test_builtin_category_expansion_counts was updated to
include ActionTypeCategory.DESKTOP but the expected expansion list omitted
"desktop", so add "desktop" to the expected entries used in
test_builtin_category_expansion_counts (or the expected counts map) so
expand_category("desktop") is covered; locate the test function
test_builtin_category_expansion_counts and the data structure asserting
expansions and include the "desktop" entry corresponding to
ActionTypeCategory.DESKTOP.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: cac2be0c-fda8-40c0-9a71-c4aa981c61f4

📥 Commits

Reviewing files that changed from the base of the PR and between 3d10da9 and e083a74.

📒 Files selected for processing (73)
  • docker/desktop/Dockerfile
  • docs/design/tools.md
  • docs/design/verification-quality.md
  • pyproject.toml
  • scripts/check_no_bare_time_in_business_logic.py
  • src/synthorg/api/app.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/config.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/workers/runtime_builder.py
  • tests/integration/security/visionverify/__init__.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/core/test_enums.py
  • tests/unit/engine/test_review_gate_vision.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/security/test_action_types.py
  • tests/unit/security/visionverify/__init__.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/unit/tools/desktop/__init__.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles: ladder in conventions.md section 12.1. FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) blocked by scripts/check_mock_spec.py (zero-tolerance).
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Flaky: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

  • tests/unit/security/visionverify/__init__.py
  • tests/integration/security/visionverify/__init__.py
  • tests/unit/tools/desktop/__init__.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/unit/core/test_enums.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/security/test_action_types.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/engine/test_review_gate_vision.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/security/visionverify/__init__.py
  • tests/integration/security/visionverify/__init__.py
  • tests/unit/tools/desktop/__init__.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/unit/core/test_enums.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/security/test_action_types.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/engine/test_review_gate_vision.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/security/config.py
  • src/synthorg/api/app.py
  • src/synthorg/core/enums.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at the boot site. YAML is a company-template ingestion format, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values; numerics live in settings/definitions/. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal. Enforced by scripts/check_no_magic_numbers.py.
Comments WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
No from __future__ import annotations (3.14 has PEP 649). PEP 758 except: except A, B: no parens unless binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Errors: <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries); @computed_field for derived; NotBlankStr for identifiers.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out st...

Files:

  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/security/config.py
  • src/synthorg/api/app.py
  • src/synthorg/core/enums.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/security/config.py
  • src/synthorg/api/app.py
  • src/synthorg/core/enums.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/enum-values.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/enum-values.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/enum-values.gen.ts
web/src/**/*.{tsx,ts}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ design tokens only in web components; detail in web/CLAUDE.md.

Files:

  • web/src/api/types/openapi.gen.ts
  • web/src/api/types/enum-values.gen.ts
src/**/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md.

Files:

  • src/synthorg/observability/events/desktop.py
  • src/synthorg/observability/events/vision_verify.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README + public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers. See data/README.md.

Files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
**/*.{md,d2}

📄 CodeRabbit inference engine (CLAUDE.md)

D2 for architecture / nested containers, mermaid for flowcharts / sequence / pipelines. Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI.

Files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error.

Files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/drivers/mappers.py
src/synthorg/api/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Construction phase wires synchronous services; on_startup wires services needing connected persistence. Agent registry built BEFORE auto_wire_meetings; tunnel_provider wired unconditionally.

Files:

  • src/synthorg/api/app.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Conversational propose: POST /meta/chat/propose is opt-in (meta.chief_of_staff.propose_enabled); ChiefOfStaffProposer built by build_chief_of_staff_proposer (ENFORCED manifest entry) and 503s when ANY of provider, persistence, or work pipeline missing. Human content wrapped via wrap_untrusted(TAG_TASK_DATA, ...) (SEC-1).

Files:

  • src/synthorg/api/app.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/security/visionverify/__init__.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/config.py
  • tests/integration/security/visionverify/__init__.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/observability/events/desktop.py
  • tests/unit/tools/desktop/__init__.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • tests/unit/core/test_enums.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/observability/events/vision_verify.py
  • tests/unit/observability/test_events.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/security/visionverify/builder.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • src/synthorg/security/visionverify/errors.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • src/synthorg/tools/desktop/__init__.py
  • tests/unit/security/test_action_types.py
  • src/synthorg/security/config.py
  • src/synthorg/api/app.py
  • tests/unit/security/visionverify/test_models_routing.py
  • src/synthorg/core/enums.py
  • src/synthorg/tools/desktop/_args.py
  • tests/unit/security/visionverify/test_verifiers.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/protocol.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/providers/test_multimodal.py
  • src/synthorg/tools/desktop/_models.py
  • scripts/check_no_bare_time_in_business_logic.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/__init__.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/prompt.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/security/visionverify/gate.py
  • tests/unit/engine/test_review_gate_vision.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • src/synthorg/security/visionverify/factory.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

  • src/synthorg/settings/definitions/tools.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

  • src/synthorg/settings/definitions/tools.py
🪛 Checkov (3.2.529)
docker/desktop/Dockerfile

[low] 1-49: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)

🔇 Additional comments (79)
docs/design/tools.md (6)

12-12: LGTM!


30-30: LGTM!


78-78: LGTM!


203-224: LGTM!


486-486: LGTM!


501-501: LGTM!

docs/design/verification-quality.md (2)

144-171: LGTM!


185-185: LGTM!

src/synthorg/security/visionverify/__init__.py (1)

1-60: LGTM!

src/synthorg/security/visionverify/config.py (1)

1-25: LGTM!

src/synthorg/security/visionverify/errors.py (1)

1-85: LGTM!

src/synthorg/security/visionverify/models.py (1)

1-263: LGTM!

src/synthorg/security/visionverify/protocol.py (1)

1-66: LGTM!

src/synthorg/security/visionverify/routing.py (1)

1-69: LGTM!

src/synthorg/security/visionverify/prompt.py (1)

1-56: LGTM!

src/synthorg/security/visionverify/verifiers/__init__.py (1)

1-11: LGTM!

src/synthorg/security/visionverify/verifiers/_image.py (1)

1-79: LGTM!

src/synthorg/security/visionverify/verifiers/heuristic.py (1)

1-139: LGTM!

src/synthorg/security/visionverify/verifiers/noop.py (1)

1-45: LGTM!

src/synthorg/security/visionverify/verifiers/llm_vision.py (1)

1-224: LGTM!

Also applies to: 278-301

src/synthorg/security/visionverify/factory.py (1)

1-103: LGTM!

src/synthorg/security/visionverify/builder.py (1)

1-52: LGTM!

src/synthorg/security/visionverify/gate.py (1)

1-167: LGTM!

src/synthorg/observability/events/vision_verify.py (1)

1-23: LGTM!

src/synthorg/engine/review_gate.py (5)

48-51: LGTM!

Also applies to: 60-61


93-93: LGTM!

Also applies to: 98-108


219-219: LGTM!

Also applies to: 244-249


281-294: LGTM!


375-431: LGTM!

src/synthorg/api/app.py (1)

1202-1210: LGTM!

src/synthorg/tools/desktop/__init__.py (1)

1-57: LGTM!

src/synthorg/tools/desktop/_models.py (1)

1-61: LGTM!

src/synthorg/tools/desktop/driver/__init__.py (1)

1-33: LGTM!

src/synthorg/tools/desktop/driver/config.py (1)

1-106: LGTM!

src/synthorg/tools/desktop/driver/factory.py (1)

1-64: LGTM!

src/synthorg/tools/desktop/driver/protocol.py (1)

1-34: LGTM!

src/synthorg/tools/desktop/_constants.py (1)

44-44: ⚡ Quick win

tools.desktop_image_pin default uses mutable :latest (not digest-pinned)

  • src/synthorg/tools/desktop/_constants.py sets DESKTOP_IMAGE_PIN_DEFAULT to ghcr.io/aureliolo/synthorg-desktop:latest, and src/synthorg/settings/definitions/tools.py defaults tools.desktop_image_pin to the same value.
  • Repo search shows desktop_image_pin is only used to resolve DesktopSettings.image_pin and is not referenced by the desktop/sandbox Docker image selection logic (which instead uses tools.sandbox.docker.image), so the runtime security/reproducibility impact depends on wiring that isn’t present in this code.
Suggested fix
-DESKTOP_IMAGE_PIN_DEFAULT: Final[str] = "ghcr.io/aureliolo/synthorg-desktop:latest"
+DESKTOP_IMAGE_PIN_DEFAULT: Final[str] = (
+    "ghcr.io/aureliolo/synthorg-desktop@sha256:<immutable-digest>"
+)
src/synthorg/tools/desktop/driver/vnc.py (1)

1-52: LGTM!

src/synthorg/tools/desktop/driver/xvfb.py (1)

1-48: LGTM!

src/synthorg/tools/desktop/errors.py (1)

1-79: LGTM!

src/synthorg/tools/desktop/_screenshot_store.py (1)

1-63: LGTM!

src/synthorg/tools/desktop/_settings.py (1)

1-76: LGTM!

src/synthorg/tools/factory.py (1)

97-117: LGTM!

Also applies to: 458-459, 516-521, 635-641, 688-688, 742-744, 831-850, 883-884

src/synthorg/tools/permissions.py (1)

97-97: LGTM!

src/synthorg/settings/definitions/tools.py (1)

351-415: LGTM!

docker/desktop/Dockerfile (1)

1-49: LGTM!

src/synthorg/providers/enums.py (1)

25-45: LGTM!

src/synthorg/providers/models.py (1)

14-20: LGTM!

Also applies to: 200-229, 239-240, 255-258, 301-307, 312-324

src/synthorg/providers/drivers/mappers.py (1)

63-67: LGTM!

Also applies to: 71-92

src/synthorg/providers/cassette/redaction.py (1)

20-20: LGTM!

Also applies to: 148-148

src/synthorg/core/enums.py (1)

477-477: LGTM!

Also applies to: 585-590

src/synthorg/security/action_type_mapping.py (1)

31-31: LGTM!

src/synthorg/security/action_types.py (1)

37-37: LGTM!

src/synthorg/security/config.py (1)

9-9: LGTM!

Also applies to: 14-14, 350-398, 535-538

src/synthorg/security/risk_scorer.py (1)

230-239: LGTM!

src/synthorg/security/rules/risk_classifier.py (1)

49-54: LGTM!

src/synthorg/security/timeout/risk_tier_classifier.py (1)

70-75: LGTM!

src/synthorg/workers/runtime_builder.py (1)

70-70: LGTM!

Also applies to: 153-153, 230-236, 251-251, 538-542, 608-612, 649-695

src/synthorg/observability/events/desktop.py (1)

1-29: LGTM!

web/src/api/types/enum-values.gen.ts (1)

769-769: LGTM!

web/src/api/types/openapi.gen.ts (1)

12225-12225: LGTM!

pyproject.toml (1)

299-311: LGTM!

scripts/check_no_bare_time_in_business_logic.py (1)

75-78: LGTM!

tests/integration/security/visionverify/__init__.py (1)

1-1: LGTM!

tests/integration/security/visionverify/test_review_gate_vision.py (1)

1-166: LGTM!

tests/unit/engine/test_review_gate_vision.py (1)

1-191: LGTM!

tests/unit/security/visionverify/test_gate_llm.py (1)

1-202: LGTM!

tests/unit/security/visionverify/test_models_routing.py (1)

1-167: LGTM!

tests/unit/security/visionverify/test_verifiers.py (1)

1-209: LGTM!

tests/unit/tools/desktop/__init__.py (1)

1-1: LGTM!

tests/unit/tools/desktop/test_desktop_args.py (1)

1-70: LGTM!

tests/unit/tools/desktop/test_desktop_driver.py (1)

1-58: LGTM!

tests/unit/tools/desktop/test_desktop_executor.py (1)

1-77: LGTM!

tests/unit/tools/desktop/test_desktop_models_errors.py (1)

1-74: LGTM!

tests/unit/tools/desktop/test_desktop_tool_dispatch.py (1)

1-144: LGTM!

tests/unit/providers/test_multimodal.py (1)

1-231: LGTM!

tests/unit/security/visionverify/__init__.py (1)

1-1: LGTM!

tests/unit/observability/test_events.py (1)

343-345: LGTM!

tests/unit/core/test_enums.py (1)

120-121: LGTM!

Also applies to: 124-124

Comment thread src/synthorg/providers/cassette/redaction.py
Comment thread src/synthorg/security/visionverify/verifiers/llm_vision.py
Comment thread src/synthorg/tools/desktop/_args.py Outdated
Comment thread src/synthorg/tools/desktop/_executor.py Outdated
Comment thread src/synthorg/tools/desktop/_executor.py
Comment thread src/synthorg/tools/desktop/desktop_tool.py Outdated
Comment thread src/synthorg/tools/desktop/desktop_tool.py Outdated
Comment thread tests/unit/security/test_action_types.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 82.90025% with 204 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.98%. Comparing base (3d10da9) to head (b2e558a).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/tools/desktop/_executor.py 33.33% 137 Missing and 1 partial ⚠️
src/synthorg/tools/desktop/desktop_tool.py 79.66% 31 Missing and 5 partials ⚠️
src/synthorg/tools/desktop/_screenshot_store.py 64.51% 7 Missing and 4 partials ⚠️
...horg/security/visionverify/verifiers/llm_vision.py 92.68% 3 Missing and 3 partials ⚠️
src/synthorg/security/visionverify/models.py 96.47% 2 Missing and 1 partial ⚠️
src/synthorg/workers/runtime_builder.py 70.00% 3 Missing ⚠️
src/synthorg/api/app.py 0.00% 1 Missing and 1 partial ⚠️
src/synthorg/security/visionverify/builder.py 75.00% 1 Missing and 1 partial ⚠️
src/synthorg/security/visionverify/gate.py 94.87% 2 Missing ⚠️
...c/synthorg/security/visionverify/verifiers/noop.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2031      +/-   ##
==========================================
+ Coverage   84.95%   84.98%   +0.02%     
==========================================
  Files        2016     2062      +46     
  Lines      119582   121324    +1742     
  Branches    10084    10221     +137     
==========================================
+ Hits       101587   103103    +1516     
- Misses      15488    15679     +191     
- Partials     2507     2542      +35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/tools/desktop/desktop_tool.py (1)

377-387: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

System errors should propagate through the sandbox boundary.

The except Exception clause at line 377 catches MemoryError and RecursionError, wrapping them into a DesktopSessionError. Per project convention, these system-level errors should be re-raised immediately so the runtime can handle resource exhaustion appropriately.

🛡️ Proposed fix to re-raise system errors
         try:
             result = await self._sandbox.execute(
                 command="python3",
                 args=(executor_container,),
                 env_overrides=env,
                 timeout=timeout,
                 owner_id=self._owner_id,
             )
+        except builtins.MemoryError, RecursionError:
+            raise
         except Exception as exc:
             logger.warning(
                 DESKTOP_EXECUTOR_FAILED,

Note: You'll need to add import builtins at the top of the file if not already present, or use the pattern without builtins. prefix if the exceptions are already in scope.

As per coding guidelines: "Async: asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/tools/desktop/desktop_tool.py` around lines 377 - 387, The
except Exception block that logs DESKTOP_EXECUTOR_FAILED and raises
DesktopSessionError is currently catching system-level errors (e.g.,
MemoryError, RecursionError); update that handler to immediately re-raise those
system errors before logging/wrapping: detect if exc is an instance of
(MemoryError, RecursionError) (either via
builtins.MemoryError/builtins.RecursionError or the direct names if already in
scope) and if so re-raise it, otherwise perform the existing logger.warning(...)
and raise DesktopSessionError(...) from exc; keep the existing symbols
DESKTOP_EXECUTOR_FAILED, logger, safe_error_description, and DesktopSessionError
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/synthorg/tools/desktop/desktop_tool.py`:
- Around line 377-387: The except Exception block that logs
DESKTOP_EXECUTOR_FAILED and raises DesktopSessionError is currently catching
system-level errors (e.g., MemoryError, RecursionError); update that handler to
immediately re-raise those system errors before logging/wrapping: detect if exc
is an instance of (MemoryError, RecursionError) (either via
builtins.MemoryError/builtins.RecursionError or the direct names if already in
scope) and if so re-raise it, otherwise perform the existing logger.warning(...)
and raise DesktopSessionError(...) from exc; keep the existing symbols
DESKTOP_EXECUTOR_FAILED, logger, safe_error_description, and DesktopSessionError
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 786f4cb8-a860-4926-a2e5-830db0f9a72d

📥 Commits

Reviewing files that changed from the base of the PR and between e083a74 and a2882d5.

📒 Files selected for processing (12)
  • docker/desktop/Dockerfile
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/test_action_types.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Unit
  • GitHub Check: Test Integration
  • GitHub Check: Test E2E
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles: ladder in conventions.md section 12.1. FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) blocked by scripts/check_mock_spec.py (zero-tolerance).
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Flaky: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/test_action_types.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/test_action_types.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/desktop_tool.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at the boot site. YAML is a company-template ingestion format, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values; numerics live in settings/definitions/. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal. Enforced by scripts/check_no_magic_numbers.py.
Comments WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
No from __future__ import annotations (3.14 has PEP 649). PEP 758 except: except A, B: no parens unless binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Errors: <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries); @computed_field for derived; NotBlankStr for identifiers.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out st...

Files:

  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/desktop_tool.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/desktop_tool.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error.

Files:

  • src/synthorg/providers/cassette/redaction.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/test_action_types.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/providers/cassette/redaction.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • src/synthorg/tools/desktop/desktop_tool.py
🔇 Additional comments (21)
docker/desktop/Dockerfile (1)

47-52: LGTM!

tests/unit/providers/cassette/test_redaction.py (1)

12-12: LGTM!

Also applies to: 59-79

tests/unit/security/test_action_types.py (1)

127-127: LGTM!

src/synthorg/tools/desktop/_models.py (1)

63-83: LGTM!

src/synthorg/providers/cassette/redaction.py (1)

91-97: LGTM!

Also applies to: 102-103

tests/unit/tools/desktop/test_desktop_args.py (1)

20-23: LGTM!

tests/unit/tools/desktop/test_desktop_executor.py (1)

67-82: LGTM!

src/synthorg/tools/desktop/_args.py (1)

61-69: LGTM!

Also applies to: 141-143

src/synthorg/tools/desktop/_executor.py (3)

1-2: LGTM!

Also applies to: 72-99, 102-117, 279-289


435-452: LGTM!


120-188: LGTM!

Also applies to: 190-231, 250-342, 345-406

src/synthorg/security/visionverify/verifiers/llm_vision.py (3)

100-122: LGTM!


125-191: LGTM!

Also applies to: 221-296


298-319: LGTM!

tests/unit/tools/desktop/test_desktop_tool_dispatch.py (2)

1-51: LGTM!

Also applies to: 53-67, 69-134


136-180: LGTM!

src/synthorg/tools/desktop/desktop_tool.py (5)

1-30: LGTM!

Also applies to: 33-67, 104-167


169-210: LGTM!


215-291: LGTM!


297-350: LGTM!


351-375: LGTM!

Also applies to: 389-419, 421-511

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 21, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker/desktop/Dockerfile`:
- Line 27: Add a one-line rationale next to the existing hadolint suppression "#
hadolint ignore=DL3008" explaining why DL3008 (unpinned apt versions) is
acceptable here (e.g., tracking Debian stable base image and relying on upstream
security updates), so future maintainers understand the intentional decision;
update the comment near the same line that contains "# hadolint ignore=DL3008".

In `@src/synthorg/tools/desktop/_args.py`:
- Around line 70-74: The Field definition for launch_timeout_seconds uses a
hardcoded multiplier `20`; introduce a module-level annotated named constant
(e.g., LAUNCH_TIMEOUT_MULTIPLIER: int = 20 or MAX_LAUNCH_TIMEOUT_MULTIPLIER: int
= 20) near the top of src/synthorg/tools/desktop/_args.py and replace the
literal `20` in the ge/le expression with that constant (use
LAUNCH_TIMEOUT_SECONDS * LAUNCH_TIMEOUT_MULTIPLIER); ensure the constant is
annotated and named per repo conventions so the numeric value is not hardcoded
in the Field() call.

In `@src/synthorg/tools/desktop/_executor.py`:
- Around line 435-450: The top-level handler around the _dispatch(payload) call
should not swallow system-fatal errors like MemoryError and RecursionError;
change the exception handling so those two exceptions are re-raised and only
other Exception subclasses are converted to the generic JSON error envelope.
Locate the try/except that calls _dispatch in this module and implement the
conventional pattern (re-raise MemoryError and RecursionError, then handle other
Exception instances by writing the generic {"status":"error", "error_type":...,
"message":"Executor failed"} to stdout and returning 1).
- Around line 287-290: On timeout when window_seen is false the code only calls
proc.terminate() and raises, leaving the child unreaped; modify the timeout
branch to fully reap the child: after proc.terminate() call use
proc.wait(timeout=some_short_seconds) and if wait raises TimeoutExpired call
proc.kill() then proc.wait() to ensure the process is reaped before raising the
RuntimeError; update the block that currently references window_seen,
proc.terminate(), and _write_session_state(display=display, pid=proc.pid) so
that reaping happens before raising and before any subsequent
_write_session_state is reached.

In `@tests/unit/tools/desktop/test_desktop_driver.py`:
- Around line 54-57: Replace the generic exception assertion in
test_session_config_is_frozen with the concrete Pydantic frozen-model error:
update the pytest.raises(Exception) to
pytest.raises(pydantic_core.ValidationError) when mutating the session returned
by build_desktop_driver(DesktopDriverConfig()).session_config(), and add an
import for pydantic_core at the top of the test file so the specific exception
type is available.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a57436eb-d707-43ad-9eac-ae197f051182

📥 Commits

Reviewing files that changed from the base of the PR and between dcc5879 and fb609e9.

📒 Files selected for processing (75)
  • docker/desktop/Dockerfile
  • docs/design/tools.md
  • docs/design/verification-quality.md
  • pyproject.toml
  • scripts/check_no_bare_time_in_business_logic.py
  • src/synthorg/api/app.py
  • src/synthorg/core/enums.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/providers/enums.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/config.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/tools/desktop/desktop_tool.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/workers/runtime_builder.py
  • tests/conftest.py
  • tests/integration/security/visionverify/__init__.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/core/test_enums.py
  • tests/unit/engine/test_review_gate_vision.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/security/test_action_types.py
  • tests/unit/security/visionverify/__init__.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/unit/tools/desktop/__init__.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Test E2E
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Test Unit
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles: ladder in conventions.md section 12.1. FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) blocked by scripts/check_mock_spec.py (zero-tolerance).
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Flaky: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

  • tests/integration/security/visionverify/__init__.py
  • tests/unit/security/visionverify/__init__.py
  • tests/unit/tools/desktop/__init__.py
  • tests/conftest.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/security/test_action_types.py
  • tests/unit/core/test_enums.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/engine/test_review_gate_vision.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/integration/security/visionverify/__init__.py
  • tests/unit/security/visionverify/__init__.py
  • tests/unit/tools/desktop/__init__.py
  • tests/conftest.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/security/test_action_types.py
  • tests/unit/core/test_enums.py
  • tests/unit/observability/test_events.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • tests/unit/security/visionverify/test_verifiers.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • tests/unit/security/visionverify/test_models_routing.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/engine/test_review_gate_vision.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/api/app.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/core/enums.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/providers/enums.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/security/config.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/desktop_tool.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at the boot site. YAML is a company-template ingestion format, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values; numerics live in settings/definitions/. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal. Enforced by scripts/check_no_magic_numbers.py.
Comments WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
No from __future__ import annotations (3.14 has PEP 649). PEP 758 except: except A, B: no parens unless binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Errors: <Domain><Condition>Error from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries); @computed_field for derived; NotBlankStr for identifiers.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out st...

Files:

  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/api/app.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/core/enums.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/providers/enums.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/security/config.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/desktop_tool.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/api/app.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/core/enums.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/visionverify/config.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/providers/enums.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/errors.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/visionverify/protocol.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/security/config.py
  • src/synthorg/tools/factory.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/errors.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/workers/runtime_builder.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • src/synthorg/security/visionverify/gate.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/tools/desktop/_executor.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/desktop_tool.py
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
web/src/**/*.{tsx,ts}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ design tokens only in web components; detail in web/CLAUDE.md.

Files:

  • web/src/api/types/enum-values.gen.ts
  • web/src/api/types/openapi.gen.ts
src/synthorg/api/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Construction phase wires synchronous services; on_startup wires services needing connected persistence. Agent registry built BEFORE auto_wire_meetings; tunnel_provider wired unconditionally.

Files:

  • src/synthorg/api/app.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Conversational propose: POST /meta/chat/propose is opt-in (meta.chief_of_staff.propose_enabled); ChiefOfStaffProposer built by build_chief_of_staff_proposer (ENFORCED manifest entry) and 503s when ANY of provider, persistence, or work pipeline missing. Human content wrapped via wrap_untrusted(TAG_TASK_DATA, ...) (SEC-1).

Files:

  • src/synthorg/api/app.py
src/**/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md.

Files:

  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/observability/events/desktop.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README + public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers. See data/README.md.

Files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
**/*.{md,d2}

📄 CodeRabbit inference engine (CLAUDE.md)

D2 for architecture / nested containers, mermaid for flowcharts / sequence / pipelines. Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI.

Files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error.

Files:

  • src/synthorg/providers/enums.py
  • src/synthorg/providers/drivers/mappers.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/models.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/integration/security/visionverify/__init__.py
  • tests/unit/security/visionverify/__init__.py
  • tests/unit/tools/desktop/__init__.py
  • src/synthorg/tools/desktop/driver/__init__.py
  • src/synthorg/security/visionverify/verifiers/__init__.py
  • tests/conftest.py
  • src/synthorg/security/action_types.py
  • src/synthorg/security/visionverify/builder.py
  • src/synthorg/api/app.py
  • src/synthorg/security/visionverify/verifiers/noop.py
  • src/synthorg/tools/desktop/_constants.py
  • src/synthorg/observability/events/vision_verify.py
  • src/synthorg/tools/desktop/driver/protocol.py
  • tests/unit/tools/desktop/test_desktop_driver.py
  • src/synthorg/tools/desktop/driver/vnc.py
  • src/synthorg/tools/desktop/__init__.py
  • src/synthorg/core/enums.py
  • src/synthorg/security/action_type_mapping.py
  • src/synthorg/security/visionverify/config.py
  • tests/unit/tools/desktop/test_desktop_executor.py
  • src/synthorg/security/rules/risk_classifier.py
  • src/synthorg/security/timeout/risk_tier_classifier.py
  • src/synthorg/providers/enums.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/security/visionverify/test_gate_llm.py
  • tests/unit/security/test_action_types.py
  • src/synthorg/tools/permissions.py
  • src/synthorg/tools/desktop/errors.py
  • tests/unit/core/test_enums.py
  • src/synthorg/observability/events/desktop.py
  • src/synthorg/security/visionverify/__init__.py
  • src/synthorg/security/risk_scorer.py
  • src/synthorg/providers/drivers/mappers.py
  • tests/unit/observability/test_events.py
  • src/synthorg/tools/desktop/_args.py
  • src/synthorg/tools/desktop/driver/xvfb.py
  • src/synthorg/tools/desktop/_screenshot_store.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/models.py
  • src/synthorg/security/visionverify/protocol.py
  • scripts/check_no_bare_time_in_business_logic.py
  • src/synthorg/tools/desktop/driver/config.py
  • src/synthorg/settings/definitions/tools.py
  • src/synthorg/tools/desktop/_models.py
  • src/synthorg/security/config.py
  • src/synthorg/tools/factory.py
  • tests/unit/tools/desktop/test_desktop_args.py
  • src/synthorg/tools/desktop/_settings.py
  • src/synthorg/security/visionverify/verifiers/llm_vision.py
  • tests/unit/security/visionverify/test_verifiers.py
  • src/synthorg/tools/desktop/driver/factory.py
  • src/synthorg/engine/review_gate.py
  • src/synthorg/security/visionverify/errors.py
  • tests/unit/tools/desktop/test_desktop_tool_dispatch.py
  • src/synthorg/security/visionverify/models.py
  • src/synthorg/security/visionverify/prompt.py
  • src/synthorg/workers/runtime_builder.py
  • tests/unit/tools/desktop/test_desktop_models_errors.py
  • src/synthorg/security/visionverify/verifiers/heuristic.py
  • src/synthorg/security/visionverify/verifiers/_image.py
  • tests/integration/security/visionverify/test_review_gate_vision.py
  • src/synthorg/security/visionverify/gate.py
  • tests/unit/security/visionverify/test_models_routing.py
  • src/synthorg/security/visionverify/factory.py
  • src/synthorg/tools/desktop/_executor.py
  • tests/unit/providers/test_multimodal.py
  • tests/unit/engine/test_review_gate_vision.py
  • src/synthorg/security/visionverify/routing.py
  • src/synthorg/tools/desktop/desktop_tool.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/design/verification-quality.md
  • docs/design/tools.md
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

  • src/synthorg/settings/definitions/tools.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

  • src/synthorg/settings/definitions/tools.py
🔇 Additional comments (78)
tests/integration/security/visionverify/__init__.py (1)

1-1: LGTM!

tests/unit/security/visionverify/__init__.py (1)

1-1: LGTM!

tests/unit/tools/desktop/__init__.py (1)

1-1: LGTM!

docker/desktop/Dockerfile (1)

1-56: LGTM!

src/synthorg/tools/desktop/driver/__init__.py (1)

1-32: LGTM!

src/synthorg/security/visionverify/verifiers/__init__.py (1)

1-11: LGTM!

tests/conftest.py (1)

242-251: LGTM!

Also applies to: 263-263, 270-280, 283-283, 287-287, 620-620, 623-632

src/synthorg/security/action_types.py (1)

37-38: LGTM!

src/synthorg/security/visionverify/builder.py (1)

1-51: LGTM!

pyproject.toml (1)

299-311: LGTM!

src/synthorg/api/app.py (1)

1202-1210: LGTM!

src/synthorg/security/visionverify/verifiers/noop.py (1)

1-44: LGTM!

src/synthorg/tools/desktop/_constants.py (1)

1-48: LGTM!

src/synthorg/observability/events/vision_verify.py (1)

1-22: LGTM!

docs/design/verification-quality.md (1)

144-185: LGTM!

src/synthorg/tools/desktop/driver/protocol.py (1)

1-33: LGTM!

web/src/api/types/openapi.gen.ts (1)

5963-5967: LGTM!

Also applies to: 6941-6942, 7249-7250, 11470-11470, 12234-12234, 12487-12487

src/synthorg/tools/desktop/driver/vnc.py (1)

1-52: LGTM!

src/synthorg/tools/desktop/__init__.py (1)

1-56: LGTM!

src/synthorg/core/enums.py (1)

477-478: LGTM!

Also applies to: 586-592

src/synthorg/security/action_type_mapping.py (1)

31-32: LGTM!

src/synthorg/security/visionverify/config.py (1)

1-25: LGTM!

tests/unit/tools/desktop/test_desktop_executor.py (1)

1-93: LGTM!

src/synthorg/security/rules/risk_classifier.py (1)

24-24: LGTM!

Also applies to: 50-55

src/synthorg/security/timeout/risk_tier_classifier.py (1)

45-45: LGTM!

Also applies to: 71-76

src/synthorg/providers/enums.py (1)

25-45: LGTM!

tests/unit/providers/cassette/test_redaction.py (1)

12-12: LGTM!

Also applies to: 59-80

tests/unit/security/visionverify/test_gate_llm.py (1)

1-202: LGTM!

tests/unit/security/test_action_types.py (1)

14-14: LGTM!

Also applies to: 31-32, 128-128

src/synthorg/tools/permissions.py (1)

97-98: LGTM!

src/synthorg/tools/desktop/errors.py (1)

1-79: LGTM!

tests/unit/core/test_enums.py (1)

120-125: LGTM!

src/synthorg/observability/events/desktop.py (1)

1-29: LGTM!

src/synthorg/security/visionverify/__init__.py (1)

1-61: LGTM!

src/synthorg/security/risk_scorer.py (1)

202-204: LGTM!

Also applies to: 233-242

src/synthorg/providers/drivers/mappers.py (1)

63-67: LGTM!

Also applies to: 71-92

tests/unit/observability/test_events.py (1)

343-347: LGTM!

src/synthorg/tools/desktop/driver/xvfb.py (1)

1-49: LGTM!

src/synthorg/tools/desktop/_screenshot_store.py (1)

1-64: LGTM!

docs/design/tools.md (1)

12-12: LGTM!

Also applies to: 30-31, 79-79, 204-224, 478-487, 502-503

src/synthorg/providers/cassette/redaction.py (1)

20-20: LGTM!

Also applies to: 83-97, 102-103, 118-119, 158-158

src/synthorg/providers/models.py (1)

14-20: LGTM!

Also applies to: 200-229, 255-258, 305-305, 312-324

src/synthorg/security/visionverify/protocol.py (1)

1-65: LGTM!

scripts/check_no_bare_time_in_business_logic.py (1)

75-78: LGTM!

src/synthorg/tools/desktop/driver/config.py (1)

1-106: LGTM!

src/synthorg/settings/definitions/tools.py (1)

352-415: LGTM!

src/synthorg/tools/desktop/_models.py (1)

1-84: LGTM!

src/synthorg/security/config.py (1)

350-360: LGTM!

Also applies to: 363-397, 535-538

src/synthorg/tools/factory.py (1)

64-65: LGTM!

Also applies to: 97-117, 458-460, 516-521, 635-641, 688-688, 742-744, 831-850, 883-884

tests/unit/tools/desktop/test_desktop_args.py (1)

1-74: LGTM!

src/synthorg/tools/desktop/_settings.py (1)

34-46: LGTM!

Also applies to: 48-76

src/synthorg/security/visionverify/verifiers/llm_vision.py (1)

103-123: LGTM!

Also applies to: 125-191, 193-220, 221-286, 288-319

tests/unit/security/visionverify/test_verifiers.py (1)

32-42: LGTM!

Also applies to: 45-60, 63-107, 109-182, 184-208

src/synthorg/tools/desktop/driver/factory.py (1)

23-25: LGTM!

Also applies to: 27-39, 42-48, 51-63

src/synthorg/engine/review_gate.py (1)

48-51: LGTM!

Also applies to: 60-62, 93-109, 219-219, 244-249, 281-294, 375-431

src/synthorg/security/visionverify/errors.py (1)

1-85: LGTM!

tests/unit/tools/desktop/test_desktop_tool_dispatch.py (1)

1-181: LGTM!

src/synthorg/security/visionverify/models.py (1)

1-263: LGTM!

src/synthorg/security/visionverify/prompt.py (1)

1-56: LGTM!

src/synthorg/workers/runtime_builder.py (1)

52-788: LGTM!

tests/unit/tools/desktop/test_desktop_models_errors.py (1)

1-74: LGTM!

src/synthorg/security/visionverify/verifiers/heuristic.py (1)

1-139: LGTM!

src/synthorg/security/visionverify/verifiers/_image.py (1)

1-79: LGTM!

tests/integration/security/visionverify/test_review_gate_vision.py (1)

1-166: LGTM!

src/synthorg/security/visionverify/gate.py (1)

50-167: LGTM!

tests/unit/security/visionverify/test_models_routing.py (1)

1-167: LGTM!

src/synthorg/security/visionverify/factory.py (1)

35-103: LGTM!

tests/unit/providers/test_multimodal.py (1)

1-231: LGTM!

tests/unit/engine/test_review_gate_vision.py (1)

1-191: LGTM!

src/synthorg/security/visionverify/routing.py (1)

1-69: LGTM!

src/synthorg/tools/desktop/desktop_tool.py (8)

1-52: LGTM!


53-102: LGTM!


104-167: LGTM!


169-211: LGTM!


217-293: LGTM!


295-351: LGTM!


353-459: LGTM!


462-515: LGTM!

Comment thread docker/desktop/Dockerfile
Comment thread src/synthorg/tools/desktop/_args.py
Comment thread src/synthorg/tools/desktop/_executor.py
Comment thread src/synthorg/tools/desktop/_executor.py
Comment thread tests/unit/tools/desktop/test_desktop_driver.py
@Aureliolo Aureliolo merged commit dfe8b42 into main May 21, 2026
83 checks passed
@Aureliolo Aureliolo deleted the feat/1993-vision-verifier branch May 21, 2026 09:33
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 09:33 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request May 21, 2026
…lish gap (#2034)

## Summary

- Pin `docker/desktop/Dockerfile` base image (`debian:trixie-slim`) and
the `# syntax=` directive by SHA-256 digest, clearing OSSF Scorecard
code-scanning alert #309 (Pinned-Dependencies). The debian digest was
verified against the live registry; the syntax digest matches the one
already used by every other Dockerfile in the repo.
- Document the desktop image in `docs/design/deployment.md` as a
**not-yet-published** runtime image (it is referenced by the desktop
tool's `desktop_image_pin` default but never built/signed by CI), and
generalise the `docs/security.md` Renovate-scope sentence (the
`dockerfile` manager + `docker:pinDigests` already covers every
Dockerfile, so the old "backend and sandbox" enumeration was
incomplete).
- File #2033 to track the genuine gap surfaced during review: the
desktop image (`ghcr.io/aureliolo/synthorg-desktop`) is referenced by
code shipped in #2031 but is never built, published, or cosign-signed by
CI (GHCR returns 404). The deployment doc points at #2033.

## Test plan

- Docs + Dockerfile-only change; no runtime code touched.
- Pre-commit (ruff/ruff-format, em-dash gate, hadolint) and pre-push
(mypy-affected, pytest-unit-affected, all convention gates, hadolint)
passed locally. hadolint validates the pinned Dockerfile.
- Scorecard alert #309 should clear once this lands on main and the next
code-scanning run completes.

## Review coverage

Pre-reviewed by 3 agents (infra-reviewer, docs-consistency,
comment-quality-rot). infra-reviewer and comment-quality-rot: 0
findings. docs-consistency surfaced the desktop-image documentation gap,
addressed here (and the deeper publish gap tracked in #2033).

Resolves Scorecard alert #309. Does **not** close #2033 (followup
tracking the missing build/publish wiring).
Aureliolo pushed a commit that referenced this pull request May 22, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Introduced conversational interface for direct clarify and propose
interactions.
- Cost management now includes forecast gates, hard ceilings, and Pareto
considerations.
- Added living documentation engine combining wiki and
retrieval-augmented generation features.
- Real intake engine is now operational for live data processing.
- Virtual desktop tool with vision verification gate available for
enhanced workspace control.

### What's new
- Per-project reproducible environments for consistent setups.
- Headless browser testing tool integrated for automated UI validation.
- Governed external API and data access tool introduced.
- Hardened external-remote git backend with sandbox mounts and
push-queue dispatching.
- Adversarial red-team gate subsystem for enhanced security testing.
- Self-extending toolkit to dynamically expand capabilities.
- Stakes-aware model routing enables prioritized processing.
- Task-board entry adapter connects live runtime with project
management.
- Persistent project workspace with pluggable git backend and
per-project push queues implemented.
- Knowledge and provenance substrate added to track data lineage.
- Scoring and data contract framework for golden-company benchmark
evaluations.

### Under the hood
- Desktop Dockerfile pinned by digest to improve build stability and
documented publishing gap fixed.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.7](v0.8.6...v0.8.7)
(2026-05-22)


### Features

* conversational interface v1 - 1:1 clarify + propose
([#2019](#2019))
([216ef94](216ef94)),
closes [#1968](#1968)
* cost as a first-class dial (forecast gate, hard ceiling, Pareto)
([#2029](#2029))
([700a59e](700a59e)),
closes [#1982](#1982)
* **env:** reproducible per-project environments
([#2039](#2039))
([d2c0ef9](d2c0ef9)),
closes [#1994](#1994)
* **evals:** [#1980](#1980)
spine -- scoring + data contract for golden-company benchmark
([#2025](#2025))
([53108e8](53108e8))
* goal/objective entry adapter
([#1964](#1964))
([#2022](#2022))
([cb15c3c](cb15c3c))
* governed external API/data access tool
([#1991](#1991))
([#2032](#2032))
([e08b451](e08b451))
* harden external-remote git backend + per-project sandbox mount +
push-queue dispatch
([#2020](#2020))
([#2030](#2030))
([2fa2e1e](2fa2e1e))
* headless browser testing tool
([#1992](#1992))
([#2024](#2024))
([277b52a](277b52a))
* knowledge + provenance substrate
([#2036](#2036))
([48c897b](48c897b))
* living documentation engine (dual-purpose wiki + RAG namespace)
([#2028](#2028))
([3d10da9](3d10da9)),
closes [#1976](#1976)
* real intake engine online
([#2017](#2017))
([9d8eb34](9d8eb34))
* **redteam:** adversarial red-team gate subsystem
([#1986](#1986))
([#2026](#2026))
([d2207e9](d2207e9))
* self-extending toolkit
([#1995](#1995))
([#2035](#2035))
([5ffc545](5ffc545))
* stakes-aware model routing
([#1998](#1998))
([#2038](#2038))
([9b98312](9b98312))
* task-board entry adapter to live runtime
([#1963](#1963))
([#2023](#2023))
([a8f1eea](a8f1eea))
* virtual desktop tool and vision verifier gate
([#2031](#2031))
([dfe8b42](dfe8b42)),
closes [#1993](#1993)
* **workspace:** persistent project workspace + pluggable git backend +
per-project push queue
([#2021](#2021))
([ee58ee7](ee58ee7))


### Bug Fixes

* pin desktop Dockerfile by digest (Scorecard
[#309](#309)) + document
publish gap ([#2034](#2034))
([8fda188](8fda188))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Virtual desktop + vision verifier

2 participants