feat: self-extending toolkit (#1995) by Aureliolo · Pull Request #2035 · Aureliolo/synthorg

Aureliolo · 2026-05-21T18:19:54Z

Closes #1995.

Summary

The org now extends its own toolkit at runtime: it detects a recurring capability gap, an LLM authors a new sandboxed MCP tool, the candidate is benchmark-validated, and on pass plus human approval it is live-registered so a later task in the same run uses it. Tool creation is governed via a new TOOL_CREATION proposal altitude (HIGH risk, mandatory approval) and a new tool:create action type.

New package src/synthorg/meta/toolsmith/: models, config, errors, gap_store, dynamic_registry, script_handler, strategy (LLM blueprint authoring), validation_gate (per-tool acceptance brief + golden-scorecard delta), applier, service, factory, CodeModificationOverflowHandler for service-access gaps.
Frozen registries stay frozen: a LayeredToolRegistry plus LayeredHandlerMap read by MCPToolInvoker overlay a snapshot-swapping DynamicToolRegistry on top of the static DomainToolRegistry. build_args_model materialises a real frozen Pydantic args model from the blueprint's JSON Schema so the strict MCPToolDef alignment validator keeps holding.
Persistence: new DynamicToolRepository protocol composing StatefulRepository + FilteredQueryRepository (no bespoke methods per ADR-0001), SQLite + Postgres repos, yoyo revision 20260522000001_dynamic_tools, schema.sql additions on both backends.
Governance: ProposalAltitude.TOOL_CREATION plus ImprovementProposal.tool_changes, guard-chain wiring (scope, approval), ActionType.TOOL_CREATE with HIGH risk across the risk maps, autonomy presets gating tool:create behind human approval in supervised/semi.
Boot: gated _wire_toolsmith hook (tool_creation_enabled + provider + persistence) installs the dynamic tool layer over the layered surface; falls back cleanly when any dependency is absent. The production golden-scorecard provider is fail-closed until Golden-company benchmark (eval spine) #1980 exposes a runnable score-with-candidate API.

Test plan

110+ new unit tests under tests/unit/meta/toolsmith/ covering models, config, gap store, strategy, validation gate, applier (incl. repo-save-failure, retire-non-active, validation-persistence), dynamic registry, script handler, overflow handler, layered invoker.
14 conformance tests at tests/conformance/persistence/test_dynamic_tool_repository.py (SQLite + Postgres parity for the new repo, transition_if semantics, query/filter, unknown-key rejection).
Acceptance e2e at tests/integration/toolsmith/test_self_extension_e2e.py: drives the full gap to author to guard to approve to validate to register to invoke loop through the real MCPToolInvoker, with a regression-case showing a regressing tool is rejected.
Full pre-push gate clean: ruff, mypy strict, full unit suite, ghost-wiring manifest (9 ENFORCED symbols reachable), schema-drift (sqlite revisions + sqlite-vs-postgres parity), magic-numbers, domain-error-hierarchy, review-origin, persistence-boundary, frozen-extra-forbid, boundary-typed, dual-backend conformance parity, dependency-inversion.

Review coverage

Reviewed by 17 agents (code-reviewer, python-reviewer, conventions-enforcer, security-reviewer, persistence-reviewer, async-concurrency-reviewer, silent-failure-hunter, type-design-analyzer, docs-consistency, comment-quality-rot, logging-audit, resilience-audit, test-quality-reviewer, pr-test-analyzer, api-contract-drift, issue-resolution-verifier, ghost-wiring mini-pass).

20 distinct findings, 14 valid + 6 invalid. Triage in _audit/pre-pr-review/triage.md.
The "Python 2 except syntax error" flagged by 5 agents is a false positive: PEP 758 makes except A, B: valid in 3.14, CLAUDE.md mandates the bare form, python-reviewer + resilience-audit explicitly confirmed it, and the pre-push (ruff + full mypy + 30k unit tests importing these modules) passed.
Implemented in this branch: missing event constants, ProviderError propagation in strategy.author() / applier.apply() / code_modification._generate_and_parse (closes a real resilience hole flagged in the toolsmith delegation path), unknown-JSON-type rejection in build_args_model, silent brief-parse log, error= on the invoke-failed log, retire() race fix (unregister gated on transition success), bounded script_body, config validator (enabled requires non-empty allowlist), ACTIVE-blueprint requires validation record, expanded applier coverage tests, docs drift (action-type count, altitudes table, package structure, YAML defaults), and #1995 stripped from the migration SQL headers.
One outstanding triage item (V9): the schema-drift baseline reasons still embed (#1995) because the gate-suppression PreToolUse hook blocked the direct edit and the --update-baseline script preserves existing reasons. Cosmetic only (the review-origin gate passed and the baseline is data, not code); can be cleaned up in a follow-up by removing the 6 lines first, then re-running --update-baseline with a clean justification under fresh ALLOW_BASELINE_GROWTH=1 approval.
Documented dependency: production GoldenScorecardProvider (runnable golden-benchmark-with-candidate API) lands when Golden-company benchmark (eval spine) #1980 exposes it; until then require_golden_delta is fail-closed at apply (a missing provider rejects), not a stub.

github-actions · 2026-05-21T18:20:08Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-05-21T18:20:10Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1244ae98-52e0-4dd5-a2d0-34f163cca8ae

📥 Commits

Reviewing files that changed from the base of the PR and between 38197ba and 7eaa0ed.

📒 Files selected for processing (1)

src/synthorg/meta/toolsmith/validation_gate.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Build Backend
GitHub Check: Lighthouse Dashboard
GitHub Check: Lighthouse Site
GitHub Check: Dashboard Test
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Unit
GitHub Check: Test Integration
GitHub Check: Test E2E
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Python benchmarks
GitHub Check: CodSpeed Web benchmarks
GitHub Check: Build Preview
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (3)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/meta/toolsmith/validation_gate.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/meta/toolsmith/validation_gate.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/meta/toolsmith/validation_gate.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

src/synthorg/meta/toolsmith/validation_gate.py

🧠 Learnings (3)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/meta/toolsmith/validation_gate.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/meta/toolsmith/validation_gate.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/meta/toolsmith/validation_gate.py

🔇 Additional comments (1)

src/synthorg/meta/toolsmith/validation_gate.py (1)

138-152: LGTM!

Walkthrough

Adds a self-extending “toolsmith” subsystem: models, config, protocols, errors; gap store; LLM blueprint authoring; validation gate; dynamic registry and layered MCP dispatch; sandbox script handler; orchestration service, applier, overflow routing, and factory; Postgres/SQLite persistence with migrations and repositories; API startup wiring and dynamic tool layer installation; security enums/guards/risk/timeouts; observability events; docs updates; and comprehensive unit/integration/conformance tests.

gemini-code-assist · 2026-05-21T18:20:19Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a self-extending toolkit ('toolsmith') that allows the system to autonomously address recurring capability gaps by authoring, validating, and registering new MCP tools at runtime. This subsystem enhances the organization's capabilities while maintaining strict security and quality controls through a mandatory approval gate, benchmark-based validation, and a layered registry architecture.

Highlights

Self-Extending Toolkit: Introduced the 'toolsmith' subsystem, enabling the organization to extend its MCP tool surface at runtime via LLM-authored, sandboxed tools.
Governance & Validation: Implemented a new 'TOOL_CREATION' proposal altitude requiring mandatory human approval and a benchmark-based validation gate to ensure tool quality and prevent regressions.
Persistence & Registry: Added a 'DynamicToolRepository' for persistent blueprint storage and a 'LayeredToolRegistry' to allow dynamic tool dispatching without unfreezing the static domain registry.
Security & Autonomy: Integrated the 'tool:create' action type into existing autonomy presets and risk maps, ensuring high-risk tool creation is strictly governed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codspeed-hq · 2026-05-21T18:22:31Z

Merging this PR will not alter performance

✅ 54 untouched benchmarks

_{Comparing feat/1995-self-extending-toolkit (7eaa0ed) with main (48c897b)}

gemini-code-assist

Code Review

This pull request introduces the 'toolsmith' subsystem, a self-extending toolkit that enables the organization to dynamically author and register new MCP tools at runtime based on recurring capability gaps. The implementation includes a ring-buffered gap store, LLM-backed authoring in isolated sandboxes, and a two-stage validation gate integrated into the self-improvement framework at a new TOOL_CREATION altitude. Feedback focuses on optimizing performance by parallelizing tool application and gap processing via asyncio.TaskGroup, correcting a docstring type mismatch in the orchestration service, and refining authoring logic to prevent duplicate tools by including currently registered dynamic tools in the capability context.

gemini-code-assist · 2026-05-21T18:26:05Z

@@ -0,0 +1,177 @@
+"""Applier that turns an approved tool-creation proposal into a live tool.


Add import asyncio to support parallel processing of tool changes via asyncio.TaskGroup.

"""Applier that turns an approved tool-creation proposal into a live tool.\n\nimport asyncio\n"

gemini-code-assist · 2026-05-21T18:26:05Z

+        applied = 0
+        failures: list[str] = []
+        for blueprint in proposal.tool_changes:
+            try:
+                ok = await self._apply_one(blueprint)
+            except ProviderError:
+                raise
+            except MemoryError, RecursionError:
+                raise
+            except Exception as exc:
+                logger.warning(
+                    TOOLSMITH_APPLY_FAILED,
+                    tool_name=blueprint.name,
+                    error_type=type(exc).__name__,
+                    error=safe_error_description(exc),
+                )
+                failures.append(f"{blueprint.name}: {type(exc).__name__}")
+                continue
+            if ok:
+                applied += 1
+            else:
+                failures.append(f"{blueprint.name}: rejected by benchmark gate")
+        if failures:
+            return ApplyResult(
+                success=False,
+                error_message=NotBlankStr("; ".join(failures)),
+                changes_applied=applied,
+            )
+        return ApplyResult(success=True, changes_applied=applied)


Tool changes are currently processed sequentially. Since _apply_one involves benchmark validation (which can be slow due to scorecard runs), consider using asyncio.TaskGroup to process them in parallel, adhering to the project's fan-out convention.

applied = 0\n failures: list[str] = []\n\n async with asyncio.TaskGroup() as tg:\n tasks = [tg.create_task(self._apply_one(blueprint)) for blueprint in proposal.tool_changes]\n\n for i, task in enumerate(tasks):\n blueprint = proposal.tool_changes[i]\n try:\n ok = task.result()\n if ok:\n applied += 1\n else:\n failures.append(f"{blueprint.name}: rejected by benchmark gate")\n except ProviderError:\n raise\n except MemoryError, RecursionError:\n raise\n except Exception as exc:\n logger.warning(\n TOOLSMITH_APPLY_FAILED,\n tool_name=blueprint.name,\n error_type=type(exc).__name__,\n error=safe_error_description(exc),\n )\n failures.append(f"{blueprint.name}: {type(exc).__name__}")\n\n if failures:\n return ApplyResult(\n success=False,\n error_message=NotBlankStr("; ".join(failures)),\n changes_applied=applied,\n )\n return ApplyResult(success=True, changes_applied=applied)

gemini-code-assist · 2026-05-21T18:26:06Z

@@ -0,0 +1,221 @@
+"""Toolsmith orchestration service.


Add import asyncio to support parallel processing of capability gaps via asyncio.TaskGroup.

Suggested change

"""Toolsmith orchestration service.

"""Toolsmith orchestration service.\n\nimport asyncio\n"

gemini-code-assist · 2026-05-21T18:26:06Z

+        existing_capabilities: Callable returning the current capability
+            surface, so the generator avoids duplicates.


The docstring incorrectly identifies existing_capabilities as a Callable. The implementation and factory wiring use a tuple[NotBlankStr, ...]. Please update the docstring to match the implementation.

Suggested change

existing_capabilities: Callable returning the current capability

surface, so the generator avoids duplicates.

existing_capabilities: Static capability surface (dedup hint),\n so the generator avoids duplicates.

gemini-code-assist · 2026-05-21T18:26:06Z

+        proposals: list[ImprovementProposal] = []
+        for gap in gaps:
+            proposals.extend(await self._handle_gap(gap))


Recurring capability gaps are processed sequentially. Since _handle_gap involves an LLM call via the generator, this is a performance bottleneck. Use asyncio.TaskGroup to process gaps in parallel per project conventions.

proposals: list[ImprovementProposal] = []\n async with asyncio.TaskGroup() as tg:\n tasks = [tg.create_task(self._handle_gap(gap)) for gap in gaps]\n\n for task in tasks:\n proposals.extend(task.result())

gemini-code-assist · 2026-05-21T18:26:06Z

+            blueprint = await self._generator.author(
+                gap, existing_capabilities=self._existing_capabilities
+            )


The existing_capabilities passed to the generator only includes static capabilities known at boot time. This can lead to the LLM authoring tools that duplicate dynamic tools already registered in the same run. Consider combining self._existing_capabilities with the names currently in the dynamic registry.

try:\n # Combine static capabilities with currently registered dynamic ones\n current_caps = list(self._existing_capabilities)\n current_caps.extend(self._applier._registry.names())\n\n blueprint = await self._generator.author(\n gap, existing_capabilities=tuple(current_caps)\n )

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/meta/toolsmith/test_dynamic_registry.py (1)

170-227: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add explicit collision-precedence tests for layered lookup.

Line 171 and Line 209 validate non-overlapping keys only. Given the dynamic layer is intended to overlay the static layer, add direct collision tests so a regression in precedence is caught.

✅ Suggested test additions

 class TestLayeredToolRegistry:
+    async def test_dynamic_overrides_static_on_name_collision(self) -> None:
+        static_def = blueprint_to_mcp_def(
+            _blueprint(name="synthorg_textkit_slugify", capability="static:tool")
+        )
+        static = _FakeStatic({static_def.name: static_def})
+        dynamic = DynamicToolRegistry(handler_factory=_handler_factory)
+        await dynamic.register(
+            _blueprint(name="synthorg_textkit_slugify", capability="dynamic:tool")
+        )
+        layered = LayeredToolRegistry(static, dynamic)
+        assert layered.get("synthorg_textkit_slugify").capability == "dynamic:tool"
+
 class TestLayeredHandlerMap:
+    async def test_dynamic_handler_overrides_static_on_key_collision(self) -> None:
+        async def static_handler(
+            *, app_state: Any, arguments: dict[str, Any], actor: Any = None
+        ) -> str:
+            del app_state, arguments, actor
+            return "static"
+
+        dynamic = DynamicToolRegistry(handler_factory=_handler_factory)
+        await dynamic.register(_blueprint(name="synthorg_textkit_slugify"))
+        mapping = LayeredHandlerMap(
+            {"synthorg_textkit_slugify": static_handler},
+            dynamic,
+        )
+        assert mapping["synthorg_textkit_slugify"] is _fake_handler

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/meta/toolsmith/test_dynamic_registry.py` around lines 170 - 227,
Add tests that assert collision precedence when a key exists in both layers: for
LayeredToolRegistry create a static ToolDef (e.g., via
blueprint_to_mcp_def/_FakeStatic) with the same name that DynamicToolRegistry
registers and assert layered.get("shared_name") returns the dynamic definition
(not the static one); for LayeredHandlerMap register a static handler under
"shared_key" and also ensure DynamicToolRegistry provides a handler for
"shared_key", then assert mapping["shared_key"] is the dynamic handler,
"shared_key" in mapping, and that other existing behaviors (KeyError for
missing, membership, len, iteration) still hold. Ensure tests reference
LayeredToolRegistry.get and LayeredHandlerMap (mapping["..."]) so collisions are
explicitly checked.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/api/app.py`:
- Around line 1532-1538: Move the call to
install_dynamic_tool_layer(runtime.dynamic_registry) so it runs before
app_state.set_toolsmith_service(runtime.service); specifically, invoke
install_dynamic_tool_layer and let it complete successfully (propagate or handle
exceptions) before calling app_state.set_toolsmith_service to avoid persisting a
partial wiring on failure. Ensure install_dynamic_tool_layer errors prevent
calling app_state.set_toolsmith_service and that the existing
logger.info(API_APP_STARTUP, service="toolsmith", note="wired") only runs after
both installation and set_toolsmith_service succeed.
- Around line 355-361: Replace the use of Path.cwd() when building sandbox
backends so toolsmith sandboxes are not bound to the process CWD; call
build_sandbox_backends with the app's resolved workspace root (the workspace
seam/root that the app exposes) instead of Path.cwd(), keeping
SandboxingConfig() and the _resolve_sandbox function unchanged except for that
argument; ensure the passed value is the same workspace root used elsewhere in
the app (e.g., the resolved workspace variable or property the app already
computes) so build_sandbox_backends(config=sandboxing,
workspace=the_app_workspace_root).

In `@src/synthorg/meta/strategies/code_modification.py`:
- Around line 179-180: The ProviderError catch currently re-raises without
logging; before re-raising in the except ProviderError block inside
src/synthorg/meta/strategies/code_modification.py, emit a warning/error log
(including the exception details) and/or raise an appropriate observability
event (e.g., call the module's logger.warn/error or the service's event emitter)
with contextual info about the provider and operation, then re-raise the
original ProviderError to preserve stack trace.

In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 149-151: The current flow saves the blueprint state via
self._repo.save(active) before doing runtime registration with
self._registry.register(active), risking durable ACTIVE state if registration
fails; change the order to call self._registry.register(active) first, then call
self._repo.save(active), and if the save raises an exception roll back the
runtime registration (e.g., call a deregister/unregister method on
self._registry or otherwise undo registration for the same active) to avoid
split-brain; update the success logging (logger.info(TOOLSMITH_APPLY_COMPLETED,
tool_name=blueprint.name)) to run only after both register and save complete
successfully.

In `@src/synthorg/meta/toolsmith/dynamic_registry.py`:
- Around line 119-123: build_args_model currently returns create_model(...)
directly which can raise raw exceptions; catch exceptions from create_model
(using the existing create_model call with _model_class_name(blueprint.name) and
ConfigDict(...)) and re-raise them as ToolRegistrationError while preserving the
original exception message/trace (e.g., include the caught exception as context
or message). Wrap only the create_model invocation in try/except, reference
create_model, _model_class_name, blueprint.name and ConfigDict in the error
message, and raise ToolRegistrationError(...) so callers receive the documented
domain-specific error.

In `@src/synthorg/meta/toolsmith/models.py`:
- Around line 227-247: Update _validate_state_timestamps to require a non-None
validation record when the blueprint reaches VALIDATED (in addition to ACTIVE):
inside the method (ToolBlueprint._validate_state_timestamps) add a check for
self.state in {ToolBlueprintState.VALIDATED, ToolBlueprintState.ACTIVE} and
raise ValueError("validation result required in state 'validated'") (or a
combined message) if self.validation is None; keep existing
validated_at/activated_at/retired_at checks and _assert_monotonic() intact.
- Around line 193-205: The validator _validate_name_capability_action currently
checks formats but not that synthorg_{domain}_{action} corresponds to capability
"{domain}:{action}"; update it to parse domain and action from self.name (using
the capturing groups from _TOOL_NAME_RE) and from self.capability (using
_CAPABILITY_RE or splitting on ':'), then compare the domain and action parts
and raise a ValueError with a clear message if they differ; keep the existing
format checks for _TOOL_NAME_RE, _CAPABILITY_RE and _ACTION_TYPE_RE and return
self when all checks pass.

In `@src/synthorg/meta/toolsmith/script_handler.py`:
- Around line 72-84: The except Exception block in the tool invocation handler
currently swallows system-critical exceptions; modify the handler around the
except Exception as exc (and related variables like logger.warning, wrapped,
DynamicToolScriptError, ToolsmithError, err) to re-raise system exceptions
instead of wrapping them: immediately check if exc is an instance of MemoryError
or RecursionError (and also propagate KeyboardInterrupt/SystemExit) and if so
re-raise it, otherwise continue with the existing logging/wrapping/return
behavior that creates DynamicToolScriptError or preserves ToolsmithError and
returns err(..., domain_code="dynamic_tool_failed").

In `@src/synthorg/meta/toolsmith/service.py`:
- Around line 216-217: Replace the hardcoded 0.5 literal used for the proposal
confidence with a module-level annotated constant: add a top-of-file constant
like PROPOSAL_CONFIDENCE: float = 0.5 and then change the call site that
currently sets confidence=0.5 (adjacent to
source_rule=NotBlankStr("capability_gap")) to use
confidence=PROPOSAL_CONFIDENCE; keep the constant name exported/visible in the
module so linters/no-magic-number guards accept it.

In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 49-56: _PROBE_VALUES currently contains shared mutable singletons
for "array" and "object" which can be mutated across uses; change the mapping so
it doesn't store [] or {} directly—either store factory callables (e.g.,
"array": list, "object": dict) or immutable placeholders and ensure any code
that constructs probes (the code that reads _PROBE_VALUES to produce probe
instances) calls the factory or creates a fresh copy (e.g., list() / dict() or
copy.deepcopy) each time; update both occurrences of _PROBE_VALUES and any code
paths that consume it to instantiate new containers rather than reusing the same
list/dict.
- Around line 109-120: After parsing JSON in the try block, ensure the parsed
envelope is a mapping before calling envelope.get: check isinstance(envelope,
dict) (or collections.abc.Mapping) and if it's not, log a brief-parse failure
including the actual type (similar to the existing logger.warning usage with
tool_name and error_type) and return False, _BRIEF_FAIL_SCORE; otherwise
continue to compute passed = envelope.get("status") == "ok" and return the
pass/fail score. This change touches the envelope variable and the
envelope.get(...) usage in the validation logic.

In
`@src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql`:
- Line 19: The JSONB columns parameters_schema and validation must be
constrained to object payloads; update the migration
(20260522000001_dynamic_tools.sql) to add CHECK constraints on both columns
(e.g., ensure jsonb_typeof(parameters_schema) = 'object' and
jsonb_typeof(validation) = 'object') so scalar/array values are rejected, and
apply the same CHECK constraints in the canonical schema file
(src/synthorg/persistence/postgres/schema.sql) to keep drift checks passing.
- Around line 25-30: Add a table-level CHECK constraint on the dynamic_tools
table (name it e.g., dynamic_tools_lifecycle_check) that enforces lifecycle
coherence between state and timestamps: for 'pending' none of
validated_at/activated_at/retired_at exist, for 'validated' validated_at IS NOT
NULL and activated_at/retired_at IS NULL, for 'active' validated_at AND
activated_at are NOT NULL and retired_at IS NULL, and for 'retired'
validated_at, activated_at, and retired_at are all NOT NULL; add this CHECK to
the CREATE TABLE statement in the revision SQL
(20260522000001_dynamic_tools.sql) and mirror the same constraint in the
canonical schema file (schema.sql).

In `@src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql`:
- Line 20: Add write-time JSON validation for the TEXT columns by adding JSON
constraints to the CREATE/ALTER statements: ensure the parameters_schema and
validation columns include CHECK(json_valid(parameters_schema)) and
CHECK(json_valid(validation)) (and optionally CHECK(json_type(parameters_schema)
= 'object') / json_type(validation) = 'object' if you expect JSON objects) so
invalid or non-JSON text cannot persist; apply the same constraint edits in the
migration 20260522000001_dynamic_tools.sql and mirror them in schema.sql for the
same table/columns (parameters_schema, validation).
- Around line 27-39: Add a CHECK constraint to the table that enforces valid
state/timestamp combinations: for column state and timestamp columns
validated_at, activated_at, retired_at ensure pending => all three are NULL,
validated => validated_at IS NOT NULL and activated_at and retired_at are NULL,
active => validated_at IS NOT NULL and activated_at IS NOT NULL and retired_at
IS NULL, retired => retired_at IS NOT NULL; add this CHECK alongside the
existing state and timestamp checks in the migration (the same constraint should
be added to src/synthorg/persistence/sqlite/schema.sql as well), referencing the
exact column names state, validated_at, activated_at, retired_at so the DB
prevents contradictory lifecycle rows.

---

Outside diff comments:
In `@tests/unit/meta/toolsmith/test_dynamic_registry.py`:
- Around line 170-227: Add tests that assert collision precedence when a key
exists in both layers: for LayeredToolRegistry create a static ToolDef (e.g.,
via blueprint_to_mcp_def/_FakeStatic) with the same name that
DynamicToolRegistry registers and assert layered.get("shared_name") returns the
dynamic definition (not the static one); for LayeredHandlerMap register a static
handler under "shared_key" and also ensure DynamicToolRegistry provides a
handler for "shared_key", then assert mapping["shared_key"] is the dynamic
handler, "shared_key" in mapping, and that other existing behaviors (KeyError
for missing, membership, len, iteration) still hold. Ensure tests reference
LayeredToolRegistry.get and LayeredHandlerMap (mapping["..."]) so collisions are
explicitly checked.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: e25f9d4b-c78a-4aa0-950b-3edd7c9ce454

📥 Commits

Reviewing files that changed from the base of the PR and between 8fda188 and 996fd6e.

📒 Files selected for processing (64)

docs/design/self-improvement.md
docs/design/tools.md
scripts/_ghost_wiring_manifest.txt
scripts/schema_drift_baseline.txt
src/synthorg/api/app.py
src/synthorg/api/state.py
src/synthorg/core/enums.py
src/synthorg/meta/config.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/meta/mcp/server.py
src/synthorg/meta/models.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/observability/events/persistence.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/postgres/schema.sql
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/sqlite/schema.sql
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/security/action_types.py
src/synthorg/security/autonomy/models.py
src/synthorg/security/risk_scorer.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/timeout/risk_tier_classifier.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/integration/toolsmith/__init__.py
tests/integration/toolsmith/test_self_extension_e2e.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/__init__.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/meta/toolsmith/test_strategy.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/unit/observability/test_events.py
tests/unit/persistence/test_factory.py
tests/unit/security/test_action_types.py
web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: Test E2E
GitHub Check: Dashboard Test
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Integration
GitHub Check: Test Unit
GitHub Check: Lighthouse Site
GitHub Check: Lighthouse Dashboard
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Build Preview
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (12)

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline).
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Dual-backend conformance: tests under tests/conformance/persistence/ consume backend fixture (SQLite + Postgres). Enforced by check_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

tests/integration/toolsmith/__init__.py
tests/unit/security/test_action_types.py
tests/unit/meta/toolsmith/__init__.py
tests/unit/observability/test_events.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/persistence/test_factory.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/integration/toolsmith/test_self_extension_e2e.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/integration/toolsmith/__init__.py
tests/unit/security/test_action_types.py
tests/unit/meta/toolsmith/__init__.py
tests/unit/observability/test_events.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/persistence/test_factory.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/integration/toolsmith/test_self_extension_e2e.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_strategy.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories in persistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay @runtime_checkable.
Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes: <Domain><Condition>Error pattern inheriting from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on every frozen model project-wide. Gate: check_frozen_model_extra_forbid.py. @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries. Use @computed_field for derived data.
Use NotBlankStr for identifiers in Pydantic models.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or `c...

Files:

src/synthorg/security/rules/risk_classifier.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/core/enums.py
src/synthorg/security/autonomy/models.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/security/risk_scorer.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/security/action_types.py
src/synthorg/meta/models.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/observability/events/persistence.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/mcp/registry.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/api/state.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/mcp/server.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/config.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/strategy.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/security/rules/risk_classifier.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/core/enums.py
src/synthorg/security/autonomy/models.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/security/risk_scorer.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/security/action_types.py
src/synthorg/meta/models.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/observability/events/persistence.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/mcp/registry.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/api/state.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/mcp/server.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/config.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/strategy.py

{README.md,docs/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numeric claims in README and public docs sourced from data/runtime_stats.yaml via  markers. See data/README.md.

Files:

docs/design/self-improvement.md
docs/design/tools.md

docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; use mermaid for flowcharts / sequence / pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI.

Files:

docs/design/self-improvement.md
docs/design/tools.md

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/api/types/openapi.gen.ts
web/src/api/types/enum-values.gen.ts

web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

web/src/api/types/openapi.gen.ts
web/src/api/types/enum-values.gen.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/api/types/openapi.gen.ts
web/src/api/types/enum-values.gen.ts

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

web/src/**/*.{ts,tsx}: Reuse web/src/components/ui/ and design tokens only in Web Dashboard. See web/CLAUDE.md for detail.
Web: see web/CLAUDE.md for framework/React 19 conventions.

Files:

web/src/api/types/openapi.gen.ts
web/src/api/types/enum-values.gen.ts

src/synthorg/meta/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/meta/mcp/**/*.py: MCP: 200+ tools across 15 domain modules under meta/mcp/domains/. Define ToolHandler + args_model; call require_admin_guardrails() on admin tools; route through service layers. See mcp-handler-contract.md.
Conversational propose: POST /meta/chat/propose is opt-in (meta.chief_of_staff.propose_enabled); ChiefOfStaffProposer built by build_chief_of_staff_proposer (ENFORCED manifest entry, EPIC #1955) and 503s when ANY of provider, persistence, or work pipeline missing. Approval routing: ApprovalSource.CONVERSATIONAL_INTAKE through Flow 0 of signal_resume_intent before parked-context / review-gate flows. Wrap human content via wrap_untrusted(TAG_TASK_DATA, ...) (SEC-1).

Files:

src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/meta/mcp/server.py

src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD methods: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle has two phases: construction (synchronous services in create_app body) and on_startup (services needing persistence in _build_lifecycle.on_startup). Construction ordering: agent_registry before auto_wire_meetings; tunnel_provider always wired. On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial after persistence via best-effort idempotent pattern.

Files:

src/synthorg/api/state.py
src/synthorg/api/app.py

🧠 Learnings (5)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

tests/integration/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/security/timeout/risk_tier_classifier.py
tests/unit/security/test_action_types.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/core/enums.py
src/synthorg/security/autonomy/models.py
tests/unit/meta/toolsmith/__init__.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/security/risk_scorer.py
tests/unit/observability/test_events.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/security/action_types.py
src/synthorg/meta/models.py
tests/unit/meta/toolsmith/test_script_handler.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/observability/events/persistence.py
tests/unit/meta/toolsmith/test_applier.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/mcp/registry.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/persistence/test_factory.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/test_gap_store.py
src/synthorg/meta/toolsmith/gap_store.py
tests/unit/meta/toolsmith/test_config.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/__init__.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/api/state.py
tests/unit/meta/toolsmith/test_validation_gate.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/mcp/server.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/script_handler.py
tests/integration/toolsmith/test_self_extension_e2e.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/config.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/strategy.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_strategy.py

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

🪛 OpenGrep (1.21.0)

src/synthorg/persistence/postgres/tool_blueprint_repo.py

[ERROR] 332-335: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🔇 Additional comments (43)

tests/conformance/persistence/test_dynamic_tool_repository.py (1)

1-318: LGTM!

tests/integration/toolsmith/__init__.py (1)

1-1: LGTM!

tests/integration/toolsmith/test_self_extension_e2e.py (1)

1-283: LGTM!

tests/unit/core/test_enums.py (1)

120-127: LGTM!

tests/unit/meta/toolsmith/__init__.py (1)

1-1: LGTM!

tests/unit/meta/toolsmith/test_applier.py (1)

1-269: LGTM!

tests/unit/meta/toolsmith/test_config.py (1)

1-57: LGTM!

tests/unit/meta/toolsmith/test_gap_store.py (1)

1-102: LGTM!

tests/unit/meta/toolsmith/test_invoker_layered.py (1)

1-104: LGTM!

tests/unit/meta/toolsmith/test_models.py (1)

1-208: LGTM!

tests/unit/meta/toolsmith/test_overflow.py (1)

1-132: LGTM!

tests/unit/meta/toolsmith/test_script_handler.py (1)

1-116: LGTM!

tests/unit/meta/toolsmith/test_strategy.py (1)

1-148: LGTM!

tests/unit/meta/toolsmith/test_validation_gate.py (1)

1-173: LGTM!

tests/unit/observability/test_events.py (1)

348-350: LGTM!

tests/unit/persistence/test_factory.py (1)

117-117: LGTM!

Also applies to: 121-127, 131-131, 136-138

tests/unit/security/test_action_types.py (1)

14-14: LGTM!

Also applies to: 29-29

src/synthorg/persistence/postgres/tool_blueprint_repo.py (2)

327-346: Static analysis false positive - parameterized query is safe.

The OpenGrep warning about SQL injection on lines 332-335 is a false positive. The where clause is constructed from validated ToolBlueprintFilterSpec fields (enum values and NotBlankStr), and all dynamic values are passed through the params list using %s placeholders.

1-422: LGTM!

src/synthorg/persistence/tool_blueprint_protocol.py (1)

1-153: LGTM!

src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)

1-433: LGTM!

src/synthorg/meta/models.py (1)

24-24: LGTM!

Also applies to: 36-36, 338-338, 376-422, 433-433

src/synthorg/meta/signals/service.py (1)

313-314: LGTM!

src/synthorg/meta/guards/approval_gate.py (1)

41-41: LGTM!

src/synthorg/core/enums.py (1)

580-580: LGTM!

src/synthorg/meta/guards/scope_check.py (1)

94-95: Provide the original <review_comment> text to rewrite. No review comment (or diff snippet) was included in this thread, so I can’t rewrite or verify it—paste the full <review_comment> content you want updated.

src/synthorg/security/action_types.py (1)

35-35: LGTM!

src/synthorg/security/autonomy/models.py (1)

91-92: LGTM!

Also applies to: 118-119

src/synthorg/security/risk_scorer.py (1)

206-208: LGTM!

src/synthorg/security/rules/risk_classifier.py (1)

26-26: LGTM!

src/synthorg/security/timeout/risk_tier_classifier.py (1)

47-47: LGTM!

src/synthorg/meta/strategies/code_modification.py (1)

43-43: LGTM!

src/synthorg/observability/events/persistence.py (1)

697-724: LGTM!

src/synthorg/observability/events/toolsmith.py (1)

1-52: LGTM!

docs/design/self-improvement.md (1)

60-73: LGTM!

Also applies to: 165-165, 249-249

docs/design/tools.md (1)

433-478: LGTM!

Also applies to: 524-527, 533-533, 546-546

scripts/_ghost_wiring_manifest.txt (1)

77-85: LGTM!

scripts/schema_drift_baseline.txt (1)

20-20: LGTM!

Also applies to: 67-72

web/src/api/types/enum-values.gen.ts (1)

523-530: LGTM!

web/src/api/types/openapi.gen.ts (1)

10539-10539: LGTM!

src/synthorg/meta/toolsmith/overflow.py (1)

45-72: LGTM!

Also applies to: 75-117

src/synthorg/meta/toolsmith/factory.py (1)

48-59: LGTM!

Also applies to: 61-147, 149-186

src/synthorg/api/state.py (1)

177-177: LGTM!

Also applies to: 335-335, 627-627, 1265-1273

codecov · 2026-05-21T18:36:05Z

Codecov Report

❌ Patch coverage is 79.67549% with 238 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.97%. Comparing base (48c897b) to head (7eaa0ed).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...synthorg/persistence/sqlite/tool_blueprint_repo.py	69.73%	42 Missing and 4 partials ⚠️
src/synthorg/api/app.py	18.36%	38 Missing and 2 partials ⚠️
...nthorg/persistence/postgres/tool_blueprint_repo.py	71.63%	36 Missing and 4 partials ⚠️
src/synthorg/meta/toolsmith/validation_gate.py	76.28%	16 Missing and 7 partials ⚠️
src/synthorg/meta/toolsmith/dynamic_registry.py	82.67%	16 Missing and 6 partials ⚠️
src/synthorg/meta/toolsmith/applier.py	84.78%	8 Missing and 6 partials ⚠️
src/synthorg/meta/toolsmith/strategy.py	85.54%	10 Missing and 2 partials ⚠️
src/synthorg/meta/toolsmith/models.py	91.34%	6 Missing and 3 partials ⚠️
src/synthorg/meta/toolsmith/factory.py	69.56%	6 Missing and 1 partial ⚠️
src/synthorg/meta/mcp/server.py	44.44%	4 Missing and 1 partial ⚠️
... and 9 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2035      +/-   ##
==========================================
- Coverage   84.99%   84.97%   -0.03%     
==========================================
  Files        2107     2125      +18     
  Lines      123632   124801    +1169     
  Branches    10335    10433      +98     
==========================================
+ Hits       105084   106052     +968     
- Misses      15960    16128     +168     
- Partials     2588     2621      +33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/synthorg/meta/toolsmith/applier.py (1)

161-167: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize lifecycle fields before persisting the candidate blueprint.

At Line 161, only state is overwritten. At Line 166 and Lines 175-182, records are rebuilt from the original inbound blueprint, so caller-supplied lifecycle fields (validated_at/activated_at/retired_at/validation) can leak into durable state. This weakens the trust boundary and can persist misleading lifecycle metadata even on gate rejection.

Proposed fix

-        pending = blueprint.model_copy(update={"state": ToolBlueprintState.PENDING})
-        await self._repo.save(pending)
-
-        result = await self._gate.validate(blueprint)
+        pending = blueprint.model_copy(
+            update={
+                "state": ToolBlueprintState.PENDING,
+                "validated_at": None,
+                "activated_at": None,
+                "retired_at": None,
+                "validation": None,
+            }
+        )
+        await self._repo.save(pending)
+
+        result = await self._gate.validate(pending)
         if not result.passed:
-            await self._repo.save(blueprint.model_copy(update={"validation": result}))
+            await self._repo.save(pending.model_copy(update={"validation": result}))
             logger.info(
                 TOOLSMITH_APPLY_REJECTED,
                 tool_name=blueprint.name,
                 detail=result.detail,
             )
             return False
@@
-        active = blueprint.model_copy(
+        active = pending.model_copy(
             update={
                 "state": ToolBlueprintState.ACTIVE,
                 "validated_at": now,
                 "activated_at": now,
                 "validation": result,
             }
         )

Also applies to: 175-182

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/meta/toolsmith/applier.py` around lines 161 - 167, The persisted
candidate blueprints are rebuilt from the inbound blueprint without normalizing
lifecycle fields, allowing caller-supplied
validated_at/activated_at/retired_at/validation to be stored; before calling
self._repo.save (e.g., where pending = blueprint.model_copy(update={"state":
ToolBlueprintState.PENDING}) and later saves after self._gate.validate in the
same function), explicitly clear or set lifecycle fields (validated_at,
activated_at, retired_at, validation) to safe defaults (e.g., None or an empty
validation object) on any model_copy used for persistence so that gate rejection
and subsequent save paths cannot persist caller-provided lifecycle metadata;
apply the same normalization to the other rebuild/save path that reconstructs
the record after validation (the save calls following self._gate.validate and
the later save block).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql`:
- Around line 35-54: The CHECK constraint for the table permits non-pending
states with NULL validation; update the constraint so that for each non-pending
branch (state = 'validated', 'active', 'retired') you also require validation IS
NOT NULL in addition to the existing timestamp requirements (validated_at,
activated_at, retired_at), and apply the same change to the corresponding CHECK
in src/synthorg/persistence/postgres/schema.sql so the invariant (validation
required for validated and later states) is enforced in both the migration and
schema definition; locate the constraint referencing columns validation, state,
validated_at, activated_at, retired_at and add validation IS NOT NULL to the
validated/active/retired OR-clauses.

In `@src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql`:
- Around line 47-69: The CHECK constraint currently allows
state='validated'|'active'|'retired' with validation IS NULL; update the
constraint so that each non-pending branch requires validation IS NOT NULL (in
addition to the existing timestamp requirements) — i.e., for the 'validated',
'active', and 'retired' clauses add "AND validation IS NOT NULL". Apply the same
change to the equivalent schema/check in the other SQLite schema file so both
migrations/schemas enforce persisted validation evidence.

---

Outside diff comments:
In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 161-167: The persisted candidate blueprints are rebuilt from the
inbound blueprint without normalizing lifecycle fields, allowing caller-supplied
validated_at/activated_at/retired_at/validation to be stored; before calling
self._repo.save (e.g., where pending = blueprint.model_copy(update={"state":
ToolBlueprintState.PENDING}) and later saves after self._gate.validate in the
same function), explicitly clear or set lifecycle fields (validated_at,
activated_at, retired_at, validation) to safe defaults (e.g., None or an empty
validation object) on any model_copy used for persistence so that gate rejection
and subsequent save paths cannot persist caller-provided lifecycle metadata;
apply the same normalization to the other rebuild/save path that reconstructs
the record after validation (the save calls following self._gate.validate and
the later save block).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: bf294c51-2204-432c-9b6c-dfd493267352

📥 Commits

Reviewing files that changed from the base of the PR and between 996fd6e and e073a38.

📒 Files selected for processing (17)

src/synthorg/api/app.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/postgres/schema.sql
src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/sqlite/schema.sql
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_models.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: Dashboard Test
GitHub Check: Test E2E
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Unit
GitHub Check: Test Integration
GitHub Check: Lighthouse Site
GitHub Check: Lighthouse Dashboard
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Build Preview
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (4)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories in persistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay @runtime_checkable.
Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes: <Domain><Condition>Error pattern inheriting from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on every frozen model project-wide. Gate: check_frozen_model_extra_forbid.py. @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries. Use @computed_field for derived data.
Use NotBlankStr for identifiers in Pydantic models.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or `c...

Files:

src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/script_handler.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/script_handler.py

src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle has two phases: construction (synchronous services in create_app body) and on_startup (services needing persistence in _build_lifecycle.on_startup). Construction ordering: agent_registry before auto_wire_meetings; tunnel_provider always wired. On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial after persistence via best-effort idempotent pattern.

Files:

src/synthorg/api/app.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline).
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Dual-backend conformance: tests under tests/conformance/persistence/ consume backend fixture (SQLite + Postgres). Enforced by check_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_models.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_applier.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_models.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_applier.py

🧠 Learnings (1)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/api/app.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/factory.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/models.py
tests/unit/meta/toolsmith/test_models.py
tests/conformance/persistence/test_dynamic_tool_repository.py
src/synthorg/meta/toolsmith/script_handler.py
tests/unit/meta/toolsmith/test_applier.py

🔇 Additional comments (12)

src/synthorg/meta/strategies/code_modification.py (1)

179-189: LGTM!

src/synthorg/meta/toolsmith/validation_gate.py (1)

17-17: LGTM!

Also applies to: 73-75, 122-131

src/synthorg/meta/toolsmith/dynamic_registry.py (1)

119-136: LGTM!

Also applies to: 233-243

src/synthorg/api/app.py (1)

320-327: LGTM!

Also applies to: 335-360, 1521-1559

src/synthorg/meta/toolsmith/factory.py (1)

144-145: LGTM!

tests/unit/meta/toolsmith/test_dynamic_registry.py (1)

229-280: LGTM!

src/synthorg/meta/toolsmith/service.py (1)

63-68: LGTM!

Also applies to: 100-101, 110-111, 127-134, 143-147, 163-170, 248-249

src/synthorg/meta/toolsmith/models.py (1)

195-202: LGTM!

Also applies to: 212-219, 243-255, 261-266

tests/unit/meta/toolsmith/test_models.py (1)

209-244: LGTM!

tests/conformance/persistence/test_dynamic_tool_repository.py (1)

77-84: LGTM!

src/synthorg/meta/toolsmith/script_handler.py (1)

72-73: LGTM!

tests/unit/meta/toolsmith/test_applier.py (1)

251-254: LGTM!

Also applies to: 260-290

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/conformance/persistence/test_dynamic_tool_repository.py (1)
253-285: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add the failing transition parity test too.

This covers the happy path, but it never asserts that an existing PENDING -> VALIDATED transition without validation fails on both backends. That is the new lifecycle invariant in this PR, so the conformance suite should lock it down explicitly.
Suggested test
+    async def test_transition_if_requires_validation_for_validated_state(
+        self, backend: PersistenceBackend
+    ) -> None:
+        from synthorg.core.persistence_errors import QueryError
+
+        repo = _repo(backend)
+        bp = _blueprint(blueprint_id="bp-no-val", name="synthorg_textkit_no_val")
+        await repo.save(bp)
+
+        with pytest.raises(QueryError):
+            await repo.transition_if(
+                bp.id,
+                from_state=ToolBlueprintState.PENDING,
+                to_state=ToolBlueprintState.VALIDATED,
+                validated_at=_NOW,
+            )
+```
</details>



As per coding guidelines, "Dual-backend conformance: tests under `tests/conformance/persistence/` consume `backend` fixture (SQLite + Postgres)."


Also applies to: 305-315

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @tests/conformance/persistence/test_dynamic_tool_repository.py around lines
253 - 285, Add a parity test to assert the new lifecycle invariant that a
PENDING->VALIDATED transition without a ToolValidationResult must fail: after
saving bp via _repo and _blueprint and performing the happy-path transition
(using repo.transition_if with validated_at and validation), call
repo.transition_if again on the same bp.id attempting to move from
ToolBlueprintState.PENDING to ToolBlueprintState.VALIDATED but omit the
validation (validation=None or not provided) and assert it returns False, that
fetched = await repo.get(bp.id) remains non-null, state is unchanged, and
validated_at/validation were not modified; apply the same assertion pattern for
the related test around the other block mentioned (lines 305-315) so both
backends enforce this invariant.
</details>

</blockquote></details>

</blockquote></details>
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/persistence/tool_blueprint_protocol.py`:
- Around line 35-40: The docstring for DynamicToolRepository.transition_if is
incorrect about validation=None; it currently says None leaves the column
untouched but implementations treat None as setting the validation column to
NULL. Update the protocol text to state that the ``validation`` kwarg accepts a
ToolValidationResult or None and that passing None will explicitly SET
validation = NULL (clearing existing evidence), and document the required
alternative (e.g., an explicit sentinel or omission) callers must use to leave
the column unchanged; reference DynamicToolRepository.transition_if and
ToolValidationResult so readers can locate the behavior.

---

Outside diff comments:
In `@tests/conformance/persistence/test_dynamic_tool_repository.py`:
- Around line 253-285: Add a parity test to assert the new lifecycle invariant
that a PENDING->VALIDATED transition without a ToolValidationResult must fail:
after saving bp via _repo and _blueprint and performing the happy-path
transition (using repo.transition_if with validated_at and validation), call
repo.transition_if again on the same bp.id attempting to move from
ToolBlueprintState.PENDING to ToolBlueprintState.VALIDATED but omit the
validation (validation=None or not provided) and assert it returns False, that
fetched = await repo.get(bp.id) remains non-null, state is unchanged, and
validated_at/validation were not modified; apply the same assertion pattern for
the related test around the other block mentioned (lines 305-315) so both
backends enforce this invariant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)

Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 44d5c745-efd9-4a3c-9d9f-4d0ff664ab3f

📥 Commits

Reviewing files that changed from the base of the PR and between e72587b and ce548c7.

📒 Files selected for processing (4)

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/sqlite/tool_blueprint_repo.py

src/synthorg/persistence/tool_blueprint_protocol.py

tests/conformance/persistence/test_dynamic_tool_repository.py
📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: Build Backend

GitHub Check: Lighthouse Site

GitHub Check: Lighthouse Dashboard

GitHub Check: Build Web Assets (melange)

GitHub Check: CodSpeed Web benchmarks

GitHub Check: CodSpeed Python benchmarks

GitHub Check: Test Conformance (SQLite)

GitHub Check: Test E2E

GitHub Check: Test Unit

GitHub Check: Dashboard Test

GitHub Check: Test Integration

GitHub Check: Analyze (python)

GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used

📓 Path-based instructions (4)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories in persistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay @runtime_checkable.
Configuration Precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes: <Domain><Condition>Error pattern inheriting from DomainError; never inherit Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on every frozen model project-wide. Gate: check_frozen_model_extra_forbid.py. @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries. Use @computed_field for derived data.
Use NotBlankStr for identifiers in Pydantic models.
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or `c...

Files:

src/synthorg/persistence/tool_blueprint_protocol.py

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/sqlite/tool_blueprint_repo.py

src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD methods: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

src/synthorg/persistence/tool_blueprint_protocol.py

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/sqlite/tool_blueprint_repo.py

src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/persistence/tool_blueprint_protocol.py

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/sqlite/tool_blueprint_repo.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline).
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Dual-backend conformance: tests under tests/conformance/persistence/ consume backend fixture (SQLite + Postgres). Enforced by check_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large).

Files:

tests/conformance/persistence/test_dynamic_tool_repository.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/conformance/persistence/test_dynamic_tool_repository.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:

src/synthorg/persistence/tool_blueprint_protocol.py

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/sqlite/tool_blueprint_repo.py

tests/conformance/persistence/test_dynamic_tool_repository.py
🔇 Additional comments (2)

src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)

55-62: LGTM!

Also applies to: 363-385, 435-450

src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)

49-56: LGTM!

Also applies to: 359-380, 444-459

…mic registry, authoring, validation, governance

… layered MCP dispatch, sim e2e

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/synthorg/meta/config.py (1)

243-277: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the docstring to include the new fields.

The SelfImprovementConfig docstring lists all configuration attributes, but the newly added tool_creation_enabled (line 288) and toolsmith (line 310) fields are missing from the Attributes section.

📝 Proposed fix

     Attributes:
         enabled: Master switch for the self-improvement system.
         chief_of_staff_enabled: Whether to enable the Chief of Staff
             agent persona.
         config_tuning_enabled: Enable config tuning proposals.
         architecture_proposals_enabled: Enable architecture proposals.
         prompt_tuning_enabled: Enable prompt tuning proposals.
         code_modification_enabled: Enable code modification proposals.
+        tool_creation_enabled: Enable self-extending toolkit proposals.
         schedule: Cycle scheduling configuration.
         rollout: Rollout behavior configuration.
         regression: Regression detection thresholds.
         guards: Guard chain configuration.
         rules: Rule engine configuration.
         prompt_tuning: Prompt tuning strategy configuration.
         code_modification: Code modification strategy configuration.
         chief_of_staff: Chief of Staff advanced capabilities
             (learning, alerts, chat).
         cross_deployment_analytics: Cross-deployment analytics
             telemetry (opt-in, disabled by default).
+        toolsmith: Self-extending toolkit configuration
+            (gap thresholds, sandbox policy, validation).
         analysis_model: LLM model identifier for proposal analysis.
         analysis_temperature: Sampling temperature for analysis.
         analysis_max_tokens: Token budget for analysis responses.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/meta/config.py` around lines 243 - 277, The class
SelfImprovementConfig docstring is missing the new attributes; update the
Attributes section of the SelfImprovementConfig docstring to add entries for
tool_creation_enabled (bool master switch to enable proposal-driven tool
creation) and toolsmith (configuration object for toolsmith behavior and
limits), making sure their brief descriptions follow the same style as the other
attribute lines and appear alongside the other config fields like chief_of_staff
and cross_deployment_analytics.

docs/design/tools.md (1)

524-551: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Taxonomy count and list are inconsistent with the ActionType enum.

The documentation states "~39 built-in action types" (line 524) and "Taxonomy (39 leaf types)" (line 534), but the ActionType enum in src/synthorg/core/enums.py now includes 41 members:

The 39 types listed here
KNOWLEDGE_INGEST = "knowledge:ingest" (line 628)
KNOWLEDGE_REINDEX = "knowledge:reindex" (line 629)

The taxonomy list is missing the knowledge action types.

📝 Proposed fix

-**Taxonomy (39 leaf types):**
+**Taxonomy (41 leaf types):**

 ```text
 code:read, code:write, code:create, code:delete, code:refactor
 test:write, test:run
 docs:write
 vcs:read, vcs:commit, vcs:push, vcs:branch
 deploy:staging, deploy:production
 comms:internal, comms:external
 budget:spend, budget:exceed
 org:hire, org:fire, org:promote
 db:query, db:mutate, db:admin
 arch:decide
 tool:create
 memory:read
+knowledge:ingest, knowledge:reindex
 browser:navigate, browser:screenshot, browser:diff, browser:accessibility_scan, browser:spec
 external_data:request
 desktop:launch, desktop:click, desktop:type, desktop:key, desktop:screenshot, desktop:scroll


Also update line 524:

```diff
-**Registry:** `StrEnum` for ~39 built-in action types (type safety, autocomplete, typos caught
+**Registry:** `StrEnum` for ~41 built-in action types (type safety, autocomplete, typos caught

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/tools.md` around lines 524 - 551, Update the taxonomy list in the
docs to match the ActionType enum by adding the two missing knowledge action
types: KNOWLEDGE_INGEST ("knowledge:ingest") and KNOWLEDGE_REINDEX
("knowledge:reindex"); edit the taxonomy block where the 39 leaf types are
listed and add "knowledge:ingest, knowledge:reindex" into the comma-separated
list and update the header/count mention (e.g., "~39" / "39 leaf types") to
reflect the new total so the documentation matches the ActionType enum symbols.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 197-208: The rollback currently calls
self._registry.unregister(...) without handling failures, risking divergence
between runtime and durable state; update both rollback paths (the one after
await self._repo.save(active) and the analogous RETIRE path around lines
handling unregister) to catch exceptions from _registry.unregister and
compensate by restoring the persisted state before re-raising: if unregister
raises after a successful repo.save(active), attempt to re-save the ACTIVE row
(await self._repo.save(active)) or, if that fails, log and raise the original
exception; likewise, in the RETIRE flow where you unregister then change DB
state, if unregister fails after the DB change, attempt to re-register the
handler (await self._registry.register(<handler or blueprint>)) or revert the DB
change (e.g., await self._repo.save(previous_state)) and only then re-raise;
wrap calls to _registry.unregister and _repo.save in try/except to perform these
compensating actions and ensure the durable and runtime states remain consistent
when exceptions occur.

In `@src/synthorg/meta/toolsmith/models.py`:
- Line 114: Replace the hardcoded upper bound `100` with a typed module-level
constant and use that constant in the Field declaration: add an annotated
constant (e.g. BRIEF_SCORE_MAX: int = 100) at the top of the module and change
the brief_score Field call to use le=BRIEF_SCORE_MAX; update any imports or type
hints if needed so the constant is recognized by linters and the numeric-literal
policy.
- Around line 256-269: The RETIRED state validation currently allows missing
activation/validation timestamps; update the ToolBlueprint validation logic in
models.py so ToolBlueprintState.RETIRED requires both activated_at and
validated_at to be non-None (and still require validation result), e.g. add
explicit checks for self.state is ToolBlueprintState.RETIRED that raise
ValueError if self.activated_at or self.validated_at is None (or alternatively
include RETIRED in the set used to require validated_at earlier), referencing
the self.state, ToolBlueprintState.RETIRED, activated_at, and validated_at
symbols to locate and modify the validation block.

In `@src/synthorg/meta/toolsmith/strategy.py`:
- Around line 201-254: Replace the manual JSON/untyped checks in _parse (and
remove/stop using _parse_json_object, _require_str, _require_schema) with a
typed boundary: define a small frozen args model (e.g., AuthoredBlueprintArgs)
that declares description, script_body, action_type, parameters_schema, etc.,
then call parse_typed(response, AuthoredBlueprintArgs) inside _parse to get a
validated instance and build the ToolBlueprint from that instance's fields
before returning; ensure you still wrap parsing errors in ToolAuthoringError and
keep the existing fields set (id, name, sandbox_backend, requires_network,
state, created_at) when constructing ToolBlueprint.
- Around line 123-137: The except-block currently re-raises ProviderError which
lets provider failures escape ToolsmithService._handle_gap; change the handling
so ProviderError is treated as a per-gap authoring failure (like
ToolAuthoringError/ToolCapabilityNotAllowedError) by catching ProviderError and
raising a ToolAuthoringError (or wrapping it) instead, while preserving logging
via TOOLSMITH_AUTHOR_FAILED and safe_error_description for capability and error
details so _handle_gap will swallow/skip this gap rather than cancelling the
TaskGroup.

In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 165-181: The golden-stage reporting currently computes and prints
baseline/candidate/margin (baseline, candidate, margin) even when brief_passed
is false and score() was skipped, making skipped runs indistinguishable from
real 0-0 results; update the logic in validation_gate.py around
validation_cfg.require_golden_delta and brief_passed so that when
require_golden_delta is true but brief_passed is false you do NOT call
self._scorecard_provider.score(blueprint) and instead set passed = False and set
detail to explicitly indicate the golden stage was skipped (e.g., "brief=fail;
golden skipped") so baseline/candidate/margin are not reported; keep the
existing ToolValidationConfigError branch (when self._scorecard_provider is
None) unchanged.

In `@src/synthorg/persistence/postgres/tool_blueprint_repo.py`:
- Around line 147-167: The upsert path in _upsert_params is persisting
ToolBlueprint datetime fields (created_at, validated_at, activated_at,
retired_at) as-is which can lead to inconsistent round-trips across backends;
update _upsert_params to normalize these typed datetimes using the
persistence._shared helpers (use normalize_utc for already-typed datetimes, and
parse_iso_utc/format_iso_utc where appropriate) so that created_at,
validated_at, activated_at, and retired_at are converted to normalized UTC ISO
values before being placed into the returned tuple; reference the ToolBlueprint
fields and the helper functions parse_iso_utc, format_iso_utc, and normalize_utc
to implement the normalization consistently with transition_if() and the SQLite
path.

In `@tests/integration/toolsmith/test_self_extension_e2e.py`:
- Around line 25-30: The test spawns a sandbox subprocess and currently passes a
minimal env dict which replaces the entire process environment; change the spawn
call in tests/integration/toolsmith/test_self_extension_e2e.py (the subprocess
invocation around the sandbox start, where an env parameter is passed) to
preserve the base environment by starting from os.environ.copy() and then
updating it with only the required key/value pairs instead of replacing it
wholesale (e.g., use base_env = os.environ.copy(); base_env.update({...}) and
pass base_env as env). This keeps platform-specific vars (Windows PATH, etc.)
intact while still injecting the needed test variables.

---

Outside diff comments:
In `@docs/design/tools.md`:
- Around line 524-551: Update the taxonomy list in the docs to match the
ActionType enum by adding the two missing knowledge action types:
KNOWLEDGE_INGEST ("knowledge:ingest") and KNOWLEDGE_REINDEX
("knowledge:reindex"); edit the taxonomy block where the 39 leaf types are
listed and add "knowledge:ingest, knowledge:reindex" into the comma-separated
list and update the header/count mention (e.g., "~39" / "39 leaf types") to
reflect the new total so the documentation matches the ActionType enum symbols.

In `@src/synthorg/meta/config.py`:
- Around line 243-277: The class SelfImprovementConfig docstring is missing the
new attributes; update the Attributes section of the SelfImprovementConfig
docstring to add entries for tool_creation_enabled (bool master switch to enable
proposal-driven tool creation) and toolsmith (configuration object for toolsmith
behavior and limits), making sure their brief descriptions follow the same style
as the other attribute lines and appear alongside the other config fields like
chief_of_staff and cross_deployment_analytics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68f3c01e-0409-4047-a559-7d0d7cccc9a8

📥 Commits

Reviewing files that changed from the base of the PR and between 6728ed8 and 379c60e.

📒 Files selected for processing (63)

docs/design/self-improvement.md
docs/design/tools.md
scripts/_ghost_wiring_manifest.txt
src/synthorg/api/app.py
src/synthorg/api/state.py
src/synthorg/core/enums.py
src/synthorg/meta/config.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/meta/mcp/server.py
src/synthorg/meta/models.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/service.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/observability/events/persistence.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/postgres/schema.sql
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql
src/synthorg/persistence/sqlite/schema.sql
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/security/action_types.py
src/synthorg/security/autonomy/models.py
src/synthorg/security/risk_scorer.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/timeout/risk_tier_classifier.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/integration/toolsmith/__init__.py
tests/integration/toolsmith/test_self_extension_e2e.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/__init__.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/meta/toolsmith/test_strategy.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/unit/observability/test_events.py
tests/unit/persistence/test_factory.py
tests/unit/security/test_action_types.py
web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: Lighthouse Dashboard
GitHub Check: Lighthouse Site
GitHub Check: Test Unit
GitHub Check: Test Conformance (SQLite)
GitHub Check: Test Integration
GitHub Check: Dashboard Test
GitHub Check: Test E2E
GitHub Check: CodSpeed Web benchmarks
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Build Preview
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (15)

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

tests/unit/meta/toolsmith/__init__.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/observability/test_events.py
tests/unit/security/test_action_types.py
tests/integration/toolsmith/__init__.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/persistence/test_factory.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_strategy.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/integration/toolsmith/test_self_extension_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/meta/toolsmith/__init__.py
tests/unit/core/test_enums.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/observability/test_events.py
tests/unit/security/test_action_types.py
tests/integration/toolsmith/__init__.py
tests/unit/meta/toolsmith/test_gap_store.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/persistence/test_factory.py
tests/unit/meta/toolsmith/test_overflow.py
tests/unit/meta/toolsmith/test_strategy.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_validation_gate.py
tests/unit/meta/toolsmith/test_invoker_layered.py
tests/unit/meta/toolsmith/test_applier.py
tests/unit/meta/toolsmith/test_models.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
tests/integration/toolsmith/test_self_extension_e2e.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

tests/unit/meta/toolsmith/__init__.py
src/synthorg/security/risk_scorer.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/action_types.py
tests/unit/core/test_enums.py
src/synthorg/observability/events/toolsmith.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/observability/test_events.py
src/synthorg/security/autonomy/models.py
src/synthorg/core/enums.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/observability/events/persistence.py
tests/unit/security/test_action_types.py
src/synthorg/api/state.py
tests/integration/toolsmith/__init__.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/models.py
tests/unit/meta/toolsmith/test_gap_store.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/strategies/code_modification.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/persistence/test_factory.py
src/synthorg/meta/config.py
tests/unit/meta/toolsmith/test_overflow.py
src/synthorg/api/app.py
src/synthorg/persistence/tool_blueprint_protocol.py
tests/unit/meta/toolsmith/test_strategy.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/strategy.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_validation_gate.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/mcp/server.py
tests/unit/meta/toolsmith/test_invoker_layered.py
src/synthorg/meta/toolsmith/errors.py
tests/unit/meta/toolsmith/test_applier.py
src/synthorg/meta/toolsmith/script_handler.py
tests/unit/meta/toolsmith/test_models.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/service.py
tests/integration/toolsmith/test_self_extension_e2e.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/security/risk_scorer.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/action_types.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/security/autonomy/models.py
src/synthorg/core/enums.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/observability/events/persistence.py
src/synthorg/api/state.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/models.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/config.py
src/synthorg/api/app.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/mcp/server.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/service.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/security/risk_scorer.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/action_types.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/security/autonomy/models.py
src/synthorg/core/enums.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/observability/events/persistence.py
src/synthorg/api/state.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/models.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/config.py
src/synthorg/api/app.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/mcp/server.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/security/risk_scorer.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/action_types.py
src/synthorg/observability/events/toolsmith.py
src/synthorg/security/autonomy/models.py
src/synthorg/core/enums.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/observability/events/persistence.py
src/synthorg/api/state.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/models.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/strategies/code_modification.py
src/synthorg/meta/config.py
src/synthorg/api/app.py
src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/mcp/server.py
src/synthorg/meta/toolsmith/errors.py
src/synthorg/meta/toolsmith/script_handler.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/service.py

web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/api/types/**/*.gen.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Generated DTO types (MANDATORY): NEVER hand-edit web/src/api/types/*.gen.ts. Regenerate with uv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse web/src/components/ui/ components and design tokens only per web Dashboard Design System in web/CLAUDE.md

Files:

web/src/api/types/enum-values.gen.ts
web/src/api/types/openapi.gen.ts

{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

docs/design/self-improvement.md
docs/design/tools.md

docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

docs/design/self-improvement.md
docs/design/tools.md

src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: Two-phase API startup: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend
Construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider unconditionally
On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial AFTER persistence; knowledge substrate via _wire_knowledge_engine AFTER persistence, gated on has_persistence AND has_memory_backend
Pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value

Files:

src/synthorg/api/state.py
src/synthorg/api/app.py

src/synthorg/**/*approval*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Approval decisions for ApprovalSource.CONVERSATIONAL_INTAKE route through Flow 0 of signal_resume_intent before parked-context / review-gate flows

Files:

src/synthorg/meta/guards/approval_gate.py

{src/synthorg/persistence/**/*.py,src/synthorg/**/*repository*.py,src/synthorg/**/*_repo*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py

Files:

src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py

src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed

Files:

src/synthorg/persistence/tool_blueprint_protocol.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py

tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py

Files:

tests/conformance/persistence/test_dynamic_tool_repository.py

🧠 Learnings (5)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

tests/unit/meta/toolsmith/__init__.py
src/synthorg/security/risk_scorer.py
src/synthorg/meta/signals/service.py
src/synthorg/meta/toolsmith/__init__.py
src/synthorg/security/rules/risk_classifier.py
src/synthorg/security/action_types.py
tests/unit/core/test_enums.py
src/synthorg/observability/events/toolsmith.py
tests/unit/meta/toolsmith/test_script_handler.py
tests/unit/observability/test_events.py
src/synthorg/security/autonomy/models.py
src/synthorg/core/enums.py
src/synthorg/meta/guards/scope_check.py
src/synthorg/meta/toolsmith/config.py
src/synthorg/observability/events/persistence.py
tests/unit/security/test_action_types.py
src/synthorg/api/state.py
tests/integration/toolsmith/__init__.py
src/synthorg/meta/mcp/invoker.py
src/synthorg/meta/mcp/registry.py
src/synthorg/security/timeout/risk_tier_classifier.py
src/synthorg/meta/guards/approval_gate.py
src/synthorg/meta/models.py
tests/unit/meta/toolsmith/test_gap_store.py
src/synthorg/meta/toolsmith/overflow.py
src/synthorg/meta/strategies/code_modification.py
tests/unit/meta/toolsmith/test_config.py
tests/unit/persistence/test_factory.py
src/synthorg/meta/config.py
tests/unit/meta/toolsmith/test_overflow.py
src/synthorg/api/app.py
src/synthorg/persistence/tool_blueprint_protocol.py
tests/unit/meta/toolsmith/test_strategy.py
src/synthorg/meta/toolsmith/applier.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/factory.py
src/synthorg/meta/toolsmith/strategy.py
tests/conformance/persistence/test_dynamic_tool_repository.py
tests/unit/meta/toolsmith/test_validation_gate.py
src/synthorg/meta/toolsmith/gap_store.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/protocol.py
src/synthorg/meta/mcp/server.py
tests/unit/meta/toolsmith/test_invoker_layered.py
src/synthorg/meta/toolsmith/errors.py
tests/unit/meta/toolsmith/test_applier.py
src/synthorg/meta/toolsmith/script_handler.py
tests/unit/meta/toolsmith/test_models.py
src/synthorg/meta/toolsmith/dynamic_registry.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
tests/unit/meta/toolsmith/test_dynamic_registry.py
src/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/meta/toolsmith/service.py
tests/integration/toolsmith/test_self_extension_e2e.py

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/design/self-improvement.md
docs/design/tools.md

🪛 OpenGrep (1.21.0)

src/synthorg/persistence/postgres/tool_blueprint_repo.py

[ERROR] 337-340: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.

(coderabbit.sql-injection.python-fstring-execute)

🔇 Additional comments (57)

tests/conformance/persistence/test_dynamic_tool_repository.py (1)

1-381: LGTM!

tests/integration/toolsmith/__init__.py (1)

1-1: LGTM!

tests/integration/toolsmith/test_self_extension_e2e.py (1)

1-24: LGTM!

Also applies to: 31-90, 100-283

tests/unit/core/test_enums.py (1)

92-97: LGTM!

Also applies to: 121-129

tests/unit/meta/toolsmith/__init__.py (1)

1-1: LGTM!

tests/unit/meta/toolsmith/test_applier.py (1)

1-346: LGTM!

tests/unit/meta/toolsmith/test_config.py (1)

1-57: LGTM!

tests/unit/meta/toolsmith/test_dynamic_registry.py (1)

1-281: LGTM!

tests/unit/meta/toolsmith/test_gap_store.py (1)

1-102: LGTM!

tests/unit/meta/toolsmith/test_invoker_layered.py (1)

1-104: LGTM!

tests/unit/meta/toolsmith/test_models.py (1)

1-245: LGTM!

tests/unit/meta/toolsmith/test_overflow.py (1)

1-132: LGTM!

tests/unit/meta/toolsmith/test_script_handler.py (1)

1-116: LGTM!

tests/unit/meta/toolsmith/test_strategy.py (1)

1-148: LGTM!

tests/unit/meta/toolsmith/test_validation_gate.py (1)

1-173: LGTM!

tests/unit/observability/test_events.py (1)

348-352: LGTM!

tests/unit/persistence/test_factory.py (1)

117-117: LGTM!

Also applies to: 121-127, 131-131, 136-138

tests/unit/security/test_action_types.py (1)

14-14: LGTM!

Also applies to: 29-29, 31-31

docs/design/self-improvement.md (1)

60-73: LGTM!

Also applies to: 165-165, 249-249

docs/design/tools.md (1)

433-478: LGTM!

scripts/_ghost_wiring_manifest.txt (1)

77-93: LGTM!

src/synthorg/core/enums.py (1)

128-143: LGTM!

Also applies to: 161-198, 626-629

src/synthorg/meta/guards/approval_gate.py (1)

36-42: LGTM!

src/synthorg/meta/guards/scope_check.py (1)

94-95: LGTM!

src/synthorg/security/action_types.py (1)

35-37: LGTM!

src/synthorg/security/autonomy/models.py (1)

91-92: LGTM!

Also applies to: 118-119

src/synthorg/security/risk_scorer.py (1)

206-208: LGTM!

Also applies to: 215-217, 231-233

src/synthorg/security/rules/risk_classifier.py (1)

26-27: LGTM!

Also applies to: 33-34, 47-48

src/synthorg/security/timeout/risk_tier_classifier.py (1)

47-48: LGTM!

Also applies to: 54-55, 68-69

src/synthorg/observability/events/persistence.py (1)

351-419: LGTM!

Also applies to: 767-794

src/synthorg/observability/events/toolsmith.py (1)

1-52: LGTM!

src/synthorg/meta/toolsmith/overflow.py (1)

45-73: LGTM!

Also applies to: 85-115, 117-117

src/synthorg/meta/toolsmith/protocol.py (1)

1-187: LGTM!

src/synthorg/meta/toolsmith/errors.py (1)

1-55: LGTM!

src/synthorg/meta/toolsmith/gap_store.py (1)

1-152: LGTM!

src/synthorg/meta/toolsmith/script_handler.py (1)

1-133: LGTM!

src/synthorg/meta/toolsmith/dynamic_registry.py (1)

1-330: LGTM!

src/synthorg/meta/toolsmith/service.py (1)

1-254: LGTM!

src/synthorg/meta/toolsmith/factory.py (1)

48-147: LGTM!

Also applies to: 150-187

src/synthorg/meta/models.py (1)

24-25: LGTM!

Also applies to: 36-36, 338-338, 373-434

src/synthorg/meta/toolsmith/__init__.py (1)

1-26: LGTM!

src/synthorg/persistence/tool_blueprint_protocol.py (1)

1-170: LGTM!

src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql (1)

1-76: LGTM!

src/synthorg/persistence/sqlite/schema.sql (1)

485-543: LGTM!

Also applies to: 1680-1743

src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)

1-463: LGTM!

src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql (1)

1-61: LGTM!

src/synthorg/persistence/postgres/schema.sql (1)

499-556: LGTM!

Also applies to: 1637-1688

src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)

191-191: ⚡ Quick win

Provide the missing input needed to rewrite the comment: The original <review_comment> text (and any verification outputs to base the rewrite on) isn’t included, so I can’t produce the rewritten review comment. Paste the <review_comment>...</review_comment> contents and the results you want me to use.

src/synthorg/api/app.py (1)

153-159: LGTM!

Also applies to: 300-374, 1358-1407, 1546-1612

src/synthorg/api/state.py (1)

177-180: LGTM!

Also applies to: 267-269, 339-340, 519-520, 633-633, 1105-1133, 1301-1308

src/synthorg/meta/mcp/registry.py (1)

11-12: LGTM!

Also applies to: 26-42

src/synthorg/meta/mcp/invoker.py (1)

30-31: LGTM!

Also applies to: 57-68

src/synthorg/meta/mcp/server.py (1)

18-19: LGTM!

Also applies to: 40-45, 79-91, 94-105, 132-136

src/synthorg/meta/signals/service.py (1)

313-314: LGTM!

src/synthorg/meta/strategies/code_modification.py (1)

43-44: LGTM!

Also applies to: 179-190

web/src/api/types/enum-values.gen.ts (1)

443-444: LGTM!

Also applies to: 529-530, 679-695

web/src/api/types/openapi.gen.ts (1)

2017-2033: LGTM!

Also applies to: 2862-2912, 5371-5378, 5811-5818, 6718-6736, 6852-6869, 8048-8048, 8818-8864, 9214-9223, 9884-9898, 10398-10418, 10749-10749, 11980-11996, 12398-12413, 13072-13087, 18144-18175, 19967-20069

coderabbitai · 2026-05-21T22:37:21Z

+        await self._registry.register(active)
+        try:
+            await self._repo.save(active)
+        except MemoryError, RecursionError:
+            raise
+        except Exception:
+            # Persisting an ACTIVE row failed: unregister the live handler
+            # so the durable state ("not in DB") matches the runtime state
+            # ("not registered"). Without rollback the layered tool surface
+            # would expose a tool with no audit trail.
+            await self._registry.unregister(active.name)
+            raise


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Compensate unregister() failures in both rollback paths.

If self._registry.unregister(...) raises after the repo state has already changed, the runtime and durable state diverge again: Lines 197-208 can leave a live handler with no durable ACTIVE row, and Lines 222-230 can leave a RETIRED blueprint still callable. Add compensation that restores the persisted state (or swap the order and compensate the inverse) before re-raising.

Also applies to: 222-230

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/toolsmith/applier.py` around lines 197 - 208, The rollback currently calls self._registry.unregister(...) without handling failures, risking divergence between runtime and durable state; update both rollback paths (the one after await self._repo.save(active) and the analogous RETIRE path around lines handling unregister) to catch exceptions from _registry.unregister and compensate by restoring the persisted state before re-raising: if unregister raises after a successful repo.save(active), attempt to re-save the ACTIVE row (await self._repo.save(active)) or, if that fails, log and raise the original exception; likewise, in the RETIRE flow where you unregister then change DB state, if unregister fails after the DB change, attempt to re-register the handler (await self._registry.register(<handler or blueprint>)) or revert the DB change (e.g., await self._repo.save(previous_state)) and only then re-raise; wrap calls to _registry.unregister and _repo.save in try/except to perform these compensating actions and ensure the durable and runtime states remain consistent when exceptions occur.

Aureliolo · 2026-05-21T22:54:33Z

+
+    passed: bool
+    brief_passed: bool
+    brief_score: int = Field(ge=0, le=100)


@coderabbitai Same as the config.py thread: brief_score: int = Field(ge=0, le=100) puts 100 as a call-site argument to Field(), which scripts/check_no_magic_numbers.py does not scan (lines 131-132). The gate passes on this file. This matches the established Pydantic-validator idiom across the repo. Skipping as a verified false-positive; please confirm and resolve.

Aureliolo · 2026-05-21T22:54:33Z

+
+    passed: bool
+    brief_passed: bool
+    brief_score: int = Field(ge=0, le=100)


@coderabbitai Same as the config.py thread: brief_score: int = Field(ge=0, le=100) puts 100 as a call-site argument to Field(), which scripts/check_no_magic_numbers.py does not scan (lines 131-132). The gate passes on this file. This matches the established Pydantic-validator idiom across the repo. Skipping as a verified false-positive; please confirm and resolve.

Post-rebase CodeRabbit review of the toolsmith on the rebased head. - meta/toolsmith/models.py: RETIRED now requires validated_at AND activated_at (a retired tool was rolled back after activation), matching the DB-layer lifecycle CHECK; the Python validator was looser. - meta/toolsmith/strategy.py: ProviderError in the authoring path now wraps to ToolAuthoringError (falls through the generic handler) so a transient completion failure skips just that gap instead of cancelling the sibling TaskGroup authoring tasks (regression from the round-2 parallelisation). - meta/toolsmith/strategy.py: route the model-authored JSON through a frozen _AuthoredBlueprintArgs boundary via parse_typed, replacing the ad-hoc _parse_json_object/_require_str/_require_schema helpers. - meta/toolsmith/validation_gate.py: report the golden stage as skipped when the brief fails, so a never-run scorecard is distinguishable from a real 0-0 result. - persistence/postgres/tool_blueprint_repo.py: normalize_utc the lifecycle datetimes in the upsert so naive/non-UTC values round-trip the same as the SQLite path and transition_if. - tests/integration/toolsmith/test_self_extension_e2e.py: preserve the base process environment when spawning the sandbox subprocess. - meta/config.py: document tool_creation_enabled + toolsmith attributes. - docs/design/tools.md: action-type taxonomy 39 to 41 (knowledge:ingest, knowledge:reindex from the rebased-in #2036 substrate). Skipped (defended on-thread, verified false-positive): config.py + models.py:114 Field() numerics are call-site args that check_no_magic_numbers.py does not scan (lines 131-132); gate passes.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/synthorg/meta/toolsmith/validation_gate.py (1)

60-76: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Generate schema-aware probes.

This helper only looks at "type", so authored schemas with default, const, enum, bounds, or non-empty container constraints can get an invalid probe and fail the brief before the golden stage ever runs. Build probes from the actual JSON Schema constraints instead of the type alone.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/meta/toolsmith/validation_gate.py` around lines 60 - 76, The
_synthesize_probe helper currently picks probe values only by prop["type"] (and
_PROBE_VALUES), causing invalid probes for schemas with defaults, consts, enums,
bounds or non-empty container constraints; update
_synthesize_probe(parameters_schema) to inspect each property dict (prop) for
schema keywords in priority order: const, default, enum (use first allowed
value), then type-specific constraints (for string: minLength, pattern/format
fallback; for number/integer: minimum/maximum; for array: minItems and
recursively synthesize an item probe from items schema to meet minItems; for
object: minProperties and recursively build required fields using nested
properties/required); fall back to _PROBE_VALUES only when no other constraints
exist; preserve the deepcopy behavior when assigning probe[name] and handle
missing/invalid prop types defensively. Ensure you reference and reuse the
existing functions/values (_synthesize_probe, _PROBE_VALUES, deepcopy) and keep
the same return shape.

src/synthorg/meta/config.py (1)

291-313: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Collapse the tool-creation enable flags.

tool_creation_enabled and toolsmith.enabled now both gate the same subsystem, and the new integration test has to set both. That creates a silent misconfiguration path where operators enable tool creation at the documented top level but the runtime still stays off because the nested flag remains false. Pick one source of truth, or reject mismatched values during config load.

🔧 Minimal guardrail

+    `@model_validator`(mode="after")
+    def _validate_tool_creation_flags(self) -> Self:
+        """Keep the tool-creation switches coherent."""
+        if self.tool_creation_enabled != self.toolsmith.enabled:
+            msg = "tool_creation_enabled and toolsmith.enabled must match"
+            raise ValueError(msg)
+        return self
+
     `@model_validator`(mode="after")
     def _validate_code_modification_requirements(self) -> Self:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/meta/config.py` around lines 291 - 313, The top-level flag
tool_creation_enabled and the nested ToolsmithConfig.enabled (accessible via the
toolsmith field) currently conflict; add a pydantic root_validator on the same
config model to enforce a single source of truth by either rejecting mismatched
settings or synchronizing them—for example, if tool_creation_enabled !=
toolsmith.enabled, raise a clear ValueError (or else set toolsmith.enabled =
tool_creation_enabled) so runtime behavior is deterministic; implement this
check in a root_validator that references tool_creation_enabled and toolsmith to
locate the fields to validate/fix.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/synthorg/meta/config.py`:
- Around line 291-313: The top-level flag tool_creation_enabled and the nested
ToolsmithConfig.enabled (accessible via the toolsmith field) currently conflict;
add a pydantic root_validator on the same config model to enforce a single
source of truth by either rejecting mismatched settings or synchronizing
them—for example, if tool_creation_enabled != toolsmith.enabled, raise a clear
ValueError (or else set toolsmith.enabled = tool_creation_enabled) so runtime
behavior is deterministic; implement this check in a root_validator that
references tool_creation_enabled and toolsmith to locate the fields to
validate/fix.

In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 60-76: The _synthesize_probe helper currently picks probe values
only by prop["type"] (and _PROBE_VALUES), causing invalid probes for schemas
with defaults, consts, enums, bounds or non-empty container constraints; update
_synthesize_probe(parameters_schema) to inspect each property dict (prop) for
schema keywords in priority order: const, default, enum (use first allowed
value), then type-specific constraints (for string: minLength, pattern/format
fallback; for number/integer: minimum/maximum; for array: minItems and
recursively synthesize an item probe from items schema to meet minItems; for
object: minProperties and recursively build required fields using nested
properties/required); fall back to _PROBE_VALUES only when no other constraints
exist; preserve the deepcopy behavior when assigning probe[name] and handle
missing/invalid prop types defensively. Ensure you reference and reuse the
existing functions/values (_synthesize_probe, _PROBE_VALUES, deepcopy) and keep
the same return shape.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: c946e5a2-9744-4336-be3e-19ee5510b96f

📥 Commits

Reviewing files that changed from the base of the PR and between 379c60e and bddfc2e.

📒 Files selected for processing (7)

docs/design/tools.md
src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py
tests/integration/toolsmith/test_self_extension_e2e.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Deploy Preview
GitHub Check: Build Backend
GitHub Check: CodSpeed Python benchmarks
GitHub Check: CodSpeed Web benchmarks
GitHub Check: Build Web Assets (melange)
GitHub Check: Lighthouse Site
GitHub Check: Lighthouse Dashboard
GitHub Check: Test E2E
GitHub Check: Test Integration
GitHub Check: Dashboard Test
GitHub Check: Test Unit
GitHub Check: Test Conformance (SQLite)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (8)

{README.md,docs/**/*.md,web/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md

Files:

docs/design/tools.md

docs/**/*.{md,d2,mmd}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI

Files:

docs/design/tools.md

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
tests/integration/toolsmith/test_self_extension_e2e.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

tests/integration/toolsmith/test_self_extension_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/integration/toolsmith/test_self_extension_e2e.py

{src/synthorg/persistence/**/*.py,src/synthorg/**/*repository*.py,src/synthorg/**/*_repo*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py

Files:

src/synthorg/persistence/postgres/tool_blueprint_repo.py

src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed

Files:

src/synthorg/persistence/postgres/tool_blueprint_repo.py

🧠 Learnings (7)

📚 Learning: 2026-05-16T18:36:31.446Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

docs/design/tools.md

📚 Learning: 2026-05-16T18:36:35.250Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

docs/design/tools.md

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
tests/integration/toolsmith/test_self_extension_e2e.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
tests/integration/toolsmith/test_self_extension_e2e.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/strategy.py
src/synthorg/meta/toolsmith/validation_gate.py
src/synthorg/meta/toolsmith/models.py
src/synthorg/persistence/postgres/tool_blueprint_repo.py

🔇 Additional comments (6)

docs/design/tools.md (1)

433-477: LGTM!

Also applies to: 524-548

src/synthorg/meta/toolsmith/strategy.py (1)

16-18: LGTM!

Also applies to: 130-142, 206-263

tests/integration/toolsmith/test_self_extension_e2e.py (1)

26-27: LGTM!

Also applies to: 93-97, 167-287

src/synthorg/meta/toolsmith/validation_gate.py (1)

175-187: LGTM!

src/synthorg/meta/toolsmith/models.py (1)

251-270: LGTM!

src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)

160-163: LGTM!

Post-round-7 CodeRabbit review (all 6 prior fixes confirmed LGTM); 2 new outside-diff findings: - meta/config.py: add a cross-flag validator so tool_creation_enabled (the runtime gate for boot wiring + scope guard) and toolsmith.enabled (the toolsmith's own allowlist-consistency switch) must agree. Without it an operator could enable one and silently leave the subsystem off, or configure an allowlist that never runs. - meta/toolsmith/validation_gate.py: make _synthesize_probe honour the schema keywords that fully determine a valid value (const, default, enum-first) before the type-based placeholder, so a constrained required field no longer gets a probe that spuriously fails the brief. Deeper constraint satisfaction (numeric bounds, minLength, recursive containers) is intentionally out of scope: the probe is a best-effort smoke test and a missed bound fails the brief gracefully. CodSpeed flagged a resolveLocale web-benchmark regression; that file is not touched by this PR (only the generated .gen.ts DTO types changed), so it is noise or main-baseline drift, observed not chased.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/synthorg/meta/toolsmith/validation_gate.py (1)

128-160: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handler exceptions propagate instead of failing the brief gracefully.

Line 138 calls await handler(...) without exception handling. If the sandbox handler raises (timeout, runtime error, permission issue), this propagates up rather than returning (False, _BRIEF_FAIL_SCORE). Per coding guidelines, async helpers should catch Exception (re-raising MemoryError/RecursionError). The docstring states "passes iff the tool exits cleanly"—a crash should fail the brief, not raise.

Proposed fix

     async def run(self, blueprint: ToolBlueprint) -> tuple[bool, int]:
         """Execute the acceptance probe; return ``(passed, score)``."""
         import json as _json  # noqa: PLC0415

         handler = make_dynamic_tool_handler(
             blueprint,
             self._sandbox_resolver(blueprint),
             timeout_seconds=self._timeout_seconds,
         )
         probe = _synthesize_probe(blueprint.parameters_schema)
-        raw = await handler(app_state=None, arguments=probe)
+        try:
+            raw = await handler(app_state=None, arguments=probe)
+        except MemoryError, RecursionError:
+            raise
+        except Exception as exc:
+            logger.warning(
+                TOOLSMITH_BRIEF_PARSE_FAILED,
+                tool_name=blueprint.name,
+                error_type=type(exc).__name__,
+                error=safe_error_description(exc),
+            )
+            return False, _BRIEF_FAIL_SCORE
         try:
             envelope = _json.loads(raw)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/meta/toolsmith/validation_gate.py` around lines 128 - 160, The
async run method calls await handler(...) without catching exceptions so handler
failures propagate; wrap the handler invocation in a try/except Exception block
(while re-raising MemoryError and RecursionError) to ensure any handler error
causes the brief to fail gracefully: on exception log a warning using the
existing logger/TOOLSMITH_BRIEF_PARSE_FAILED (include blueprint.name,
type(exc).__name__, and safe_error_description(exc)) and return False,
_BRIEF_FAIL_SCORE; keep the rest of the JSON parsing flow unchanged and still
parse raw only if the handler call succeeds.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 128-160: The async run method calls await handler(...) without
catching exceptions so handler failures propagate; wrap the handler invocation
in a try/except Exception block (while re-raising MemoryError and
RecursionError) to ensure any handler error causes the brief to fail gracefully:
on exception log a warning using the existing
logger/TOOLSMITH_BRIEF_PARSE_FAILED (include blueprint.name, type(exc).__name__,
and safe_error_description(exc)) and return False, _BRIEF_FAIL_SCORE; keep the
rest of the JSON parsing flow unchanged and still parse raw only if the handler
call succeeds.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 28248649-d258-41d1-80b9-6871b165b262

📥 Commits

Reviewing files that changed from the base of the PR and between bddfc2e and 38197ba.

📒 Files selected for processing (2)

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: Deploy Preview
GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Python benchmarks
GitHub Check: CodSpeed Web benchmarks
GitHub Check: Lighthouse Dashboard
GitHub Check: Lighthouse Site
GitHub Check: Test Integration
GitHub Check: Test E2E
GitHub Check: Test Conformance (SQLite)
GitHub Check: Dashboard Test
GitHub Check: Test Unit
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (3)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
No from __future__ import annotations (3.14 has PEP 649); PEP 758 except: except A, B: no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use @computed_field for derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use from synthorg.observability import get_logger; variable always logger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

🧠 Learnings (3)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

📚 Learning: 2026-05-21T22:55:20.496Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

📚 Learning: 2026-05-21T22:55:09.289Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.

Applied to files:

src/synthorg/meta/config.py
src/synthorg/meta/toolsmith/validation_gate.py

🔇 Additional comments (5)

src/synthorg/meta/config.py (1)

322-335: LGTM!

src/synthorg/meta/toolsmith/validation_gate.py (4)

50-57: LGTM!

Also applies to: 86-104

60-83: LGTM!

163-228: LGTM!

231-241: LGTM!

CodeRabbit confirmed both round-8 fixes LGTM; 1 new outside-diff finding. - meta/toolsmith/validation_gate.py: guard the handler invocation in SandboxBriefRunner.run. A sandbox handler crash (timeout, runtime error) now fails the brief gracefully (False, _BRIEF_FAIL_SCORE) with a logged warning instead of propagating out of the gate, matching the async-helper convention (re-raise MemoryError/RecursionError) and the docstring's 'passes iff the tool exits cleanly' contract. CI fully green; the round-8 CodSpeed resolveLocale regression cleared on re-run (confirmed noise, web locale code untouched by this PR).

Superseded: this CHANGES_REQUESTED review targeted an earlier head. All findings were addressed in later commits; CodeRabbit's review of the current head (7eaa0ed) reports no actionable comments, and CI is green.

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Introduced conversational interface for direct clarify and propose interactions. - Cost management now includes forecast gates, hard ceilings, and Pareto considerations. - Added living documentation engine combining wiki and retrieval-augmented generation features. - Real intake engine is now operational for live data processing. - Virtual desktop tool with vision verification gate available for enhanced workspace control. ### What's new - Per-project reproducible environments for consistent setups. - Headless browser testing tool integrated for automated UI validation. - Governed external API and data access tool introduced. - Hardened external-remote git backend with sandbox mounts and push-queue dispatching. - Adversarial red-team gate subsystem for enhanced security testing. - Self-extending toolkit to dynamically expand capabilities. - Stakes-aware model routing enables prioritized processing. - Task-board entry adapter connects live runtime with project management. - Persistent project workspace with pluggable git backend and per-project push queues implemented. - Knowledge and provenance substrate added to track data lineage. - Scoring and data contract framework for golden-company benchmark evaluations. ### Under the hood - Desktop Dockerfile pinned by digest to improve build stability and documented publishing gap fixed.  :robot: I have created a release *beep* *boop* --- ## [0.8.7](v0.8.6...v0.8.7) (2026-05-22) ### Features * conversational interface v1 - 1:1 clarify + propose ([#2019](#2019)) ([216ef94](216ef94)), closes [#1968](#1968) * cost as a first-class dial (forecast gate, hard ceiling, Pareto) ([#2029](#2029)) ([700a59e](700a59e)), closes [#1982](#1982) * **env:** reproducible per-project environments ([#2039](#2039)) ([d2c0ef9](d2c0ef9)), closes [#1994](#1994) * **evals:** [#1980](#1980) spine -- scoring + data contract for golden-company benchmark ([#2025](#2025)) ([53108e8](53108e8)) * goal/objective entry adapter ([#1964](#1964)) ([#2022](#2022)) ([cb15c3c](cb15c3c)) * governed external API/data access tool ([#1991](#1991)) ([#2032](#2032)) ([e08b451](e08b451)) * harden external-remote git backend + per-project sandbox mount + push-queue dispatch ([#2020](#2020)) ([#2030](#2030)) ([2fa2e1e](2fa2e1e)) * headless browser testing tool ([#1992](#1992)) ([#2024](#2024)) ([277b52a](277b52a)) * knowledge + provenance substrate ([#2036](#2036)) ([48c897b](48c897b)) * living documentation engine (dual-purpose wiki + RAG namespace) ([#2028](#2028)) ([3d10da9](3d10da9)), closes [#1976](#1976) * real intake engine online ([#2017](#2017)) ([9d8eb34](9d8eb34)) * **redteam:** adversarial red-team gate subsystem ([#1986](#1986)) ([#2026](#2026)) ([d2207e9](d2207e9)) * self-extending toolkit ([#1995](#1995)) ([#2035](#2035)) ([5ffc545](5ffc545)) * stakes-aware model routing ([#1998](#1998)) ([#2038](#2038)) ([9b98312](9b98312)) * task-board entry adapter to live runtime ([#1963](#1963)) ([#2023](#2023)) ([a8f1eea](a8f1eea)) * virtual desktop tool and vision verifier gate ([#2031](#2031)) ([dfe8b42](dfe8b42)), closes [#1993](#1993) * **workspace:** persistent project workspace + pluggable git backend + per-project push queue ([#2021](#2021)) ([ee58ee7](ee58ee7)) ### Bug Fixes * pin desktop Dockerfile by digest (Scorecard [#309](#309)) + document publish gap ([#2034](#2034)) ([8fda188](8fda188)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo temporarily deployed to lighthouse May 21, 2026 18:20 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 18:21 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

coderabbitai Bot previously requested changes May 21, 2026

View reviewed changes

Aureliolo temporarily deployed to lighthouse May 21, 2026 19:10 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 19:12 — with GitHub Actions Inactive

coderabbitai Bot previously requested changes May 21, 2026

View reviewed changes

Comment thread src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql

Comment thread src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql

Aureliolo temporarily deployed to lighthouse May 21, 2026 19:37 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 19:38 — with GitHub Actions Inactive

coderabbitai Bot previously approved these changes May 21, 2026

View reviewed changes

Aureliolo dismissed coderabbitai[bot]’s stale review via ce548c7 May 21, 2026 20:03

Aureliolo temporarily deployed to lighthouse May 21, 2026 20:03 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 20:04 — with GitHub Actions Inactive

coderabbitai Bot previously requested changes May 21, 2026

View reviewed changes

Comment thread src/synthorg/persistence/tool_blueprint_protocol.py Outdated

Aureliolo temporarily deployed to lighthouse May 21, 2026 20:32 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 20:33 — with GitHub Actions Inactive

coderabbitai Bot previously approved these changes May 21, 2026

View reviewed changes

Aureliolo added 3 commits May 21, 2026 22:48

feat: self-extending toolkit core (#1995) - models, persistence, dyna…

b1ae515

…mic registry, authoring, validation, governance

feat: wire self-extending toolkit at boot (#1995) - service, factory,…

7067786

… layered MCP dispatch, sim e2e

docs: document self-extending toolkit + tool:create action type (#1995)

3152378

Aureliolo dismissed coderabbitai[bot]’s stale review via 379c60e May 21, 2026 22:23

Aureliolo force-pushed the feat/1995-self-extending-toolkit branch from 6728ed8 to 379c60e Compare May 21, 2026 22:23

Aureliolo temporarily deployed to lighthouse May 21, 2026 22:23 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 22:24 — with GitHub Actions Inactive

coderabbitai Bot previously requested changes May 21, 2026

View reviewed changes

Aureliolo commented May 21, 2026

View reviewed changes

Aureliolo temporarily deployed to lighthouse May 21, 2026 22:59 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 23:00 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Aureliolo temporarily deployed to lighthouse May 21, 2026 23:27 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 23:28 — with GitHub Actions Inactive

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Aureliolo temporarily deployed to lighthouse May 21, 2026 23:48 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 23:50 — with GitHub Actions Inactive

Aureliolo merged commit 5ffc545 into main May 22, 2026
102 of 104 checks passed

Aureliolo temporarily deployed to cloudflare-preview May 22, 2026 05:02 — with GitHub Actions Inactive

Aureliolo deleted the feat/1995-self-extending-toolkit branch May 22, 2026 05:02

synthorg-repo-bot Bot mentioned this pull request May 21, 2026

chore(main): release 0.8.7 #2018

Merged

		@@ -0,0 +1,177 @@
		"""Applier that turns an approved tool-creation proposal into a live tool.

	"""Toolsmith orchestration service.
	"""Toolsmith orchestration service.\n\nimport asyncio\n"

		existing_capabilities: Callable returning the current capability
		surface, so the generator avoids duplicates.

	existing_capabilities: Callable returning the current capability
	surface, so the generator avoids duplicates.
	existing_capabilities: Static capability surface (dedup hint),\n so the generator avoids duplicates.

Conversation

Aureliolo commented May 21, 2026

Summary

Test plan

Review coverage

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist Bot commented May 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

codspeed-hq Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Aureliolo May 21, 2026

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

codspeed-hq Bot commented May 21, 2026 •

edited

Loading

codecov Bot commented May 21, 2026 •

edited

Loading