feat: self-extending toolkit (#1995)#2035
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🧰 Additional context used📓 Path-based instructions (3)src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
{src/**/*.py,tests/**/*.py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (3)📚 Learning: 2026-05-05T09:04:46.195ZApplied to files:
📚 Learning: 2026-05-21T22:55:20.496ZApplied to files:
📚 Learning: 2026-05-21T22:55:09.289ZApplied to files:
🔇 Additional comments (1)
WalkthroughAdds a self-extending “toolsmith” subsystem: models, config, protocols, errors; gap store; LLM blueprint authoring; validation gate; dynamic registry and layered MCP dispatch; sandbox script handler; orchestration service, applier, overflow routing, and factory; Postgres/SQLite persistence with migrations and repositories; API startup wiring and dynamic tool layer installation; security enums/guards/risk/timeouts; observability events; docs updates; and comprehensive unit/integration/conformance tests. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a self-extending toolkit ('toolsmith') that allows the system to autonomously address recurring capability gaps by authoring, validating, and registering new MCP tools at runtime. This subsystem enhances the organization's capabilities while maintaining strict security and quality controls through a mandatory approval gate, benchmark-based validation, and a layered registry architecture. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the 'toolsmith' subsystem, a self-extending toolkit that enables the organization to dynamically author and register new MCP tools at runtime based on recurring capability gaps. The implementation includes a ring-buffered gap store, LLM-backed authoring in isolated sandboxes, and a two-stage validation gate integrated into the self-improvement framework at a new TOOL_CREATION altitude. Feedback focuses on optimizing performance by parallelizing tool application and gap processing via asyncio.TaskGroup, correcting a docstring type mismatch in the orchestration service, and refining authoring logic to prevent duplicate tools by including currently registered dynamic tools in the capability context.
| @@ -0,0 +1,177 @@ | |||
| """Applier that turns an approved tool-creation proposal into a live tool. | |||
| applied = 0 | ||
| failures: list[str] = [] | ||
| for blueprint in proposal.tool_changes: | ||
| try: | ||
| ok = await self._apply_one(blueprint) | ||
| except ProviderError: | ||
| raise | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.warning( | ||
| TOOLSMITH_APPLY_FAILED, | ||
| tool_name=blueprint.name, | ||
| error_type=type(exc).__name__, | ||
| error=safe_error_description(exc), | ||
| ) | ||
| failures.append(f"{blueprint.name}: {type(exc).__name__}") | ||
| continue | ||
| if ok: | ||
| applied += 1 | ||
| else: | ||
| failures.append(f"{blueprint.name}: rejected by benchmark gate") | ||
| if failures: | ||
| return ApplyResult( | ||
| success=False, | ||
| error_message=NotBlankStr("; ".join(failures)), | ||
| changes_applied=applied, | ||
| ) | ||
| return ApplyResult(success=True, changes_applied=applied) |
There was a problem hiding this comment.
Tool changes are currently processed sequentially. Since _apply_one involves benchmark validation (which can be slow due to scorecard runs), consider using asyncio.TaskGroup to process them in parallel, adhering to the project's fan-out convention.
applied = 0\n failures: list[str] = []\n\n async with asyncio.TaskGroup() as tg:\n tasks = [tg.create_task(self._apply_one(blueprint)) for blueprint in proposal.tool_changes]\n\n for i, task in enumerate(tasks):\n blueprint = proposal.tool_changes[i]\n try:\n ok = task.result()\n if ok:\n applied += 1\n else:\n failures.append(f"{blueprint.name}: rejected by benchmark gate")\n except ProviderError:\n raise\n except MemoryError, RecursionError:\n raise\n except Exception as exc:\n logger.warning(\n TOOLSMITH_APPLY_FAILED,\n tool_name=blueprint.name,\n error_type=type(exc).__name__,\n error=safe_error_description(exc),\n )\n failures.append(f"{blueprint.name}: {type(exc).__name__}")\n\n if failures:\n return ApplyResult(\n success=False,\n error_message=NotBlankStr("; ".join(failures)),\n changes_applied=applied,\n )\n return ApplyResult(success=True, changes_applied=applied)| @@ -0,0 +1,221 @@ | |||
| """Toolsmith orchestration service. | |||
| existing_capabilities: Callable returning the current capability | ||
| surface, so the generator avoids duplicates. |
There was a problem hiding this comment.
The docstring incorrectly identifies existing_capabilities as a Callable. The implementation and factory wiring use a tuple[NotBlankStr, ...]. Please update the docstring to match the implementation.
| existing_capabilities: Callable returning the current capability | |
| surface, so the generator avoids duplicates. | |
| existing_capabilities: Static capability surface (dedup hint),\n so the generator avoids duplicates. |
| proposals: list[ImprovementProposal] = [] | ||
| for gap in gaps: | ||
| proposals.extend(await self._handle_gap(gap)) |
There was a problem hiding this comment.
Recurring capability gaps are processed sequentially. Since _handle_gap involves an LLM call via the generator, this is a performance bottleneck. Use asyncio.TaskGroup to process gaps in parallel per project conventions.
proposals: list[ImprovementProposal] = []\n async with asyncio.TaskGroup() as tg:\n tasks = [tg.create_task(self._handle_gap(gap)) for gap in gaps]\n\n for task in tasks:\n proposals.extend(task.result())| blueprint = await self._generator.author( | ||
| gap, existing_capabilities=self._existing_capabilities | ||
| ) |
There was a problem hiding this comment.
The existing_capabilities passed to the generator only includes static capabilities known at boot time. This can lead to the LLM authoring tools that duplicate dynamic tools already registered in the same run. Consider combining self._existing_capabilities with the names currently in the dynamic registry.
try:\n # Combine static capabilities with currently registered dynamic ones\n current_caps = list(self._existing_capabilities)\n current_caps.extend(self._applier._registry.names())\n\n blueprint = await self._generator.author(\n gap, existing_capabilities=tuple(current_caps)\n )There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/meta/toolsmith/test_dynamic_registry.py (1)
170-227: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd explicit collision-precedence tests for layered lookup.
Line 171 and Line 209 validate non-overlapping keys only. Given the dynamic layer is intended to overlay the static layer, add direct collision tests so a regression in precedence is caught.
✅ Suggested test additions
class TestLayeredToolRegistry: + async def test_dynamic_overrides_static_on_name_collision(self) -> None: + static_def = blueprint_to_mcp_def( + _blueprint(name="synthorg_textkit_slugify", capability="static:tool") + ) + static = _FakeStatic({static_def.name: static_def}) + dynamic = DynamicToolRegistry(handler_factory=_handler_factory) + await dynamic.register( + _blueprint(name="synthorg_textkit_slugify", capability="dynamic:tool") + ) + layered = LayeredToolRegistry(static, dynamic) + assert layered.get("synthorg_textkit_slugify").capability == "dynamic:tool" + class TestLayeredHandlerMap: + async def test_dynamic_handler_overrides_static_on_key_collision(self) -> None: + async def static_handler( + *, app_state: Any, arguments: dict[str, Any], actor: Any = None + ) -> str: + del app_state, arguments, actor + return "static" + + dynamic = DynamicToolRegistry(handler_factory=_handler_factory) + await dynamic.register(_blueprint(name="synthorg_textkit_slugify")) + mapping = LayeredHandlerMap( + {"synthorg_textkit_slugify": static_handler}, + dynamic, + ) + assert mapping["synthorg_textkit_slugify"] is _fake_handler🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/meta/toolsmith/test_dynamic_registry.py` around lines 170 - 227, Add tests that assert collision precedence when a key exists in both layers: for LayeredToolRegistry create a static ToolDef (e.g., via blueprint_to_mcp_def/_FakeStatic) with the same name that DynamicToolRegistry registers and assert layered.get("shared_name") returns the dynamic definition (not the static one); for LayeredHandlerMap register a static handler under "shared_key" and also ensure DynamicToolRegistry provides a handler for "shared_key", then assert mapping["shared_key"] is the dynamic handler, "shared_key" in mapping, and that other existing behaviors (KeyError for missing, membership, len, iteration) still hold. Ensure tests reference LayeredToolRegistry.get and LayeredHandlerMap (mapping["..."]) so collisions are explicitly checked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/api/app.py`:
- Around line 1532-1538: Move the call to
install_dynamic_tool_layer(runtime.dynamic_registry) so it runs before
app_state.set_toolsmith_service(runtime.service); specifically, invoke
install_dynamic_tool_layer and let it complete successfully (propagate or handle
exceptions) before calling app_state.set_toolsmith_service to avoid persisting a
partial wiring on failure. Ensure install_dynamic_tool_layer errors prevent
calling app_state.set_toolsmith_service and that the existing
logger.info(API_APP_STARTUP, service="toolsmith", note="wired") only runs after
both installation and set_toolsmith_service succeed.
- Around line 355-361: Replace the use of Path.cwd() when building sandbox
backends so toolsmith sandboxes are not bound to the process CWD; call
build_sandbox_backends with the app's resolved workspace root (the workspace
seam/root that the app exposes) instead of Path.cwd(), keeping
SandboxingConfig() and the _resolve_sandbox function unchanged except for that
argument; ensure the passed value is the same workspace root used elsewhere in
the app (e.g., the resolved workspace variable or property the app already
computes) so build_sandbox_backends(config=sandboxing,
workspace=the_app_workspace_root).
In `@src/synthorg/meta/strategies/code_modification.py`:
- Around line 179-180: The ProviderError catch currently re-raises without
logging; before re-raising in the except ProviderError block inside
src/synthorg/meta/strategies/code_modification.py, emit a warning/error log
(including the exception details) and/or raise an appropriate observability
event (e.g., call the module's logger.warn/error or the service's event emitter)
with contextual info about the provider and operation, then re-raise the
original ProviderError to preserve stack trace.
In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 149-151: The current flow saves the blueprint state via
self._repo.save(active) before doing runtime registration with
self._registry.register(active), risking durable ACTIVE state if registration
fails; change the order to call self._registry.register(active) first, then call
self._repo.save(active), and if the save raises an exception roll back the
runtime registration (e.g., call a deregister/unregister method on
self._registry or otherwise undo registration for the same active) to avoid
split-brain; update the success logging (logger.info(TOOLSMITH_APPLY_COMPLETED,
tool_name=blueprint.name)) to run only after both register and save complete
successfully.
In `@src/synthorg/meta/toolsmith/dynamic_registry.py`:
- Around line 119-123: build_args_model currently returns create_model(...)
directly which can raise raw exceptions; catch exceptions from create_model
(using the existing create_model call with _model_class_name(blueprint.name) and
ConfigDict(...)) and re-raise them as ToolRegistrationError while preserving the
original exception message/trace (e.g., include the caught exception as context
or message). Wrap only the create_model invocation in try/except, reference
create_model, _model_class_name, blueprint.name and ConfigDict in the error
message, and raise ToolRegistrationError(...) so callers receive the documented
domain-specific error.
In `@src/synthorg/meta/toolsmith/models.py`:
- Around line 227-247: Update _validate_state_timestamps to require a non-None
validation record when the blueprint reaches VALIDATED (in addition to ACTIVE):
inside the method (ToolBlueprint._validate_state_timestamps) add a check for
self.state in {ToolBlueprintState.VALIDATED, ToolBlueprintState.ACTIVE} and
raise ValueError("validation result required in state 'validated'") (or a
combined message) if self.validation is None; keep existing
validated_at/activated_at/retired_at checks and _assert_monotonic() intact.
- Around line 193-205: The validator _validate_name_capability_action currently
checks formats but not that synthorg_{domain}_{action} corresponds to capability
"{domain}:{action}"; update it to parse domain and action from self.name (using
the capturing groups from _TOOL_NAME_RE) and from self.capability (using
_CAPABILITY_RE or splitting on ':'), then compare the domain and action parts
and raise a ValueError with a clear message if they differ; keep the existing
format checks for _TOOL_NAME_RE, _CAPABILITY_RE and _ACTION_TYPE_RE and return
self when all checks pass.
In `@src/synthorg/meta/toolsmith/script_handler.py`:
- Around line 72-84: The except Exception block in the tool invocation handler
currently swallows system-critical exceptions; modify the handler around the
except Exception as exc (and related variables like logger.warning, wrapped,
DynamicToolScriptError, ToolsmithError, err) to re-raise system exceptions
instead of wrapping them: immediately check if exc is an instance of MemoryError
or RecursionError (and also propagate KeyboardInterrupt/SystemExit) and if so
re-raise it, otherwise continue with the existing logging/wrapping/return
behavior that creates DynamicToolScriptError or preserves ToolsmithError and
returns err(..., domain_code="dynamic_tool_failed").
In `@src/synthorg/meta/toolsmith/service.py`:
- Around line 216-217: Replace the hardcoded 0.5 literal used for the proposal
confidence with a module-level annotated constant: add a top-of-file constant
like PROPOSAL_CONFIDENCE: float = 0.5 and then change the call site that
currently sets confidence=0.5 (adjacent to
source_rule=NotBlankStr("capability_gap")) to use
confidence=PROPOSAL_CONFIDENCE; keep the constant name exported/visible in the
module so linters/no-magic-number guards accept it.
In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 49-56: _PROBE_VALUES currently contains shared mutable singletons
for "array" and "object" which can be mutated across uses; change the mapping so
it doesn't store [] or {} directly—either store factory callables (e.g.,
"array": list, "object": dict) or immutable placeholders and ensure any code
that constructs probes (the code that reads _PROBE_VALUES to produce probe
instances) calls the factory or creates a fresh copy (e.g., list() / dict() or
copy.deepcopy) each time; update both occurrences of _PROBE_VALUES and any code
paths that consume it to instantiate new containers rather than reusing the same
list/dict.
- Around line 109-120: After parsing JSON in the try block, ensure the parsed
envelope is a mapping before calling envelope.get: check isinstance(envelope,
dict) (or collections.abc.Mapping) and if it's not, log a brief-parse failure
including the actual type (similar to the existing logger.warning usage with
tool_name and error_type) and return False, _BRIEF_FAIL_SCORE; otherwise
continue to compute passed = envelope.get("status") == "ok" and return the
pass/fail score. This change touches the envelope variable and the
envelope.get(...) usage in the validation logic.
In
`@src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql`:
- Line 19: The JSONB columns parameters_schema and validation must be
constrained to object payloads; update the migration
(20260522000001_dynamic_tools.sql) to add CHECK constraints on both columns
(e.g., ensure jsonb_typeof(parameters_schema) = 'object' and
jsonb_typeof(validation) = 'object') so scalar/array values are rejected, and
apply the same CHECK constraints in the canonical schema file
(src/synthorg/persistence/postgres/schema.sql) to keep drift checks passing.
- Around line 25-30: Add a table-level CHECK constraint on the dynamic_tools
table (name it e.g., dynamic_tools_lifecycle_check) that enforces lifecycle
coherence between state and timestamps: for 'pending' none of
validated_at/activated_at/retired_at exist, for 'validated' validated_at IS NOT
NULL and activated_at/retired_at IS NULL, for 'active' validated_at AND
activated_at are NOT NULL and retired_at IS NULL, and for 'retired'
validated_at, activated_at, and retired_at are all NOT NULL; add this CHECK to
the CREATE TABLE statement in the revision SQL
(20260522000001_dynamic_tools.sql) and mirror the same constraint in the
canonical schema file (schema.sql).
In `@src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql`:
- Line 20: Add write-time JSON validation for the TEXT columns by adding JSON
constraints to the CREATE/ALTER statements: ensure the parameters_schema and
validation columns include CHECK(json_valid(parameters_schema)) and
CHECK(json_valid(validation)) (and optionally CHECK(json_type(parameters_schema)
= 'object') / json_type(validation) = 'object' if you expect JSON objects) so
invalid or non-JSON text cannot persist; apply the same constraint edits in the
migration 20260522000001_dynamic_tools.sql and mirror them in schema.sql for the
same table/columns (parameters_schema, validation).
- Around line 27-39: Add a CHECK constraint to the table that enforces valid
state/timestamp combinations: for column state and timestamp columns
validated_at, activated_at, retired_at ensure pending => all three are NULL,
validated => validated_at IS NOT NULL and activated_at and retired_at are NULL,
active => validated_at IS NOT NULL and activated_at IS NOT NULL and retired_at
IS NULL, retired => retired_at IS NOT NULL; add this CHECK alongside the
existing state and timestamp checks in the migration (the same constraint should
be added to src/synthorg/persistence/sqlite/schema.sql as well), referencing the
exact column names state, validated_at, activated_at, retired_at so the DB
prevents contradictory lifecycle rows.
---
Outside diff comments:
In `@tests/unit/meta/toolsmith/test_dynamic_registry.py`:
- Around line 170-227: Add tests that assert collision precedence when a key
exists in both layers: for LayeredToolRegistry create a static ToolDef (e.g.,
via blueprint_to_mcp_def/_FakeStatic) with the same name that
DynamicToolRegistry registers and assert layered.get("shared_name") returns the
dynamic definition (not the static one); for LayeredHandlerMap register a static
handler under "shared_key" and also ensure DynamicToolRegistry provides a
handler for "shared_key", then assert mapping["shared_key"] is the dynamic
handler, "shared_key" in mapping, and that other existing behaviors (KeyError
for missing, membership, len, iteration) still hold. Ensure tests reference
LayeredToolRegistry.get and LayeredHandlerMap (mapping["..."]) so collisions are
explicitly checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: e25f9d4b-c78a-4aa0-950b-3edd7c9ce454
📒 Files selected for processing (64)
docs/design/self-improvement.mddocs/design/tools.mdscripts/_ghost_wiring_manifest.txtscripts/schema_drift_baseline.txtsrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/core/enums.pysrc/synthorg/meta/config.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/meta/models.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/postgres/schema.sqlsrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/sqlite/schema.sqlsrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/timeout/risk_tier_classifier.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/integration/toolsmith/__init__.pytests/integration/toolsmith/test_self_extension_e2e.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/__init__.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_config.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/meta/toolsmith/test_models.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/meta/toolsmith/test_strategy.pytests/unit/meta/toolsmith/test_validation_gate.pytests/unit/observability/test_events.pytests/unit/persistence/test_factory.pytests/unit/security/test_action_types.pyweb/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test E2E
- GitHub Check: Dashboard Test
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test Integration
- GitHub Check: Test Unit
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Preview
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (12)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam,mock_of[T](**overrides)for typed-boundary substitutions, SimpleNamespace for attribute-bags. BareMagicMockat typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked byscripts/check_mock_spec.py(zero-tolerance, no baseline).
FakeClock andmock_ofimport fromtests._shared; inject viaclock=and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Dual-backend conformance: tests undertests/conformance/persistence/consumebackendfixture (SQLite + Postgres). Enforced bycheck_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Useasyncio.Event().wait()notsleep(large).
Files:
tests/integration/toolsmith/__init__.pytests/unit/security/test_action_types.pytests/unit/meta/toolsmith/__init__.pytests/unit/observability/test_events.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/persistence/test_factory.pytests/unit/meta/toolsmith/test_models.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_config.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_validation_gate.pytests/integration/toolsmith/test_self_extension_e2e.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_strategy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/integration/toolsmith/__init__.pytests/unit/security/test_action_types.pytests/unit/meta/toolsmith/__init__.pytests/unit/observability/test_events.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/persistence/test_factory.pytests/unit/meta/toolsmith/test_models.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_config.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_validation_gate.pytests/integration/toolsmith/test_self_extension_e2e.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_strategy.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories inpersistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay@runtime_checkable.
Configuration Precedence: DB > env > code default viaSettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. Noos.environ.getoutside startup; pre-init Cat-2 reads usesettings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced bycheck_no_review_origin_in_code.py+check_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes:<Domain><Condition>Errorpattern inheriting fromDomainError; never inheritException/RuntimeError/etc directly. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on every frozen model project-wide. Gate:check_frozen_model_extra_forbid.py.@computed_fieldauto-exempt; per-line# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries. Use@computed_fieldfor derived data.
UseNotBlankStrfor identifiers in Pydantic models.
Args models at every system boundary;parse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)or `c...
Files:
src/synthorg/security/rules/risk_classifier.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/core/enums.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/security/action_types.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/config.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/strategy.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/security/rules/risk_classifier.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/core/enums.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/security/action_types.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/config.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/strategy.py
{README.md,docs/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numeric claims in README and public docs sourced from
data/runtime_stats.yamlvia<!--RS:NAME-->markers. Seedata/README.md.
Files:
docs/design/self-improvement.mddocs/design/tools.md
docs/**/*.{md,d2,mmd}
📄 CodeRabbit inference engine (CLAUDE.md)
Use
d2for architecture / nested containers; usemermaidfor flowcharts / sequence / pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI.
Files:
docs/design/self-improvement.mddocs/design/tools.md
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/api/types/openapi.gen.tsweb/src/api/types/enum-values.gen.ts
web/src/api/types/**/*.gen.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Generated DTO types (MANDATORY): NEVER hand-edit
web/src/api/types/*.gen.ts. Regenerate withuv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').
Files:
web/src/api/types/openapi.gen.tsweb/src/api/types/enum-values.gen.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/api/types/openapi.gen.tsweb/src/api/types/enum-values.gen.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
web/src/**/*.{ts,tsx}: Reuseweb/src/components/ui/and design tokens only in Web Dashboard. Seeweb/CLAUDE.mdfor detail.
Web: seeweb/CLAUDE.mdfor framework/React 19 conventions.
Files:
web/src/api/types/openapi.gen.tsweb/src/api/types/enum-values.gen.ts
src/synthorg/meta/mcp/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/meta/mcp/**/*.py: MCP: 200+ tools across 15 domain modules undermeta/mcp/domains/. DefineToolHandler+args_model; callrequire_admin_guardrails()on admin tools; route through service layers. See mcp-handler-contract.md.
Conversational propose:POST /meta/chat/proposeis opt-in (meta.chief_of_staff.propose_enabled);ChiefOfStaffProposerbuilt bybuild_chief_of_staff_proposer(ENFORCED manifest entry, EPIC#1955) and 503s when ANY of provider, persistence, or work pipeline missing. Approval routing:ApprovalSource.CONVERSATIONAL_INTAKEthrough Flow 0 ofsignal_resume_intentbefore parked-context / review-gate flows. Wrap human content viawrap_untrusted(TAG_TASK_DATA, ...)(SEC-1).
Files:
src/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/meta/mcp/server.py
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD methods:save(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples.
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed.
Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup lifecycle has two phases: construction (synchronous services in
create_appbody) and on_startup (services needing persistence in_build_lifecycle.on_startup). Construction ordering:agent_registrybeforeauto_wire_meetings;tunnel_provideralways wired. On-startup ordering:SettingsServiceauto-wire beforeWorkflowExecutionObserver;OntologyServiceafterpersistence.connect(); cost-dial services via_try_wire_cost_dialafter persistence via best-effort idempotent pattern.
Files:
src/synthorg/api/state.pysrc/synthorg/api/app.py
🧠 Learnings (5)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/integration/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/security/timeout/risk_tier_classifier.pytests/unit/security/test_action_types.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/core/enums.pysrc/synthorg/security/autonomy/models.pytests/unit/meta/toolsmith/__init__.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/security/risk_scorer.pytests/unit/observability/test_events.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/security/action_types.pysrc/synthorg/meta/models.pytests/unit/meta/toolsmith/test_script_handler.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/observability/events/persistence.pytests/unit/meta/toolsmith/test_applier.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/mcp/registry.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/persistence/test_factory.pytests/unit/meta/toolsmith/test_models.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/test_gap_store.pysrc/synthorg/meta/toolsmith/gap_store.pytests/unit/meta/toolsmith/test_config.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/__init__.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_dynamic_registry.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/api/state.pytests/unit/meta/toolsmith/test_validation_gate.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/script_handler.pytests/integration/toolsmith/test_self_extension_e2e.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/config.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/strategy.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_strategy.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
🪛 OpenGrep (1.21.0)
src/synthorg/persistence/postgres/tool_blueprint_repo.py
[ERROR] 332-335: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🔇 Additional comments (43)
tests/conformance/persistence/test_dynamic_tool_repository.py (1)
1-318: LGTM!tests/integration/toolsmith/__init__.py (1)
1-1: LGTM!tests/integration/toolsmith/test_self_extension_e2e.py (1)
1-283: LGTM!tests/unit/core/test_enums.py (1)
120-127: LGTM!tests/unit/meta/toolsmith/__init__.py (1)
1-1: LGTM!tests/unit/meta/toolsmith/test_applier.py (1)
1-269: LGTM!tests/unit/meta/toolsmith/test_config.py (1)
1-57: LGTM!tests/unit/meta/toolsmith/test_gap_store.py (1)
1-102: LGTM!tests/unit/meta/toolsmith/test_invoker_layered.py (1)
1-104: LGTM!tests/unit/meta/toolsmith/test_models.py (1)
1-208: LGTM!tests/unit/meta/toolsmith/test_overflow.py (1)
1-132: LGTM!tests/unit/meta/toolsmith/test_script_handler.py (1)
1-116: LGTM!tests/unit/meta/toolsmith/test_strategy.py (1)
1-148: LGTM!tests/unit/meta/toolsmith/test_validation_gate.py (1)
1-173: LGTM!tests/unit/observability/test_events.py (1)
348-350: LGTM!tests/unit/persistence/test_factory.py (1)
117-117: LGTM!Also applies to: 121-127, 131-131, 136-138
tests/unit/security/test_action_types.py (1)
14-14: LGTM!Also applies to: 29-29
src/synthorg/persistence/postgres/tool_blueprint_repo.py (2)
327-346: Static analysis false positive - parameterized query is safe.The OpenGrep warning about SQL injection on lines 332-335 is a false positive. The
whereclause is constructed from validatedToolBlueprintFilterSpecfields (enum values andNotBlankStr), and all dynamic values are passed through theparamslist using%splaceholders.
1-422: LGTM!src/synthorg/persistence/tool_blueprint_protocol.py (1)
1-153: LGTM!src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)
1-433: LGTM!src/synthorg/meta/models.py (1)
24-24: LGTM!Also applies to: 36-36, 338-338, 376-422, 433-433
src/synthorg/meta/signals/service.py (1)
313-314: LGTM!src/synthorg/meta/guards/approval_gate.py (1)
41-41: LGTM!src/synthorg/core/enums.py (1)
580-580: LGTM!src/synthorg/meta/guards/scope_check.py (1)
94-95: Provide the original <review_comment> text to rewrite. No review comment (or diff snippet) was included in this thread, so I can’t rewrite or verify it—paste the full <review_comment> content you want updated.src/synthorg/security/action_types.py (1)
35-35: LGTM!src/synthorg/security/autonomy/models.py (1)
91-92: LGTM!Also applies to: 118-119
src/synthorg/security/risk_scorer.py (1)
206-208: LGTM!src/synthorg/security/rules/risk_classifier.py (1)
26-26: LGTM!src/synthorg/security/timeout/risk_tier_classifier.py (1)
47-47: LGTM!src/synthorg/meta/strategies/code_modification.py (1)
43-43: LGTM!src/synthorg/observability/events/persistence.py (1)
697-724: LGTM!src/synthorg/observability/events/toolsmith.py (1)
1-52: LGTM!docs/design/self-improvement.md (1)
60-73: LGTM!Also applies to: 165-165, 249-249
docs/design/tools.md (1)
433-478: LGTM!Also applies to: 524-527, 533-533, 546-546
scripts/_ghost_wiring_manifest.txt (1)
77-85: LGTM!scripts/schema_drift_baseline.txt (1)
20-20: LGTM!Also applies to: 67-72
web/src/api/types/enum-values.gen.ts (1)
523-530: LGTM!web/src/api/types/openapi.gen.ts (1)
10539-10539: LGTM!src/synthorg/meta/toolsmith/overflow.py (1)
45-72: LGTM!Also applies to: 75-117
src/synthorg/meta/toolsmith/factory.py (1)
48-59: LGTM!Also applies to: 61-147, 149-186
src/synthorg/api/state.py (1)
177-177: LGTM!Also applies to: 335-335, 627-627, 1265-1273
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2035 +/- ##
==========================================
- Coverage 84.99% 84.97% -0.03%
==========================================
Files 2107 2125 +18
Lines 123632 124801 +1169
Branches 10335 10433 +98
==========================================
+ Hits 105084 106052 +968
- Misses 15960 16128 +168
- Partials 2588 2621 +33 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/meta/toolsmith/applier.py (1)
161-167:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize lifecycle fields before persisting the candidate blueprint.
At Line 161, only
stateis overwritten. At Line 166 and Lines 175-182, records are rebuilt from the original inboundblueprint, so caller-supplied lifecycle fields (validated_at/activated_at/retired_at/validation) can leak into durable state. This weakens the trust boundary and can persist misleading lifecycle metadata even on gate rejection.Proposed fix
- pending = blueprint.model_copy(update={"state": ToolBlueprintState.PENDING}) - await self._repo.save(pending) - - result = await self._gate.validate(blueprint) + pending = blueprint.model_copy( + update={ + "state": ToolBlueprintState.PENDING, + "validated_at": None, + "activated_at": None, + "retired_at": None, + "validation": None, + } + ) + await self._repo.save(pending) + + result = await self._gate.validate(pending) if not result.passed: - await self._repo.save(blueprint.model_copy(update={"validation": result})) + await self._repo.save(pending.model_copy(update={"validation": result})) logger.info( TOOLSMITH_APPLY_REJECTED, tool_name=blueprint.name, detail=result.detail, ) return False @@ - active = blueprint.model_copy( + active = pending.model_copy( update={ "state": ToolBlueprintState.ACTIVE, "validated_at": now, "activated_at": now, "validation": result, } )Also applies to: 175-182
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/toolsmith/applier.py` around lines 161 - 167, The persisted candidate blueprints are rebuilt from the inbound blueprint without normalizing lifecycle fields, allowing caller-supplied validated_at/activated_at/retired_at/validation to be stored; before calling self._repo.save (e.g., where pending = blueprint.model_copy(update={"state": ToolBlueprintState.PENDING}) and later saves after self._gate.validate in the same function), explicitly clear or set lifecycle fields (validated_at, activated_at, retired_at, validation) to safe defaults (e.g., None or an empty validation object) on any model_copy used for persistence so that gate rejection and subsequent save paths cannot persist caller-provided lifecycle metadata; apply the same normalization to the other rebuild/save path that reconstructs the record after validation (the save calls following self._gate.validate and the later save block).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql`:
- Around line 35-54: The CHECK constraint for the table permits non-pending
states with NULL validation; update the constraint so that for each non-pending
branch (state = 'validated', 'active', 'retired') you also require validation IS
NOT NULL in addition to the existing timestamp requirements (validated_at,
activated_at, retired_at), and apply the same change to the corresponding CHECK
in src/synthorg/persistence/postgres/schema.sql so the invariant (validation
required for validated and later states) is enforced in both the migration and
schema definition; locate the constraint referencing columns validation, state,
validated_at, activated_at, retired_at and add validation IS NOT NULL to the
validated/active/retired OR-clauses.
In `@src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql`:
- Around line 47-69: The CHECK constraint currently allows
state='validated'|'active'|'retired' with validation IS NULL; update the
constraint so that each non-pending branch requires validation IS NOT NULL (in
addition to the existing timestamp requirements) — i.e., for the 'validated',
'active', and 'retired' clauses add "AND validation IS NOT NULL". Apply the same
change to the equivalent schema/check in the other SQLite schema file so both
migrations/schemas enforce persisted validation evidence.
---
Outside diff comments:
In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 161-167: The persisted candidate blueprints are rebuilt from the
inbound blueprint without normalizing lifecycle fields, allowing caller-supplied
validated_at/activated_at/retired_at/validation to be stored; before calling
self._repo.save (e.g., where pending = blueprint.model_copy(update={"state":
ToolBlueprintState.PENDING}) and later saves after self._gate.validate in the
same function), explicitly clear or set lifecycle fields (validated_at,
activated_at, retired_at, validation) to safe defaults (e.g., None or an empty
validation object) on any model_copy used for persistence so that gate rejection
and subsequent save paths cannot persist caller-provided lifecycle metadata;
apply the same normalization to the other rebuild/save path that reconstructs
the record after validation (the save calls following self._gate.validate and
the later save block).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: bf294c51-2204-432c-9b6c-dfd493267352
📒 Files selected for processing (17)
src/synthorg/api/app.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/postgres/schema.sqlsrc/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/sqlite/schema.sqltests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_models.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test Unit
- GitHub Check: Test Integration
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Preview
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories inpersistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay@runtime_checkable.
Configuration Precedence: DB > env > code default viaSettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. Noos.environ.getoutside startup; pre-init Cat-2 reads usesettings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced bycheck_no_review_origin_in_code.py+check_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes:<Domain><Condition>Errorpattern inheriting fromDomainError; never inheritException/RuntimeError/etc directly. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on every frozen model project-wide. Gate:check_frozen_model_extra_forbid.py.@computed_fieldauto-exempt; per-line# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries. Use@computed_fieldfor derived data.
UseNotBlankStrfor identifiers in Pydantic models.
Args models at every system boundary;parse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)or `c...
Files:
src/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/script_handler.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/script_handler.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup lifecycle has two phases: construction (synchronous services in
create_appbody) and on_startup (services needing persistence in_build_lifecycle.on_startup). Construction ordering:agent_registrybeforeauto_wire_meetings;tunnel_provideralways wired. On-startup ordering:SettingsServiceauto-wire beforeWorkflowExecutionObserver;OntologyServiceafterpersistence.connect(); cost-dial services via_try_wire_cost_dialafter persistence via best-effort idempotent pattern.
Files:
src/synthorg/api/app.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam,mock_of[T](**overrides)for typed-boundary substitutions, SimpleNamespace for attribute-bags. BareMagicMockat typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked byscripts/check_mock_spec.py(zero-tolerance, no baseline).
FakeClock andmock_ofimport fromtests._shared; inject viaclock=and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Dual-backend conformance: tests undertests/conformance/persistence/consumebackendfixture (SQLite + Postgres). Enforced bycheck_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Useasyncio.Event().wait()notsleep(large).
Files:
tests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_models.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_applier.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_models.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_applier.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/api/app.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/factory.pytests/unit/meta/toolsmith/test_dynamic_registry.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/models.pytests/unit/meta/toolsmith/test_models.pytests/conformance/persistence/test_dynamic_tool_repository.pysrc/synthorg/meta/toolsmith/script_handler.pytests/unit/meta/toolsmith/test_applier.py
🔇 Additional comments (12)
src/synthorg/meta/strategies/code_modification.py (1)
179-189: LGTM!src/synthorg/meta/toolsmith/validation_gate.py (1)
17-17: LGTM!Also applies to: 73-75, 122-131
src/synthorg/meta/toolsmith/dynamic_registry.py (1)
119-136: LGTM!Also applies to: 233-243
src/synthorg/api/app.py (1)
320-327: LGTM!Also applies to: 335-360, 1521-1559
src/synthorg/meta/toolsmith/factory.py (1)
144-145: LGTM!tests/unit/meta/toolsmith/test_dynamic_registry.py (1)
229-280: LGTM!src/synthorg/meta/toolsmith/service.py (1)
63-68: LGTM!Also applies to: 100-101, 110-111, 127-134, 143-147, 163-170, 248-249
src/synthorg/meta/toolsmith/models.py (1)
195-202: LGTM!Also applies to: 212-219, 243-255, 261-266
tests/unit/meta/toolsmith/test_models.py (1)
209-244: LGTM!tests/conformance/persistence/test_dynamic_tool_repository.py (1)
77-84: LGTM!src/synthorg/meta/toolsmith/script_handler.py (1)
72-73: LGTM!tests/unit/meta/toolsmith/test_applier.py (1)
251-254: LGTM!Also applies to: 260-290
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/conformance/persistence/test_dynamic_tool_repository.py (1)
253-285: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winAdd the failing transition parity test too.
This covers the happy path, but it never asserts that an existing
PENDING -> VALIDATEDtransition withoutvalidationfails on both backends. That is the new lifecycle invariant in this PR, so the conformance suite should lock it down explicitly.Suggested test
+ async def test_transition_if_requires_validation_for_validated_state( + self, backend: PersistenceBackend + ) -> None: + from synthorg.core.persistence_errors import QueryError + + repo = _repo(backend) + bp = _blueprint(blueprint_id="bp-no-val", name="synthorg_textkit_no_val") + await repo.save(bp) + + with pytest.raises(QueryError): + await repo.transition_if( + bp.id, + from_state=ToolBlueprintState.PENDING, + to_state=ToolBlueprintState.VALIDATED, + validated_at=_NOW, + ) +``` </details> As per coding guidelines, "Dual-backend conformance: tests under `tests/conformance/persistence/` consume `backend` fixture (SQLite + Postgres)." Also applies to: 305-315 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@tests/conformance/persistence/test_dynamic_tool_repository.pyaround lines
253 - 285, Add a parity test to assert the new lifecycle invariant that a
PENDING->VALIDATED transition without a ToolValidationResult must fail: after
saving bp via _repo and _blueprint and performing the happy-path transition
(using repo.transition_if with validated_at and validation), call
repo.transition_if again on the same bp.id attempting to move from
ToolBlueprintState.PENDING to ToolBlueprintState.VALIDATED but omit the
validation (validation=None or not provided) and assert it returns False, that
fetched = await repo.get(bp.id) remains non-null, state is unchanged, and
validated_at/validation were not modified; apply the same assertion pattern for
the related test around the other block mentioned (lines 305-315) so both
backends enforce this invariant.</details> </blockquote></details> </blockquote></details>🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. Inline comments: In `@src/synthorg/persistence/tool_blueprint_protocol.py`: - Around line 35-40: The docstring for DynamicToolRepository.transition_if is incorrect about validation=None; it currently says None leaves the column untouched but implementations treat None as setting the validation column to NULL. Update the protocol text to state that the ``validation`` kwarg accepts a ToolValidationResult or None and that passing None will explicitly SET validation = NULL (clearing existing evidence), and document the required alternative (e.g., an explicit sentinel or omission) callers must use to leave the column unchanged; reference DynamicToolRepository.transition_if and ToolValidationResult so readers can locate the behavior. --- Outside diff comments: In `@tests/conformance/persistence/test_dynamic_tool_repository.py`: - Around line 253-285: Add a parity test to assert the new lifecycle invariant that a PENDING->VALIDATED transition without a ToolValidationResult must fail: after saving bp via _repo and _blueprint and performing the happy-path transition (using repo.transition_if with validated_at and validation), call repo.transition_if again on the same bp.id attempting to move from ToolBlueprintState.PENDING to ToolBlueprintState.VALIDATED but omit the validation (validation=None or not provided) and assert it returns False, that fetched = await repo.get(bp.id) remains non-null, state is unchanged, and validated_at/validation were not modified; apply the same assertion pattern for the related test around the other block mentioned (lines 305-315) so both backends enforce this invariant.🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID:
44d5c745-efd9-4a3c-9d9f-4d0ff664ab3f📒 Files selected for processing (4)
src/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/persistence/tool_blueprint_protocol.pytests/conformance/persistence/test_dynamic_tool_repository.py📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Dashboard Test
- GitHub Check: Test Integration
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL. New repository protocols inherit from generic categories inpersistence/_generics.py; bespoke methods permitted only under ADR-0001 D7. Protocols stay@runtime_checkable.
Configuration Precedence: DB > env > code default viaSettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env. YAML is company-template ingestion format only. Noos.environ.getoutside startup; pre-init Cat-2 reads usesettings.bootstrap_resolver.resolve_init_value.
No hardcoded numeric values. Allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations / issue back-refs / migration framing. Enforced bycheck_no_review_origin_in_code.py+check_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Error classes:<Domain><Condition>Errorpattern inheriting fromDomainError; never inheritException/RuntimeError/etc directly. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on every frozen model project-wide. Gate:check_frozen_model_extra_forbid.py.@computed_fieldauto-exempt; per-line# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries. Use@computed_fieldfor derived data.
UseNotBlankStrfor identifiers in Pydantic models.
Args models at every system boundary;parse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)or `c...Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD methods:save(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples.
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed.Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pytests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min.
Windows tests: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Test doubles ladder: FakeClock for Clock seam,mock_of[T](**overrides)for typed-boundary substitutions, SimpleNamespace for attribute-bags. BareMagicMockat typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked byscripts/check_mock_spec.py(zero-tolerance, no baseline).
FakeClock andmock_ofimport fromtests._shared; inject viaclock=and helper's spec subscript.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Dual-backend conformance: tests undertests/conformance/persistence/consumebackendfixture (SQLite + Postgres). Enforced bycheck_dual_backend_test_parity.py.
Flaky tests: NEVER skip/xfail; fix fundamentally. Useasyncio.Event().wait()notsleep(large).Files:
tests/conformance/persistence/test_dynamic_tool_repository.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.Files:
tests/conformance/persistence/test_dynamic_tool_repository.py🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo Repo: Aureliolo/synthorg PR: 1760 File: scripts/_dual_backend_parity_lib.py:215-216 Timestamp: 2026-05-05T09:04:46.195Z Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.Applied to files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pytests/conformance/persistence/test_dynamic_tool_repository.py🔇 Additional comments (2)
src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)
55-62: LGTM!Also applies to: 363-385, 435-450
src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)
49-56: LGTM!Also applies to: 359-380, 444-459
…mic registry, authoring, validation, governance
… layered MCP dispatch, sim e2e
6728ed8 to
379c60e
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/meta/config.py (1)
243-277:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the docstring to include the new fields.
The
SelfImprovementConfigdocstring lists all configuration attributes, but the newly addedtool_creation_enabled(line 288) andtoolsmith(line 310) fields are missing from the Attributes section.📝 Proposed fix
Attributes: enabled: Master switch for the self-improvement system. chief_of_staff_enabled: Whether to enable the Chief of Staff agent persona. config_tuning_enabled: Enable config tuning proposals. architecture_proposals_enabled: Enable architecture proposals. prompt_tuning_enabled: Enable prompt tuning proposals. code_modification_enabled: Enable code modification proposals. + tool_creation_enabled: Enable self-extending toolkit proposals. schedule: Cycle scheduling configuration. rollout: Rollout behavior configuration. regression: Regression detection thresholds. guards: Guard chain configuration. rules: Rule engine configuration. prompt_tuning: Prompt tuning strategy configuration. code_modification: Code modification strategy configuration. chief_of_staff: Chief of Staff advanced capabilities (learning, alerts, chat). cross_deployment_analytics: Cross-deployment analytics telemetry (opt-in, disabled by default). + toolsmith: Self-extending toolkit configuration + (gap thresholds, sandbox policy, validation). analysis_model: LLM model identifier for proposal analysis. analysis_temperature: Sampling temperature for analysis. analysis_max_tokens: Token budget for analysis responses.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/config.py` around lines 243 - 277, The class SelfImprovementConfig docstring is missing the new attributes; update the Attributes section of the SelfImprovementConfig docstring to add entries for tool_creation_enabled (bool master switch to enable proposal-driven tool creation) and toolsmith (configuration object for toolsmith behavior and limits), making sure their brief descriptions follow the same style as the other attribute lines and appear alongside the other config fields like chief_of_staff and cross_deployment_analytics.docs/design/tools.md (1)
524-551:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTaxonomy count and list are inconsistent with the ActionType enum.
The documentation states "~39 built-in action types" (line 524) and "Taxonomy (39 leaf types)" (line 534), but the
ActionTypeenum insrc/synthorg/core/enums.pynow includes 41 members:
- The 39 types listed here
KNOWLEDGE_INGEST = "knowledge:ingest"(line 628)KNOWLEDGE_REINDEX = "knowledge:reindex"(line 629)The taxonomy list is missing the knowledge action types.
📝 Proposed fix
-**Taxonomy (39 leaf types):** +**Taxonomy (41 leaf types):** ```text code:read, code:write, code:create, code:delete, code:refactor test:write, test:run docs:write vcs:read, vcs:commit, vcs:push, vcs:branch deploy:staging, deploy:production comms:internal, comms:external budget:spend, budget:exceed org:hire, org:fire, org:promote db:query, db:mutate, db:admin arch:decide tool:create memory:read +knowledge:ingest, knowledge:reindex browser:navigate, browser:screenshot, browser:diff, browser:accessibility_scan, browser:spec external_data:request desktop:launch, desktop:click, desktop:type, desktop:key, desktop:screenshot, desktop:scrollAlso update line 524: ```diff -**Registry:** `StrEnum` for ~39 built-in action types (type safety, autocomplete, typos caught +**Registry:** `StrEnum` for ~41 built-in action types (type safety, autocomplete, typos caught🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design/tools.md` around lines 524 - 551, Update the taxonomy list in the docs to match the ActionType enum by adding the two missing knowledge action types: KNOWLEDGE_INGEST ("knowledge:ingest") and KNOWLEDGE_REINDEX ("knowledge:reindex"); edit the taxonomy block where the 39 leaf types are listed and add "knowledge:ingest, knowledge:reindex" into the comma-separated list and update the header/count mention (e.g., "~39" / "39 leaf types") to reflect the new total so the documentation matches the ActionType enum symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/meta/toolsmith/applier.py`:
- Around line 197-208: The rollback currently calls
self._registry.unregister(...) without handling failures, risking divergence
between runtime and durable state; update both rollback paths (the one after
await self._repo.save(active) and the analogous RETIRE path around lines
handling unregister) to catch exceptions from _registry.unregister and
compensate by restoring the persisted state before re-raising: if unregister
raises after a successful repo.save(active), attempt to re-save the ACTIVE row
(await self._repo.save(active)) or, if that fails, log and raise the original
exception; likewise, in the RETIRE flow where you unregister then change DB
state, if unregister fails after the DB change, attempt to re-register the
handler (await self._registry.register(<handler or blueprint>)) or revert the DB
change (e.g., await self._repo.save(previous_state)) and only then re-raise;
wrap calls to _registry.unregister and _repo.save in try/except to perform these
compensating actions and ensure the durable and runtime states remain consistent
when exceptions occur.
In `@src/synthorg/meta/toolsmith/models.py`:
- Line 114: Replace the hardcoded upper bound `100` with a typed module-level
constant and use that constant in the Field declaration: add an annotated
constant (e.g. BRIEF_SCORE_MAX: int = 100) at the top of the module and change
the brief_score Field call to use le=BRIEF_SCORE_MAX; update any imports or type
hints if needed so the constant is recognized by linters and the numeric-literal
policy.
- Around line 256-269: The RETIRED state validation currently allows missing
activation/validation timestamps; update the ToolBlueprint validation logic in
models.py so ToolBlueprintState.RETIRED requires both activated_at and
validated_at to be non-None (and still require validation result), e.g. add
explicit checks for self.state is ToolBlueprintState.RETIRED that raise
ValueError if self.activated_at or self.validated_at is None (or alternatively
include RETIRED in the set used to require validated_at earlier), referencing
the self.state, ToolBlueprintState.RETIRED, activated_at, and validated_at
symbols to locate and modify the validation block.
In `@src/synthorg/meta/toolsmith/strategy.py`:
- Around line 201-254: Replace the manual JSON/untyped checks in _parse (and
remove/stop using _parse_json_object, _require_str, _require_schema) with a
typed boundary: define a small frozen args model (e.g., AuthoredBlueprintArgs)
that declares description, script_body, action_type, parameters_schema, etc.,
then call parse_typed(response, AuthoredBlueprintArgs) inside _parse to get a
validated instance and build the ToolBlueprint from that instance's fields
before returning; ensure you still wrap parsing errors in ToolAuthoringError and
keep the existing fields set (id, name, sandbox_backend, requires_network,
state, created_at) when constructing ToolBlueprint.
- Around line 123-137: The except-block currently re-raises ProviderError which
lets provider failures escape ToolsmithService._handle_gap; change the handling
so ProviderError is treated as a per-gap authoring failure (like
ToolAuthoringError/ToolCapabilityNotAllowedError) by catching ProviderError and
raising a ToolAuthoringError (or wrapping it) instead, while preserving logging
via TOOLSMITH_AUTHOR_FAILED and safe_error_description for capability and error
details so _handle_gap will swallow/skip this gap rather than cancelling the
TaskGroup.
In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 165-181: The golden-stage reporting currently computes and prints
baseline/candidate/margin (baseline, candidate, margin) even when brief_passed
is false and score() was skipped, making skipped runs indistinguishable from
real 0-0 results; update the logic in validation_gate.py around
validation_cfg.require_golden_delta and brief_passed so that when
require_golden_delta is true but brief_passed is false you do NOT call
self._scorecard_provider.score(blueprint) and instead set passed = False and set
detail to explicitly indicate the golden stage was skipped (e.g., "brief=fail;
golden skipped") so baseline/candidate/margin are not reported; keep the
existing ToolValidationConfigError branch (when self._scorecard_provider is
None) unchanged.
In `@src/synthorg/persistence/postgres/tool_blueprint_repo.py`:
- Around line 147-167: The upsert path in _upsert_params is persisting
ToolBlueprint datetime fields (created_at, validated_at, activated_at,
retired_at) as-is which can lead to inconsistent round-trips across backends;
update _upsert_params to normalize these typed datetimes using the
persistence._shared helpers (use normalize_utc for already-typed datetimes, and
parse_iso_utc/format_iso_utc where appropriate) so that created_at,
validated_at, activated_at, and retired_at are converted to normalized UTC ISO
values before being placed into the returned tuple; reference the ToolBlueprint
fields and the helper functions parse_iso_utc, format_iso_utc, and normalize_utc
to implement the normalization consistently with transition_if() and the SQLite
path.
In `@tests/integration/toolsmith/test_self_extension_e2e.py`:
- Around line 25-30: The test spawns a sandbox subprocess and currently passes a
minimal env dict which replaces the entire process environment; change the spawn
call in tests/integration/toolsmith/test_self_extension_e2e.py (the subprocess
invocation around the sandbox start, where an env parameter is passed) to
preserve the base environment by starting from os.environ.copy() and then
updating it with only the required key/value pairs instead of replacing it
wholesale (e.g., use base_env = os.environ.copy(); base_env.update({...}) and
pass base_env as env). This keeps platform-specific vars (Windows PATH, etc.)
intact while still injecting the needed test variables.
---
Outside diff comments:
In `@docs/design/tools.md`:
- Around line 524-551: Update the taxonomy list in the docs to match the
ActionType enum by adding the two missing knowledge action types:
KNOWLEDGE_INGEST ("knowledge:ingest") and KNOWLEDGE_REINDEX
("knowledge:reindex"); edit the taxonomy block where the 39 leaf types are
listed and add "knowledge:ingest, knowledge:reindex" into the comma-separated
list and update the header/count mention (e.g., "~39" / "39 leaf types") to
reflect the new total so the documentation matches the ActionType enum symbols.
In `@src/synthorg/meta/config.py`:
- Around line 243-277: The class SelfImprovementConfig docstring is missing the
new attributes; update the Attributes section of the SelfImprovementConfig
docstring to add entries for tool_creation_enabled (bool master switch to enable
proposal-driven tool creation) and toolsmith (configuration object for toolsmith
behavior and limits), making sure their brief descriptions follow the same style
as the other attribute lines and appear alongside the other config fields like
chief_of_staff and cross_deployment_analytics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 68f3c01e-0409-4047-a559-7d0d7cccc9a8
📒 Files selected for processing (63)
docs/design/self-improvement.mddocs/design/tools.mdscripts/_ghost_wiring_manifest.txtsrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/core/enums.pysrc/synthorg/meta/config.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/meta/models.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/service.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/postgres/schema.sqlsrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sqlsrc/synthorg/persistence/sqlite/schema.sqlsrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/security/action_types.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/timeout/risk_tier_classifier.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/integration/toolsmith/__init__.pytests/integration/toolsmith/test_self_extension_e2e.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/__init__.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_config.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/meta/toolsmith/test_models.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/meta/toolsmith/test_strategy.pytests/unit/meta/toolsmith/test_validation_gate.pytests/unit/observability/test_events.pytests/unit/persistence/test_factory.pytests/unit/security/test_action_types.pyweb/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Test Unit
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test Integration
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (15)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/meta/toolsmith/__init__.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/observability/test_events.pytests/unit/security/test_action_types.pytests/integration/toolsmith/__init__.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_config.pytests/unit/persistence/test_factory.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_strategy.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_validation_gate.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_models.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/integration/toolsmith/test_self_extension_e2e.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/toolsmith/__init__.pytests/unit/core/test_enums.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/observability/test_events.pytests/unit/security/test_action_types.pytests/integration/toolsmith/__init__.pytests/unit/meta/toolsmith/test_gap_store.pytests/unit/meta/toolsmith/test_config.pytests/unit/persistence/test_factory.pytests/unit/meta/toolsmith/test_overflow.pytests/unit/meta/toolsmith/test_strategy.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_validation_gate.pytests/unit/meta/toolsmith/test_invoker_layered.pytests/unit/meta/toolsmith/test_applier.pytests/unit/meta/toolsmith/test_models.pytests/unit/meta/toolsmith/test_dynamic_registry.pytests/integration/toolsmith/test_self_extension_e2e.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/
Files:
tests/unit/meta/toolsmith/__init__.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/action_types.pytests/unit/core/test_enums.pysrc/synthorg/observability/events/toolsmith.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/observability/test_events.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/core/enums.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/observability/events/persistence.pytests/unit/security/test_action_types.pysrc/synthorg/api/state.pytests/integration/toolsmith/__init__.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/models.pytests/unit/meta/toolsmith/test_gap_store.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/strategies/code_modification.pytests/unit/meta/toolsmith/test_config.pytests/unit/persistence/test_factory.pysrc/synthorg/meta/config.pytests/unit/meta/toolsmith/test_overflow.pysrc/synthorg/api/app.pysrc/synthorg/persistence/tool_blueprint_protocol.pytests/unit/meta/toolsmith/test_strategy.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/strategy.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_validation_gate.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/mcp/server.pytests/unit/meta/toolsmith/test_invoker_layered.pysrc/synthorg/meta/toolsmith/errors.pytests/unit/meta/toolsmith/test_applier.pysrc/synthorg/meta/toolsmith/script_handler.pytests/unit/meta/toolsmith/test_models.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pytests/unit/meta/toolsmith/test_dynamic_registry.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/service.pytests/integration/toolsmith/test_self_extension_e2e.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site
Files:
src/synthorg/security/risk_scorer.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/core/enums.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/api/state.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/models.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/config.pysrc/synthorg/api/app.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/service.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Nofrom __future__ import annotations(3.14 has PEP 649); PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use@computed_fieldfor derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Usefrom synthorg.observability import get_logger; variable alwayslogger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...
Files:
src/synthorg/security/risk_scorer.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/core/enums.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/api/state.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/models.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/config.pysrc/synthorg/api/app.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/service.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/security/risk_scorer.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/action_types.pysrc/synthorg/observability/events/toolsmith.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/core/enums.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/observability/events/persistence.pysrc/synthorg/api/state.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/models.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/strategies/code_modification.pysrc/synthorg/meta/config.pysrc/synthorg/api/app.pysrc/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/mcp/server.pysrc/synthorg/meta/toolsmith/errors.pysrc/synthorg/meta/toolsmith/script_handler.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/service.py
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/api/types/**/*.gen.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Generated DTO types (MANDATORY): NEVER hand-edit
web/src/api/types/*.gen.ts. Regenerate withuv run python scripts/generate_dto_types_ts.py. Import DTOs via the barrel (import type { AgentConfig } from '@/api/types').
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse
web/src/components/ui/components and design tokens only per web Dashboard Design System in web/CLAUDE.md
Files:
web/src/api/types/enum-values.gen.tsweb/src/api/types/openapi.gen.ts
{README.md,docs/**/*.md,web/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md
Files:
docs/design/self-improvement.mddocs/design/tools.md
docs/**/*.{md,d2,mmd}
📄 CodeRabbit inference engine (CLAUDE.md)
Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI
Files:
docs/design/self-improvement.mddocs/design/tools.md
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: Two-phase API startup: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence backend
Construction-phase ordering: agent_registry BEFORE auto_wire_meetings; tunnel_provider unconditionally
On-startup ordering: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService after persistence.connect(); cost-dial services via _try_wire_cost_dial AFTER persistence; knowledge substrate via _wire_knowledge_engine AFTER persistence, gated on has_persistence AND has_memory_backend
Pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Files:
src/synthorg/api/state.pysrc/synthorg/api/app.py
src/synthorg/**/*approval*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Approval decisions for ApprovalSource.CONVERSATIONAL_INTAKE route through Flow 0 of signal_resume_intent before parked-context / review-gate flows
Files:
src/synthorg/meta/guards/approval_gate.py
{src/synthorg/persistence/**/*.py,src/synthorg/**/*repository*.py,src/synthorg/**/*_repo*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py
Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.py
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed
Files:
src/synthorg/persistence/tool_blueprint_protocol.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.py
tests/conformance/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres); enforced by check_dual_backend_test_parity.py
Files:
tests/conformance/persistence/test_dynamic_tool_repository.py
🧠 Learnings (5)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/meta/toolsmith/__init__.pysrc/synthorg/security/risk_scorer.pysrc/synthorg/meta/signals/service.pysrc/synthorg/meta/toolsmith/__init__.pysrc/synthorg/security/rules/risk_classifier.pysrc/synthorg/security/action_types.pytests/unit/core/test_enums.pysrc/synthorg/observability/events/toolsmith.pytests/unit/meta/toolsmith/test_script_handler.pytests/unit/observability/test_events.pysrc/synthorg/security/autonomy/models.pysrc/synthorg/core/enums.pysrc/synthorg/meta/guards/scope_check.pysrc/synthorg/meta/toolsmith/config.pysrc/synthorg/observability/events/persistence.pytests/unit/security/test_action_types.pysrc/synthorg/api/state.pytests/integration/toolsmith/__init__.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/meta/mcp/registry.pysrc/synthorg/security/timeout/risk_tier_classifier.pysrc/synthorg/meta/guards/approval_gate.pysrc/synthorg/meta/models.pytests/unit/meta/toolsmith/test_gap_store.pysrc/synthorg/meta/toolsmith/overflow.pysrc/synthorg/meta/strategies/code_modification.pytests/unit/meta/toolsmith/test_config.pytests/unit/persistence/test_factory.pysrc/synthorg/meta/config.pytests/unit/meta/toolsmith/test_overflow.pysrc/synthorg/api/app.pysrc/synthorg/persistence/tool_blueprint_protocol.pytests/unit/meta/toolsmith/test_strategy.pysrc/synthorg/meta/toolsmith/applier.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/factory.pysrc/synthorg/meta/toolsmith/strategy.pytests/conformance/persistence/test_dynamic_tool_repository.pytests/unit/meta/toolsmith/test_validation_gate.pysrc/synthorg/meta/toolsmith/gap_store.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/protocol.pysrc/synthorg/meta/mcp/server.pytests/unit/meta/toolsmith/test_invoker_layered.pysrc/synthorg/meta/toolsmith/errors.pytests/unit/meta/toolsmith/test_applier.pysrc/synthorg/meta/toolsmith/script_handler.pytests/unit/meta/toolsmith/test_models.pysrc/synthorg/meta/toolsmith/dynamic_registry.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pytests/unit/meta/toolsmith/test_dynamic_registry.pysrc/synthorg/persistence/sqlite/tool_blueprint_repo.pysrc/synthorg/meta/toolsmith/service.pytests/integration/toolsmith/test_self_extension_e2e.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/self-improvement.mddocs/design/tools.md
🪛 OpenGrep (1.21.0)
src/synthorg/persistence/postgres/tool_blueprint_repo.py
[ERROR] 337-340: SQL query built via f-string passed to execute()/executemany(). Use parameterized queries with placeholders instead.
(coderabbit.sql-injection.python-fstring-execute)
🔇 Additional comments (57)
tests/conformance/persistence/test_dynamic_tool_repository.py (1)
1-381: LGTM!tests/integration/toolsmith/__init__.py (1)
1-1: LGTM!tests/integration/toolsmith/test_self_extension_e2e.py (1)
1-24: LGTM!Also applies to: 31-90, 100-283
tests/unit/core/test_enums.py (1)
92-97: LGTM!Also applies to: 121-129
tests/unit/meta/toolsmith/__init__.py (1)
1-1: LGTM!tests/unit/meta/toolsmith/test_applier.py (1)
1-346: LGTM!tests/unit/meta/toolsmith/test_config.py (1)
1-57: LGTM!tests/unit/meta/toolsmith/test_dynamic_registry.py (1)
1-281: LGTM!tests/unit/meta/toolsmith/test_gap_store.py (1)
1-102: LGTM!tests/unit/meta/toolsmith/test_invoker_layered.py (1)
1-104: LGTM!tests/unit/meta/toolsmith/test_models.py (1)
1-245: LGTM!tests/unit/meta/toolsmith/test_overflow.py (1)
1-132: LGTM!tests/unit/meta/toolsmith/test_script_handler.py (1)
1-116: LGTM!tests/unit/meta/toolsmith/test_strategy.py (1)
1-148: LGTM!tests/unit/meta/toolsmith/test_validation_gate.py (1)
1-173: LGTM!tests/unit/observability/test_events.py (1)
348-352: LGTM!tests/unit/persistence/test_factory.py (1)
117-117: LGTM!Also applies to: 121-127, 131-131, 136-138
tests/unit/security/test_action_types.py (1)
14-14: LGTM!Also applies to: 29-29, 31-31
docs/design/self-improvement.md (1)
60-73: LGTM!Also applies to: 165-165, 249-249
docs/design/tools.md (1)
433-478: LGTM!scripts/_ghost_wiring_manifest.txt (1)
77-93: LGTM!src/synthorg/core/enums.py (1)
128-143: LGTM!Also applies to: 161-198, 626-629
src/synthorg/meta/guards/approval_gate.py (1)
36-42: LGTM!src/synthorg/meta/guards/scope_check.py (1)
94-95: LGTM!src/synthorg/security/action_types.py (1)
35-37: LGTM!src/synthorg/security/autonomy/models.py (1)
91-92: LGTM!Also applies to: 118-119
src/synthorg/security/risk_scorer.py (1)
206-208: LGTM!Also applies to: 215-217, 231-233
src/synthorg/security/rules/risk_classifier.py (1)
26-27: LGTM!Also applies to: 33-34, 47-48
src/synthorg/security/timeout/risk_tier_classifier.py (1)
47-48: LGTM!Also applies to: 54-55, 68-69
src/synthorg/observability/events/persistence.py (1)
351-419: LGTM!Also applies to: 767-794
src/synthorg/observability/events/toolsmith.py (1)
1-52: LGTM!src/synthorg/meta/toolsmith/overflow.py (1)
45-73: LGTM!Also applies to: 85-115, 117-117
src/synthorg/meta/toolsmith/protocol.py (1)
1-187: LGTM!src/synthorg/meta/toolsmith/errors.py (1)
1-55: LGTM!src/synthorg/meta/toolsmith/gap_store.py (1)
1-152: LGTM!src/synthorg/meta/toolsmith/script_handler.py (1)
1-133: LGTM!src/synthorg/meta/toolsmith/dynamic_registry.py (1)
1-330: LGTM!src/synthorg/meta/toolsmith/service.py (1)
1-254: LGTM!src/synthorg/meta/toolsmith/factory.py (1)
48-147: LGTM!Also applies to: 150-187
src/synthorg/meta/models.py (1)
24-25: LGTM!Also applies to: 36-36, 338-338, 373-434
src/synthorg/meta/toolsmith/__init__.py (1)
1-26: LGTM!src/synthorg/persistence/tool_blueprint_protocol.py (1)
1-170: LGTM!src/synthorg/persistence/sqlite/revisions/20260522000001_dynamic_tools.sql (1)
1-76: LGTM!src/synthorg/persistence/sqlite/schema.sql (1)
485-543: LGTM!Also applies to: 1680-1743
src/synthorg/persistence/sqlite/tool_blueprint_repo.py (1)
1-463: LGTM!src/synthorg/persistence/postgres/revisions/20260522000001_dynamic_tools.sql (1)
1-61: LGTM!src/synthorg/persistence/postgres/schema.sql (1)
499-556: LGTM!Also applies to: 1637-1688
src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)
191-191: ⚡ Quick winProvide the missing input needed to rewrite the comment: The original
<review_comment>text (and any verification outputs to base the rewrite on) isn’t included, so I can’t produce the rewritten review comment. Paste the<review_comment>...</review_comment>contents and the results you want me to use.src/synthorg/api/app.py (1)
153-159: LGTM!Also applies to: 300-374, 1358-1407, 1546-1612
src/synthorg/api/state.py (1)
177-180: LGTM!Also applies to: 267-269, 339-340, 519-520, 633-633, 1105-1133, 1301-1308
src/synthorg/meta/mcp/registry.py (1)
11-12: LGTM!Also applies to: 26-42
src/synthorg/meta/mcp/invoker.py (1)
30-31: LGTM!Also applies to: 57-68
src/synthorg/meta/mcp/server.py (1)
18-19: LGTM!Also applies to: 40-45, 79-91, 94-105, 132-136
src/synthorg/meta/signals/service.py (1)
313-314: LGTM!src/synthorg/meta/strategies/code_modification.py (1)
43-44: LGTM!Also applies to: 179-190
web/src/api/types/enum-values.gen.ts (1)
443-444: LGTM!Also applies to: 529-530, 679-695
web/src/api/types/openapi.gen.ts (1)
2017-2033: LGTM!Also applies to: 2862-2912, 5371-5378, 5811-5818, 6718-6736, 6852-6869, 8048-8048, 8818-8864, 9214-9223, 9884-9898, 10398-10418, 10749-10749, 11980-11996, 12398-12413, 13072-13087, 18144-18175, 19967-20069
| await self._registry.register(active) | ||
| try: | ||
| await self._repo.save(active) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| # Persisting an ACTIVE row failed: unregister the live handler | ||
| # so the durable state ("not in DB") matches the runtime state | ||
| # ("not registered"). Without rollback the layered tool surface | ||
| # would expose a tool with no audit trail. | ||
| await self._registry.unregister(active.name) | ||
| raise |
There was a problem hiding this comment.
Compensate unregister() failures in both rollback paths.
If self._registry.unregister(...) raises after the repo state has already changed, the runtime and durable state diverge again: Lines 197-208 can leave a live handler with no durable ACTIVE row, and Lines 222-230 can leave a RETIRED blueprint still callable. Add compensation that restores the persisted state (or swap the order and compensate the inverse) before re-raising.
Also applies to: 222-230
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/synthorg/meta/toolsmith/applier.py` around lines 197 - 208, The rollback
currently calls self._registry.unregister(...) without handling failures,
risking divergence between runtime and durable state; update both rollback paths
(the one after await self._repo.save(active) and the analogous RETIRE path
around lines handling unregister) to catch exceptions from _registry.unregister
and compensate by restoring the persisted state before re-raising: if unregister
raises after a successful repo.save(active), attempt to re-save the ACTIVE row
(await self._repo.save(active)) or, if that fails, log and raise the original
exception; likewise, in the RETIRE flow where you unregister then change DB
state, if unregister fails after the DB change, attempt to re-register the
handler (await self._registry.register(<handler or blueprint>)) or revert the DB
change (e.g., await self._repo.save(previous_state)) and only then re-raise;
wrap calls to _registry.unregister and _repo.save in try/except to perform these
compensating actions and ensure the durable and runtime states remain consistent
when exceptions occur.
|
|
||
| passed: bool | ||
| brief_passed: bool | ||
| brief_score: int = Field(ge=0, le=100) |
There was a problem hiding this comment.
@coderabbitai Same as the config.py thread: brief_score: int = Field(ge=0, le=100) puts 100 as a call-site argument to Field(), which scripts/check_no_magic_numbers.py does not scan (lines 131-132). The gate passes on this file. This matches the established Pydantic-validator idiom across the repo. Skipping as a verified false-positive; please confirm and resolve.
|
|
||
| passed: bool | ||
| brief_passed: bool | ||
| brief_score: int = Field(ge=0, le=100) |
There was a problem hiding this comment.
@coderabbitai Same as the config.py thread: brief_score: int = Field(ge=0, le=100) puts 100 as a call-site argument to Field(), which scripts/check_no_magic_numbers.py does not scan (lines 131-132). The gate passes on this file. This matches the established Pydantic-validator idiom across the repo. Skipping as a verified false-positive; please confirm and resolve.
Post-rebase CodeRabbit review of the toolsmith on the rebased head. - meta/toolsmith/models.py: RETIRED now requires validated_at AND activated_at (a retired tool was rolled back after activation), matching the DB-layer lifecycle CHECK; the Python validator was looser. - meta/toolsmith/strategy.py: ProviderError in the authoring path now wraps to ToolAuthoringError (falls through the generic handler) so a transient completion failure skips just that gap instead of cancelling the sibling TaskGroup authoring tasks (regression from the round-2 parallelisation). - meta/toolsmith/strategy.py: route the model-authored JSON through a frozen _AuthoredBlueprintArgs boundary via parse_typed, replacing the ad-hoc _parse_json_object/_require_str/_require_schema helpers. - meta/toolsmith/validation_gate.py: report the golden stage as skipped when the brief fails, so a never-run scorecard is distinguishable from a real 0-0 result. - persistence/postgres/tool_blueprint_repo.py: normalize_utc the lifecycle datetimes in the upsert so naive/non-UTC values round-trip the same as the SQLite path and transition_if. - tests/integration/toolsmith/test_self_extension_e2e.py: preserve the base process environment when spawning the sandbox subprocess. - meta/config.py: document tool_creation_enabled + toolsmith attributes. - docs/design/tools.md: action-type taxonomy 39 to 41 (knowledge:ingest, knowledge:reindex from the rebased-in #2036 substrate). Skipped (defended on-thread, verified false-positive): config.py + models.py:114 Field() numerics are call-site args that check_no_magic_numbers.py does not scan (lines 131-132); gate passes.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/meta/toolsmith/validation_gate.py (1)
60-76:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftGenerate schema-aware probes.
This helper only looks at
"type", so authored schemas withdefault,const,enum, bounds, or non-empty container constraints can get an invalid probe and fail the brief before the golden stage ever runs. Build probes from the actual JSON Schema constraints instead of the type alone.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/toolsmith/validation_gate.py` around lines 60 - 76, The _synthesize_probe helper currently picks probe values only by prop["type"] (and _PROBE_VALUES), causing invalid probes for schemas with defaults, consts, enums, bounds or non-empty container constraints; update _synthesize_probe(parameters_schema) to inspect each property dict (prop) for schema keywords in priority order: const, default, enum (use first allowed value), then type-specific constraints (for string: minLength, pattern/format fallback; for number/integer: minimum/maximum; for array: minItems and recursively synthesize an item probe from items schema to meet minItems; for object: minProperties and recursively build required fields using nested properties/required); fall back to _PROBE_VALUES only when no other constraints exist; preserve the deepcopy behavior when assigning probe[name] and handle missing/invalid prop types defensively. Ensure you reference and reuse the existing functions/values (_synthesize_probe, _PROBE_VALUES, deepcopy) and keep the same return shape.src/synthorg/meta/config.py (1)
291-313:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCollapse the tool-creation enable flags.
tool_creation_enabledandtoolsmith.enablednow both gate the same subsystem, and the new integration test has to set both. That creates a silent misconfiguration path where operators enable tool creation at the documented top level but the runtime still stays off because the nested flag remains false. Pick one source of truth, or reject mismatched values during config load.🔧 Minimal guardrail
+ `@model_validator`(mode="after") + def _validate_tool_creation_flags(self) -> Self: + """Keep the tool-creation switches coherent.""" + if self.tool_creation_enabled != self.toolsmith.enabled: + msg = "tool_creation_enabled and toolsmith.enabled must match" + raise ValueError(msg) + return self + `@model_validator`(mode="after") def _validate_code_modification_requirements(self) -> Self:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/config.py` around lines 291 - 313, The top-level flag tool_creation_enabled and the nested ToolsmithConfig.enabled (accessible via the toolsmith field) currently conflict; add a pydantic root_validator on the same config model to enforce a single source of truth by either rejecting mismatched settings or synchronizing them—for example, if tool_creation_enabled != toolsmith.enabled, raise a clear ValueError (or else set toolsmith.enabled = tool_creation_enabled) so runtime behavior is deterministic; implement this check in a root_validator that references tool_creation_enabled and toolsmith to locate the fields to validate/fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/synthorg/meta/config.py`:
- Around line 291-313: The top-level flag tool_creation_enabled and the nested
ToolsmithConfig.enabled (accessible via the toolsmith field) currently conflict;
add a pydantic root_validator on the same config model to enforce a single
source of truth by either rejecting mismatched settings or synchronizing
them—for example, if tool_creation_enabled != toolsmith.enabled, raise a clear
ValueError (or else set toolsmith.enabled = tool_creation_enabled) so runtime
behavior is deterministic; implement this check in a root_validator that
references tool_creation_enabled and toolsmith to locate the fields to
validate/fix.
In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 60-76: The _synthesize_probe helper currently picks probe values
only by prop["type"] (and _PROBE_VALUES), causing invalid probes for schemas
with defaults, consts, enums, bounds or non-empty container constraints; update
_synthesize_probe(parameters_schema) to inspect each property dict (prop) for
schema keywords in priority order: const, default, enum (use first allowed
value), then type-specific constraints (for string: minLength, pattern/format
fallback; for number/integer: minimum/maximum; for array: minItems and
recursively synthesize an item probe from items schema to meet minItems; for
object: minProperties and recursively build required fields using nested
properties/required); fall back to _PROBE_VALUES only when no other constraints
exist; preserve the deepcopy behavior when assigning probe[name] and handle
missing/invalid prop types defensively. Ensure you reference and reuse the
existing functions/values (_synthesize_probe, _PROBE_VALUES, deepcopy) and keep
the same return shape.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: c946e5a2-9744-4336-be3e-19ee5510b96f
📒 Files selected for processing (7)
docs/design/tools.mdsrc/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.pytests/integration/toolsmith/test_self_extension_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Test E2E
- GitHub Check: Test Integration
- GitHub Check: Dashboard Test
- GitHub Check: Test Unit
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (8)
{README.md,docs/**/*.md,web/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics in README and public docs sourced from data/runtime_stats.yaml via markers per data/README.md
Files:
docs/design/tools.md
docs/**/*.{md,d2,mmd}
📄 CodeRabbit inference engine (CLAUDE.md)
Use d2 for architecture / nested containers; mermaid for flowcharts / sequence / pipelines; Markdown tables for tabular data; D2 theme 200, D2 CLI pinned to v0.7.1 in CI
Files:
docs/design/tools.md
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Nofrom __future__ import annotations(3.14 has PEP 649); PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use@computed_fieldfor derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Usefrom synthorg.observability import get_logger; variable alwayslogger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pytests/integration/toolsmith/test_self_extension_e2e.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary forbidden (zero-tolerance, no baseline) per check_mock_spec.py
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)
Files:
tests/integration/toolsmith/test_self_extension_e2e.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/integration/toolsmith/test_self_extension_e2e.py
{src/synthorg/persistence/**/*.py,src/synthorg/**/*repository*.py,src/synthorg/**/*_repo*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py
Files:
src/synthorg/persistence/postgres/tool_blueprint_repo.py
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed
Files:
src/synthorg/persistence/postgres/tool_blueprint_repo.py
🧠 Learnings (7)
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/tools.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/tools.md
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pytests/integration/toolsmith/test_self_extension_e2e.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pytests/integration/toolsmith/test_self_extension_e2e.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/strategy.pysrc/synthorg/meta/toolsmith/validation_gate.pysrc/synthorg/meta/toolsmith/models.pysrc/synthorg/persistence/postgres/tool_blueprint_repo.py
🔇 Additional comments (6)
docs/design/tools.md (1)
433-477: LGTM!Also applies to: 524-548
src/synthorg/meta/toolsmith/strategy.py (1)
16-18: LGTM!Also applies to: 130-142, 206-263
tests/integration/toolsmith/test_self_extension_e2e.py (1)
26-27: LGTM!Also applies to: 93-97, 167-287
src/synthorg/meta/toolsmith/validation_gate.py (1)
175-187: LGTM!src/synthorg/meta/toolsmith/models.py (1)
251-270: LGTM!src/synthorg/persistence/postgres/tool_blueprint_repo.py (1)
160-163: LGTM!
Post-round-7 CodeRabbit review (all 6 prior fixes confirmed LGTM); 2 new outside-diff findings: - meta/config.py: add a cross-flag validator so tool_creation_enabled (the runtime gate for boot wiring + scope guard) and toolsmith.enabled (the toolsmith's own allowlist-consistency switch) must agree. Without it an operator could enable one and silently leave the subsystem off, or configure an allowlist that never runs. - meta/toolsmith/validation_gate.py: make _synthesize_probe honour the schema keywords that fully determine a valid value (const, default, enum-first) before the type-based placeholder, so a constrained required field no longer gets a probe that spuriously fails the brief. Deeper constraint satisfaction (numeric bounds, minLength, recursive containers) is intentionally out of scope: the probe is a best-effort smoke test and a missed bound fails the brief gracefully. CodSpeed flagged a resolveLocale web-benchmark regression; that file is not touched by this PR (only the generated .gen.ts DTO types changed), so it is noise or main-baseline drift, observed not chased.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/meta/toolsmith/validation_gate.py (1)
128-160:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandler exceptions propagate instead of failing the brief gracefully.
Line 138 calls
await handler(...)without exception handling. If the sandbox handler raises (timeout, runtime error, permission issue), this propagates up rather than returning(False, _BRIEF_FAIL_SCORE). Per coding guidelines, async helpers should catchException(re-raisingMemoryError/RecursionError). The docstring states "passes iff the tool exits cleanly"—a crash should fail the brief, not raise.Proposed fix
async def run(self, blueprint: ToolBlueprint) -> tuple[bool, int]: """Execute the acceptance probe; return ``(passed, score)``.""" import json as _json # noqa: PLC0415 handler = make_dynamic_tool_handler( blueprint, self._sandbox_resolver(blueprint), timeout_seconds=self._timeout_seconds, ) probe = _synthesize_probe(blueprint.parameters_schema) - raw = await handler(app_state=None, arguments=probe) + try: + raw = await handler(app_state=None, arguments=probe) + except MemoryError, RecursionError: + raise + except Exception as exc: + logger.warning( + TOOLSMITH_BRIEF_PARSE_FAILED, + tool_name=blueprint.name, + error_type=type(exc).__name__, + error=safe_error_description(exc), + ) + return False, _BRIEF_FAIL_SCORE try: envelope = _json.loads(raw)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/meta/toolsmith/validation_gate.py` around lines 128 - 160, The async run method calls await handler(...) without catching exceptions so handler failures propagate; wrap the handler invocation in a try/except Exception block (while re-raising MemoryError and RecursionError) to ensure any handler error causes the brief to fail gracefully: on exception log a warning using the existing logger/TOOLSMITH_BRIEF_PARSE_FAILED (include blueprint.name, type(exc).__name__, and safe_error_description(exc)) and return False, _BRIEF_FAIL_SCORE; keep the rest of the JSON parsing flow unchanged and still parse raw only if the handler call succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/synthorg/meta/toolsmith/validation_gate.py`:
- Around line 128-160: The async run method calls await handler(...) without
catching exceptions so handler failures propagate; wrap the handler invocation
in a try/except Exception block (while re-raising MemoryError and
RecursionError) to ensure any handler error causes the brief to fail gracefully:
on exception log a warning using the existing
logger/TOOLSMITH_BRIEF_PARSE_FAILED (include blueprint.name, type(exc).__name__,
and safe_error_description(exc)) and return False, _BRIEF_FAIL_SCORE; keep the
rest of the JSON parsing flow unchanged and still parse raw only if the handler
call succeeds.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 28248649-d258-41d1-80b9-6871b165b262
📒 Files selected for processing (2)
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Test Integration
- GitHub Check: Test E2E
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Dashboard Test
- GitHub Check: Test Unit
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (3)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: No hardcoded numerics; numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing; enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Nofrom __future__ import annotations(3.14 has PEP 649); PEP 758 except:except A, B:no parens unless binding
Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Errors: Error from DomainError; never inherit Exception/RuntimeError/etc directly; enforced by check_domain_error_hierarchy.py
Pydantic v2 frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries)
Use@computed_fieldfor derived fields; use NotBlankStr for identifiers in Pydantic models
Args models at every system boundary; parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Usefrom synthorg.observability import get_logger; variable alwayslogger; never import logging or print() in app code
Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSI...
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/
Files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
🧠 Learnings (3)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/meta/config.pysrc/synthorg/meta/toolsmith/validation_gate.py
🔇 Additional comments (5)
src/synthorg/meta/config.py (1)
322-335: LGTM!src/synthorg/meta/toolsmith/validation_gate.py (4)
50-57: LGTM!Also applies to: 86-104
60-83: LGTM!
163-228: LGTM!
231-241: LGTM!
CodeRabbit confirmed both round-8 fixes LGTM; 1 new outside-diff finding. - meta/toolsmith/validation_gate.py: guard the handler invocation in SandboxBriefRunner.run. A sandbox handler crash (timeout, runtime error) now fails the brief gracefully (False, _BRIEF_FAIL_SCORE) with a logged warning instead of propagating out of the gate, matching the async-helper convention (re-raise MemoryError/RecursionError) and the docstring's 'passes iff the tool exits cleanly' contract. CI fully green; the round-8 CodSpeed resolveLocale regression cleared on re-run (confirmed noise, web locale code untouched by this PR).
Superseded: this CHANGES_REQUESTED review targeted an earlier head. All findings were addressed in later commits; CodeRabbit's review of the current head (7eaa0ed) reports no actionable comments, and CI is green.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Introduced conversational interface for direct clarify and propose interactions. - Cost management now includes forecast gates, hard ceilings, and Pareto considerations. - Added living documentation engine combining wiki and retrieval-augmented generation features. - Real intake engine is now operational for live data processing. - Virtual desktop tool with vision verification gate available for enhanced workspace control. ### What's new - Per-project reproducible environments for consistent setups. - Headless browser testing tool integrated for automated UI validation. - Governed external API and data access tool introduced. - Hardened external-remote git backend with sandbox mounts and push-queue dispatching. - Adversarial red-team gate subsystem for enhanced security testing. - Self-extending toolkit to dynamically expand capabilities. - Stakes-aware model routing enables prioritized processing. - Task-board entry adapter connects live runtime with project management. - Persistent project workspace with pluggable git backend and per-project push queues implemented. - Knowledge and provenance substrate added to track data lineage. - Scoring and data contract framework for golden-company benchmark evaluations. ### Under the hood - Desktop Dockerfile pinned by digest to improve build stability and documented publishing gap fixed. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.7](v0.8.6...v0.8.7) (2026-05-22) ### Features * conversational interface v1 - 1:1 clarify + propose ([#2019](#2019)) ([216ef94](216ef94)), closes [#1968](#1968) * cost as a first-class dial (forecast gate, hard ceiling, Pareto) ([#2029](#2029)) ([700a59e](700a59e)), closes [#1982](#1982) * **env:** reproducible per-project environments ([#2039](#2039)) ([d2c0ef9](d2c0ef9)), closes [#1994](#1994) * **evals:** [#1980](#1980) spine -- scoring + data contract for golden-company benchmark ([#2025](#2025)) ([53108e8](53108e8)) * goal/objective entry adapter ([#1964](#1964)) ([#2022](#2022)) ([cb15c3c](cb15c3c)) * governed external API/data access tool ([#1991](#1991)) ([#2032](#2032)) ([e08b451](e08b451)) * harden external-remote git backend + per-project sandbox mount + push-queue dispatch ([#2020](#2020)) ([#2030](#2030)) ([2fa2e1e](2fa2e1e)) * headless browser testing tool ([#1992](#1992)) ([#2024](#2024)) ([277b52a](277b52a)) * knowledge + provenance substrate ([#2036](#2036)) ([48c897b](48c897b)) * living documentation engine (dual-purpose wiki + RAG namespace) ([#2028](#2028)) ([3d10da9](3d10da9)), closes [#1976](#1976) * real intake engine online ([#2017](#2017)) ([9d8eb34](9d8eb34)) * **redteam:** adversarial red-team gate subsystem ([#1986](#1986)) ([#2026](#2026)) ([d2207e9](d2207e9)) * self-extending toolkit ([#1995](#1995)) ([#2035](#2035)) ([5ffc545](5ffc545)) * stakes-aware model routing ([#1998](#1998)) ([#2038](#2038)) ([9b98312](9b98312)) * task-board entry adapter to live runtime ([#1963](#1963)) ([#2023](#2023)) ([a8f1eea](a8f1eea)) * virtual desktop tool and vision verifier gate ([#2031](#2031)) ([dfe8b42](dfe8b42)), closes [#1993](#1993) * **workspace:** persistent project workspace + pluggable git backend + per-project push queue ([#2021](#2021)) ([ee58ee7](ee58ee7)) ### Bug Fixes * pin desktop Dockerfile by digest (Scorecard [#309](#309)) + document publish gap ([#2034](#2034)) ([8fda188](8fda188)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closes #1995.
Summary
The org now extends its own toolkit at runtime: it detects a recurring capability gap, an LLM authors a new sandboxed MCP tool, the candidate is benchmark-validated, and on pass plus human approval it is live-registered so a later task in the same run uses it. Tool creation is governed via a new
TOOL_CREATIONproposal altitude (HIGH risk, mandatory approval) and a newtool:createaction type.src/synthorg/meta/toolsmith/: models, config, errors, gap_store, dynamic_registry, script_handler, strategy (LLM blueprint authoring), validation_gate (per-tool acceptance brief + golden-scorecard delta), applier, service, factory,CodeModificationOverflowHandlerfor service-access gaps.LayeredToolRegistryplusLayeredHandlerMapread byMCPToolInvokeroverlay a snapshot-swappingDynamicToolRegistryon top of the staticDomainToolRegistry.build_args_modelmaterialises a real frozen Pydantic args model from the blueprint's JSON Schema so the strictMCPToolDefalignment validator keeps holding.DynamicToolRepositoryprotocol composingStatefulRepository+FilteredQueryRepository(no bespoke methods per ADR-0001), SQLite + Postgres repos, yoyo revision20260522000001_dynamic_tools, schema.sql additions on both backends.ProposalAltitude.TOOL_CREATIONplusImprovementProposal.tool_changes, guard-chain wiring (scope, approval),ActionType.TOOL_CREATEwith HIGH risk across the risk maps, autonomy presets gatingtool:createbehind human approval in supervised/semi._wire_toolsmithhook (tool_creation_enabled + provider + persistence) installs the dynamic tool layer over the layered surface; falls back cleanly when any dependency is absent. The production golden-scorecard provider is fail-closed until Golden-company benchmark (eval spine) #1980 exposes a runnable score-with-candidate API.Test plan
tests/unit/meta/toolsmith/covering models, config, gap store, strategy, validation gate, applier (incl. repo-save-failure, retire-non-active, validation-persistence), dynamic registry, script handler, overflow handler, layered invoker.tests/conformance/persistence/test_dynamic_tool_repository.py(SQLite + Postgres parity for the new repo, transition_if semantics, query/filter, unknown-key rejection).tests/integration/toolsmith/test_self_extension_e2e.py: drives the full gap to author to guard to approve to validate to register to invoke loop through the realMCPToolInvoker, with a regression-case showing a regressing tool is rejected.Review coverage
Reviewed by 17 agents (code-reviewer, python-reviewer, conventions-enforcer, security-reviewer, persistence-reviewer, async-concurrency-reviewer, silent-failure-hunter, type-design-analyzer, docs-consistency, comment-quality-rot, logging-audit, resilience-audit, test-quality-reviewer, pr-test-analyzer, api-contract-drift, issue-resolution-verifier, ghost-wiring mini-pass).
_audit/pre-pr-review/triage.md.except A, B:valid in 3.14, CLAUDE.md mandates the bare form, python-reviewer + resilience-audit explicitly confirmed it, and the pre-push (ruff + full mypy + 30k unit tests importing these modules) passed.ProviderErrorpropagation instrategy.author()/applier.apply()/code_modification._generate_and_parse(closes a real resilience hole flagged in the toolsmith delegation path), unknown-JSON-type rejection inbuild_args_model, silent brief-parse log,error=on the invoke-failed log,retire()race fix (unregister gated on transition success), boundedscript_body, config validator (enabledrequires non-empty allowlist), ACTIVE-blueprint requires validation record, expanded applier coverage tests, docs drift (action-type count, altitudes table, package structure, YAML defaults), and#1995stripped from the migration SQL headers.(#1995)because the gate-suppression PreToolUse hook blocked the direct edit and the--update-baselinescript preserves existing reasons. Cosmetic only (the review-origin gate passed and the baseline is data, not code); can be cleaned up in a follow-up by removing the 6 lines first, then re-running--update-baselinewith a clean justification under freshALLOW_BASELINE_GROWTH=1approval.GoldenScorecardProvider(runnable golden-benchmark-with-candidate API) lands when Golden-company benchmark (eval spine) #1980 exposes it; until thenrequire_golden_deltais fail-closed at apply (a missing provider rejects), not a stub.