feat: surface safety-spine state in runtime-services boot log (closes #2096)#2097
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an observability gap identified during a security audit, ensuring that the safety-spine state is clearly visible in the runtime boot logs. It also enhances system reliability by improving error reporting during the runtime coordinator initialization and provides a consolidated summary of the active subsystems upon successful startup. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
🧰 Additional context used📓 Path-based instructions (7)src/synthorg/!(persistence)/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
src/**/*.{py,ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/{api/lifecycle_builder,workers/runtime_builder}.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/workers/runtime_builder.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (4)📚 Learning: 2026-05-05T09:04:46.195ZApplied to files:
📚 Learning: 2026-05-21T22:55:20.496ZApplied to files:
📚 Learning: 2026-05-21T22:55:09.289ZApplied to files:
📚 Learning: 2026-05-23T12:24:00.128ZApplied to files:
🔇 Additional comments (4)
WalkthroughThis PR enriches the runtime-builder boot logs with security configuration state to address operator visibility into SecOps enforcement modes. The three 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request enhances the observability and robustness of the agent runtime startup process. Key changes include adding security configuration details (enabled status and enforcement mode) to startup logs and introducing structured logging to track the wiring of runtime components like the coordinator and work pipeline. Error handling in the runtime coordinator builder was improved using Python 3.14's parenthesisless multi-exception syntax and redacted exception logging. New unit tests verify that security state is correctly captured in boot logs across different provider scenarios. I have no feedback to provide.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/workers/runtime_builder.py`:
- Around line 826-834: The startup log for the "agent_engine_built" event
(logger.info call with API_APP_STARTUP and mode="agent_engine_built") is missing
the safety-spine fields; update that logger.info invocation to include the same
security fields used by other runtime_services startup events (e.g.,
security_enabled and security_enforcement_mode) so the event schema remains
consistent with other API_APP_STARTUP messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: e92dee5e-fc9b-4676-81f1-4edba762eaeb
📒 Files selected for processing (5)
.codespellrcscripts/_module_size_baseline.jsonsrc/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.pytests/unit/workers/test_runtime_builder.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: Build Backend
- GitHub Check: Test Unit (shard 4)
- GitHub Check: Test Integration (shard 3)
- GitHub Check: Test Unit (shard 2)
- GitHub Check: Test Integration (shard 2)
- GitHub Check: Test Integration (shard 4)
- GitHub Check: Test Unit (shard 1)
- GitHub Check: Test Unit (shard 3)
- GitHub Check: Test Integration (shard 1)
- GitHub Check: Runtime Stats Freshness Gate
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Test E2E
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: pyright (advisory)
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}; asyncauto; timeout 30s global; coverage 80% min
Windows: unit tests useWindowsSelectorEventLoopPolicy; subprocess tests override back
Test doubles: useFakeClockfor Clock seam,mock_of[T](**overrides)for typed-boundary substitutions,SimpleNamespacefor attribute-bags; bareMagicMockat typed boundaries is blocked
FakeClock andmock_ofimport fromtests._shared; inject viaclock=and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Never skip/xfail flaky tests; fix fundamentally; useasyncio.Event().wait()notsleep(large)
Files:
tests/unit/workers/test_runtime_builder.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/workers/test_runtime_builder.py
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Configuration precedence: DB > env > code default viaSettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env; noos.environ.getoutside startup
Non-provider transient I/O (e.g. git push/fetch) usescore.resilience.GeneralRetryHandlerwith aretryablepredicate, never hand-rolled loop
Conversational human content wrapped viawrap_untrusted(TAG_TASK_DATA, ...)(SEC-1)
Files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics must live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the formNAME: int|float|Final|Final[int]|Final[float] = literal
Module-size budget:controller400 LOC,service/orchestrator600 LOC,repository500 LOC,adapter/integration700 LOC,feature100 LOC,code500 LOC (default),tests800 LOC,declarativeexempt,generatedglob-exempt per# module-kind:header
Nofrom __future__ import annotationsin Python 3.14+ (has PEP 649); use PEP 758 except:except A, B:no parens unless binding
Type hints required on public functions; mypy strict type checking; Google-style docstrings; line length 88; functions <50 lines
Error classes must follow<Domain><Condition>Errornaming and inherit fromDomainError, never fromException/RuntimeErrordirectly
Pydantic v2: every frozen model must haveextra="forbid";@computed_fieldauto-exempt; use# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries
Use@computed_fieldfor derived fields in Pydantic models
UseNotBlankStrfor identifier fields in Pydantic models
Args models must be used at every system boundary; useparse_typed()for every external dict ingestion
Enforce immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries
Useasyncio.TaskGroupfor fan-out/fan-in async patterns; helpers must catchExceptionand re-raiseMemoryError/RecursionError
Clock seam:clock: Clock | None = Noneparameter in services; tests injectFakeClock
Services own_lifecycle_lock; timed-out stops mark services unrestartable
Untrusted content: usewrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML
Usefrom synthorg.observability import get_logger; variable must be namedlogger; neverimport loggingorprint()in app code
Event names must come fromobservability.events.<domain>...
Files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
src/**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Comments must explain WHY only; no reviewer citations / issue back-refs / migration framing
Files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
src/synthorg/{api/lifecycle_builder,workers/runtime_builder}.py
📄 CodeRabbit inference engine (CLAUDE.md)
EnvironmentService wires in
_install_runtime_servicesbehindhas_persistence; provisions ambiently viaActiveSandboxEnvironmentcontextvar before engine run
Files:
src/synthorg/workers/runtime_builder.py
src/synthorg/workers/runtime_builder.py
📄 CodeRabbit inference engine (CLAUDE.md)
Runtime services:
synthorg.workers.runtime_builder.build_runtime_servicesreturns aRuntimeServicespair selected by ONE provider-present switch; useAgentEngineExecutionService+ coordinator orNoProviderExecutionService+Noneas backstop
Files:
src/synthorg/workers/runtime_builder.py
🧠 Learnings (4)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/workers/test_runtime_builder.pysrc/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
📚 Learning: 2026-05-21T22:55:20.496Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/models.py:114-114
Timestamp: 2026-05-21T22:55:20.496Z
Learning: In this repo’s “magic number” review standard, the existing gate in `scripts/check_no_magic_numbers.py` intentionally does NOT flag numeric literals used as raw call-site arguments. So, do not flag numeric literals passed as keyword arguments to Pydantic `Field()` (e.g., `Field(ge=0, le=100)` / `Field(ge=1, le=50)`)—this is an established idiom. Only treat numeric literals as “magic numbers” when they occur in the locations the gate checks (module-level assignments and function/method parameter defaults).
Applied to files:
tests/unit/workers/test_runtime_builder.pysrc/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
📚 Learning: 2026-05-23T12:24:00.128Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2080
File: tests/_shared/test_postgres_proxy.py:19-48
Timestamp: 2026-05-23T12:24:00.128Z
Learning: When creating test doubles for Python typing.Protocols in tests, prefer a hand-written Protocol fake (a concrete class that explicitly implements the Protocol) over `mock_of[T]` if the Protocol only defines annotation-only attributes (e.g., `username: str`, `password: str`, `dbname: str`) with no class-level values/assignments. This is because `mock_of[T]` relies on `create_autospec(..., spec_set=True)`, which enumerates members via `dir(spec)`; annotation-only attributes are not included, so `mock_of`’s kwarg-based attribute setting can raise `AttributeError: attribute not present on spec type`. In that annotation-only case, don’t recommend `mock_of[T]`—use an explicit fake class instead.
Applied to files:
tests/unit/workers/test_runtime_builder.py
📚 Learning: 2026-05-21T22:55:09.289Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 2035
File: src/synthorg/meta/toolsmith/config.py:29-30
Timestamp: 2026-05-21T22:55:09.289Z
Learning: For this repo’s Pydantic configuration idiom, do not treat numeric literals passed directly as arguments to `pydantic.Field(...)` as “magic numbers” during review. This includes call-site usages like `Field(default=0.2, ge=0.0, le=1.0)` (e.g., in config models such as `ToolAuthoringConfig`, `ToolValidationConfig`, `ToolsmithConfig`). Do not request extracting those `Field(...)` numeric arguments into named constants, since the repo’s `scripts/check_no_magic_numbers.py` intentionally excludes call-site `Field(...)` numerics and relies on `Field(...)` as the canonical way to express these constraints/defaults.
Applied to files:
src/synthorg/workers/execution_service.pysrc/synthorg/workers/runtime_builder.py
🔇 Additional comments (5)
src/synthorg/workers/execution_service.py (1)
1-1: LGTM!src/synthorg/workers/runtime_builder.py (1)
1-1: LGTM!Also applies to: 185-207, 576-597, 774-783, 821-825, 840-840
tests/unit/workers/test_runtime_builder.py (1)
3-5: LGTM!Also applies to: 9-10, 24-24, 289-355
.codespellrc (1)
7-7: LGTM!scripts/_module_size_baseline.json (1)
159-159: LGTM!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2097 +/- ##
=======================================
Coverage 87.12% 87.13%
=======================================
Files 2251 2251
Lines 130302 130311 +9
=======================================
+ Hits 113531 113541 +10
+ Misses 16756 16755 -1
Partials 15 15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…r coordinator resolve and subsystem-built summary log
- Add security_enabled / security_enforcement_mode to the agent_engine_built startup log so the post-construction runtime_services event matches the schema of the agent_engine decision event right above it (CodeRabbit inline, runtime_builder.py:826-834). - Cover the _build_runtime_coordinator try/except resolve-failure path so codecov/patch is back at 100% on the touched lines (new TestRuntimeCoordinatorResolveFailure + extended TestBootLogSafetySpineState assertion for the agent_engine_built event).
6dab029 to
aa7037c
Compare
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - New brownfield codebase intake mode supports merger and acquisition scenarios. - Added deep CEO interview feature to improve project charter creation. - Introduced mission control and flight recorder operator cockpit for better operational oversight. - Research mode added for enhanced exploratory work. - Runtime services now log safety-spine state at boot for clearer diagnostics. ### What's new - Research mode feature enables deeper data exploration. - CEO interview integration helps shape project charters. - Mission control and flight recorder cockpit introduced for operational tracking. ### Under the hood - Improved codebase modularity with module-size gates and lint tightening. - Added __init__.py to 21 test directories for better test discovery. - Promoted six transitive dependencies to direct dependencies for clarity. - Split codespell ignore list into vocabulary and source renames. - Decomposed oversized web utilities, hooks, and libraries for maintainability. - Enhanced CI with Lychee link checker integration and retry logic for cosign signing. - Sharded unit and integration tests and added Postgres service container in CI. - Updated infrastructure and web dependencies; maintained lock files. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.8](v0.8.7...v0.8.8) (2026-05-24) ### Features * brownfield codebase intake (merger/acquisition entry mode) ([#2042](#2042)) ([e287621](e287621)), closes [#1975](#1975) * deep CEO interview to project charter ([#2045](#2045)) ([904f2fb](904f2fb)) * mission control + flight recorder operator cockpit ([#2044](#2044)) ([1c2660b](1c2660b)) * research mode ([#2041](#2041)) ([f81a5ac](f81a5ac)), closes [#1989](#1989) * surface safety-spine state in runtime-services boot log (closes [#2096](#2096)) ([#2097](#2097)) ([f187b31](f187b31)) ### Refactoring * add __init__.py to 21 leaf test directories (INP001) ([#2081](#2081)) ([2592118](2592118)), closes [#2064](#2064) * codebase modularity (1/4) - module-size gates + lint tightening + tools ([#2078](#2078)) ([556fbd9](556fbd9)), closes [#2047](#2047) [#2040](#2040) * promote 6 transitive deps to direct deps ([#2083](#2083)) ([adedc6a](adedc6a)) * split codespell ignore-words-list into vocab + source renames ([#2085](#2085)) ([917d98a](917d98a)), closes [#2074](#2074) * **web:** PR A foundation, decompose oversized utils/hooks/lib ([#2092](#2092)) ([#2098](#2098)) ([aedbba5](aedbba5)) ### CI/CD * exclude slsa.dev from lychee (transient timeout on canonical badge) ([#2090](#2090)) ([346c51d](346c51d)) * fix paths-filter shallow-clone race and scorecard allowlist ([#2089](#2089)) ([7cd7ce8](7cd7ce8)) * refresh .test_durations.{unit,integration} ([#2087](#2087)) ([ddf2d86](ddf2d86)) * retry cosign sign on transient GHCR/Rekor failures ([#2100](#2100)) ([da9422a](da9422a)) * shard test-unit + test-integration, sysmon coverage, Postgres service container ([#2080](#2080)) ([0768787](0768787)) * wire Lychee link-checker (workflow + installer + pre-push hook) ([#2084](#2084)) ([1c0694a](1c0694a)) ### Maintenance * Lock file maintenance ([#2086](#2086)) ([a78810a](a78810a)) * Update Infrastructure dependencies ([#2055](#2055)) ([041ad8b](041ad8b)) * Update Web dependencies ([#2054](#2054)) ([4d57b9a](4d57b9a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Audit follow-up to merged #1956 (PR #2003 "Agent runtime online + minimal safety spine"). The audit found 13 of 14 acceptance / scope items PASS; the only WEAK item (composed end-to-end test) was already covered by
tests/e2e/test_runtime_online_seam.py. The single genuine gap was operator observability of the SecOps spine state at boot. This PR closes that gap.Closes #2096.
What changed
Runtime-services boot log carries the safety-spine state so operators reading
synthorg.logcan see, at a glance, whether the SecOps interceptor will beactive/shadow/disabledonce a provider runs. Two structured kwargs (security_enabled,security_enforcement_mode) added to all threelogger.info(API_APP_STARTUP, service="runtime_services", ...)call sites insrc/synthorg/workers/runtime_builder.py(the two no-provider branches in_select_active_providerand the provider-present branch inbuild_runtime_services).Three pre-PR-review findings folded in by user request:
# module-kind: orchestratorheader added tosrc/synthorg/workers/runtime_builder.py(was defaulting tocodetier with cap 500 but is an orchestrator at 840 gate-LOC; cap is now correctlyorchestrator600 with the baseline carrying the actual size).# module-kind: serviceheader added tosrc/synthorg/workers/execution_service.py(sibling classification gap; same shape)._build_runtime_coordinatorTaskGroup so a failure in decomposition / routing-scorer / workspace config resolve logs operator context (vialog_exception_redacted) before propagating, instead of escaping silently.logger.info(API_APP_STARTUP, mode="agent_engine_built", ...)at the end ofbuild_runtime_servicesreporting which optional subsystems (coordinator, work pipeline, red team, vision gate) wired, so operators can see the final boot shape in one line.Test plan
tests/unit/workers/test_runtime_builder.pyadds aTestBootLogSafetySpineStateclass with three tests asserting the newsecurity_enabled+security_enforcement_modekwargs appear on each of the three boot branches (no-provider, empty-registry, provider-present), usingstructlog.testing.capture_logs.module-size budgetandno-ghost-wiring.tests/e2e/test_runtime_online_seam.py::test_runtime_executes_task_through_seam_with_safety_spine(unchanged).Review coverage
Pre-reviewed by 4 agents per "reduced agent count" instruction:
code-reviewer(APPROVED),python-reviewer(APPROVED),logging-audit(COMPLIANT),conventions-enforcer(1 MAJOR + 1 MEDIUM). Mandatorydocs-consistency/comment-quality-rot/ mini-pass agents skipped per explicit user instruction (no docs, no comment narrative, no ghost wiring introduced).All four valid findings (MAJOR + MEDIUM + 2 SUGGESTION) applied in this PR per user direction.
Files
src/synthorg/workers/runtime_builder.py(+23 gate-LOC: 3 log sites + 2 local vars + try / except wrapper + summary log + module-kind header)src/synthorg/workers/execution_service.py(module-kind header only; 0 gate-LOC)tests/unit/workers/test_runtime_builder.py(newTestBootLogSafetySpineStateclass; +73 lines)scripts/_module_size_baseline.json(mechanical baseline bump 809 to 840 forruntime_builder.py; user-approved).codespellrc(added.test_durations.unit,.test_durations.integrationto skip list; was blocking pre-commit on a deliberateparse_booltest parametrize ID)Manifest
No new
_ghost_wiring_manifest.txtentries.AgentEngineis alreadyENFORCEDfrom #1956.