Skip to content

chore: close remaining 5 sub-tasks of #1781 (b/e/f/i/k)#1852

Merged
Aureliolo merged 15 commits into
mainfrom
chore/issue-1781-cleanup-finish
May 9, 2026
Merged

chore: close remaining 5 sub-tasks of #1781 (b/e/f/i/k)#1852
Aureliolo merged 15 commits into
mainfrom
chore/issue-1781-cleanup-finish

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Closes #1781.

Closes the final five sub-tasks of the cleanup umbrella; PR #1838 already
landed a/c/d/g/h/j/l/m. After this merges, #1781 has no remaining work.

Sub-tasks

-b — Prompt eval coverage + supervisor router temperature pin

  • Pin _ROUTING_COMPLETION_CONFIG = CompletionConfig(temperature=0.0) at
    module level on the hierarchical retrieval supervisor and pass it on
    both complete() calls (route and evaluate_for_retry). Without
    the pin, routing was non-deterministic across CI shards.
  • Four new prompt eval suites under tests/evals/prompt/ covering the
    surfaces the audit flagged:
    • test_supervisor_router_prompt.py (temperature, fingerprints, call sites)
    • test_health_judge_behaviour.py (8 labelled inputs graded at 100% accuracy)
    • test_code_modification_prompt.py (config-driven temperature, fingerprint)
    • test_procedural_memory_prompt.py (config-driven temperature, fingerprint, call site)

-e — Py3.14 CI flake investigation

The audit flagged "Python 3.14 test flakes from 2026-05-05 runs". The
single failed Test (Python 3.14) job that day was a deterministic
symlink-escape edge case in a newly-added gate test, not a flake.
Documented in docs/reference/py314-flake-investigation-2026-05.md
along with already-merged related mitigations (#1713, #1755).

-f — Pydantic Any -> JsonValue / concrete unions (10 sites)

  • ModelMappingConfig.seniority_model_map -> dict[str, str] and the
    Mapping-input branch becomes dead code (Pydantic v2 already coerces
    any Mapping-like input).
  • CoordinationMiddlewareContext.{decomposition,routing,dispatch}_result
    / status_rollup -> the concrete DecompositionResult /
    RoutingResult / DispatchResult / SubtaskStatusRollup types.
    Surfaced a real production bug: ProgressLedgerMiddleware was
    reading getattr(rollup, "completed_count", 0) which always
    returned 0 on real rollups; switched to rollup.completed.
  • Meta payload fields (RollbackOperation.previous_value,
    ConfigChange.{old,new}_value, ArchitectureChange.payload,
    RuleMatch.signal_context) -> JsonValue / dict[str, JsonValue].
    Propagated the narrowing through the meta strategies
    (config_tuning / architecture / prompt_tuning) so their
    internal _propose_* methods accept dict[str, JsonValue].

-i — SQL parameterize approval-status enums

Replace inline f-string interpolation of ApprovalStatus.EXPIRED.value
and ApprovalStatus.PENDING.value in expire_if_pending UPDATE
statements with bound parameters; SQLite (?) and Postgres (%s)
backends both updated. The remaining id-list interpolation is a
fixed ?,?,? count derived from len(ids), retained under noqa: S608.

-k — Wire placeholder stubs

meta.chat -> ChiefOfStaffChat: replace the silent placeholder
response with real DI: new _chief_of_staff_chat slot on AppState,
new build_chief_of_staff_chat() factory in app_builders.py that
picks an LLM provider from the registry when
meta.chief_of_staff.chat_enabled is True, async startup task that
constructs the backend, and a controller path that calls
ChiefOfStaffChat.ask(query, snapshot) using the live
SignalsService snapshot. When the backend or signals service is
unavailable, the endpoint surfaces an explicit 503
ServiceUnavailableError instead of the silent fake answer with
confidence=0.0 it returned previously.

MemoryContextProvider Phase 2: refactor StrategicContextProvider.provide
to be async across the protocol and all three implementations
(Config / Memory / Composite). MemoryContextProvider now accepts an
optional MemoryBackend and queries it at provide-time for
SEMANTIC entries on the synthetic system:strategy agent tagged
strategic-context, parsing the entry content as a JSON object of
overridable fields and layering the overrides on top of the fallback
context. Missing backend, retrieval errors, malformed JSON, and
non-string overrides all degrade gracefully to the fallback.
build_context() takes an optional memory_backend kwarg.

Pre-PR review

Four agents (python-reviewer, security-reviewer, persistence-reviewer,
issue-resolution-verifier) ran in parallel; all five sub-tasks
verified RESOLVED. One valid SEC-1 finding addressed in this branch:
chief_of_staff/chat.py was using logger.exception(...) (which
implicitly sets exc_info=True) on the LLM error path, which can
leak provider credentials carried in exception strings; replaced
with logger.error(..., error_type=..., error=safe_error_description(exc)).
Same commit also tightened the meta.chat controller's
has_* / *_service accessor pairs to read into local variables once.

Test plan

  • uv run ruff check src/ tests/ — passes
  • uv run mypy src/ tests/ — passes
  • uv run python -m pytest tests/unit/ — passes
  • uv run python -m pytest tests/conformance/persistence/ -k "approval and expire" — 18 passed
  • uv run python -m pytest tests/evals/prompt/ — all new + existing eval suites pass
  • uv run python -m pytest tests/unit/api/controllers/test_meta_chat.py — 3 new tests pass
  • uv run python -m pytest tests/unit/engine/strategy/test_context.py — 18 tests (3 prior + 15 new for Phase 2 wiring) pass

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7133284-92cd-4d2a-8d7e-804f0148b18c

📥 Commits

Reviewing files that changed from the base of the PR and between bb449a4 and 2710034.

📒 Files selected for processing (34)
  • docs/reference/py314-flake-investigation-2026-05.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/meta/models.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_strategies.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/strategy/test_context.py
  • tests/evals/prompt/test_supervisor_router_prompt.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/strategy/test_context.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state.py
  • tests/unit/meta/test_code_modification_strategy.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/api/app.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/meta/models.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/app_builders.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/hr/promotion/config.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • src/synthorg/meta/strategies/config_tuning.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/engine/strategy/context.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default via SettingsService/ConfigResolver; no os.environ.get outside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live in settings/definitions/. Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14+ has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit directly from Exception, RuntimeError, etc. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived; use NotBlankStr for identifiers.
Use args models at every system boundary; call parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML.
Import logger as from synthorg.observability import get_logger; variable always named logger. Never import logging or print() in app code.
Event names from observability.events.<domain> constants; use stru...

Files:

  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/app.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/engine/strategy/context.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/app.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/persistence/postgres/approval_repo.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.

Files:

  • docs/reference/py314-flake-investigation-2026-05.md
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state.py
  • tests/unit/meta/test_code_modification_strategy.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/api/app.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/meta/models.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/app_builders.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/hr/promotion/config.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • src/synthorg/meta/strategies/config_tuning.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/engine/strategy/context.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md

[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (53)
docs/reference/py314-flake-investigation-2026-05.md (1)

1-67: Looks good — clear investigation record and actionable follow-up steps.

The write-up is specific (run IDs, failing test, root cause, and repro plan) and appropriately justifies why no additional mitigation is queued.

tests/unit/meta/test_appliers.py (1)

816-822: Good update: valid JSON-path coverage is now explicit and realistic.

This keeps the test aligned with the actual Pydantic boundary behavior and exercises the reachable happy path.

Also applies to: 830-830

src/synthorg/persistence/sqlite/approval_repo.py (2)

327-338: Good hardening in expire_if_pending SQL construction.

Binding EXPIRED, PENDING, and ids via params while interpolating only fixed placeholder markers is the right pattern and preserves atomic RETURNING id semantics.


453-453: limit default is documented consistently with local lint policy.

src/synthorg/persistence/postgres/approval_repo.py (2)

305-317: Nice parity update: status enums are now fully parameterized in Postgres CAS update.

This removes inline enum interpolation and keeps the compare-and-set update path safe and deterministic.


395-395: Default page-size lint annotation matches the repository convention.

tests/evals/prompt/test_health_judge_behaviour.py (4)

25-37: Helper signal factory is clean and deterministic.

This keeps test inputs concise and repeatable across scenarios.


39-120: Labelled example matrix is well-structured and regression-focused.

Coverage spans key causes plus no-ticket cases, including threshold-edge behavior.


122-141: Grader-based contract assertion is solid.

Requiring 100% labelled accuracy is appropriate for heuristic regression protection.


143-171: Severity boundary tests are precise and valuable.

The HIGH vs CRITICAL split at threshold boundaries is clearly validated.

src/synthorg/hr/promotion/config.py (1)

81-103: Type narrowing + immutable wrapping are implemented cleanly.

This change tightens the field contract and preserves read-only behavior after validation without introducing side effects.

src/synthorg/engine/middleware/coordination_protocol.py (2)

15-30: Concrete pipeline result imports look correct.

Good move to import explicit DispatchResult/DecompositionResult/SubtaskStatusRollup/RoutingResult types instead of relying on Any.


66-81: Context field narrowing is a solid type-safety improvement.

Switching these fields to concrete *Result | None strengthens validation and reduces downstream ambiguity.

tests/unit/engine/middleware/test_coordination_protocol.py (1)

301-301: Type-ignore adjustment is appropriate for frozen-field assignment test.

This keeps static checks aligned while preserving the runtime ValidationError assertion.

src/synthorg/engine/middleware/coordination_constraints.py (1)

147-152: Using rollup.completed as the canonical source is the right fix.

This is cleaner and safer than attribute-probing fallback logic for completed_count.

tests/unit/engine/middleware/test_coordination_constraints.py (2)

80-119: Typed test helpers/context wiring are well aligned with the production model changes.

Using concrete DecompositionResult and SubtaskStatusRollup here improves contract fidelity.


144-145: Updated call sites correctly exercise typed rollup/decomposition paths.

Nice coverage alignment with the CoordinationMiddlewareContext type narrowing.

Also applies to: 158-159, 182-183, 192-193, 243-244

scripts/mock_spec_baseline.txt (1)

2243-2249: Baseline update looks correct.

These entries are consistent with a line/column refresh of existing allowlisted call sites.

src/synthorg/meta/strategies/architecture.py (1)

10-11: Type narrowing is clean and consistent.

The updated context annotations align with the strategy contract without altering behavior.

Also applies to: 100-100, 146-146

tests/unit/meta/rollout/test_inverse_dispatch.py (1)

4-4: Good test helper typing update.

This keeps the helper contract aligned with the updated rollback model field type.

Also applies to: 60-60

tests/unit/meta/test_strategies.py (1)

4-4: Looks good.

The rule-context typing updates are consistent across helper and fixture usage.

Also applies to: 66-66, 156-156

tests/unit/meta/test_code_modification_strategy.py (1)

7-7: Nice consistency improvement in test typing.

The _rule helper now matches the updated context shape expectations.

Also applies to: 67-67

src/synthorg/meta/strategies/config_tuning.py (1)

13-14: Consistent type-tightening across all config proposal helpers.

This is a clean, uniform update with no logic drift.

Also applies to: 118-118, 167-167, 212-212, 262-262, 308-308, 352-352

src/synthorg/meta/strategies/prompt_tuning.py (1)

11-12: Good localized typing update.

The context annotation tightening is clear and consistent here.

Also applies to: 152-152

src/synthorg/meta/models.py (1)

10-10: Model typing changes are coherent and well-applied.

The JSON-compatible field type tightening is consistent across related models.

Also applies to: 18-18, 124-124, 160-161, 180-180, 439-439

tests/evals/prompt/test_code_modification_prompt.py (1)

60-77: Avoid first-call indexing for CompletionConfig assertions.

Line 60 (config_calls[0]) is still order-sensitive and can fail/false-pass after unrelated refactors. This concern was already raised earlier and the same fix pattern applies here.

Suggested minimal refactor
-        call = config_calls[0]
-        assert any(
-            kw.arg == "temperature"
-            and ast.unparse(kw.value) == "self._code_config.temperature"
-            for kw in call.keywords
-        ), (
-            "CompletionConfig.temperature must come from "
-            "self._code_config.temperature so the value stays runtime-"
-            "tunable rather than hardcoded"
-        )
-        assert any(
-            kw.arg == "max_tokens"
-            and ast.unparse(kw.value) == "self._code_config.max_tokens"
-            for kw in call.keywords
-        ), (
-            "CompletionConfig.max_tokens must come from "
-            "self._code_config.max_tokens so token budgets stay tunable"
-        )
+        assert any(
+            any(
+                kw.arg == "temperature"
+                and ast.unparse(kw.value) == "self._code_config.temperature"
+                for kw in call.keywords
+            )
+            and any(
+                kw.arg == "max_tokens"
+                and ast.unparse(kw.value) == "self._code_config.max_tokens"
+                for kw in call.keywords
+            )
+            for call in config_calls
+        ), (
+            "CompletionConfig must bind both temperature and max_tokens "
+            "from self._code_config"
+        )
src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)

83-86: Deterministic routing config is correctly pinned and reused.

The module-level config and both complete(...) call sites consistently use config=_ROUTING_COMPLETION_CONFIG, which cleanly enforces deterministic routing/retry behavior.

Also applies to: 232-236, 301-305

tests/evals/prompt/test_supervisor_router_prompt.py (1)

66-102: AST-based call-site contract check is solid.

Great choice to assert real complete(..., config=...) bindings via AST instead of source-string matching; this materially reduces false positives.

tests/evals/prompt/test_procedural_memory_prompt.py (1)

70-79: The procedural prompt contract checks are now robust against AST-order drift.

The order-independent CompletionConfig check and provider-scoped complete(...) matching make this suite much less brittle to harmless refactors.

Also applies to: 131-156

src/synthorg/meta/chief_of_staff/chat.py (1)

252-258: LGTM!

The exception handling correctly follows SEC-1 guidelines: it logs structured error details using error_type=type(exc).__name__ and error=safe_error_description(exc) without exposing raw exception strings or using exc_info=True, then re-raises the exception.

src/synthorg/observability/events/meta.py (1)

173-183: LGTM!

The new META_CHAT_DEPENDENCY_UNAVAILABLE event constant follows the established naming convention and the comment accurately describes its purpose for warning-level diagnostics when the /meta/chat endpoint cannot be served due to missing dependencies.

src/synthorg/api/app_builders.py (1)

132-176: LGTM!

The build_chief_of_staff_chat factory function is well-structured: it correctly handles the disabled case, logs a warning when chat is enabled but no providers are available, and cleanly returns None for the 503 fallback path. The in-function import of ChiefOfStaffChat avoids the circular import described in the comment block.

src/synthorg/api/state_services_facades.py (1)

176-176: LGTM!

The new _chief_of_staff_chat slot is initialized to None following the established pattern for all other facade service slots in this method.

src/synthorg/api/state.py (1)

173-173: LGTM!

The _chief_of_staff_chat slot is correctly added to AppState.__slots__ in alphabetical order, matching the initialization in state_services_facades.py and the accessors in state_services_facades_mcp3.py.

src/synthorg/api/app.py (1)

964-992: LGTM!

The _wire_chief_of_staff_chat startup hook is correctly implemented:

  • Idempotent via the has_chief_of_staff_chat check at line 972
  • Gracefully handles missing provider_registry
  • Only attaches the backend when build_chief_of_staff_chat returns a non-None value
src/synthorg/api/controllers/meta.py (2)

306-379: LGTM!

The /meta/chat endpoint implementation is solid:

  • Correctly sets status_code=200 for this query-only POST endpoint
  • Properly logs META_CHAT_DEPENDENCY_UNAVAILABLE with context before raising ServiceUnavailableError for both missing dependencies
  • Constructs ChatQuery with the optional proposal_id and alert_id from the request
  • Returns the structured response from the chat backend

30-37: LGTM!

The ChatRequest model correctly follows Pydantic DTO conventions with frozen=True and extra="forbid", uses NotBlankStr for the question field with a reasonable max length, and adds optional UUID fields for contextual queries.

src/synthorg/api/state_services_facades_mcp3.py (1)

203-225: LGTM!

The ChiefOfStaffChat facade accessors follow the established pattern exactly: has_chief_of_staff_chat checks for presence, the property accessor uses _require_service for 503 behavior, and the setter uses _attach_service which emits the standard audit event.

tests/unit/api/controllers/test_meta_chat.py (3)

28-49: Strong deterministic snapshot fixture for controller tests.

The _empty_snapshot() helper is well-typed and keeps test setup explicit and repeatable.


77-128: Great wiring verification for ChiefOfStaffChat.ask().

This test now checks both payload shaping and call contract (ChatQuery fields + exact snapshot object), which meaningfully protects the controller wiring path.


56-75: Good 503 guard coverage for missing dependencies.

Both unavailable-service branches are asserted with explicit status and response shape, matching the intended fail-closed behavior.

Also applies to: 130-153

src/synthorg/observability/events/strategy.py (1)

19-19: LGTM!

The new event constant follows the established naming convention and is properly typed.

src/synthorg/engine/strategy/context.py (6)

8-34: LGTM!

Imports are properly organized and constants use NotBlankStr for identifiers as required.


36-57: LGTM!

The args model correctly uses frozen=True, strict=True, and extra="ignore" for forward-compatible boundary parsing of external memory content. The docstring clearly explains the validation contract.


59-97: LGTM!

Protocol and provider correctly define async provide methods with proper structured logging.


126-196: LGTM!

The provide method correctly handles all fallback scenarios with proper PEP 758 exception syntax, uses safe_error_description for safe logging, validates external data via parse_typed, and applies overrides immutably via model_copy.


199-237: LGTM!

CompositeContextProvider properly chains providers with async support, re-raises system errors (MemoryError/RecursionError), and raises RuntimeError with proper chaining when all providers fail.


240-287: LGTM!

The async build_context factory correctly selects providers based on ContextSource and the scaffolding comment clearly explains why CompositeContextProvider wraps a single provider today.

tests/unit/engine/strategy/test_context.py (5)

27-36: LGTM!

The _entry() helper cleanly constructs MemoryEntry fixtures with proper types, improving test readability.


39-65: LGTM!

Config provider tests correctly verify async behavior and field mapping.


68-223: LGTM!

Comprehensive coverage of MemoryContextProvider fallback paths and override application logic. Mocks properly use spec=MemoryBackend per guidelines.


226-279: LGTM!

Composite provider tests cover success, validation, fallback, and failure scenarios with clear inline helper classes.


282-338: LGTM!

build_context factory tests comprehensively cover all ContextSource branches with both degraded (no-backend) and happy (with-backend) paths.


Walkthrough

This PR tightens JSON typing across Pydantic models and strategy contexts, converts strategic-context providers to async with a memory-backed provider, and wires a Chief‑of‑Staff chat backend into app startup, AppState facades, and the /meta/chat endpoint (with dependency guards and observability). It pins supervisor routing CompletionConfig to temperature=0.0, adjusts ProgressLedger rollup semantics, parameterizes approval SQL, and adds multiple prompt-eval and unit tests plus a Python-3.14 CI investigation doc and mock-spec baseline updates.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 9, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing chore/issue-1781-cleanup-finish (2710034) with main (e96ae20)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/hr/promotion/config.py (1)

80-102: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Align seniority_model_map's declared type with its immutable runtime value.

The field is declared as mutable dict[str, str] but line 101 stores a MappingProxyType. This creates a misleading API: code that attempts mutations will type-check but fail at runtime. Change the annotation to Mapping[str, str] to reflect the immutable contract enforced by the validator.

Proposed fix
+from collections.abc import Mapping
 from types import MappingProxyType
 from typing import Self
@@
-    seniority_model_map: dict[str, str] = Field(
+    seniority_model_map: Mapping[str, str] = Field(
         default_factory=dict,
         description="Explicit seniority level to model ID overrides "
         "(wrapped as MappingProxyType after validation)",
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/hr/promotion/config.py` around lines 80 - 102, The field
annotation for seniority_model_map is misleading: change its declared type from
dict[str, str] to Mapping[str, str] to reflect that _validate_model_map_keys
wraps the value in MappingProxyType (making it immutable at runtime); update the
Field declaration's type only (seniority_model_map) so static type-checkers
match the runtime behavior enforced by MappingProxyType in the
_validate_model_map_keys method and ensure imports include typing.Mapping if not
already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/api/app.py`:
- Around line 957-980: The startup hook _wire_chief_of_staff_chat must avoid
calling the one-shot setter set_chief_of_staff_chat if the AppState already has
a chief-of-staff chat set; modify _wire_chief_of_staff_chat to check the
AppState (e.g., an existing flag or accessor like
app_state.has_chief_of_staff_chat or similar) before calling
app_state.set_chief_of_staff_chat, and only call set_chief_of_staff_chat when
none is set (or alternatively catch and ignore the specific duplicate-set error
from set_chief_of_staff_chat), using build_chief_of_staff_chat and
provider_registry as before.

In `@src/synthorg/engine/strategy/context.py`:
- Around line 246-254: The COMPOSITE branch currently creates a
CompositeContextProvider wrapping a single MemoryContextProvider which already
uses ConfigContextProvider as its fallback, so the composite wrapper is
redundant; either remove the CompositeContextProvider and construct
MemoryContextProvider(config_provider, memory_backend) to match the MEMORY
branch, or if you intend to keep scaffolding for multiple providers add a short
comment above ContextSource.COMPOSITE explaining that CompositeContextProvider
is intentionally used to enable future multi-provider composition (mentioning
CompositeContextProvider, MemoryContextProvider, ConfigContextProvider,
config_provider, memory_backend in the comment).

---

Outside diff comments:
In `@src/synthorg/hr/promotion/config.py`:
- Around line 80-102: The field annotation for seniority_model_map is
misleading: change its declared type from dict[str, str] to Mapping[str, str] to
reflect that _validate_model_map_keys wraps the value in MappingProxyType
(making it immutable at runtime); update the Field declaration's type only
(seniority_model_map) so static type-checkers match the runtime behavior
enforced by MappingProxyType in the _validate_model_map_keys method and ensure
imports include typing.Mapping if not already present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e3a5ee0-7592-468c-827c-df8e8a9b66de

📥 Commits

Reviewing files that changed from the base of the PR and between 0b8b5bb and 14394c7.

📒 Files selected for processing (33)
  • docs/reference/py314-flake-investigation-2026-05.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/meta/models.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/evals/prompt/test_code_modification_prompt.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/evals/prompt/test_code_modification_prompt.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/architecture.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/meta/models.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/app_builders.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/meta/models.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default via SettingsService/ConfigResolver; no os.environ.get outside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live in settings/definitions/. Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14+ has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit directly from Exception, RuntimeError, etc. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived; use NotBlankStr for identifiers.
Use args models at every system boundary; call parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML.
Import logger as from synthorg.observability import get_logger; variable always named logger. Never import logging or print() in app code.
Event names from observability.events.<domain> constants; use stru...

Files:

  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/meta/models.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/strategy/context.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/meta/models.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/strategy/context.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.

Files:

  • docs/reference/py314-flake-investigation-2026-05.md
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/architecture.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/meta/models.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/app_builders.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/strategy/context.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md

[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (48)
tests/unit/meta/test_appliers.py (1)

823-823: Valid test input type now matches the intended payload shape.

Switching tool_access to a list in the happy-path test is the right fix and keeps this case consistent with the validator contract.

tests/evals/prompt/test_health_judge_behaviour.py (4)

39-41: Good test classification and scope.

Class-level @pytest.mark.unit is correctly applied for this deterministic behavioural suite.


43-120: Strong labelled example coverage for regression safety.

The cases pin core branches and boundaries (stagnation/error paths, threshold edges, streak break, and double-threshold behavior) cleanly.


122-142: Strict 100% grading gate is the right contract here.

Using a hard accuracy requirement is a solid guardrail for heuristic regression detection.


143-171: Severity threshold behavior is explicitly locked down.

Separate tests for HIGH at threshold and CRITICAL at 2x threshold make escalation semantics unambiguous.

src/synthorg/engine/middleware/coordination_constraints.py (1)

147-151: Progress derivation now correctly uses the canonical rollup field.

Nice fix: reading from rollup.completed and persisting into ProgressLedger.completed_count resolves the prior field mismatch cleanly.

src/synthorg/engine/middleware/coordination_protocol.py (2)

15-17: Concrete type imports are a solid boundary-tightening step.

These imports support explicit result typing and remove ambiguity in middleware context state.

Also applies to: 22-25, 30-30


66-81: Context field narrowing is a strong improvement for middleware safety.

Switching to DecompositionResult | None, RoutingResult | None, DispatchResult | None, and SubtaskStatusRollup | None makes the lifecycle states explicit and type-checkable.

tests/unit/engine/middleware/test_coordination_protocol.py (1)

301-301: Type-ignore annotation is now more precise.

Good refinement to keep the ignore scoped to the concrete assignment-check category.

tests/unit/engine/middleware/test_coordination_constraints.py (4)

11-11: Import updates correctly align tests with typed middleware contracts.

Using concrete decomposition/rollup models here is the right move for parity with production annotations.

Also applies to: 16-19, 37-37


80-102: Typed rollup/decomposition helpers improve test realism and stability.

Constructing real DecompositionResult and SubtaskStatusRollup objects makes these tests much more representative than placeholder values.


106-107: _mw_context signature narrowing is consistent with the protocol changes.

This keeps test fixtures aligned with the stricter CoordinationMiddlewareContext field types.


144-145: Updated callsites correctly exercise the new typed path end-to-end.

These replacements ensure assertions validate actual model behavior, not permissive mock/string inputs.

Also applies to: 158-159, 182-183, 192-193, 243-244

docs/reference/py314-flake-investigation-2026-05.md (1)

1-66: LGTM! Well-structured investigation report.

The document clearly distinguishes the deterministic symlink-resolution edge case from a genuine flake, provides actionable future guidance, and properly uses Markdown tables for the run data. The technical explanation is thorough and the rationale for not pursuing additional isolation work is sound.

tests/unit/meta/test_strategies.py (1)

4-4: JsonValue typing alignment in test helpers looks correct.

The updated helper/context annotations are consistent with JSON-safe rule payloads and keep the test surface strict without changing behavior.

Also applies to: 63-67, 156-157

src/synthorg/meta/strategies/prompt_tuning.py (1)

11-12: Prompt-tuning strategy typing update is clean and consistent.

JsonValue usage for ctx matches the strategy’s JSON context handling and keeps behavior unchanged.

Also applies to: 150-153

src/synthorg/meta/strategies/architecture.py (1)

10-11: Architecture strategy type narrowing is well-applied.

Both proposal builders now use JSON-compatible context typing consistently.

Also applies to: 98-101, 144-147

src/synthorg/meta/strategies/config_tuning.py (1)

13-14: Config-tuning helper signatures are consistently tightened.

ctx now uses dict[str, JsonValue] uniformly across all proposal builders, which keeps the strategy surface coherent.

Also applies to: 116-119, 165-168, 210-213, 260-263, 306-309, 350-353

scripts/mock_spec_baseline.txt (1)

2243-2249: Mock-spec baseline coordinate refresh looks correct.

The updated call-site coordinates for the shifted test file are consistent with baseline maintenance.

src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)

39-40: Deterministic completion config is correctly centralized and applied.

Pinning temperature=0.0 once and passing it to both route and retry complete() calls cleanly removes provider-default variance.

Also applies to: 83-87, 232-236, 301-305

tests/evals/prompt/test_supervisor_router_prompt.py (1)

29-83: Prompt-contract coverage is strong and targeted.

The suite correctly pins temperature, fingerprints both prompts, and asserts pinned config usage at both production call sites.

src/synthorg/meta/models.py (1)

18-19: Model type tightening to JsonValue is consistent and beneficial.

These field updates improve DTO strictness for JSON-like payloads while preserving the existing model structure.

Also applies to: 124-125, 160-161, 180-181, 439-440

tests/unit/meta/rollout/test_inverse_dispatch.py (1)

4-4: Type narrowing to JsonValue is a good alignment with rollback model contracts.

This update improves test-helper type fidelity without changing behavior.

Also applies to: 56-61

tests/unit/meta/test_code_modification_strategy.py (1)

7-7: ctx typed as dict[str, JsonValue] is a solid contract-tightening change.

Looks correct and consistent with JSON-typed model usage in tests.

Also applies to: 64-68

tests/evals/prompt/test_code_modification_prompt.py (1)

27-81: Strong prompt-contract coverage for code-modification strategy.

Fingerprint pinning + anti-injection directive + config-driven CompletionConfig assertions are all valuable and well-scoped.

tests/evals/prompt/test_procedural_memory_prompt.py (1)

30-99: Procedural-memory prompt contract tests look robust and purposeful.

These checks provide good guardrails for prompt drift and completion-config regressions.

src/synthorg/observability/events/strategy.py (1)

19-19: LGTM!

The new event constant follows the established naming convention and is logically grouped with other strategy.context.* events.

src/synthorg/engine/strategy/context.py (4)

26-36: LGTM!

Module constants are well-documented with docstrings explaining their purpose. Using NotBlankStr for identifiers follows project conventions.


39-76: LGTM!

The async protocol and ConfigContextProvider implementation are clean. Structured logging with the STRATEGY_CONTEXT_BUILT event follows observability guidelines.


106-174: LGTM!

The MemoryContextProvider.provide implementation correctly:

  • Re-raises MemoryError/RecursionError per project convention
  • Uses safe_error_description(exc) for SEC-1 compliant logging
  • Validates override values as non-blank strings before applying
  • Uses model_copy(update=...) for immutability

The PLR0911 suppression is appropriate given the multiple fallback paths required by the graceful degradation design.


177-215: LGTM!

The CompositeContextProvider correctly chains providers with proper error handling. The RuntimeError on line 215 is appropriate for the "should never happen" fallback scenario documented in the comment.

tests/unit/engine/strategy/test_context.py (5)

27-36: LGTM!

The _entry helper cleanly constructs test fixtures with appropriate type annotations and realistic structure matching the production MemoryEntry model.


39-65: LGTM!

Tests correctly verify async ConfigContextProvider behavior with both default and custom configurations.


68-197: LGTM!

Comprehensive test coverage for MemoryContextProvider including all fallback paths (no backend, empty results, exceptions, invalid JSON, non-object JSON) and override validation (full, partial, blank/non-string filtering). Proper use of AsyncMock(spec=MemoryBackend) follows mock-spec guidelines.


200-253: LGTM!

Tests verify composite provider behavior: successful resolution, constructor validation, fallback on failure, and proper RuntimeError propagation when all providers fail.


256-294: LGTM!

Tests verify build_context factory behavior for all ContextSource variants, including degradation when no memory backend is provided and override application when one is injected.

src/synthorg/meta/chief_of_staff/chat.py (1)

252-257: SEC-1-safe error logging update looks good.

Using error_type plus safe_error_description(exc) here is the correct structured pattern.

src/synthorg/persistence/sqlite/approval_repo.py (1)

327-338: expire_if_pending parameterization is correctly applied.

Binding both status enums through params removes string interpolation risk while preserving the existing CAS behavior.

src/synthorg/persistence/postgres/approval_repo.py (1)

305-317: Postgres expire_if_pending status binding is implemented correctly.

Good parity with the SQLite backend and improved query safety/readability.

src/synthorg/api/state.py (1)

173-173: Slot update is correct.

Adding _chief_of_staff_chat to __slots__ is the right change for the new AppState field.

src/synthorg/api/state_services_facades.py (1)

176-176: Facade slot initialization is consistent.

Initializing _chief_of_staff_chat in _init_facade_service_slots() keeps AppState startup invariants intact.

src/synthorg/api/app_builders.py (1)

121-165: build_chief_of_staff_chat() is a solid boundary builder.

The opt-in gating and provider-availability checks are clear and align with the intended 503 fallback behavior upstream.

src/synthorg/api/controllers/meta.py (1)

312-358: Meta chat endpoint wiring is correctly implemented.

Good shift from placeholder output to real backend invocation with explicit 503 behavior when required services are unavailable.

src/synthorg/api/state_services_facades_mcp3.py (3)

4-12: Docstring update is clear and aligned with the new /meta/chat wiring.

Good addition of chief_of_staff_chat and endpoint context at module level.


33-73: Typed AppState slot wiring looks correct.

Import + slot declaration for ChiefOfStaffChat cleanly extends the existing facade typing pattern.


198-219: Chief-of-Staff facade triple follows the established one-shot pattern.

has_*, getter via _require_service, and set_* via _attach_service are consistent and safe.

tests/unit/api/controllers/test_meta_chat.py (2)

28-50: Snapshot helper is well-structured for deterministic endpoint tests.

The fully-typed OrgSignalSnapshot fixture keeps test setup explicit and stable.


52-136: Scenario coverage is solid for the new chat wiring contract.

The suite cleanly covers both unavailable paths and the successful wired path, with proper state rollback.

Comment thread src/synthorg/api/app.py
Comment thread src/synthorg/engine/strategy/context.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Chief of Staff chat backend, enabling LLM-powered explanations for signals and proposals via a new API endpoint. It also enhances the strategic context system to support dynamic overrides from the memory backend and improves type safety across meta-models by replacing generic types with JsonValue. Critical feedback was provided regarding syntax errors in exception handling within src/synthorg/engine/strategy/context.py, where multiple exceptions were incorrectly separated by commas instead of being parenthesized as a tuple. Additionally, the PR includes security improvements by transitioning database queries to use parameterized inputs and adds comprehensive evaluation tests for prompt determinism.

limit=1,
),
)
except builtins.MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except statement uses a comma to separate multiple exceptions, which is invalid syntax in Python 3 and will result in a SyntaxError. To catch multiple exceptions, they must be provided as a parenthesized tuple. Additionally, for consistency with the use of builtins.MemoryError, RecursionError should also be prefixed with builtins..

        except (builtins.MemoryError, builtins.RecursionError):

return provider.provide(config=config)
except MemoryError, RecursionError:
return await provider.provide(config=config)
except builtins.MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except statement uses a comma to separate multiple exceptions, which is invalid syntax in Python 3 and will result in a SyntaxError. It should use a parenthesized tuple instead. For consistency, both exceptions should use the builtins. prefix.

            except (builtins.MemoryError, builtins.RecursionError):

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 87.96992% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.85%. Comparing base (e96ae20) to head (2710034).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/app_builders.py 27.27% 7 Missing and 1 partial ⚠️
src/synthorg/api/app.py 66.66% 2 Missing and 2 partials ⚠️
src/synthorg/engine/strategy/context.py 94.54% 2 Missing and 1 partial ⚠️
src/synthorg/api/state_services_facades_mcp3.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1852      +/-   ##
==========================================
- Coverage   84.86%   84.85%   -0.02%     
==========================================
  Files        1801     1801              
  Lines      104892   104993     +101     
  Branches     9198     9206       +8     
==========================================
+ Hits        89012    89087      +75     
- Misses      13649    13667      +18     
- Partials     2231     2239       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Aureliolo Aureliolo force-pushed the chore/issue-1781-cleanup-finish branch from 14394c7 to b805d5b Compare May 9, 2026 21:49
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 9, 2026 21:51 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/meta/test_appliers.py (1)

815-828: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add test case for tuple acceptance in tool_access validation.

The validator at line 515 of architecture_applier.py still accepts both list | tuple for tool_access. However, the valid-path test only covers lists (line 823). Add a test case verifying that tuples also pass validation, either by extending the existing test_dry_run_tool_access_valid_passes or creating a parametrized variant that tests both list and tuple inputs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/meta/test_appliers.py` around lines 815 - 828, Extend the existing
test for tool_access validation to cover tuple inputs as well: update or
parametrize test_dry_run_tool_access_valid_passes to call
ArchitectureApplier().dry_run with a proposal whose payload.tool_access is a
tuple (e.g., ("git","shell")) in addition to the existing list case so both list
and tuple inputs are asserted to pass; reference the test name
test_dry_run_tool_access_valid_passes, the ArchitectureApplier class, the
dry_run method, and the tool_access payload key when making the change.
src/synthorg/api/app_builders.py (1)

21-29: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move type annotations for this public helper out of TYPE_CHECKING to make them runtime-resolvable.

build_chief_of_staff_chat() is a public function whose signature references ChiefOfStaffConfig, ProviderRegistry, CostTracker, and ChiefOfStaffChat—all imported only under TYPE_CHECKING. Under Python 3.14 + PEP 649, typing.get_type_hints() evaluates annotation names against module globals at runtime. These names won't exist in the namespace when introspectors (doc generators, type checkers, Litestar's plugin loader) call get_type_hints(), causing NameError.

The codebase has an established pattern for this: see src/synthorg/api/app_helpers.py:15-23, which explicitly imports ChannelsPlugin at runtime (with noqa: TC002) because it appears in public function signatures.

Move these four imports out of TYPE_CHECKING.

Also applies to: 121-126

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/api/app_builders.py` around lines 21 - 29, The four types used
in the public function signatures (ChiefOfStaffConfig, ProviderRegistry,
CostTracker, ChiefOfStaffChat) are only imported under TYPE_CHECKING which makes
them unavailable at runtime for typing.get_type_hints; move their imports out of
the TYPE_CHECKING block so they are real module-level names (follow the existing
pattern used for ChannelsPlugin imports) and ensure both occurrences referenced
in this file (the earlier import block and the later block around the other
occurrences) are updated so build_chief_of_staff_chat and any other public
functions can be introspected without NameError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/api/controllers/meta.py`:
- Around line 329-345: The 503 branches skip the standard logging; before
raising ServiceUnavailableError for missing chief_of_staff_chat or
signals_service, call the existing AppState accessors or _require_service() (or
equivalent methods) to emit the standard WARNING/ERROR logs, or explicitly log a
warning with context using the same logger/format as other service checks
(reference app_state, chief_of_staff_chat, signals_service, and
AppState._require_service); then raise the ServiceUnavailableError. Ensure
messages include which service is missing and relevant context (e.g.,
meta.chief_of_staff.chat_enabled) to match project logging conventions.

In `@src/synthorg/engine/strategy/context.py`:
- Around line 140-166: The code parses JSON from MemoryBackend into a raw dict
and hand-validates fields instead of using the centralized args model; update
MemoryContextProvider's parsing to run the decoded payload through parse_typed()
with a small overrides args model (e.g., MemoryOverridesArgs) to
validate/normalize fields, then derive overrides from the parsed model instead
of ad hoc checks over payload and _OVERRIDABLE_FIELDS; keep the same fallback
flow by calling self._fallback.provide(config=config) when parse_typed raises or
the model indicates invalid/missing data, and use the model's typed attributes
when constructing the overrides passed to provide().
- Around line 108-110: The code emits STRATEGY_CONTEXT_BUILT before calling the
fallback, which double-counts builds; change the logic in the branch where
self._memory_backend is None (and where self._fallback.provide(config=...) is
called) so that you either (a) await self._fallback.provide first and then call
logger.debug(STRATEGY_CONTEXT_BUILT, source="memory_no_backend") after the
fallback returns, or (b) replace the emitted event with
STRATEGY_CONTEXT_MEMORY_QUERIED (or another non-build outcome) before awaiting
the fallback; update references around self._memory_backend,
self._fallback.provide, STRATEGY_CONTEXT_BUILT and
STRATEGY_CONTEXT_MEMORY_QUERIED accordingly.

In `@tests/evals/prompt/test_code_modification_prompt.py`:
- Around line 42-51: Replace fragile substring assertions with an AST-based
check: parse the string returned by inspect.getsource(CodeModificationStrategy)
(the source variable) with ast.parse, find the ClassDef for
CodeModificationStrategy, locate the Call node where CompletionConfig is
constructed, and assert that there are keyword arguments named "temperature" and
"max_tokens" whose values are ast.Attribute nodes referencing
self._code_config.temperature and self._code_config.max_tokens (i.e.,
ast.Attribute(value=ast.Attribute(value=Name(id='self'), attr='_code_config'),
attr='temperature'/'max_tokens')). Use this structural check instead of text
contains to ensure the constructor binds to self._code_config attributes.

In `@tests/evals/prompt/test_supervisor_router_prompt.py`:
- Around line 76-83: Replace the brittle source-string count with an AST-based
check: parse the string from inspect.getsource(supervisor) using ast.parse, walk
Call nodes and count calls whose function name/attribute is "complete" (cover
both attribute and direct name) and which have a keyword named "config" whose
value is an ast.Name with id "_ROUTING_COMPLETION_CONFIG"; then assert that this
AST-derived count is >= 2 to ensure both _route_via_llm and _evaluate_via_llm
call complete(config=_ROUTING_COMPLETION_CONFIG).

In `@tests/unit/engine/strategy/test_context.py`:
- Around line 288-294: Test currently only covers the degraded/no-backend path
for ContextSource.COMPOSITE; add a unit that constructs a StrategyConfig where
context.source == ContextSource.COMPOSITE and context.memory_backend (or the
appropriate StrategicContextConfig field) is set to a real/injected backend,
then call build_context(config) and assert the returned context reflects the
backend-backed path (e.g., maturity_stage or other backend-driven fields).
Locate the test function test_composite_source and create a variant (or extend
it) that injects a concrete memory backend instance into the
StrategicContextConfig, awaits build_context, and asserts the expected values to
prevent regressions in the COMPOSITE wrapper logic.

---

Outside diff comments:
In `@src/synthorg/api/app_builders.py`:
- Around line 21-29: The four types used in the public function signatures
(ChiefOfStaffConfig, ProviderRegistry, CostTracker, ChiefOfStaffChat) are only
imported under TYPE_CHECKING which makes them unavailable at runtime for
typing.get_type_hints; move their imports out of the TYPE_CHECKING block so they
are real module-level names (follow the existing pattern used for ChannelsPlugin
imports) and ensure both occurrences referenced in this file (the earlier import
block and the later block around the other occurrences) are updated so
build_chief_of_staff_chat and any other public functions can be introspected
without NameError.

In `@tests/unit/meta/test_appliers.py`:
- Around line 815-828: Extend the existing test for tool_access validation to
cover tuple inputs as well: update or parametrize
test_dry_run_tool_access_valid_passes to call ArchitectureApplier().dry_run with
a proposal whose payload.tool_access is a tuple (e.g., ("git","shell")) in
addition to the existing list case so both list and tuple inputs are asserted to
pass; reference the test name test_dry_run_tool_access_valid_passes, the
ArchitectureApplier class, the dry_run method, and the tool_access payload key
when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3953c9b5-65aa-43f6-bc46-b869e7cd2ef3

📥 Commits

Reviewing files that changed from the base of the PR and between 14394c7 and b805d5b.

📒 Files selected for processing (33)
  • docs/reference/py314-flake-investigation-2026-05.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/meta/models.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/meta/models.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default via SettingsService/ConfigResolver; no os.environ.get outside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live in settings/definitions/. Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14+ has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit directly from Exception, RuntimeError, etc. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived; use NotBlankStr for identifiers.
Use args models at every system boundary; call parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML.
Import logger as from synthorg.observability import get_logger; variable always named logger. Never import logging or print() in app code.
Event names from observability.events.<domain> constants; use stru...

Files:

  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/strategy/context.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/api/app_builders.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/strategy/context.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/strategy/context.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.

Files:

  • docs/reference/py314-flake-investigation-2026-05.md
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/persistence/postgres/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/api/app_builders.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/meta/models.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/meta/strategies/config_tuning.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/strategy/context.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md

[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (23)
src/synthorg/persistence/sqlite/approval_repo.py (2)

327-338: Status parameterization in expire_if_pending looks correct.

This keeps the compare-and-set query atomic while removing enum-value interpolation from SQL text.


453-453: Default pagination line change is safe.

The inline lint annotation does not alter repository behavior.

src/synthorg/persistence/postgres/approval_repo.py (2)

305-317: Good update: status values are now fully bound in the UPDATE.

This aligns the Postgres path with secure parameterized SQL handling while preserving CAS semantics.


395-395: list_items default-limit annotation is non-functional and safe.

docs/reference/py314-flake-investigation-2026-05.md (1)

1-66: LGTM! Thorough investigation documentation.

This document provides clear investigation methodology, detailed root cause analysis, and actionable future steps. The Markdown table is properly formatted per coding guidelines, and the content effectively communicates why no further isolation work is warranted.

Note: The LanguageTool hint flagging line 33 is a false positive—the spelling and grammar are correct.

src/synthorg/engine/middleware/coordination_protocol.py (2)

15-30: Good type import alignment for coordination pipeline results.

The imported result types are clear and match the narrowed context contract.


66-81: Type narrowing on CoordinationMiddlewareContext looks solid.

Switching these fields to concrete result types improves validation and downstream safety.

tests/unit/engine/middleware/test_coordination_protocol.py (1)

301-301: Targeted type-ignore update is appropriate.

Using a specific assignment ignore here is cleaner and better scoped.

src/synthorg/engine/middleware/coordination_constraints.py (1)

147-151: Canonical rollup.completed usage is the right fix.

The new snapshot source is explicit and avoids dynamic attribute ambiguity.

tests/unit/engine/middleware/test_coordination_constraints.py (2)

11-19: Typed helper refactor is well done.

These helper and context signature changes make the tests much closer to real model contracts.

Also applies to: 37-37, 80-119


144-144: Call-site updates correctly follow the new typed fixtures.

The revised inputs are consistent with the narrowed middleware context types.

Also applies to: 158-158, 182-182, 192-192, 243-243

src/synthorg/meta/strategies/architecture.py (1)

10-11: JsonValue narrowing is consistent and safe.

This typing refinement is clean and aligns with JSON-shaped signal context usage in these proposal builders.

Also applies to: 100-101, 146-147

src/synthorg/meta/strategies/prompt_tuning.py (1)

11-12: Type tightening for ctx looks correct.

The JsonValue annotation matches how rule context is consumed here and keeps behavior unchanged.

Also applies to: 152-153

src/synthorg/meta/strategies/config_tuning.py (1)

13-14: Consistent JSON context typing across proposal helpers.

Good sweep: all _propose_* context parameters now use the same dict[str, JsonValue] contract.

Also applies to: 118-119, 167-168, 212-213, 262-263, 308-309, 352-353

src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)

39-40: Deterministic routing/retry wiring is implemented correctly.

Using a shared pinned CompletionConfig(temperature=0.0) and passing it at both completion call sites satisfies the non-drift determinism goal.

Also applies to: 83-87, 235-236, 304-305

src/synthorg/hr/promotion/config.py (1)

8-11: Model-map typing and immutability handling look solid.

Switching to Mapping[str, str] and preserving read-only semantics via MappingProxyType(copy.deepcopy(...)) is a good tightening.

Also applies to: 81-85, 96-103

tests/evals/prompt/test_health_judge_behaviour.py (1)

43-172: Excellent deterministic behavioral coverage for HealthJudge.emit_ticket.

The labelled examples plus explicit severity-threshold tests provide strong regression protection for heuristic logic.

src/synthorg/meta/models.py (1)

10-18: JsonValue migration is correctly applied.

These model field changes tighten JSON typing without introducing incompatibilities in the shown validators or model contracts.

Also applies to: 124-124, 160-161, 180-180, 439-439

tests/unit/meta/test_strategies.py (1)

4-4: Test typing alignment looks good.

The JsonValue annotations in helper/context fixtures are consistent with the production model narrowing and keep tests type-accurate.

Also applies to: 66-67, 156-163

tests/unit/meta/test_code_modification_strategy.py (1)

7-7: Helper context typing update is correct.

JsonValue usage in _rule keeps test inputs consistent with the stricter JSON-typed signal context contract.

Also applies to: 67-68

scripts/mock_spec_baseline.txt (1)

2243-2249: Mock baseline coordinate updates are consistent.

The adjusted entries maintain expected format/order for the frozen bare-mock allowlist.

tests/unit/meta/rollout/test_inverse_dispatch.py (1)

4-4: Rollback helper typing is correctly tightened.

_op(previous_value: JsonValue) now aligns with the production rollback model and keeps test fixtures schema-accurate.

Also applies to: 60-61

tests/evals/prompt/test_procedural_memory_prompt.py (1)

1-100: New prompt-contract eval suite looks solid.

The tests provide clear guardrails for prompt drift and completion-config wiring on the procedural memory proposer path.

Comment thread src/synthorg/api/controllers/meta.py
Comment thread src/synthorg/engine/strategy/context.py
Comment thread src/synthorg/engine/strategy/context.py Outdated
Comment thread tests/evals/prompt/test_code_modification_prompt.py
Comment thread tests/evals/prompt/test_supervisor_router_prompt.py
Comment thread tests/unit/engine/strategy/test_context.py
@Aureliolo Aureliolo force-pushed the chore/issue-1781-cleanup-finish branch from b805d5b to 50d1260 Compare May 9, 2026 22:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/api/state_services_facades_mcp3.py (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale service count in class docstring.

Line 42 says this mixin covers five services, but it now exposes six (including chief_of_staff_chat).

Suggested patch
-    """Facade accessors for the five META-MCP-3 services."""
+    """Facade accessors for the six META-MCP-3 services."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/api/state_services_facades_mcp3.py` at line 42, Update the stale
docstring that currently says "Facade accessors for the five META-MCP-3
services" to reflect that the mixin now exposes six services (including
chief_of_staff_chat); edit the docstring in
src/synthorg/api/state_services_facades_mcp3.py (the class/mixin docstring) to
mention six services and optionally list or acknowledge the newly exposed
chief_of_staff_chat to keep the comment accurate.
src/synthorg/api/controllers/meta.py (1)

306-374: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Set status_code=200 for this query-only POST endpoint.

Litestar 2.21.1 defaults POST handlers to 201 Created when status_code is omitted. The /meta/chat endpoint queries the ChiefOfStaffChat backend and returns computed results—it does not create a server resource. POST operations without side effects should return 200 OK instead.

Proposed fix
     `@post`(
         "/chat",
+        status_code=200,
         guards=[
             require_org_mutation(),
             per_op_rate_limit_from_policy("meta.chat", key="user"),
         ],
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/api/controllers/meta.py` around lines 306 - 374, The POST
handler method chat should explicitly set status_code=200 on its `@post` decorator
because it is a query-only endpoint (no resource creation) and Litestar defaults
POST to 201; update the `@post`(...) call that decorates async def chat(...) to
include status_code=200 so the endpoint returns 200 OK for successful queries to
ChiefOfStaffChat.
♻️ Duplicate comments (2)
tests/evals/prompt/test_procedural_memory_prompt.py (1)

44-52: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Harden source-contract checks by using AST instead of raw substring matching.

These two assertions are format-sensitive and can pass/fail for textual reasons rather than semantic wiring. Switching to AST keeps the contract robust across harmless refactors.

Proposed refactor
+import ast
 import inspect
@@
         from synthorg.memory.procedural.proposer import ProceduralMemoryProposer
 
         source = inspect.getsource(ProceduralMemoryProposer)
-        assert "temperature=config.temperature" in source, (
+        tree = ast.parse(source)
+        config_calls = [
+            node
+            for node in ast.walk(tree)
+            if isinstance(node, ast.Call)
+            and isinstance(node.func, ast.Name)
+            and node.func.id == "CompletionConfig"
+        ]
+        assert config_calls, "expected at least one CompletionConfig(...) call"
+        call = config_calls[0]
+        assert any(
+            kw.arg == "temperature"
+            and ast.unparse(kw.value) == "config.temperature"
+            for kw in call.keywords
+        ), (
             "ProceduralMemoryProposer must build CompletionConfig from "
             "config.temperature so the value stays runtime-tunable"
         )
-        assert "max_tokens=config.max_tokens" in source, (
+        assert any(
+            kw.arg == "max_tokens"
+            and ast.unparse(kw.value) == "config.max_tokens"
+            for kw in call.keywords
+        ), (
             "ProceduralMemoryProposer must read max_tokens from "
             "ProceduralMemoryConfig so token budgets stay tunable"
         )
@@
         from synthorg.memory.procedural.proposer import ProceduralMemoryProposer
 
         source = inspect.getsource(ProceduralMemoryProposer.propose)
-        assert "config=self._completion_config" in source, (
+        tree = ast.parse(source)
+        complete_calls = [
+            node
+            for node in ast.walk(tree)
+            if isinstance(node, ast.Call)
+            and isinstance(node.func, ast.Attribute)
+            and node.func.attr == "complete"
+        ]
+        assert any(
+            any(
+                kw.arg == "config"
+                and ast.unparse(kw.value) == "self._completion_config"
+                for kw in call.keywords
+            )
+            for call in complete_calls
+        ), (
             "ProceduralMemoryProposer.propose must pass "
             "config=self._completion_config to provider.complete; a "
             "fresh inline CompletionConfig() construction would defeat "
             "the runtime-tunable contract"
         )

Also applies to: 93-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/evals/prompt/test_procedural_memory_prompt.py` around lines 44 - 52,
Replace fragile substring assertions in the test with AST-based checks: parse
the source of ProceduralMemoryProposer (inspect.getsource) into an AST and
verify that the code constructs a CompletionConfig (or calls a
function/constructor named CompletionConfig) with keyword arguments that bind to
config.temperature and config.max_tokens respectively; update the checks around
the other similar assertions (the second group referencing the same class) to
use the same AST pattern-matching so the test verifies semantic wiring rather
than exact formatting.
src/synthorg/engine/strategy/context.py (1)

157-170: 🛠️ Refactor suggestion | 🟠 Major

The typed args model is good, but this boundary still skips parse_typed().

json.loads() is still ingesting backend data and handing it directly to model_validate(), so this path bypasses the repository’s standard boundary parser. Please run decoded through parse_typed() with _StrategicContextOverridesArgs and keep the same fallback behavior around validation failure.

As per coding guidelines, "Use args models at every system boundary; call parse_typed() for every external dict ingestion."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/engine/strategy/context.py` around lines 157 - 170, The code
currently calls _StrategicContextOverridesArgs.model_validate(decoded) directly
after json.loads, bypassing the repository's boundary parser; replace that
direct validation with a call to parse_typed(_StrategicContextOverridesArgs,
decoded) and use its result for args, preserving the current except/ fallback
behavior: if parse_typed raises/returns a validation error, log
STRATEGY_CONTEXT_PROVIDER_FAILED with the same provider_name/stage/error info
and return await self._fallback.provide(config=config); ensure references to
_StrategicContextOverridesArgs, parse_typed(), STRATEGY_CONTEXT_PROVIDER_FAILED
and self._fallback.provide(...) are used so the external dict always passes
through parse_typed before being accepted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/observability/events/meta.py`:
- Around line 176-182: The comment explaining why the chat endpoint bypasses
AppState._require_service is inaccurate: update the comment text so it states
that the central accessor raises ServiceUnavailableError (503) rather than a
generic 500; reference AppState._require_service and ServiceUnavailableError in
the updated comment near the POST /meta/chat explanation so future readers
understand the endpoint must surface 503 (service unavailable) instead of
relying on the central accessor.

In `@tests/unit/api/controllers/test_meta_chat.py`:
- Around line 77-110: In the test_returns_chat_payload_when_wired test, add
assertions to verify the controller actually forwards the ChatQuery and snapshot
to the backend: assert chat_mock.ask was awaited once with a ChatQuery instance
containing the expected question and any forwarded fields like
proposal_id/alert_id, and assert the snapshot passed equals the value returned
by signals_mock.get_org_snapshot (use the _empty_snapshot() value). Reference
the test function name, chat_mock.ask, ChatQuery, and
signals_mock.get_org_snapshot when implementing these assertions to ensure the
wiring is covered.

---

Outside diff comments:
In `@src/synthorg/api/controllers/meta.py`:
- Around line 306-374: The POST handler method chat should explicitly set
status_code=200 on its `@post` decorator because it is a query-only endpoint (no
resource creation) and Litestar defaults POST to 201; update the `@post`(...) call
that decorates async def chat(...) to include status_code=200 so the endpoint
returns 200 OK for successful queries to ChiefOfStaffChat.

In `@src/synthorg/api/state_services_facades_mcp3.py`:
- Line 42: Update the stale docstring that currently says "Facade accessors for
the five META-MCP-3 services" to reflect that the mixin now exposes six services
(including chief_of_staff_chat); edit the docstring in
src/synthorg/api/state_services_facades_mcp3.py (the class/mixin docstring) to
mention six services and optionally list or acknowledge the newly exposed
chief_of_staff_chat to keep the comment accurate.

---

Duplicate comments:
In `@src/synthorg/engine/strategy/context.py`:
- Around line 157-170: The code currently calls
_StrategicContextOverridesArgs.model_validate(decoded) directly after
json.loads, bypassing the repository's boundary parser; replace that direct
validation with a call to parse_typed(_StrategicContextOverridesArgs, decoded)
and use its result for args, preserving the current except/ fallback behavior:
if parse_typed raises/returns a validation error, log
STRATEGY_CONTEXT_PROVIDER_FAILED with the same provider_name/stage/error info
and return await self._fallback.provide(config=config); ensure references to
_StrategicContextOverridesArgs, parse_typed(), STRATEGY_CONTEXT_PROVIDER_FAILED
and self._fallback.provide(...) are used so the external dict always passes
through parse_typed before being accepted.

In `@tests/evals/prompt/test_procedural_memory_prompt.py`:
- Around line 44-52: Replace fragile substring assertions in the test with
AST-based checks: parse the source of ProceduralMemoryProposer
(inspect.getsource) into an AST and verify that the code constructs a
CompletionConfig (or calls a function/constructor named CompletionConfig) with
keyword arguments that bind to config.temperature and config.max_tokens
respectively; update the checks around the other similar assertions (the second
group referencing the same class) to use the same AST pattern-matching so the
test verifies semantic wiring rather than exact formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e0d9f28-e3d1-429a-8ebf-59f77009f665

📥 Commits

Reviewing files that changed from the base of the PR and between b805d5b and 50d1260.

📒 Files selected for processing (34)
  • docs/reference/py314-flake-investigation-2026-05.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/app.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services_facades.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/meta/models.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/engine/strategy/test_context.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/meta/test_appliers.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/api/state_services_facades.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/strategy/context.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default via SettingsService/ConfigResolver; no os.environ.get outside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live in settings/definitions/. Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14+ has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit directly from Exception, RuntimeError, etc. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived; use NotBlankStr for identifiers.
Use args models at every system boundary; call parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML.
Import logger as from synthorg.observability import get_logger; variable always named logger. Never import logging or print() in app code.
Event names from observability.events.<domain> constants; use stru...

Files:

  • src/synthorg/api/state_services_facades.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/strategy/context.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

  • src/synthorg/api/state_services_facades.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/unit/meta/test_code_modification_strategy.py
  • src/synthorg/observability/events/meta.py
  • tests/unit/meta/test_appliers.py
  • src/synthorg/meta/strategies/architecture.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/hr/promotion/config.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/models.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/app.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/controllers/meta.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/engine/strategy/context.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/state_services_facades.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • src/synthorg/observability/events/meta.py
  • src/synthorg/meta/strategies/architecture.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/hr/promotion/config.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/models.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/app.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • src/synthorg/engine/strategy/context.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.

Files:

  • docs/reference/py314-flake-investigation-2026-05.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_appliers.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/engine/strategy/test_context.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/middleware/test_coordination_protocol.py
  • tests/unit/meta/test_strategies.py
  • tests/unit/api/controllers/test_meta_chat.py
  • tests/unit/meta/test_code_modification_strategy.py
  • tests/unit/meta/test_appliers.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • tests/unit/engine/strategy/test_context.py
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/persistence/**/*.py: Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples.
Datetime in persistence: use parse_iso_utc / format_iso_utc from persistence._shared (reject naive); use normalize_utc for already-typed.

Files:

  • src/synthorg/persistence/postgres/approval_repo.py
  • src/synthorg/persistence/sqlite/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/api/state_services_facades.py
  • src/synthorg/memory/retrieval/hierarchical/supervisor.py
  • src/synthorg/observability/events/strategy.py
  • tests/unit/engine/middleware/test_coordination_protocol.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • tests/unit/meta/test_strategies.py
  • src/synthorg/persistence/postgres/approval_repo.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/engine/middleware/coordination_protocol.py
  • tests/unit/meta/test_code_modification_strategy.py
  • src/synthorg/observability/events/meta.py
  • tests/unit/meta/test_appliers.py
  • src/synthorg/meta/strategies/architecture.py
  • tests/evals/prompt/test_supervisor_router_prompt.py
  • src/synthorg/api/state.py
  • src/synthorg/meta/strategies/config_tuning.py
  • src/synthorg/persistence/sqlite/approval_repo.py
  • src/synthorg/hr/promotion/config.py
  • tests/evals/prompt/test_code_modification_prompt.py
  • src/synthorg/engine/middleware/coordination_constraints.py
  • src/synthorg/meta/models.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/evals/prompt/test_health_judge_behaviour.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/app.py
  • tests/unit/engine/middleware/test_coordination_constraints.py
  • src/synthorg/meta/strategies/prompt_tuning.py
  • src/synthorg/api/controllers/meta.py
  • tests/unit/meta/rollout/test_inverse_dispatch.py
  • src/synthorg/meta/chief_of_staff/chat.py
  • tests/unit/engine/strategy/test_context.py
  • src/synthorg/engine/strategy/context.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md

[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (26)
tests/unit/meta/test_appliers.py (1)

816-822: Good correction to keep this test on the real JSON-validated path.

Switching tool_access to a list and documenting why tuple input is unreachable at runtime makes this test more accurate and less misleading.

Also applies to: 830-830

src/synthorg/hr/promotion/config.py (2)

81-85: Type narrowing for seniority_model_map looks good.

Mapping[str, str] at Line 81 cleanly removes loose typing and makes the config contract explicit.


96-103: Immutability enforcement is correctly retained after validation.

The Line 99-Line 103 deepcopy + MappingProxyType wrapping keeps the frozen config practically read-only for downstream consumers.

src/synthorg/persistence/sqlite/approval_repo.py (2)

327-338: Status parameterization in expire_if_pending looks correct.

Binding both transition statuses and ids through params preserves atomic update behavior while avoiding enum interpolation in SQL.


453-453: Intentional default page size is clearly documented.

The explicit lint-allow plus existing limit validation/clamp keeps pagination behavior controlled.

src/synthorg/persistence/postgres/approval_repo.py (2)

305-317: Postgres CAS update is safely parameterized.

Using bound params for both status values and the ANY(%s) id list is a solid hardening change without altering semantics.


395-395: Pagination default change is controlled and readable.

The default is explicit, and runtime bounds enforcement in the method continues to prevent oversized queries.

src/synthorg/engine/middleware/coordination_protocol.py (1)

15-30: Strong type narrowing on coordination pipeline context.

This update cleanly replaces broad result placeholders with concrete result models and keeps the context contract aligned with downstream middleware expectations.

Also applies to: 66-81

tests/unit/engine/middleware/test_coordination_protocol.py (1)

301-301: Type-ignore refinement is precise and appropriate for this negative test.

The annotation now matches the stricter assignment typing intent without changing runtime validation coverage.

src/synthorg/engine/middleware/coordination_constraints.py (1)

147-151: Canonical rollup progress source is correctly applied.

Using SubtaskStatusRollup.completed directly is clearer and keeps progress snapshots consistent round-to-round.

tests/unit/engine/middleware/test_coordination_constraints.py (1)

80-119: Typed test fixtures/context updates are well aligned with the production model narrowing.

The helper and callsite updates improve realism of unit inputs and keep tests synchronized with the stricter middleware context contracts.

Also applies to: 144-158, 182-243

scripts/mock_spec_baseline.txt (1)

2243-2249: Baseline refresh looks consistent.

The updated test_code_modification_strategy coordinates are aligned with the intended mock-spec baseline maintenance flow.

src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)

39-40: Deterministic routing/retry config is wired correctly.

Centralizing CompletionConfig(temperature=0.0) and passing it to both LLM call sites cleanly enforces reproducible supervisor decisions.

Also applies to: 83-87, 107-109, 232-236, 301-305

tests/evals/prompt/test_supervisor_router_prompt.py (1)

30-31: Strong prompt-contract coverage.

This suite pins determinism and prompt drift effectively, and the AST-based check for config=_ROUTING_COMPLETION_CONFIG is a solid guard against brittle source-text matching.

Also applies to: 34-42, 44-64, 66-102

tests/evals/prompt/test_health_judge_behaviour.py (1)

39-41: Behavioral eval coverage is clear and deterministic.

The labelled matrix plus explicit HIGH/CRITICAL threshold assertions gives good regression protection for HealthJudge.emit_ticket.

Also applies to: 43-142, 143-172

src/synthorg/meta/models.py (1)

18-19: JsonValue tightening is applied consistently.

These field-type updates correctly narrow payload/context contracts to JSON-compatible values without changing behavior semantics.

Also applies to: 124-124, 160-161, 180-180, 439-439

src/synthorg/meta/strategies/prompt_tuning.py (1)

11-12: Context typing alignment looks good.

Using dict[str, JsonValue] here is consistent with the tightened meta model context contract.

Also applies to: 152-152

src/synthorg/meta/strategies/config_tuning.py (1)

13-14: JsonValue context migration is coherent across helpers.

The updated annotations consistently reflect JSON-shaped signal-context inputs throughout this strategy.

Also applies to: 118-118, 167-167, 212-212, 262-262, 308-308, 352-352

src/synthorg/meta/strategies/architecture.py (1)

10-11: Architecture strategy typing update is consistent.

dict[str, JsonValue] on context inputs aligns cleanly with the upstream rule-context contract.

Also applies to: 100-100, 146-146

tests/unit/meta/test_strategies.py (1)

4-4: JsonValue typing updates are clean and consistent.

These changes tighten test helper typing without altering behavior, and they align with the JSON-shaped model contracts.

Also applies to: 66-67, 156-157

tests/unit/meta/rollout/test_inverse_dispatch.py (1)

4-4: Rollback helper typing change looks good.

previous_value: JsonValue is a good fit for the rollback payload semantics and keeps the tests aligned with model typing.

Also applies to: 60-61

tests/unit/meta/test_code_modification_strategy.py (1)

7-7: Type narrowing in _rule is appropriate.

The JsonValue shift is consistent with the rest of the meta JSON typing cleanup.

Also applies to: 67-68

tests/evals/prompt/test_code_modification_prompt.py (1)

32-77: Strong prompt-contract coverage here.

The combination of AST-based config binding checks plus pinned prompt fingerprint and anti-injection assertions is solid.

Also applies to: 79-107

tests/unit/engine/strategy/test_context.py (1)

322-338: Good end-to-end pin for the COMPOSITE path.

This closes the degraded-path-only gap and should catch wrapper regressions that the no-backend case would miss.

src/synthorg/meta/chief_of_staff/chat.py (1)

252-257: Good secure-error logging update.

Structured error_type + safe_error_description(exc) here is the right pattern before re-raising.

src/synthorg/api/app.py (1)

964-991: Startup wiring is correctly idempotent now.

The has_chief_of_staff_chat guard prevents duplicate one-shot attachment on lifespan re-entry.

Comment thread src/synthorg/observability/events/meta.py Outdated
Comment thread tests/unit/api/controllers/test_meta_chat.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/evals/prompt/test_procedural_memory_prompt.py`:
- Around line 125-144: The current AST check collects any *.complete(...) call;
narrow it to only provider.complete calls used in
ProceduralMemoryProposer.propose by restricting complete_calls to nodes where
node.func.attr == "complete" AND node.func.value represents the provider object
(e.g., node.func.value is an ast.Name with id "provider" or an ast.Attribute
whose attr == "provider"); then keep the existing keyword check to assert that
one of those provider.complete calls includes config with ast.unparse(...) ==
"self._completion_config".
- Around line 63-77: The test currently assumes the first CompletionConfig call
is the one built from ProceduralMemoryConfig by using call = config_calls[0],
which is brittle; change the assertion to search across all entries in
config_calls and assert that there exists a call whose keywords include both
temperature == "config.temperature" and max_tokens == "config.max_tokens".
Concretely, replace the single-call checks that reference call.keywords with a
single any(...) over config_calls that inspects each call.keywords (using kw.arg
and ast.unparse(kw.value)) to ensure both required keyword conditions are true
for the same call (so ProceduralMemoryProposer builds CompletionConfig from
config.temperature and config.max_tokens).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ea8af612-7044-48e6-8b79-a7f8f59abd82

📥 Commits

Reviewing files that changed from the base of the PR and between 50d1260 and bb449a4.

📒 Files selected for processing (7)
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/observability/events/meta.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • tests/unit/api/controllers/test_meta_chat.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL.

Files:

  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default via SettingsService/ConfigResolver; no os.environ.get outside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live in settings/definitions/. Enforced by scripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py.
Do not use from __future__ import annotations (Python 3.14+ has PEP 649). Use PEP 758 except: except A, B: requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit directly from Exception, RuntimeError, etc. Enforced by check_domain_error_hierarchy.py.
Pydantic v2: frozen + extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived; use NotBlankStr for identifiers.
Use args models at every system boundary; call parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py.
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries.
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError).
Clock seam: inject clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML.
Import logger as from synthorg.observability import get_logger; variable always named logger. Never import logging or print() in app code.
Event names from observability.events.<domain> constants; use stru...

Files:

  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

  • src/synthorg/observability/events/meta.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/meta.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/controllers/test_meta_chat.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/meta.py
  • tests/unit/api/controllers/test_meta_chat.py
  • src/synthorg/api/state_services_facades_mcp3.py
  • tests/evals/prompt/test_procedural_memory_prompt.py
  • src/synthorg/engine/strategy/context.py
  • src/synthorg/api/app_builders.py
  • src/synthorg/api/controllers/meta.py
🔇 Additional comments (15)
src/synthorg/observability/events/meta.py (1)

173-183: LGTM!

The new META_CHAT_DEPENDENCY_UNAVAILABLE event constant follows the established meta.<subject>.<action> naming convention and the comment correctly explains why the endpoint bypasses AppState._require_service — to emit dependency-specific context before ServiceUnavailableError surfaces 503.

src/synthorg/api/state_services_facades_mcp3.py (1)

203-224: LGTM!

The ChiefOfStaffChat facade accessor trio follows the established pattern of the other META-MCP-3 services: has_* checks the slot, the property calls _require_service, and set_* calls _attach_service with audit logging.

src/synthorg/api/app_builders.py (1)

132-176: LGTM!

The build_chief_of_staff_chat factory follows the same provider-selection convention as _resolve_llm_judge_strategy: logs a warning and returns None when no providers are available, otherwise picks the first registered provider. The docstring correctly documents the 503 behavior when None is returned.

src/synthorg/api/controllers/meta.py (1)

306-379: LGTM!

The chat endpoint implementation correctly:

  • Sets status_code=200 with an explanatory comment (query-only, no resource created).
  • Logs META_CHAT_DEPENDENCY_UNAVAILABLE with dependency-specific context before raising ServiceUnavailableError.
  • Builds ChatQuery with the optional proposal_id/alert_id and passes the snapshot to the backend.
tests/unit/api/controllers/test_meta_chat.py (3)

77-128: LGTM!

The test properly asserts the ask() call contract as suggested in the past review: it verifies get_org_snapshot was awaited, chat_mock.ask was awaited with the correct ChatQuery (including proposal_id/alert_id) and the expected snapshot instance. The spec=ChiefOfStaffChat and spec=SignalsService on the mocks comply with the mock-spec baseline requirement.


56-75: LGTM!

The 503 test correctly exercises the missing-chat-backend guard and uses try/finally to restore state.


130-153: LGTM!

The test correctly validates that a wired chat backend still returns 503 when SignalsService is unavailable, covering the second dependency guard in the controller.

tests/evals/prompt/test_procedural_memory_prompt.py (2)

32-34: Unit marker usage is correct.

@pytest.mark.unit is correctly applied for this test class.


79-107: Prompt drift and safety-contract guards look solid.

The fingerprint pin plus anti-injection directive check gives strong regression coverage without coupling to full prompt bytes.

src/synthorg/engine/strategy/context.py (6)

8-27: LGTM!

Import organization and logger setup follow project conventions. The explicit builtins import supports the project's PEP 758 except builtins.MemoryError, RecursionError: pattern for system-error propagation.


36-57: LGTM!

The boundary args model is well-designed:

  • frozen=True ensures immutability
  • strict=True prevents type coercion (non-strings fail validation)
  • extra="ignore" is correctly documented for forward-compatibility with future enrichment fields
  • NotBlankStr | None typing enforces non-blank strings while allowing explicit nulls

The detailed docstring clearly explains the validation contract and fallback behavior.


59-97: LGTM!

The StrategicContextProvider protocol correctly defines the async contract. ConfigContextProvider is a clean, minimal implementation with proper structured logging of the build event.


126-196: LGTM!

The implementation properly addresses all failure modes with structured fallback:

  • Memory backend absence → immediate fallback
  • Retrieval errors → logged warning with safe_error_description, then fallback
  • Empty results → debug log, then fallback
  • JSON parse errors → warning log, then fallback
  • Validation errors via parse_typed() → warning log, then fallback

The except builtins.MemoryError, RecursionError: raise pattern follows project convention for system-error propagation. Immutable update via model_copy(update=overrides) is correct.


199-237: LGTM!

Provider chaining logic is robust:

  • Non-empty providers validation in constructor
  • Ordered iteration with first-success return
  • System errors (MemoryError, RecursionError) re-raised immediately
  • Provider failures logged with index and name for debugging
  • Final RuntimeError with cause chain for the "should not happen" edge case

240-287: LGTM!

The async factory correctly wires provider selection with proper memory_backend propagation. The scaffolding comment at lines 269-275 clarifies the COMPOSITE branch design intent — wrapping a single MemoryContextProvider today enables one-line provider additions in the future without control-flow changes.

Comment thread tests/evals/prompt/test_procedural_memory_prompt.py Outdated
Comment thread tests/evals/prompt/test_procedural_memory_prompt.py Outdated
Aureliolo added 12 commits May 10, 2026 01:12
…ding

Inline f-string interpolation of ApprovalStatus.EXPIRED.value /
ApprovalStatus.PENDING.value in expire_if_pending UPDATE statements is
replaced with bound parameters in both SQLite (?) and Postgres (%s)
backends. The id-list interpolation remains under noqa: S608 because it
is a fixed sequence of placeholders, not user data.
Tightens 10 Any-typed Pydantic fields per audit:

- ModelMappingConfig.seniority_model_map: dict[str, str]. The Mapping
  input branch in the validator becomes dead code; Pydantic v2 already
  coerces any Mapping-like input to dict during validation.

- CoordinationMiddlewareContext.{decomposition_result, routing_result,
  dispatch_result, status_rollup}: their concrete types
  (DecompositionResult, RoutingResult, DispatchResult,
  SubtaskStatusRollup). ProgressLedgerMiddleware now reads the
  canonical SubtaskStatusRollup.completed field directly instead of
  getattr-with-default-zero, which always returned zero on real
  rollups.

- Meta domain payload fields (RollbackOperation.previous_value,
  ConfigChange.{old,new}_value, ArchitectureChange.payload,
  RuleMatch.signal_context): JsonValue / dict[str, JsonValue].
The hierarchical retrieval router and retry-evaluator at
src/synthorg/memory/retrieval/hierarchical/supervisor.py made LLM calls
without a CompletionConfig, so sampling temperature fell back to
provider defaults (typically non-zero). Routing decisions therefore
varied across runs for the same query.

Pin a module-level _ROUTING_COMPLETION_CONFIG with temperature=0.0 and
pass it on both calls so worker selection and retry evaluation are
deterministic. Matches the existing canonical pattern at
synthorg/memory/retrieval/reranking/llm_reranker.py.
The audit flagged 'Python 3.14 test flakes from 2026-05-05 runs' as a
sub-task of #1781. Investigating the actual failed CI runs that day
shows the only failure on the Test (Python 3.14) job was a
deterministic environment edge case in the newly-added
test_gate_path_symlink_escape_treated_as_missing, not a flake or a
3.14-specific issue. Subsequent runs pass.

Document the runs inspected, root cause, already-merged isolation
mitigations (#1713, #1755), and the conditions under which a future
investigation should pick this up again.
Replace the silent placeholder response with real DI: add a
chief_of_staff_chat slot on AppState, attach it at startup via a new
build_chief_of_staff_chat helper that picks an LLM provider from the
provider registry when meta.chief_of_staff.chat_enabled is True, and
update the controller to call ChiefOfStaffChat.ask using the current
SignalsService snapshot.

When the chat backend or SignalsService is unavailable, the endpoint
now surfaces an explicit 503 ServiceUnavailableError instead of the
silent fake answer with confidence=0.0 it returned previously.
Make StrategicContextProvider.provide async across the protocol and the
three implementations (Config / Memory / Composite). MemoryContextProvider
now accepts an optional MemoryBackend and queries it at provide-time for
SEMANTIC entries on the synthetic system:strategy agent tagged
strategic-context, parsing the entry content as a JSON object of
overridable fields (maturity_stage / industry / competitive_position).
Layered overrides are applied on top of the fallback context. Missing
backend, retrieval errors, malformed JSON, and non-string overrides all
degrade gracefully to the fallback.

build_context() takes an optional memory_backend kwarg and is now async.
The function had no production callers; only tests use it, and they have
been updated.

Add STRATEGY_CONTEXT_MEMORY_QUERIED observability event.
Reuse the existing tests/evals/prompt/_harness.py primitives to add
contract, fingerprint, and call-site coverage for surfaces the audit
flagged as unevalued:

- supervisor_router_prompt: temperature=0.0 pinned config check, pinned
  fingerprints for the routing and retry system prompts, and a call-
  site assertion that both provider.complete() invocations pass the
  pinned _ROUTING_COMPLETION_CONFIG.

- health_judge_behaviour: heuristic HealthJudge has no LLM prompt, so
  the eval is a labelled set of (termination_reason, has_recovery,
  signals) inputs grading the resulting EscalationCause at 100%
  accuracy, plus severity tier checks at threshold and 2x threshold.

- code_modification_prompt: temperature stays config-driven, the
  _SYSTEM_PROMPT fingerprint is pinned, and the prompt carries the
  untrusted-content directive.

- procedural_memory_prompt: same shape as code_modification, plus a
  call-site check that propose() passes the pre-built
  self._completion_config rather than a fresh inline construction.

LLM grader coverage already lives in
tests/evals/prompt/test_rubric_grader_prompt.py.
Followup to the JsonValue tightening: update the meta proposal-builder
strategy methods (config_tuning / architecture / prompt_tuning) to take
ctx as dict[str, JsonValue] instead of dict[str, object] so the value
flowing in from RuleMatch.signal_context type-checks. Update affected
test helpers to match. Regenerate scripts/mock_spec_baseline.txt to
account for the line-number shifts introduced by the new
'from pydantic import JsonValue' import in two test files (and to drop
stale entries for tests that now use spec= on their MagicMocks).
The pre-push magic-numbers gate flags pre-existing safe defaults in any
file touched by this PR. Annotate the three call sites
(SupervisorRouter.max_workers_per_query, SupervisorRouter.max_retry_count,
PostgresApprovalRepository.list_items.limit, SqliteApprovalRepository.
list_items.limit) with lint-allow comments rather than migrate them to
settings/definitions/, which is out of scope for this cleanup PR.
…ootstrap interval

Same pre-push gate as the previous commit, this round flagging
pre-existing defaults in api/app.py:_DEFAULT_TIMEOUT_CHECK_INTERVAL_SECONDS
and the four list_* paginated controllers in api/controllers/meta.py
(default page size 50). Annotate with lint-allow rather than migrate
to settings/definitions/, which is out of scope for this cleanup PR.
…ff chat

Two pre-PR review fixes:

- chief_of_staff/chat.py:253: replace logger.exception(COS_CHAT_FAILED)
  with logger.error(..., error_type=type(exc).__name__,
  error=safe_error_description(exc)) so LiteLLM provider exception
  strings (which can carry api_key / auth_token data) are no longer
  written verbatim into the structured log via exc_info=True.

- api/controllers/meta.py:329-345: read app_state.chief_of_staff_chat
  and app_state.signals_service into local variables once before the
  None check so the typed accessors aren't called twice across the
  ServiceUnavailableError gate.
- app.py: guard _wire_chief_of_staff_chat with has_chief_of_staff_chat for lifespan-restart idempotence
- engine/strategy/context.py: comment ContextSource.COMPOSITE scaffolding intent
- hr/promotion/config.py: annotate seniority_model_map as Mapping[str, str] to match MappingProxyType runtime wrap

Skipped 2 gemini-code-assist findings (context.py:121, 201) -- claimed comma-separated except is a SyntaxError, but project targets Python 3.14 with PEP 758 unparenthesized multi-exception clauses (no 'as' binding). Per project CLAUDE.md and CodeRabbit's repo config: 'except A, B:' is correct and mandatory.
Aureliolo added 3 commits May 10, 2026 01:12
- meta.py: log META_CHAT_DEPENDENCY_UNAVAILABLE warning before raising 503 from /meta/chat (chat backend and signals service guards). - engine/strategy/context.py: drop redundant STRATEGY_CONTEXT_BUILT log on no-backend branch (fallback emits its own); validate memory payload via _StrategicContextOverridesArgs Pydantic model with strict and extra=ignore (typed boundary, kept in-module to avoid an api into engine layer crossing). - observability/events/meta.py: add META_CHAT_DEPENDENCY_UNAVAILABLE event constant. - app_builders.py: hoist CostTracker, ChiefOfStaffChat, ChiefOfStaffConfig, and ProviderRegistry imports out of TYPE_CHECKING for PEP 649 runtime resolution (matches app_helpers.py precedent). - test_code_modification_prompt.py: AST match for CompletionConfig kwargs instead of substring contains. - test_supervisor_router_prompt.py: AST walk for complete kwarg call sites instead of source.count. - test_context.py: rename to test_falls_back_when_any_field_is_blank_or_non_string for whole-payload-rejection contract; add test_ignores_unknown_payload_fields and test_composite_source_with_backend_applies_overrides. Skipped 1 coderabbit finding (test_appliers.py:815-828, add tuple coverage for tool_access) as factually wrong: ArchitectureChange.payload is JSON-typed and the upstream Pydantic validator rejects tuples with 'input was not a valid JSON value' before reaching the list-or-tuple branch in _validate_tool_access. Tuple inputs are unreachable in practice; the defensive isinstance check is dead code.
- app_builders.py: revert TYPE_CHECKING hoist from round 2 (the hoist created a circular import 'cannot import name CostRecord from partially initialized module synthorg.budget.cost_record' that broke OpenAPI export, Build Web Assets, Build Backend, Lighthouse, Build Preview, and the CI Pass umbrella check); document why the imports stay lazy. - observability/events/meta.py: correct META_CHAT_DEPENDENCY_UNAVAILABLE rationale (AppState._require_service raises ServiceUnavailableError 503, not generic 500). - api/controllers/meta.py: set status_code=200 on POST /chat (query-only endpoint, no resource creation). - api/state_services_facades_mcp3.py: correct mixin docstring from five to six services (chief_of_staff_chat). - engine/strategy/context.py: route memory payload validation through synthorg.api.boundary.parse_typed under the memory.strategic_context boundary label, so failures emit API_BOUNDARY_VALIDATION_FAILED alongside the provider's own STRATEGY_CONTEXT_PROVIDER_FAILED log. - tests/unit/api/controllers/test_meta_chat.py: assert chat_mock.ask was awaited with a ChatQuery carrying question and proposal_id and alert_id, and that the snapshot is the one signals_mock returned; update status_code expectation to 200. - tests/evals/prompt/test_procedural_memory_prompt.py: AST hardening for both CompletionConfig kwargs and the propose call site (matches the pattern used in test_code_modification_prompt.py and test_supervisor_router_prompt.py); textwrap.dedent the method source before ast.parse.
test_procedural_memory_prompt.py AST hardening: search all CompletionConfig calls (not just config_calls[0]) for one carrying both temperature=config.temperature and max_tokens=config.max_tokens; restrict provider.complete matching to nodes whose receiver is provider/self.provider/self._provider so an unrelated .complete() call can't satisfy the assertion.
@Aureliolo Aureliolo force-pushed the chore/issue-1781-cleanup-finish branch from bb449a4 to 2710034 Compare May 9, 2026 23:15
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 9, 2026 23:17 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 59cf0b2 into main May 9, 2026
77 checks passed
@Aureliolo Aureliolo deleted the chore/issue-1781-cleanup-finish branch May 9, 2026 23:30
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 9, 2026 23:30 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 10, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Improved error logging and Prometheus instrumentation provide better
system monitoring.
- Eliminated race conditions in CI tagging for more reliable development
releases.
- Fixed critical configuration access and kill-switch bugs to enhance
system stability.
- Enhanced client experience with retry-after headers and better
websocket reconnect behavior.

### What's new
- Introduced composite indexes and cursor pagination for faster data
queries.
- Added server-sent events rate limiting and Ollama input sanitization
for improved security.

### Under the hood
- Centralized workflow error mappings to standardize error handling.
- Refactored API lifecycle fallback to use a configuration snapshot for
consistency.
- Tightened startup settings baseline and reduced controller error
baseline to zero.
- Replaced flaky contributor-assistant GitHub action with a custom
stable step.
- Consolidated Renovate dependency groups to avoid update conflicts.
- Upgraded in-toto-golang dependency to fix security vulnerabilities and
dropped unnecessary CVE waivers.
- Extensive lock file maintenance and multiple infrastructure and Python
dependency updates.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.2](v0.8.1...v0.8.2)
(2026-05-10)


### Features

* close audit gaps in error logging and Prometheus instrumentation
([#1821](#1821))
([ef00fdc](ef00fdc))


### Bug Fixes

* **ci:** eliminate dev-release tag-vs-downstream race + CI hygiene
audit ([#1827](#1827))
([b7b9a59](b7b9a59))
* **config:** close 6 settings reachability + kill-switch gaps
([#1798](#1798))
([410cb3b](410cb3b))
* correctness / safety fixes from 2026-05-05 audit (Wave 28)
([#1823](#1823))
([d01e624](d01e624))


### Performance

* composite indexes + cursor pagination + clock seam + SSE rate-limit +
Ollama sanitization + retry-after web client + WS reconnect jitter
([#1822](#1822))
([d1faf86](d1faf86))


### Refactoring

* **api:** move activities lifecycle-cap fallback to ApiBridgeConfig
snapshot ([#1840](#1840))
([7a56e9c](7a56e9c))
* centralise workflow error mapping and shared error codes
([#1778](#1778) sub-tasks A
+ E) ([#1843](#1843))
([11132cd](11132cd))
* drive controller-error baseline to zero
([#1778](#1778) sub-task A
tail) ([#1846](#1846))
([e96ae20](e96ae20))
* slim CLAUDE.md, port pr-review-toolkit agents, sync .opencode parity
([#1833](#1833))
([e6372b8](e6372b8))
* tighten settings → startup-trace baseline (8 → 0)
([#1847](#1847))
([3376ee2](3376ee2))


### Documentation

* fix CLAUDE.md inaccuracies and drop drift-prone counts
([#1844](#1844))
([371925f](371925f))


### Tests

* replace test placeholders with real subsystem wiring
([#1845](#1845))
([ddbb666](ddbb666))


### CI/CD

* **cla:** replace flaky contributor-assistant action with custom
read-path step
([#1819](#1819))
([11aeafe](11aeafe))
* tidy dev-release notes + stagger renovate lockfile day
([#1824](#1824))
([ec746a9](ec746a9))


### Maintenance

* cleanup roundup, sub-tasks a/c/d/g/h/j/l/m of
[#1781](#1781)
([#1838](#1838))
([099b871](099b871))
* close remaining 5 sub-tasks of
[#1781](#1781) (b/e/f/i/k)
([#1852](#1852))
([59cf0b2](59cf0b2))
* collapse Renovate dep groups into Python / Web / Infrastructure to
remove cross-PR overlap
([#1813](#1813))
([4cbd857](4cbd857))
* **deps,security:** bump in-toto-golang v0.11.0 + drop two patched CVE
waivers ([#1851](#1851))
([0b8b5bb](0b8b5bb))
* disable Renovate vulnerabilityAlerts so security flows into normal
updates ([#1834](#1834))
([6b7d15f](6b7d15f))
* Lock file maintenance
([#1820](#1820))
([ccbad73](ccbad73))
* Lock file maintenance
([#1842](#1842))
([13b68a5](13b68a5))
* Lock file maintenance
([#1853](#1853))
([db6650b](db6650b))
* Update dhi.io/nats:2.14-debian13 Docker digest to eb768bf
([#1841](#1841))
([37f84fc](37f84fc))
* Update Infrastructure dependencies
([#1815](#1815))
([75b12fe](75b12fe))
* Update Infrastructure dependencies
([#1831](#1831))
([3f3c50b](3f3c50b))
* Update Python dependencies
([#1817](#1817))
([e11332f](e11332f))
* Update Python dependencies
([#1832](#1832))
([4515c8e](4515c8e))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CLEANUP] Style, tests, types, hardcoding, CLI, deps roundup

1 participant