chore: close remaining 5 sub-tasks of #1781 (b/e/f/i/k)#1852
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/hr/promotion/config.py (1)
80-102:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAlign
seniority_model_map's declared type with its immutable runtime value.The field is declared as mutable
dict[str, str]but line 101 stores aMappingProxyType. This creates a misleading API: code that attempts mutations will type-check but fail at runtime. Change the annotation toMapping[str, str]to reflect the immutable contract enforced by the validator.Proposed fix
+from collections.abc import Mapping from types import MappingProxyType from typing import Self @@ - seniority_model_map: dict[str, str] = Field( + seniority_model_map: Mapping[str, str] = Field( default_factory=dict, description="Explicit seniority level to model ID overrides " "(wrapped as MappingProxyType after validation)", )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/hr/promotion/config.py` around lines 80 - 102, The field annotation for seniority_model_map is misleading: change its declared type from dict[str, str] to Mapping[str, str] to reflect that _validate_model_map_keys wraps the value in MappingProxyType (making it immutable at runtime); update the Field declaration's type only (seniority_model_map) so static type-checkers match the runtime behavior enforced by MappingProxyType in the _validate_model_map_keys method and ensure imports include typing.Mapping if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/api/app.py`:
- Around line 957-980: The startup hook _wire_chief_of_staff_chat must avoid
calling the one-shot setter set_chief_of_staff_chat if the AppState already has
a chief-of-staff chat set; modify _wire_chief_of_staff_chat to check the
AppState (e.g., an existing flag or accessor like
app_state.has_chief_of_staff_chat or similar) before calling
app_state.set_chief_of_staff_chat, and only call set_chief_of_staff_chat when
none is set (or alternatively catch and ignore the specific duplicate-set error
from set_chief_of_staff_chat), using build_chief_of_staff_chat and
provider_registry as before.
In `@src/synthorg/engine/strategy/context.py`:
- Around line 246-254: The COMPOSITE branch currently creates a
CompositeContextProvider wrapping a single MemoryContextProvider which already
uses ConfigContextProvider as its fallback, so the composite wrapper is
redundant; either remove the CompositeContextProvider and construct
MemoryContextProvider(config_provider, memory_backend) to match the MEMORY
branch, or if you intend to keep scaffolding for multiple providers add a short
comment above ContextSource.COMPOSITE explaining that CompositeContextProvider
is intentionally used to enable future multi-provider composition (mentioning
CompositeContextProvider, MemoryContextProvider, ConfigContextProvider,
config_provider, memory_backend in the comment).
---
Outside diff comments:
In `@src/synthorg/hr/promotion/config.py`:
- Around line 80-102: The field annotation for seniority_model_map is
misleading: change its declared type from dict[str, str] to Mapping[str, str] to
reflect that _validate_model_map_keys wraps the value in MappingProxyType
(making it immutable at runtime); update the Field declaration's type only
(seniority_model_map) so static type-checkers match the runtime behavior
enforced by MappingProxyType in the _validate_model_map_keys method and ensure
imports include typing.Mapping if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5e3a5ee0-7592-468c-827c-df8e8a9b66de
📒 Files selected for processing (33)
docs/reference/py314-flake-investigation-2026-05.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/app.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/meta/models.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/persistence/sqlite/approval_repo.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/api/controllers/test_meta_chat.pytests/unit/engine/middleware/test_coordination_constraints.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/engine/strategy/test_context.pytests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/meta/test_appliers.pytests/unit/meta/test_code_modification_strategy.pytests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers: use@pytest.mark.{unit,integration,e2e,slow}. Async tests useauto. Global timeout 30s. Coverage 80% minimum.
xdist: use-n 8 --dist=loadfile(auto-applied via pyprojectaddopts;loadfileprevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declaresspec=ConcreteClass; baseline atscripts/mock_spec_baseline.txt.
FakeClock: import fromtests._shared.fake_clock; inject viaclock=parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Useasyncio.Event().wait()instead ofsleep(large)to avoid flaky tests.
Files:
tests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/meta/test_strategies.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_code_modification_strategy.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.pytests/evals/prompt/test_code_modification_prompt.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/meta/test_strategies.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_code_modification_strategy.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.pytests/evals/prompt/test_code_modification_prompt.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use
example-provider,test-provider,example-{large,medium,small}-001. Allowed only in.claude/, third-party imports,providers/presets.py,web/public/provider-logos/.
Files:
tests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/meta/chief_of_staff/chat.pytests/unit/meta/test_strategies.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/architecture.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pytests/unit/meta/test_code_modification_strategy.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/postgres/approval_repo.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/app_builders.pytests/unit/engine/middleware/test_coordination_constraints.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/strategy/context.py
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL.
Files:
src/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/meta/models.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/strategy/context.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default viaSettingsService/ConfigResolver; noos.environ.getoutside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live insettings/definitions/. Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced bycheck_no_review_origin_in_code.pyandcheck_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14+ has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit directly fromException,RuntimeError, etc. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use@computed_fieldfor derived; useNotBlankStrfor identifiers.
Use args models at every system boundary; callparse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries.
Async: useasyncio.TaskGroupfor fan-out/fan-in; helpers catchException(re-raiseMemoryError/RecursionError).
Clock seam: injectclock: Clock | None = Noneparameter; tests injectFakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): usewrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML.
Import logger asfrom synthorg.observability import get_logger; variable always namedlogger. Neverimport loggingorprint()in app code.
Event names fromobservability.events.<domain>constants; use stru...
Files:
src/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/strategy/context.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/strategy/context.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/**/*.md: Numerics in README and public docs sourced fromdata/runtime_stats.yamlvia<!--RS:NAME-->markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.
Files:
docs/reference/py314-flake-investigation-2026-05.md
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD: implementsave(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples.
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed.
Files:
src/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/persistence/sqlite/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/meta/chief_of_staff/chat.pytests/unit/meta/test_strategies.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/architecture.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pytests/unit/meta/test_code_modification_strategy.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/postgres/approval_repo.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/meta/models.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/app_builders.pytests/unit/engine/middleware/test_coordination_constraints.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/strategy/context.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md
[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (48)
tests/unit/meta/test_appliers.py (1)
823-823: Valid test input type now matches the intended payload shape.Switching
tool_accessto a list in the happy-path test is the right fix and keeps this case consistent with the validator contract.tests/evals/prompt/test_health_judge_behaviour.py (4)
39-41: Good test classification and scope.Class-level
@pytest.mark.unitis correctly applied for this deterministic behavioural suite.
43-120: Strong labelled example coverage for regression safety.The cases pin core branches and boundaries (stagnation/error paths, threshold edges, streak break, and double-threshold behavior) cleanly.
122-142: Strict 100% grading gate is the right contract here.Using a hard accuracy requirement is a solid guardrail for heuristic regression detection.
143-171: Severity threshold behavior is explicitly locked down.Separate tests for HIGH at threshold and CRITICAL at 2x threshold make escalation semantics unambiguous.
src/synthorg/engine/middleware/coordination_constraints.py (1)
147-151: Progress derivation now correctly uses the canonical rollup field.Nice fix: reading from
rollup.completedand persisting intoProgressLedger.completed_countresolves the prior field mismatch cleanly.src/synthorg/engine/middleware/coordination_protocol.py (2)
15-17: Concrete type imports are a solid boundary-tightening step.These imports support explicit result typing and remove ambiguity in middleware context state.
Also applies to: 22-25, 30-30
66-81: Context field narrowing is a strong improvement for middleware safety.Switching to
DecompositionResult | None,RoutingResult | None,DispatchResult | None, andSubtaskStatusRollup | Nonemakes the lifecycle states explicit and type-checkable.tests/unit/engine/middleware/test_coordination_protocol.py (1)
301-301: Type-ignore annotation is now more precise.Good refinement to keep the ignore scoped to the concrete assignment-check category.
tests/unit/engine/middleware/test_coordination_constraints.py (4)
11-11: Import updates correctly align tests with typed middleware contracts.Using concrete decomposition/rollup models here is the right move for parity with production annotations.
Also applies to: 16-19, 37-37
80-102: Typed rollup/decomposition helpers improve test realism and stability.Constructing real
DecompositionResultandSubtaskStatusRollupobjects makes these tests much more representative than placeholder values.
106-107:_mw_contextsignature narrowing is consistent with the protocol changes.This keeps test fixtures aligned with the stricter
CoordinationMiddlewareContextfield types.
144-145: Updated callsites correctly exercise the new typed path end-to-end.These replacements ensure assertions validate actual model behavior, not permissive mock/string inputs.
Also applies to: 158-159, 182-183, 192-193, 243-244
docs/reference/py314-flake-investigation-2026-05.md (1)
1-66: LGTM! Well-structured investigation report.The document clearly distinguishes the deterministic symlink-resolution edge case from a genuine flake, provides actionable future guidance, and properly uses Markdown tables for the run data. The technical explanation is thorough and the rationale for not pursuing additional isolation work is sound.
tests/unit/meta/test_strategies.py (1)
4-4: JsonValue typing alignment in test helpers looks correct.The updated helper/context annotations are consistent with JSON-safe rule payloads and keep the test surface strict without changing behavior.
Also applies to: 63-67, 156-157
src/synthorg/meta/strategies/prompt_tuning.py (1)
11-12: Prompt-tuning strategy typing update is clean and consistent.
JsonValueusage forctxmatches the strategy’s JSON context handling and keeps behavior unchanged.Also applies to: 150-153
src/synthorg/meta/strategies/architecture.py (1)
10-11: Architecture strategy type narrowing is well-applied.Both proposal builders now use JSON-compatible context typing consistently.
Also applies to: 98-101, 144-147
src/synthorg/meta/strategies/config_tuning.py (1)
13-14: Config-tuning helper signatures are consistently tightened.
ctxnow usesdict[str, JsonValue]uniformly across all proposal builders, which keeps the strategy surface coherent.Also applies to: 116-119, 165-168, 210-213, 260-263, 306-309, 350-353
scripts/mock_spec_baseline.txt (1)
2243-2249: Mock-spec baseline coordinate refresh looks correct.The updated call-site coordinates for the shifted test file are consistent with baseline maintenance.
src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)
39-40: Deterministic completion config is correctly centralized and applied.Pinning
temperature=0.0once and passing it to both route and retrycomplete()calls cleanly removes provider-default variance.Also applies to: 83-87, 232-236, 301-305
tests/evals/prompt/test_supervisor_router_prompt.py (1)
29-83: Prompt-contract coverage is strong and targeted.The suite correctly pins temperature, fingerprints both prompts, and asserts pinned config usage at both production call sites.
src/synthorg/meta/models.py (1)
18-19: Model type tightening toJsonValueis consistent and beneficial.These field updates improve DTO strictness for JSON-like payloads while preserving the existing model structure.
Also applies to: 124-125, 160-161, 180-181, 439-440
tests/unit/meta/rollout/test_inverse_dispatch.py (1)
4-4: Type narrowing toJsonValueis a good alignment with rollback model contracts.This update improves test-helper type fidelity without changing behavior.
Also applies to: 56-61
tests/unit/meta/test_code_modification_strategy.py (1)
7-7:ctxtyped asdict[str, JsonValue]is a solid contract-tightening change.Looks correct and consistent with JSON-typed model usage in tests.
Also applies to: 64-68
tests/evals/prompt/test_code_modification_prompt.py (1)
27-81: Strong prompt-contract coverage for code-modification strategy.Fingerprint pinning + anti-injection directive + config-driven CompletionConfig assertions are all valuable and well-scoped.
tests/evals/prompt/test_procedural_memory_prompt.py (1)
30-99: Procedural-memory prompt contract tests look robust and purposeful.These checks provide good guardrails for prompt drift and completion-config regressions.
src/synthorg/observability/events/strategy.py (1)
19-19: LGTM!The new event constant follows the established naming convention and is logically grouped with other
strategy.context.*events.src/synthorg/engine/strategy/context.py (4)
26-36: LGTM!Module constants are well-documented with docstrings explaining their purpose. Using
NotBlankStrfor identifiers follows project conventions.
39-76: LGTM!The async protocol and
ConfigContextProviderimplementation are clean. Structured logging with theSTRATEGY_CONTEXT_BUILTevent follows observability guidelines.
106-174: LGTM!The
MemoryContextProvider.provideimplementation correctly:
- Re-raises
MemoryError/RecursionErrorper project convention- Uses
safe_error_description(exc)for SEC-1 compliant logging- Validates override values as non-blank strings before applying
- Uses
model_copy(update=...)for immutabilityThe
PLR0911suppression is appropriate given the multiple fallback paths required by the graceful degradation design.
177-215: LGTM!The
CompositeContextProvidercorrectly chains providers with proper error handling. TheRuntimeErroron line 215 is appropriate for the "should never happen" fallback scenario documented in the comment.tests/unit/engine/strategy/test_context.py (5)
27-36: LGTM!The
_entryhelper cleanly constructs test fixtures with appropriate type annotations and realistic structure matching the productionMemoryEntrymodel.
39-65: LGTM!Tests correctly verify async
ConfigContextProviderbehavior with both default and custom configurations.
68-197: LGTM!Comprehensive test coverage for
MemoryContextProviderincluding all fallback paths (no backend, empty results, exceptions, invalid JSON, non-object JSON) and override validation (full, partial, blank/non-string filtering). Proper use ofAsyncMock(spec=MemoryBackend)follows mock-spec guidelines.
200-253: LGTM!Tests verify composite provider behavior: successful resolution, constructor validation, fallback on failure, and proper
RuntimeErrorpropagation when all providers fail.
256-294: LGTM!Tests verify
build_contextfactory behavior for allContextSourcevariants, including degradation when no memory backend is provided and override application when one is injected.src/synthorg/meta/chief_of_staff/chat.py (1)
252-257: SEC-1-safe error logging update looks good.Using
error_typeplussafe_error_description(exc)here is the correct structured pattern.src/synthorg/persistence/sqlite/approval_repo.py (1)
327-338:expire_if_pendingparameterization is correctly applied.Binding both status enums through
paramsremoves string interpolation risk while preserving the existing CAS behavior.src/synthorg/persistence/postgres/approval_repo.py (1)
305-317: Postgresexpire_if_pendingstatus binding is implemented correctly.Good parity with the SQLite backend and improved query safety/readability.
src/synthorg/api/state.py (1)
173-173: Slot update is correct.Adding
_chief_of_staff_chatto__slots__is the right change for the new AppState field.src/synthorg/api/state_services_facades.py (1)
176-176: Facade slot initialization is consistent.Initializing
_chief_of_staff_chatin_init_facade_service_slots()keeps AppState startup invariants intact.src/synthorg/api/app_builders.py (1)
121-165:build_chief_of_staff_chat()is a solid boundary builder.The opt-in gating and provider-availability checks are clear and align with the intended 503 fallback behavior upstream.
src/synthorg/api/controllers/meta.py (1)
312-358: Meta chat endpoint wiring is correctly implemented.Good shift from placeholder output to real backend invocation with explicit 503 behavior when required services are unavailable.
src/synthorg/api/state_services_facades_mcp3.py (3)
4-12: Docstring update is clear and aligned with the new/meta/chatwiring.Good addition of
chief_of_staff_chatand endpoint context at module level.
33-73: Typed AppState slot wiring looks correct.Import + slot declaration for
ChiefOfStaffChatcleanly extends the existing facade typing pattern.
198-219: Chief-of-Staff facade triple follows the established one-shot pattern.
has_*, getter via_require_service, andset_*via_attach_serviceare consistent and safe.tests/unit/api/controllers/test_meta_chat.py (2)
28-50: Snapshot helper is well-structured for deterministic endpoint tests.The fully-typed
OrgSignalSnapshotfixture keeps test setup explicit and stable.
52-136: Scenario coverage is solid for the new chat wiring contract.The suite cleanly covers both unavailable paths and the successful wired path, with proper state rollback.
There was a problem hiding this comment.
Code Review
This pull request implements the Chief of Staff chat backend, enabling LLM-powered explanations for signals and proposals via a new API endpoint. It also enhances the strategic context system to support dynamic overrides from the memory backend and improves type safety across meta-models by replacing generic types with JsonValue. Critical feedback was provided regarding syntax errors in exception handling within src/synthorg/engine/strategy/context.py, where multiple exceptions were incorrectly separated by commas instead of being parenthesized as a tuple. Additionally, the PR includes security improvements by transitioning database queries to use parameterized inputs and adds comprehensive evaluation tests for prompt determinism.
| limit=1, | ||
| ), | ||
| ) | ||
| except builtins.MemoryError, RecursionError: |
There was a problem hiding this comment.
The except statement uses a comma to separate multiple exceptions, which is invalid syntax in Python 3 and will result in a SyntaxError. To catch multiple exceptions, they must be provided as a parenthesized tuple. Additionally, for consistency with the use of builtins.MemoryError, RecursionError should also be prefixed with builtins..
except (builtins.MemoryError, builtins.RecursionError):| return provider.provide(config=config) | ||
| except MemoryError, RecursionError: | ||
| return await provider.provide(config=config) | ||
| except builtins.MemoryError, RecursionError: |
There was a problem hiding this comment.
The except statement uses a comma to separate multiple exceptions, which is invalid syntax in Python 3 and will result in a SyntaxError. It should use a parenthesized tuple instead. For consistency, both exceptions should use the builtins. prefix.
except (builtins.MemoryError, builtins.RecursionError):
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1852 +/- ##
==========================================
- Coverage 84.86% 84.85% -0.02%
==========================================
Files 1801 1801
Lines 104892 104993 +101
Branches 9198 9206 +8
==========================================
+ Hits 89012 89087 +75
- Misses 13649 13667 +18
- Partials 2231 2239 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
14394c7 to
b805d5b
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unit/meta/test_appliers.py (1)
815-828:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd test case for tuple acceptance in tool_access validation.
The validator at line 515 of
architecture_applier.pystill accepts bothlist | tuplefortool_access. However, the valid-path test only covers lists (line 823). Add a test case verifying that tuples also pass validation, either by extending the existingtest_dry_run_tool_access_valid_passesor creating a parametrized variant that tests both list and tuple inputs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/meta/test_appliers.py` around lines 815 - 828, Extend the existing test for tool_access validation to cover tuple inputs as well: update or parametrize test_dry_run_tool_access_valid_passes to call ArchitectureApplier().dry_run with a proposal whose payload.tool_access is a tuple (e.g., ("git","shell")) in addition to the existing list case so both list and tuple inputs are asserted to pass; reference the test name test_dry_run_tool_access_valid_passes, the ArchitectureApplier class, the dry_run method, and the tool_access payload key when making the change.src/synthorg/api/app_builders.py (1)
21-29:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove type annotations for this public helper out of
TYPE_CHECKINGto make them runtime-resolvable.
build_chief_of_staff_chat()is a public function whose signature referencesChiefOfStaffConfig,ProviderRegistry,CostTracker, andChiefOfStaffChat—all imported only underTYPE_CHECKING. Under Python 3.14 + PEP 649,typing.get_type_hints()evaluates annotation names against module globals at runtime. These names won't exist in the namespace when introspectors (doc generators, type checkers, Litestar's plugin loader) callget_type_hints(), causingNameError.The codebase has an established pattern for this: see
src/synthorg/api/app_helpers.py:15-23, which explicitly importsChannelsPluginat runtime (withnoqa: TC002) because it appears in public function signatures.Move these four imports out of
TYPE_CHECKING.Also applies to: 121-126
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/api/app_builders.py` around lines 21 - 29, The four types used in the public function signatures (ChiefOfStaffConfig, ProviderRegistry, CostTracker, ChiefOfStaffChat) are only imported under TYPE_CHECKING which makes them unavailable at runtime for typing.get_type_hints; move their imports out of the TYPE_CHECKING block so they are real module-level names (follow the existing pattern used for ChannelsPlugin imports) and ensure both occurrences referenced in this file (the earlier import block and the later block around the other occurrences) are updated so build_chief_of_staff_chat and any other public functions can be introspected without NameError.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/api/controllers/meta.py`:
- Around line 329-345: The 503 branches skip the standard logging; before
raising ServiceUnavailableError for missing chief_of_staff_chat or
signals_service, call the existing AppState accessors or _require_service() (or
equivalent methods) to emit the standard WARNING/ERROR logs, or explicitly log a
warning with context using the same logger/format as other service checks
(reference app_state, chief_of_staff_chat, signals_service, and
AppState._require_service); then raise the ServiceUnavailableError. Ensure
messages include which service is missing and relevant context (e.g.,
meta.chief_of_staff.chat_enabled) to match project logging conventions.
In `@src/synthorg/engine/strategy/context.py`:
- Around line 140-166: The code parses JSON from MemoryBackend into a raw dict
and hand-validates fields instead of using the centralized args model; update
MemoryContextProvider's parsing to run the decoded payload through parse_typed()
with a small overrides args model (e.g., MemoryOverridesArgs) to
validate/normalize fields, then derive overrides from the parsed model instead
of ad hoc checks over payload and _OVERRIDABLE_FIELDS; keep the same fallback
flow by calling self._fallback.provide(config=config) when parse_typed raises or
the model indicates invalid/missing data, and use the model's typed attributes
when constructing the overrides passed to provide().
- Around line 108-110: The code emits STRATEGY_CONTEXT_BUILT before calling the
fallback, which double-counts builds; change the logic in the branch where
self._memory_backend is None (and where self._fallback.provide(config=...) is
called) so that you either (a) await self._fallback.provide first and then call
logger.debug(STRATEGY_CONTEXT_BUILT, source="memory_no_backend") after the
fallback returns, or (b) replace the emitted event with
STRATEGY_CONTEXT_MEMORY_QUERIED (or another non-build outcome) before awaiting
the fallback; update references around self._memory_backend,
self._fallback.provide, STRATEGY_CONTEXT_BUILT and
STRATEGY_CONTEXT_MEMORY_QUERIED accordingly.
In `@tests/evals/prompt/test_code_modification_prompt.py`:
- Around line 42-51: Replace fragile substring assertions with an AST-based
check: parse the string returned by inspect.getsource(CodeModificationStrategy)
(the source variable) with ast.parse, find the ClassDef for
CodeModificationStrategy, locate the Call node where CompletionConfig is
constructed, and assert that there are keyword arguments named "temperature" and
"max_tokens" whose values are ast.Attribute nodes referencing
self._code_config.temperature and self._code_config.max_tokens (i.e.,
ast.Attribute(value=ast.Attribute(value=Name(id='self'), attr='_code_config'),
attr='temperature'/'max_tokens')). Use this structural check instead of text
contains to ensure the constructor binds to self._code_config attributes.
In `@tests/evals/prompt/test_supervisor_router_prompt.py`:
- Around line 76-83: Replace the brittle source-string count with an AST-based
check: parse the string from inspect.getsource(supervisor) using ast.parse, walk
Call nodes and count calls whose function name/attribute is "complete" (cover
both attribute and direct name) and which have a keyword named "config" whose
value is an ast.Name with id "_ROUTING_COMPLETION_CONFIG"; then assert that this
AST-derived count is >= 2 to ensure both _route_via_llm and _evaluate_via_llm
call complete(config=_ROUTING_COMPLETION_CONFIG).
In `@tests/unit/engine/strategy/test_context.py`:
- Around line 288-294: Test currently only covers the degraded/no-backend path
for ContextSource.COMPOSITE; add a unit that constructs a StrategyConfig where
context.source == ContextSource.COMPOSITE and context.memory_backend (or the
appropriate StrategicContextConfig field) is set to a real/injected backend,
then call build_context(config) and assert the returned context reflects the
backend-backed path (e.g., maturity_stage or other backend-driven fields).
Locate the test function test_composite_source and create a variant (or extend
it) that injects a concrete memory backend instance into the
StrategicContextConfig, awaits build_context, and asserts the expected values to
prevent regressions in the COMPOSITE wrapper logic.
---
Outside diff comments:
In `@src/synthorg/api/app_builders.py`:
- Around line 21-29: The four types used in the public function signatures
(ChiefOfStaffConfig, ProviderRegistry, CostTracker, ChiefOfStaffChat) are only
imported under TYPE_CHECKING which makes them unavailable at runtime for
typing.get_type_hints; move their imports out of the TYPE_CHECKING block so they
are real module-level names (follow the existing pattern used for ChannelsPlugin
imports) and ensure both occurrences referenced in this file (the earlier import
block and the later block around the other occurrences) are updated so
build_chief_of_staff_chat and any other public functions can be introspected
without NameError.
In `@tests/unit/meta/test_appliers.py`:
- Around line 815-828: Extend the existing test for tool_access validation to
cover tuple inputs as well: update or parametrize
test_dry_run_tool_access_valid_passes to call ArchitectureApplier().dry_run with
a proposal whose payload.tool_access is a tuple (e.g., ("git","shell")) in
addition to the existing list case so both list and tuple inputs are asserted to
pass; reference the test name test_dry_run_tool_access_valid_passes, the
ArchitectureApplier class, the dry_run method, and the tool_access payload key
when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3953c9b5-65aa-43f6-bc46-b869e7cd2ef3
📒 Files selected for processing (33)
docs/reference/py314-flake-investigation-2026-05.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/app.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/meta/models.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/persistence/sqlite/approval_repo.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/api/controllers/test_meta_chat.pytests/unit/engine/middleware/test_coordination_constraints.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/engine/strategy/test_context.pytests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/meta/test_appliers.pytests/unit/meta/test_code_modification_strategy.pytests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL.
Files:
src/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/api/app_builders.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/meta/models.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/engine/strategy/context.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default viaSettingsService/ConfigResolver; noos.environ.getoutside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live insettings/definitions/. Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced bycheck_no_review_origin_in_code.pyandcheck_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14+ has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit directly fromException,RuntimeError, etc. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use@computed_fieldfor derived; useNotBlankStrfor identifiers.
Use args models at every system boundary; callparse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries.
Async: useasyncio.TaskGroupfor fan-out/fan-in; helpers catchException(re-raiseMemoryError/RecursionError).
Clock seam: injectclock: Clock | None = Noneparameter; tests injectFakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): usewrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML.
Import logger asfrom synthorg.observability import get_logger; variable always namedlogger. Neverimport loggingorprint()in app code.
Event names fromobservability.events.<domain>constants; use stru...
Files:
src/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/api/app_builders.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/meta/models.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/strategy/context.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use
example-provider,test-provider,example-{large,medium,small}-001. Allowed only in.claude/, third-party imports,providers/presets.py,web/public/provider-logos/.
Files:
src/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pytests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/middleware/coordination_protocol.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_strategies.pytests/unit/meta/test_code_modification_strategy.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/api/app_builders.pytests/evals/prompt/test_health_judge_behaviour.pysrc/synthorg/meta/chief_of_staff/chat.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/meta/models.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/config_tuning.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_supervisor_router_prompt.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/strategy/context.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/api/app_builders.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/meta/models.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/strategy/context.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers: use@pytest.mark.{unit,integration,e2e,slow}. Async tests useauto. Global timeout 30s. Coverage 80% minimum.
xdist: use-n 8 --dist=loadfile(auto-applied via pyprojectaddopts;loadfileprevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declaresspec=ConcreteClass; baseline atscripts/mock_spec_baseline.txt.
FakeClock: import fromtests._shared.fake_clock; inject viaclock=parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Useasyncio.Event().wait()instead ofsleep(large)to avoid flaky tests.
Files:
tests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_strategies.pytests/unit/meta/test_code_modification_strategy.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_procedural_memory_prompt.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_strategies.pytests/unit/meta/test_code_modification_strategy.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_procedural_memory_prompt.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/**/*.md: Numerics in README and public docs sourced fromdata/runtime_stats.yamlvia<!--RS:NAME-->markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.
Files:
docs/reference/py314-flake-investigation-2026-05.md
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD: implementsave(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples.
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed.
Files:
src/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/persistence/postgres/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pytests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/engine/middleware/coordination_protocol.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_appliers.pytests/unit/meta/test_strategies.pytests/unit/meta/test_code_modification_strategy.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/api/app_builders.pytests/evals/prompt/test_health_judge_behaviour.pysrc/synthorg/meta/chief_of_staff/chat.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/meta/models.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/meta/strategies/config_tuning.pytests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_supervisor_router_prompt.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/strategy/context.pytests/unit/engine/strategy/test_context.pytests/unit/engine/middleware/test_coordination_constraints.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md
[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (23)
src/synthorg/persistence/sqlite/approval_repo.py (2)
327-338: Status parameterization inexpire_if_pendinglooks correct.This keeps the compare-and-set query atomic while removing enum-value interpolation from SQL text.
453-453: Default pagination line change is safe.The inline lint annotation does not alter repository behavior.
src/synthorg/persistence/postgres/approval_repo.py (2)
305-317: Good update: status values are now fully bound in the UPDATE.This aligns the Postgres path with secure parameterized SQL handling while preserving CAS semantics.
395-395:list_itemsdefault-limit annotation is non-functional and safe.docs/reference/py314-flake-investigation-2026-05.md (1)
1-66: LGTM! Thorough investigation documentation.This document provides clear investigation methodology, detailed root cause analysis, and actionable future steps. The Markdown table is properly formatted per coding guidelines, and the content effectively communicates why no further isolation work is warranted.
Note: The LanguageTool hint flagging line 33 is a false positive—the spelling and grammar are correct.
src/synthorg/engine/middleware/coordination_protocol.py (2)
15-30: Good type import alignment for coordination pipeline results.The imported result types are clear and match the narrowed context contract.
66-81: Type narrowing onCoordinationMiddlewareContextlooks solid.Switching these fields to concrete result types improves validation and downstream safety.
tests/unit/engine/middleware/test_coordination_protocol.py (1)
301-301: Targeted type-ignore update is appropriate.Using a specific
assignmentignore here is cleaner and better scoped.src/synthorg/engine/middleware/coordination_constraints.py (1)
147-151: Canonicalrollup.completedusage is the right fix.The new snapshot source is explicit and avoids dynamic attribute ambiguity.
tests/unit/engine/middleware/test_coordination_constraints.py (2)
11-19: Typed helper refactor is well done.These helper and context signature changes make the tests much closer to real model contracts.
Also applies to: 37-37, 80-119
144-144: Call-site updates correctly follow the new typed fixtures.The revised inputs are consistent with the narrowed middleware context types.
Also applies to: 158-158, 182-182, 192-192, 243-243
src/synthorg/meta/strategies/architecture.py (1)
10-11: JsonValue narrowing is consistent and safe.This typing refinement is clean and aligns with JSON-shaped signal context usage in these proposal builders.
Also applies to: 100-101, 146-147
src/synthorg/meta/strategies/prompt_tuning.py (1)
11-12: Type tightening forctxlooks correct.The
JsonValueannotation matches how rule context is consumed here and keeps behavior unchanged.Also applies to: 152-153
src/synthorg/meta/strategies/config_tuning.py (1)
13-14: Consistent JSON context typing across proposal helpers.Good sweep: all
_propose_*context parameters now use the samedict[str, JsonValue]contract.Also applies to: 118-119, 167-168, 212-213, 262-263, 308-309, 352-353
src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)
39-40: Deterministic routing/retry wiring is implemented correctly.Using a shared pinned
CompletionConfig(temperature=0.0)and passing it at both completion call sites satisfies the non-drift determinism goal.Also applies to: 83-87, 235-236, 304-305
src/synthorg/hr/promotion/config.py (1)
8-11: Model-map typing and immutability handling look solid.Switching to
Mapping[str, str]and preserving read-only semantics viaMappingProxyType(copy.deepcopy(...))is a good tightening.Also applies to: 81-85, 96-103
tests/evals/prompt/test_health_judge_behaviour.py (1)
43-172: Excellent deterministic behavioral coverage forHealthJudge.emit_ticket.The labelled examples plus explicit severity-threshold tests provide strong regression protection for heuristic logic.
src/synthorg/meta/models.py (1)
10-18: JsonValue migration is correctly applied.These model field changes tighten JSON typing without introducing incompatibilities in the shown validators or model contracts.
Also applies to: 124-124, 160-161, 180-180, 439-439
tests/unit/meta/test_strategies.py (1)
4-4: Test typing alignment looks good.The
JsonValueannotations in helper/context fixtures are consistent with the production model narrowing and keep tests type-accurate.Also applies to: 66-67, 156-163
tests/unit/meta/test_code_modification_strategy.py (1)
7-7: Helper context typing update is correct.
JsonValueusage in_rulekeeps test inputs consistent with the stricter JSON-typed signal context contract.Also applies to: 67-68
scripts/mock_spec_baseline.txt (1)
2243-2249: Mock baseline coordinate updates are consistent.The adjusted entries maintain expected format/order for the frozen bare-mock allowlist.
tests/unit/meta/rollout/test_inverse_dispatch.py (1)
4-4: Rollback helper typing is correctly tightened.
_op(previous_value: JsonValue)now aligns with the production rollback model and keeps test fixtures schema-accurate.Also applies to: 60-61
tests/evals/prompt/test_procedural_memory_prompt.py (1)
1-100: New prompt-contract eval suite looks solid.The tests provide clear guardrails for prompt drift and completion-config wiring on the procedural memory proposer path.
b805d5b to
50d1260
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/api/state_services_facades_mcp3.py (1)
42-42:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate stale service count in class docstring.
Line 42 says this mixin covers five services, but it now exposes six (including
chief_of_staff_chat).Suggested patch
- """Facade accessors for the five META-MCP-3 services.""" + """Facade accessors for the six META-MCP-3 services."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/api/state_services_facades_mcp3.py` at line 42, Update the stale docstring that currently says "Facade accessors for the five META-MCP-3 services" to reflect that the mixin now exposes six services (including chief_of_staff_chat); edit the docstring in src/synthorg/api/state_services_facades_mcp3.py (the class/mixin docstring) to mention six services and optionally list or acknowledge the newly exposed chief_of_staff_chat to keep the comment accurate.src/synthorg/api/controllers/meta.py (1)
306-374: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSet
status_code=200for this query-only POST endpoint.Litestar 2.21.1 defaults POST handlers to
201 Createdwhenstatus_codeis omitted. The/meta/chatendpoint queries the ChiefOfStaffChat backend and returns computed results—it does not create a server resource. POST operations without side effects should return200 OKinstead.Proposed fix
`@post`( "/chat", + status_code=200, guards=[ require_org_mutation(), per_op_rate_limit_from_policy("meta.chat", key="user"), ], )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/api/controllers/meta.py` around lines 306 - 374, The POST handler method chat should explicitly set status_code=200 on its `@post` decorator because it is a query-only endpoint (no resource creation) and Litestar defaults POST to 201; update the `@post`(...) call that decorates async def chat(...) to include status_code=200 so the endpoint returns 200 OK for successful queries to ChiefOfStaffChat.
♻️ Duplicate comments (2)
tests/evals/prompt/test_procedural_memory_prompt.py (1)
44-52: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winHarden source-contract checks by using AST instead of raw substring matching.
These two assertions are format-sensitive and can pass/fail for textual reasons rather than semantic wiring. Switching to AST keeps the contract robust across harmless refactors.
Proposed refactor
+import ast import inspect @@ from synthorg.memory.procedural.proposer import ProceduralMemoryProposer source = inspect.getsource(ProceduralMemoryProposer) - assert "temperature=config.temperature" in source, ( + tree = ast.parse(source) + config_calls = [ + node + for node in ast.walk(tree) + if isinstance(node, ast.Call) + and isinstance(node.func, ast.Name) + and node.func.id == "CompletionConfig" + ] + assert config_calls, "expected at least one CompletionConfig(...) call" + call = config_calls[0] + assert any( + kw.arg == "temperature" + and ast.unparse(kw.value) == "config.temperature" + for kw in call.keywords + ), ( "ProceduralMemoryProposer must build CompletionConfig from " "config.temperature so the value stays runtime-tunable" ) - assert "max_tokens=config.max_tokens" in source, ( + assert any( + kw.arg == "max_tokens" + and ast.unparse(kw.value) == "config.max_tokens" + for kw in call.keywords + ), ( "ProceduralMemoryProposer must read max_tokens from " "ProceduralMemoryConfig so token budgets stay tunable" ) @@ from synthorg.memory.procedural.proposer import ProceduralMemoryProposer source = inspect.getsource(ProceduralMemoryProposer.propose) - assert "config=self._completion_config" in source, ( + tree = ast.parse(source) + complete_calls = [ + node + for node in ast.walk(tree) + if isinstance(node, ast.Call) + and isinstance(node.func, ast.Attribute) + and node.func.attr == "complete" + ] + assert any( + any( + kw.arg == "config" + and ast.unparse(kw.value) == "self._completion_config" + for kw in call.keywords + ) + for call in complete_calls + ), ( "ProceduralMemoryProposer.propose must pass " "config=self._completion_config to provider.complete; a " "fresh inline CompletionConfig() construction would defeat " "the runtime-tunable contract" )Also applies to: 93-99
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/evals/prompt/test_procedural_memory_prompt.py` around lines 44 - 52, Replace fragile substring assertions in the test with AST-based checks: parse the source of ProceduralMemoryProposer (inspect.getsource) into an AST and verify that the code constructs a CompletionConfig (or calls a function/constructor named CompletionConfig) with keyword arguments that bind to config.temperature and config.max_tokens respectively; update the checks around the other similar assertions (the second group referencing the same class) to use the same AST pattern-matching so the test verifies semantic wiring rather than exact formatting.src/synthorg/engine/strategy/context.py (1)
157-170: 🛠️ Refactor suggestion | 🟠 MajorThe typed args model is good, but this boundary still skips
parse_typed().
json.loads()is still ingesting backend data and handing it directly tomodel_validate(), so this path bypasses the repository’s standard boundary parser. Please rundecodedthroughparse_typed()with_StrategicContextOverridesArgsand keep the same fallback behavior around validation failure.As per coding guidelines, "Use args models at every system boundary; call
parse_typed()for every external dict ingestion."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/engine/strategy/context.py` around lines 157 - 170, The code currently calls _StrategicContextOverridesArgs.model_validate(decoded) directly after json.loads, bypassing the repository's boundary parser; replace that direct validation with a call to parse_typed(_StrategicContextOverridesArgs, decoded) and use its result for args, preserving the current except/ fallback behavior: if parse_typed raises/returns a validation error, log STRATEGY_CONTEXT_PROVIDER_FAILED with the same provider_name/stage/error info and return await self._fallback.provide(config=config); ensure references to _StrategicContextOverridesArgs, parse_typed(), STRATEGY_CONTEXT_PROVIDER_FAILED and self._fallback.provide(...) are used so the external dict always passes through parse_typed before being accepted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/observability/events/meta.py`:
- Around line 176-182: The comment explaining why the chat endpoint bypasses
AppState._require_service is inaccurate: update the comment text so it states
that the central accessor raises ServiceUnavailableError (503) rather than a
generic 500; reference AppState._require_service and ServiceUnavailableError in
the updated comment near the POST /meta/chat explanation so future readers
understand the endpoint must surface 503 (service unavailable) instead of
relying on the central accessor.
In `@tests/unit/api/controllers/test_meta_chat.py`:
- Around line 77-110: In the test_returns_chat_payload_when_wired test, add
assertions to verify the controller actually forwards the ChatQuery and snapshot
to the backend: assert chat_mock.ask was awaited once with a ChatQuery instance
containing the expected question and any forwarded fields like
proposal_id/alert_id, and assert the snapshot passed equals the value returned
by signals_mock.get_org_snapshot (use the _empty_snapshot() value). Reference
the test function name, chat_mock.ask, ChatQuery, and
signals_mock.get_org_snapshot when implementing these assertions to ensure the
wiring is covered.
---
Outside diff comments:
In `@src/synthorg/api/controllers/meta.py`:
- Around line 306-374: The POST handler method chat should explicitly set
status_code=200 on its `@post` decorator because it is a query-only endpoint (no
resource creation) and Litestar defaults POST to 201; update the `@post`(...) call
that decorates async def chat(...) to include status_code=200 so the endpoint
returns 200 OK for successful queries to ChiefOfStaffChat.
In `@src/synthorg/api/state_services_facades_mcp3.py`:
- Line 42: Update the stale docstring that currently says "Facade accessors for
the five META-MCP-3 services" to reflect that the mixin now exposes six services
(including chief_of_staff_chat); edit the docstring in
src/synthorg/api/state_services_facades_mcp3.py (the class/mixin docstring) to
mention six services and optionally list or acknowledge the newly exposed
chief_of_staff_chat to keep the comment accurate.
---
Duplicate comments:
In `@src/synthorg/engine/strategy/context.py`:
- Around line 157-170: The code currently calls
_StrategicContextOverridesArgs.model_validate(decoded) directly after
json.loads, bypassing the repository's boundary parser; replace that direct
validation with a call to parse_typed(_StrategicContextOverridesArgs, decoded)
and use its result for args, preserving the current except/ fallback behavior:
if parse_typed raises/returns a validation error, log
STRATEGY_CONTEXT_PROVIDER_FAILED with the same provider_name/stage/error info
and return await self._fallback.provide(config=config); ensure references to
_StrategicContextOverridesArgs, parse_typed(), STRATEGY_CONTEXT_PROVIDER_FAILED
and self._fallback.provide(...) are used so the external dict always passes
through parse_typed before being accepted.
In `@tests/evals/prompt/test_procedural_memory_prompt.py`:
- Around line 44-52: Replace fragile substring assertions in the test with
AST-based checks: parse the source of ProceduralMemoryProposer
(inspect.getsource) into an AST and verify that the code constructs a
CompletionConfig (or calls a function/constructor named CompletionConfig) with
keyword arguments that bind to config.temperature and config.max_tokens
respectively; update the checks around the other similar assertions (the second
group referencing the same class) to use the same AST pattern-matching so the
test verifies semantic wiring rather than exact formatting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e0d9f28-e3d1-429a-8ebf-59f77009f665
📒 Files selected for processing (34)
docs/reference/py314-flake-investigation-2026-05.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/app.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state.pysrc/synthorg/api/state_services_facades.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/meta/models.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/observability/events/meta.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/persistence/sqlite/approval_repo.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_supervisor_router_prompt.pytests/unit/api/controllers/test_meta_chat.pytests/unit/engine/middleware/test_coordination_constraints.pytests/unit/engine/middleware/test_coordination_protocol.pytests/unit/engine/strategy/test_context.pytests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/meta/test_appliers.pytests/unit/meta/test_code_modification_strategy.pytests/unit/meta/test_strategies.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Test (Python 3.14)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL.
Files:
src/synthorg/api/state_services_facades.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/observability/events/meta.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/models.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/strategy/context.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default viaSettingsService/ConfigResolver; noos.environ.getoutside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live insettings/definitions/. Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced bycheck_no_review_origin_in_code.pyandcheck_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14+ has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit directly fromException,RuntimeError, etc. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use@computed_fieldfor derived; useNotBlankStrfor identifiers.
Use args models at every system boundary; callparse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries.
Async: useasyncio.TaskGroupfor fan-out/fan-in; helpers catchException(re-raiseMemoryError/RecursionError).
Clock seam: injectclock: Clock | None = Noneparameter; tests injectFakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): usewrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML.
Import logger asfrom synthorg.observability import get_logger; variable always namedlogger. Neverimport loggingorprint()in app code.
Event names fromobservability.events.<domain>constants; use stru...
Files:
src/synthorg/api/state_services_facades.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/observability/events/meta.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/models.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/strategy/context.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use
example-provider,test-provider,example-{large,medium,small}-001. Allowed only in.claude/, third-party imports,providers/presets.py,web/public/provider-logos/.
Files:
src/synthorg/api/state_services_facades.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pytests/unit/engine/middleware/test_coordination_protocol.pysrc/synthorg/api/state_services_facades_mcp3.pytests/unit/meta/test_strategies.pysrc/synthorg/persistence/postgres/approval_repo.pytests/unit/api/controllers/test_meta_chat.pysrc/synthorg/engine/middleware/coordination_protocol.pytests/unit/meta/test_code_modification_strategy.pysrc/synthorg/observability/events/meta.pytests/unit/meta/test_appliers.pysrc/synthorg/meta/strategies/architecture.pytests/evals/prompt/test_supervisor_router_prompt.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/hr/promotion/config.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/models.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/app.pytests/unit/engine/middleware/test_coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/controllers/meta.pytests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/meta/chief_of_staff/chat.pytests/unit/engine/strategy/test_context.pysrc/synthorg/engine/strategy/context.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/state_services_facades.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/engine/middleware/coordination_protocol.pysrc/synthorg/observability/events/meta.pysrc/synthorg/meta/strategies/architecture.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/hr/promotion/config.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/models.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/app.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/meta/chief_of_staff/chat.pysrc/synthorg/engine/strategy/context.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/**/*.md: Numerics in README and public docs sourced fromdata/runtime_stats.yamlvia<!--RS:NAME-->markers.
Web dashboard uses D2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Use Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI v0.7.1 pinned in CI.
Files:
docs/reference/py314-flake-investigation-2026-05.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers: use@pytest.mark.{unit,integration,e2e,slow}. Async tests useauto. Global timeout 30s. Coverage 80% minimum.
xdist: use-n 8 --dist=loadfile(auto-applied via pyprojectaddopts;loadfileprevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declaresspec=ConcreteClass; baseline atscripts/mock_spec_baseline.txt.
FakeClock: import fromtests._shared.fake_clock; inject viaclock=parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Useasyncio.Event().wait()instead ofsleep(large)to avoid flaky tests.
Files:
tests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_strategies.pytests/unit/api/controllers/test_meta_chat.pytests/unit/meta/test_code_modification_strategy.pytests/unit/meta/test_appliers.pytests/evals/prompt/test_supervisor_router_prompt.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/unit/engine/middleware/test_coordination_constraints.pytests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/engine/strategy/test_context.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/middleware/test_coordination_protocol.pytests/unit/meta/test_strategies.pytests/unit/api/controllers/test_meta_chat.pytests/unit/meta/test_code_modification_strategy.pytests/unit/meta/test_appliers.pytests/evals/prompt/test_supervisor_router_prompt.pytests/evals/prompt/test_code_modification_prompt.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pytests/unit/engine/middleware/test_coordination_constraints.pytests/unit/meta/rollout/test_inverse_dispatch.pytests/unit/engine/strategy/test_context.py
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/persistence/**/*.py: Repository CRUD: implementsave(entity),get(id),delete(id) -> bool,list_items(...),query(...)returning tuples.
Datetime in persistence: useparse_iso_utc/format_iso_utcfrompersistence._shared(reject naive); usenormalize_utcfor already-typed.
Files:
src/synthorg/persistence/postgres/approval_repo.pysrc/synthorg/persistence/sqlite/approval_repo.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/api/state_services_facades.pysrc/synthorg/memory/retrieval/hierarchical/supervisor.pysrc/synthorg/observability/events/strategy.pytests/unit/engine/middleware/test_coordination_protocol.pysrc/synthorg/api/state_services_facades_mcp3.pytests/unit/meta/test_strategies.pysrc/synthorg/persistence/postgres/approval_repo.pytests/unit/api/controllers/test_meta_chat.pysrc/synthorg/engine/middleware/coordination_protocol.pytests/unit/meta/test_code_modification_strategy.pysrc/synthorg/observability/events/meta.pytests/unit/meta/test_appliers.pysrc/synthorg/meta/strategies/architecture.pytests/evals/prompt/test_supervisor_router_prompt.pysrc/synthorg/api/state.pysrc/synthorg/meta/strategies/config_tuning.pysrc/synthorg/persistence/sqlite/approval_repo.pysrc/synthorg/hr/promotion/config.pytests/evals/prompt/test_code_modification_prompt.pysrc/synthorg/engine/middleware/coordination_constraints.pysrc/synthorg/meta/models.pytests/evals/prompt/test_procedural_memory_prompt.pytests/evals/prompt/test_health_judge_behaviour.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/app.pytests/unit/engine/middleware/test_coordination_constraints.pysrc/synthorg/meta/strategies/prompt_tuning.pysrc/synthorg/api/controllers/meta.pytests/unit/meta/rollout/test_inverse_dispatch.pysrc/synthorg/meta/chief_of_staff/chat.pytests/unit/engine/strategy/test_context.pysrc/synthorg/engine/strategy/context.py
🪛 LanguageTool
docs/reference/py314-flake-investigation-2026-05.md
[grammar] ~33-~33: Ensure spelling is correct
Context: ... deterministic environment quirk, not a flake. Subsequent runs of the same test on id...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (26)
tests/unit/meta/test_appliers.py (1)
816-822: Good correction to keep this test on the real JSON-validated path.Switching
tool_accessto a list and documenting why tuple input is unreachable at runtime makes this test more accurate and less misleading.Also applies to: 830-830
src/synthorg/hr/promotion/config.py (2)
81-85: Type narrowing forseniority_model_maplooks good.
Mapping[str, str]at Line 81 cleanly removes loose typing and makes the config contract explicit.
96-103: Immutability enforcement is correctly retained after validation.The Line 99-Line 103
deepcopy+MappingProxyTypewrapping keeps the frozen config practically read-only for downstream consumers.src/synthorg/persistence/sqlite/approval_repo.py (2)
327-338: Status parameterization inexpire_if_pendinglooks correct.Binding both transition statuses and ids through
paramspreserves atomic update behavior while avoiding enum interpolation in SQL.
453-453: Intentional default page size is clearly documented.The explicit
lint-allowplus existing limit validation/clamp keeps pagination behavior controlled.src/synthorg/persistence/postgres/approval_repo.py (2)
305-317: Postgres CAS update is safely parameterized.Using bound params for both status values and the
ANY(%s)id list is a solid hardening change without altering semantics.
395-395: Pagination default change is controlled and readable.The default is explicit, and runtime bounds enforcement in the method continues to prevent oversized queries.
src/synthorg/engine/middleware/coordination_protocol.py (1)
15-30: Strong type narrowing on coordination pipeline context.This update cleanly replaces broad result placeholders with concrete result models and keeps the context contract aligned with downstream middleware expectations.
Also applies to: 66-81
tests/unit/engine/middleware/test_coordination_protocol.py (1)
301-301: Type-ignore refinement is precise and appropriate for this negative test.The annotation now matches the stricter assignment typing intent without changing runtime validation coverage.
src/synthorg/engine/middleware/coordination_constraints.py (1)
147-151: Canonical rollup progress source is correctly applied.Using
SubtaskStatusRollup.completeddirectly is clearer and keeps progress snapshots consistent round-to-round.tests/unit/engine/middleware/test_coordination_constraints.py (1)
80-119: Typed test fixtures/context updates are well aligned with the production model narrowing.The helper and callsite updates improve realism of unit inputs and keep tests synchronized with the stricter middleware context contracts.
Also applies to: 144-158, 182-243
scripts/mock_spec_baseline.txt (1)
2243-2249: Baseline refresh looks consistent.The updated
test_code_modification_strategycoordinates are aligned with the intended mock-spec baseline maintenance flow.src/synthorg/memory/retrieval/hierarchical/supervisor.py (1)
39-40: Deterministic routing/retry config is wired correctly.Centralizing
CompletionConfig(temperature=0.0)and passing it to both LLM call sites cleanly enforces reproducible supervisor decisions.Also applies to: 83-87, 107-109, 232-236, 301-305
tests/evals/prompt/test_supervisor_router_prompt.py (1)
30-31: Strong prompt-contract coverage.This suite pins determinism and prompt drift effectively, and the AST-based check for
config=_ROUTING_COMPLETION_CONFIGis a solid guard against brittle source-text matching.Also applies to: 34-42, 44-64, 66-102
tests/evals/prompt/test_health_judge_behaviour.py (1)
39-41: Behavioral eval coverage is clear and deterministic.The labelled matrix plus explicit HIGH/CRITICAL threshold assertions gives good regression protection for
HealthJudge.emit_ticket.Also applies to: 43-142, 143-172
src/synthorg/meta/models.py (1)
18-19:JsonValuetightening is applied consistently.These field-type updates correctly narrow payload/context contracts to JSON-compatible values without changing behavior semantics.
Also applies to: 124-124, 160-161, 180-180, 439-439
src/synthorg/meta/strategies/prompt_tuning.py (1)
11-12: Context typing alignment looks good.Using
dict[str, JsonValue]here is consistent with the tightened meta model context contract.Also applies to: 152-152
src/synthorg/meta/strategies/config_tuning.py (1)
13-14:JsonValuecontext migration is coherent across helpers.The updated annotations consistently reflect JSON-shaped signal-context inputs throughout this strategy.
Also applies to: 118-118, 167-167, 212-212, 262-262, 308-308, 352-352
src/synthorg/meta/strategies/architecture.py (1)
10-11: Architecture strategy typing update is consistent.
dict[str, JsonValue]on context inputs aligns cleanly with the upstream rule-context contract.Also applies to: 100-100, 146-146
tests/unit/meta/test_strategies.py (1)
4-4:JsonValuetyping updates are clean and consistent.These changes tighten test helper typing without altering behavior, and they align with the JSON-shaped model contracts.
Also applies to: 66-67, 156-157
tests/unit/meta/rollout/test_inverse_dispatch.py (1)
4-4: Rollback helper typing change looks good.
previous_value: JsonValueis a good fit for the rollback payload semantics and keeps the tests aligned with model typing.Also applies to: 60-61
tests/unit/meta/test_code_modification_strategy.py (1)
7-7: Type narrowing in_ruleis appropriate.The
JsonValueshift is consistent with the rest of the meta JSON typing cleanup.Also applies to: 67-68
tests/evals/prompt/test_code_modification_prompt.py (1)
32-77: Strong prompt-contract coverage here.The combination of AST-based config binding checks plus pinned prompt fingerprint and anti-injection assertions is solid.
Also applies to: 79-107
tests/unit/engine/strategy/test_context.py (1)
322-338: Good end-to-end pin for the COMPOSITE path.This closes the degraded-path-only gap and should catch wrapper regressions that the no-backend case would miss.
src/synthorg/meta/chief_of_staff/chat.py (1)
252-257: Good secure-error logging update.Structured
error_type+safe_error_description(exc)here is the right pattern before re-raising.src/synthorg/api/app.py (1)
964-991: Startup wiring is correctly idempotent now.The
has_chief_of_staff_chatguard prevents duplicate one-shot attachment on lifespan re-entry.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/evals/prompt/test_procedural_memory_prompt.py`:
- Around line 125-144: The current AST check collects any *.complete(...) call;
narrow it to only provider.complete calls used in
ProceduralMemoryProposer.propose by restricting complete_calls to nodes where
node.func.attr == "complete" AND node.func.value represents the provider object
(e.g., node.func.value is an ast.Name with id "provider" or an ast.Attribute
whose attr == "provider"); then keep the existing keyword check to assert that
one of those provider.complete calls includes config with ast.unparse(...) ==
"self._completion_config".
- Around line 63-77: The test currently assumes the first CompletionConfig call
is the one built from ProceduralMemoryConfig by using call = config_calls[0],
which is brittle; change the assertion to search across all entries in
config_calls and assert that there exists a call whose keywords include both
temperature == "config.temperature" and max_tokens == "config.max_tokens".
Concretely, replace the single-call checks that reference call.keywords with a
single any(...) over config_calls that inspects each call.keywords (using kw.arg
and ast.unparse(kw.value)) to ensure both required keyword conditions are true
for the same call (so ProceduralMemoryProposer builds CompletionConfig from
config.temperature and config.max_tokens).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ea8af612-7044-48e6-8b79-a7f8f59abd82
📒 Files selected for processing (7)
src/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/observability/events/meta.pytests/evals/prompt/test_procedural_memory_prompt.pytests/unit/api/controllers/test_meta_chat.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL.
Files:
src/synthorg/observability/events/meta.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Configuration precedence: DB > env > YAML > code default viaSettingsService/ConfigResolver; noos.environ.getoutside startup.
No hardcoded numeric values except allowlist: 0/1/-1, HTTP codes, hex masks, powers-of-2. Numerics live insettings/definitions/. Enforced byscripts/check_no_magic_numbers.py.
Comments explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced bycheck_no_review_origin_in_code.pyandcheck_no_migration_framing.py.
Do not usefrom __future__ import annotations(Python 3.14+ has PEP 649). Use PEP 758 except:except A, B:requires parens when binding.
Type hints on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines.
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit directly fromException,RuntimeError, etc. Enforced bycheck_domain_error_hierarchy.py.
Pydantic v2: frozen +extra="forbid"on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use@computed_fieldfor derived; useNotBlankStrfor identifiers.
Use args models at every system boundary; callparse_typed()for every external dict ingestion. Enforced bycheck_boundary_typed.py.
Immutability: usemodel_copy(update=...)orcopy.deepcopy(); deepcopy at system boundaries.
Async: useasyncio.TaskGroupfor fan-out/fan-in; helpers catchException(re-raiseMemoryError/RecursionError).
Clock seam: injectclock: Clock | None = Noneparameter; tests injectFakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable.
Untrusted content (SEC-1): usewrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML.
Import logger asfrom synthorg.observability import get_logger; variable always namedlogger. Neverimport loggingorprint()in app code.
Event names fromobservability.events.<domain>constants; use stru...
Files:
src/synthorg/observability/events/meta.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.py
{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use
example-provider,test-provider,example-{large,medium,small}-001. Allowed only in.claude/, third-party imports,providers/presets.py,web/public/provider-logos/.
Files:
src/synthorg/observability/events/meta.pytests/unit/api/controllers/test_meta_chat.pysrc/synthorg/api/state_services_facades_mcp3.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/meta.pysrc/synthorg/api/state_services_facades_mcp3.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers: use@pytest.mark.{unit,integration,e2e,slow}. Async tests useauto. Global timeout 30s. Coverage 80% minimum.
xdist: use-n 8 --dist=loadfile(auto-applied via pyprojectaddopts;loadfileprevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests useWindowsSelectorEventLoopPolicy(3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declaresspec=ConcreteClass; baseline atscripts/mock_spec_baseline.txt.
FakeClock: import fromtests._shared.fake_clock; inject viaclock=parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Useasyncio.Event().wait()instead ofsleep(large)to avoid flaky tests.
Files:
tests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/api/controllers/test_meta_chat.pytests/evals/prompt/test_procedural_memory_prompt.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/observability/events/meta.pytests/unit/api/controllers/test_meta_chat.pysrc/synthorg/api/state_services_facades_mcp3.pytests/evals/prompt/test_procedural_memory_prompt.pysrc/synthorg/engine/strategy/context.pysrc/synthorg/api/app_builders.pysrc/synthorg/api/controllers/meta.py
🔇 Additional comments (15)
src/synthorg/observability/events/meta.py (1)
173-183: LGTM!The new
META_CHAT_DEPENDENCY_UNAVAILABLEevent constant follows the establishedmeta.<subject>.<action>naming convention and the comment correctly explains why the endpoint bypassesAppState._require_service— to emit dependency-specific context beforeServiceUnavailableErrorsurfaces 503.src/synthorg/api/state_services_facades_mcp3.py (1)
203-224: LGTM!The
ChiefOfStaffChatfacade accessor trio follows the established pattern of the other META-MCP-3 services:has_*checks the slot, the property calls_require_service, andset_*calls_attach_servicewith audit logging.src/synthorg/api/app_builders.py (1)
132-176: LGTM!The
build_chief_of_staff_chatfactory follows the same provider-selection convention as_resolve_llm_judge_strategy: logs a warning and returnsNonewhen no providers are available, otherwise picks the first registered provider. The docstring correctly documents the 503 behavior whenNoneis returned.src/synthorg/api/controllers/meta.py (1)
306-379: LGTM!The chat endpoint implementation correctly:
- Sets
status_code=200with an explanatory comment (query-only, no resource created).- Logs
META_CHAT_DEPENDENCY_UNAVAILABLEwith dependency-specific context before raisingServiceUnavailableError.- Builds
ChatQuerywith the optionalproposal_id/alert_idand passes the snapshot to the backend.tests/unit/api/controllers/test_meta_chat.py (3)
77-128: LGTM!The test properly asserts the
ask()call contract as suggested in the past review: it verifiesget_org_snapshotwas awaited,chat_mock.askwas awaited with the correctChatQuery(includingproposal_id/alert_id) and the expected snapshot instance. Thespec=ChiefOfStaffChatandspec=SignalsServiceon the mocks comply with the mock-spec baseline requirement.
56-75: LGTM!The 503 test correctly exercises the missing-chat-backend guard and uses try/finally to restore state.
130-153: LGTM!The test correctly validates that a wired chat backend still returns 503 when
SignalsServiceis unavailable, covering the second dependency guard in the controller.tests/evals/prompt/test_procedural_memory_prompt.py (2)
32-34: Unit marker usage is correct.
@pytest.mark.unitis correctly applied for this test class.
79-107: Prompt drift and safety-contract guards look solid.The fingerprint pin plus anti-injection directive check gives strong regression coverage without coupling to full prompt bytes.
src/synthorg/engine/strategy/context.py (6)
8-27: LGTM!Import organization and logger setup follow project conventions. The explicit
builtinsimport supports the project's PEP 758except builtins.MemoryError, RecursionError:pattern for system-error propagation.
36-57: LGTM!The boundary args model is well-designed:
frozen=Trueensures immutabilitystrict=Trueprevents type coercion (non-strings fail validation)extra="ignore"is correctly documented for forward-compatibility with future enrichment fieldsNotBlankStr | Nonetyping enforces non-blank strings while allowing explicit nullsThe detailed docstring clearly explains the validation contract and fallback behavior.
59-97: LGTM!The
StrategicContextProviderprotocol correctly defines the async contract.ConfigContextProvideris a clean, minimal implementation with proper structured logging of the build event.
126-196: LGTM!The implementation properly addresses all failure modes with structured fallback:
- Memory backend absence → immediate fallback
- Retrieval errors → logged warning with
safe_error_description, then fallback- Empty results → debug log, then fallback
- JSON parse errors → warning log, then fallback
- Validation errors via
parse_typed()→ warning log, then fallbackThe
except builtins.MemoryError, RecursionError: raisepattern follows project convention for system-error propagation. Immutable update viamodel_copy(update=overrides)is correct.
199-237: LGTM!Provider chaining logic is robust:
- Non-empty providers validation in constructor
- Ordered iteration with first-success return
- System errors (
MemoryError,RecursionError) re-raised immediately- Provider failures logged with index and name for debugging
- Final
RuntimeErrorwith cause chain for the "should not happen" edge case
240-287: LGTM!The async factory correctly wires provider selection with proper
memory_backendpropagation. The scaffolding comment at lines 269-275 clarifies theCOMPOSITEbranch design intent — wrapping a singleMemoryContextProvidertoday enables one-line provider additions in the future without control-flow changes.
…ding Inline f-string interpolation of ApprovalStatus.EXPIRED.value / ApprovalStatus.PENDING.value in expire_if_pending UPDATE statements is replaced with bound parameters in both SQLite (?) and Postgres (%s) backends. The id-list interpolation remains under noqa: S608 because it is a fixed sequence of placeholders, not user data.
Tightens 10 Any-typed Pydantic fields per audit:
- ModelMappingConfig.seniority_model_map: dict[str, str]. The Mapping
input branch in the validator becomes dead code; Pydantic v2 already
coerces any Mapping-like input to dict during validation.
- CoordinationMiddlewareContext.{decomposition_result, routing_result,
dispatch_result, status_rollup}: their concrete types
(DecompositionResult, RoutingResult, DispatchResult,
SubtaskStatusRollup). ProgressLedgerMiddleware now reads the
canonical SubtaskStatusRollup.completed field directly instead of
getattr-with-default-zero, which always returned zero on real
rollups.
- Meta domain payload fields (RollbackOperation.previous_value,
ConfigChange.{old,new}_value, ArchitectureChange.payload,
RuleMatch.signal_context): JsonValue / dict[str, JsonValue].
The hierarchical retrieval router and retry-evaluator at src/synthorg/memory/retrieval/hierarchical/supervisor.py made LLM calls without a CompletionConfig, so sampling temperature fell back to provider defaults (typically non-zero). Routing decisions therefore varied across runs for the same query. Pin a module-level _ROUTING_COMPLETION_CONFIG with temperature=0.0 and pass it on both calls so worker selection and retry evaluation are deterministic. Matches the existing canonical pattern at synthorg/memory/retrieval/reranking/llm_reranker.py.
The audit flagged 'Python 3.14 test flakes from 2026-05-05 runs' as a sub-task of #1781. Investigating the actual failed CI runs that day shows the only failure on the Test (Python 3.14) job was a deterministic environment edge case in the newly-added test_gate_path_symlink_escape_treated_as_missing, not a flake or a 3.14-specific issue. Subsequent runs pass. Document the runs inspected, root cause, already-merged isolation mitigations (#1713, #1755), and the conditions under which a future investigation should pick this up again.
Replace the silent placeholder response with real DI: add a chief_of_staff_chat slot on AppState, attach it at startup via a new build_chief_of_staff_chat helper that picks an LLM provider from the provider registry when meta.chief_of_staff.chat_enabled is True, and update the controller to call ChiefOfStaffChat.ask using the current SignalsService snapshot. When the chat backend or SignalsService is unavailable, the endpoint now surfaces an explicit 503 ServiceUnavailableError instead of the silent fake answer with confidence=0.0 it returned previously.
Make StrategicContextProvider.provide async across the protocol and the three implementations (Config / Memory / Composite). MemoryContextProvider now accepts an optional MemoryBackend and queries it at provide-time for SEMANTIC entries on the synthetic system:strategy agent tagged strategic-context, parsing the entry content as a JSON object of overridable fields (maturity_stage / industry / competitive_position). Layered overrides are applied on top of the fallback context. Missing backend, retrieval errors, malformed JSON, and non-string overrides all degrade gracefully to the fallback. build_context() takes an optional memory_backend kwarg and is now async. The function had no production callers; only tests use it, and they have been updated. Add STRATEGY_CONTEXT_MEMORY_QUERIED observability event.
Reuse the existing tests/evals/prompt/_harness.py primitives to add contract, fingerprint, and call-site coverage for surfaces the audit flagged as unevalued: - supervisor_router_prompt: temperature=0.0 pinned config check, pinned fingerprints for the routing and retry system prompts, and a call- site assertion that both provider.complete() invocations pass the pinned _ROUTING_COMPLETION_CONFIG. - health_judge_behaviour: heuristic HealthJudge has no LLM prompt, so the eval is a labelled set of (termination_reason, has_recovery, signals) inputs grading the resulting EscalationCause at 100% accuracy, plus severity tier checks at threshold and 2x threshold. - code_modification_prompt: temperature stays config-driven, the _SYSTEM_PROMPT fingerprint is pinned, and the prompt carries the untrusted-content directive. - procedural_memory_prompt: same shape as code_modification, plus a call-site check that propose() passes the pre-built self._completion_config rather than a fresh inline construction. LLM grader coverage already lives in tests/evals/prompt/test_rubric_grader_prompt.py.
Followup to the JsonValue tightening: update the meta proposal-builder strategy methods (config_tuning / architecture / prompt_tuning) to take ctx as dict[str, JsonValue] instead of dict[str, object] so the value flowing in from RuleMatch.signal_context type-checks. Update affected test helpers to match. Regenerate scripts/mock_spec_baseline.txt to account for the line-number shifts introduced by the new 'from pydantic import JsonValue' import in two test files (and to drop stale entries for tests that now use spec= on their MagicMocks).
The pre-push magic-numbers gate flags pre-existing safe defaults in any file touched by this PR. Annotate the three call sites (SupervisorRouter.max_workers_per_query, SupervisorRouter.max_retry_count, PostgresApprovalRepository.list_items.limit, SqliteApprovalRepository. list_items.limit) with lint-allow comments rather than migrate them to settings/definitions/, which is out of scope for this cleanup PR.
…ootstrap interval Same pre-push gate as the previous commit, this round flagging pre-existing defaults in api/app.py:_DEFAULT_TIMEOUT_CHECK_INTERVAL_SECONDS and the four list_* paginated controllers in api/controllers/meta.py (default page size 50). Annotate with lint-allow rather than migrate to settings/definitions/, which is out of scope for this cleanup PR.
…ff chat Two pre-PR review fixes: - chief_of_staff/chat.py:253: replace logger.exception(COS_CHAT_FAILED) with logger.error(..., error_type=type(exc).__name__, error=safe_error_description(exc)) so LiteLLM provider exception strings (which can carry api_key / auth_token data) are no longer written verbatim into the structured log via exc_info=True. - api/controllers/meta.py:329-345: read app_state.chief_of_staff_chat and app_state.signals_service into local variables once before the None check so the typed accessors aren't called twice across the ServiceUnavailableError gate.
- app.py: guard _wire_chief_of_staff_chat with has_chief_of_staff_chat for lifespan-restart idempotence - engine/strategy/context.py: comment ContextSource.COMPOSITE scaffolding intent - hr/promotion/config.py: annotate seniority_model_map as Mapping[str, str] to match MappingProxyType runtime wrap Skipped 2 gemini-code-assist findings (context.py:121, 201) -- claimed comma-separated except is a SyntaxError, but project targets Python 3.14 with PEP 758 unparenthesized multi-exception clauses (no 'as' binding). Per project CLAUDE.md and CodeRabbit's repo config: 'except A, B:' is correct and mandatory.
- meta.py: log META_CHAT_DEPENDENCY_UNAVAILABLE warning before raising 503 from /meta/chat (chat backend and signals service guards). - engine/strategy/context.py: drop redundant STRATEGY_CONTEXT_BUILT log on no-backend branch (fallback emits its own); validate memory payload via _StrategicContextOverridesArgs Pydantic model with strict and extra=ignore (typed boundary, kept in-module to avoid an api into engine layer crossing). - observability/events/meta.py: add META_CHAT_DEPENDENCY_UNAVAILABLE event constant. - app_builders.py: hoist CostTracker, ChiefOfStaffChat, ChiefOfStaffConfig, and ProviderRegistry imports out of TYPE_CHECKING for PEP 649 runtime resolution (matches app_helpers.py precedent). - test_code_modification_prompt.py: AST match for CompletionConfig kwargs instead of substring contains. - test_supervisor_router_prompt.py: AST walk for complete kwarg call sites instead of source.count. - test_context.py: rename to test_falls_back_when_any_field_is_blank_or_non_string for whole-payload-rejection contract; add test_ignores_unknown_payload_fields and test_composite_source_with_backend_applies_overrides. Skipped 1 coderabbit finding (test_appliers.py:815-828, add tuple coverage for tool_access) as factually wrong: ArchitectureChange.payload is JSON-typed and the upstream Pydantic validator rejects tuples with 'input was not a valid JSON value' before reaching the list-or-tuple branch in _validate_tool_access. Tuple inputs are unreachable in practice; the defensive isinstance check is dead code.
- app_builders.py: revert TYPE_CHECKING hoist from round 2 (the hoist created a circular import 'cannot import name CostRecord from partially initialized module synthorg.budget.cost_record' that broke OpenAPI export, Build Web Assets, Build Backend, Lighthouse, Build Preview, and the CI Pass umbrella check); document why the imports stay lazy. - observability/events/meta.py: correct META_CHAT_DEPENDENCY_UNAVAILABLE rationale (AppState._require_service raises ServiceUnavailableError 503, not generic 500). - api/controllers/meta.py: set status_code=200 on POST /chat (query-only endpoint, no resource creation). - api/state_services_facades_mcp3.py: correct mixin docstring from five to six services (chief_of_staff_chat). - engine/strategy/context.py: route memory payload validation through synthorg.api.boundary.parse_typed under the memory.strategic_context boundary label, so failures emit API_BOUNDARY_VALIDATION_FAILED alongside the provider's own STRATEGY_CONTEXT_PROVIDER_FAILED log. - tests/unit/api/controllers/test_meta_chat.py: assert chat_mock.ask was awaited with a ChatQuery carrying question and proposal_id and alert_id, and that the snapshot is the one signals_mock returned; update status_code expectation to 200. - tests/evals/prompt/test_procedural_memory_prompt.py: AST hardening for both CompletionConfig kwargs and the propose call site (matches the pattern used in test_code_modification_prompt.py and test_supervisor_router_prompt.py); textwrap.dedent the method source before ast.parse.
test_procedural_memory_prompt.py AST hardening: search all CompletionConfig calls (not just config_calls[0]) for one carrying both temperature=config.temperature and max_tokens=config.max_tokens; restrict provider.complete matching to nodes whose receiver is provider/self.provider/self._provider so an unrelated .complete() call can't satisfy the assertion.
bb449a4 to
2710034
Compare
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Improved error logging and Prometheus instrumentation provide better system monitoring. - Eliminated race conditions in CI tagging for more reliable development releases. - Fixed critical configuration access and kill-switch bugs to enhance system stability. - Enhanced client experience with retry-after headers and better websocket reconnect behavior. ### What's new - Introduced composite indexes and cursor pagination for faster data queries. - Added server-sent events rate limiting and Ollama input sanitization for improved security. ### Under the hood - Centralized workflow error mappings to standardize error handling. - Refactored API lifecycle fallback to use a configuration snapshot for consistency. - Tightened startup settings baseline and reduced controller error baseline to zero. - Replaced flaky contributor-assistant GitHub action with a custom stable step. - Consolidated Renovate dependency groups to avoid update conflicts. - Upgraded in-toto-golang dependency to fix security vulnerabilities and dropped unnecessary CVE waivers. - Extensive lock file maintenance and multiple infrastructure and Python dependency updates. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.2](v0.8.1...v0.8.2) (2026-05-10) ### Features * close audit gaps in error logging and Prometheus instrumentation ([#1821](#1821)) ([ef00fdc](ef00fdc)) ### Bug Fixes * **ci:** eliminate dev-release tag-vs-downstream race + CI hygiene audit ([#1827](#1827)) ([b7b9a59](b7b9a59)) * **config:** close 6 settings reachability + kill-switch gaps ([#1798](#1798)) ([410cb3b](410cb3b)) * correctness / safety fixes from 2026-05-05 audit (Wave 28) ([#1823](#1823)) ([d01e624](d01e624)) ### Performance * composite indexes + cursor pagination + clock seam + SSE rate-limit + Ollama sanitization + retry-after web client + WS reconnect jitter ([#1822](#1822)) ([d1faf86](d1faf86)) ### Refactoring * **api:** move activities lifecycle-cap fallback to ApiBridgeConfig snapshot ([#1840](#1840)) ([7a56e9c](7a56e9c)) * centralise workflow error mapping and shared error codes ([#1778](#1778) sub-tasks A + E) ([#1843](#1843)) ([11132cd](11132cd)) * drive controller-error baseline to zero ([#1778](#1778) sub-task A tail) ([#1846](#1846)) ([e96ae20](e96ae20)) * slim CLAUDE.md, port pr-review-toolkit agents, sync .opencode parity ([#1833](#1833)) ([e6372b8](e6372b8)) * tighten settings → startup-trace baseline (8 → 0) ([#1847](#1847)) ([3376ee2](3376ee2)) ### Documentation * fix CLAUDE.md inaccuracies and drop drift-prone counts ([#1844](#1844)) ([371925f](371925f)) ### Tests * replace test placeholders with real subsystem wiring ([#1845](#1845)) ([ddbb666](ddbb666)) ### CI/CD * **cla:** replace flaky contributor-assistant action with custom read-path step ([#1819](#1819)) ([11aeafe](11aeafe)) * tidy dev-release notes + stagger renovate lockfile day ([#1824](#1824)) ([ec746a9](ec746a9)) ### Maintenance * cleanup roundup, sub-tasks a/c/d/g/h/j/l/m of [#1781](#1781) ([#1838](#1838)) ([099b871](099b871)) * close remaining 5 sub-tasks of [#1781](#1781) (b/e/f/i/k) ([#1852](#1852)) ([59cf0b2](59cf0b2)) * collapse Renovate dep groups into Python / Web / Infrastructure to remove cross-PR overlap ([#1813](#1813)) ([4cbd857](4cbd857)) * **deps,security:** bump in-toto-golang v0.11.0 + drop two patched CVE waivers ([#1851](#1851)) ([0b8b5bb](0b8b5bb)) * disable Renovate vulnerabilityAlerts so security flows into normal updates ([#1834](#1834)) ([6b7d15f](6b7d15f)) * Lock file maintenance ([#1820](#1820)) ([ccbad73](ccbad73)) * Lock file maintenance ([#1842](#1842)) ([13b68a5](13b68a5)) * Lock file maintenance ([#1853](#1853)) ([db6650b](db6650b)) * Update dhi.io/nats:2.14-debian13 Docker digest to eb768bf ([#1841](#1841)) ([37f84fc](37f84fc)) * Update Infrastructure dependencies ([#1815](#1815)) ([75b12fe](75b12fe)) * Update Infrastructure dependencies ([#1831](#1831)) ([3f3c50b](3f3c50b)) * Update Python dependencies ([#1817](#1817)) ([e11332f](e11332f)) * Update Python dependencies ([#1832](#1832)) ([4515c8e](4515c8e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closes #1781.
Closes the final five sub-tasks of the cleanup umbrella; PR #1838 already
landed a/c/d/g/h/j/l/m. After this merges, #1781 has no remaining work.
Sub-tasks
-b — Prompt eval coverage + supervisor router temperature pin
_ROUTING_COMPLETION_CONFIG = CompletionConfig(temperature=0.0)atmodule level on the hierarchical retrieval supervisor and pass it on
both
complete()calls (routeandevaluate_for_retry). Withoutthe pin, routing was non-deterministic across CI shards.
tests/evals/prompt/covering thesurfaces the audit flagged:
test_supervisor_router_prompt.py(temperature, fingerprints, call sites)test_health_judge_behaviour.py(8 labelled inputs graded at 100% accuracy)test_code_modification_prompt.py(config-driven temperature, fingerprint)test_procedural_memory_prompt.py(config-driven temperature, fingerprint, call site)-e — Py3.14 CI flake investigation
The audit flagged "Python 3.14 test flakes from 2026-05-05 runs". The
single failed Test (Python 3.14) job that day was a deterministic
symlink-escape edge case in a newly-added gate test, not a flake.
Documented in
docs/reference/py314-flake-investigation-2026-05.mdalong with already-merged related mitigations (#1713, #1755).
-f — Pydantic
Any->JsonValue/ concrete unions (10 sites)ModelMappingConfig.seniority_model_map->dict[str, str]and theMapping-input branch becomes dead code (Pydantic v2 already coerces
any Mapping-like input).
CoordinationMiddlewareContext.{decomposition,routing,dispatch}_result/
status_rollup-> the concreteDecompositionResult/RoutingResult/DispatchResult/SubtaskStatusRolluptypes.Surfaced a real production bug:
ProgressLedgerMiddlewarewasreading
getattr(rollup, "completed_count", 0)which alwaysreturned 0 on real rollups; switched to
rollup.completed.RollbackOperation.previous_value,ConfigChange.{old,new}_value,ArchitectureChange.payload,RuleMatch.signal_context) ->JsonValue/dict[str, JsonValue].Propagated the narrowing through the meta strategies
(
config_tuning/architecture/prompt_tuning) so theirinternal
_propose_*methods acceptdict[str, JsonValue].-i — SQL parameterize approval-status enums
Replace inline f-string interpolation of
ApprovalStatus.EXPIRED.valueand
ApprovalStatus.PENDING.valueinexpire_if_pendingUPDATEstatements with bound parameters; SQLite (
?) and Postgres (%s)backends both updated. The remaining id-list interpolation is a
fixed
?,?,?count derived fromlen(ids), retained under noqa: S608.-k — Wire placeholder stubs
meta.chat -> ChiefOfStaffChat: replace the silent placeholder
response with real DI: new
_chief_of_staff_chatslot on AppState,new
build_chief_of_staff_chat()factory inapp_builders.pythatpicks an LLM provider from the registry when
meta.chief_of_staff.chat_enabledis True, async startup task thatconstructs the backend, and a controller path that calls
ChiefOfStaffChat.ask(query, snapshot)using the liveSignalsServicesnapshot. When the backend or signals service isunavailable, the endpoint surfaces an explicit 503
ServiceUnavailableError instead of the silent fake answer with
confidence=0.0 it returned previously.
MemoryContextProvider Phase 2: refactor
StrategicContextProvider.provideto be
asyncacross the protocol and all three implementations(Config / Memory / Composite). MemoryContextProvider now accepts an
optional
MemoryBackendand queries it at provide-time forSEMANTICentries on the syntheticsystem:strategyagent taggedstrategic-context, parsing the entry content as a JSON object ofoverridable fields and layering the overrides on top of the fallback
context. Missing backend, retrieval errors, malformed JSON, and
non-string overrides all degrade gracefully to the fallback.
build_context()takes an optionalmemory_backendkwarg.Pre-PR review
Four agents (python-reviewer, security-reviewer, persistence-reviewer,
issue-resolution-verifier) ran in parallel; all five sub-tasks
verified RESOLVED. One valid SEC-1 finding addressed in this branch:
chief_of_staff/chat.pywas usinglogger.exception(...)(whichimplicitly sets
exc_info=True) on the LLM error path, which canleak provider credentials carried in exception strings; replaced
with
logger.error(..., error_type=..., error=safe_error_description(exc)).Same commit also tightened the
meta.chatcontroller'shas_*/*_serviceaccessor pairs to read into local variables once.Test plan
uv run ruff check src/ tests/— passesuv run mypy src/ tests/— passesuv run python -m pytest tests/unit/— passesuv run python -m pytest tests/conformance/persistence/ -k "approval and expire"— 18 passeduv run python -m pytest tests/evals/prompt/— all new + existing eval suites passuv run python -m pytest tests/unit/api/controllers/test_meta_chat.py— 3 new tests passuv run python -m pytest tests/unit/engine/strategy/test_context.py— 18 tests (3 prior + 15 new for Phase 2 wiring) pass