feat: deterministic recorded-LLM cassette replay at the provider chokepoint#2010
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🧰 Additional context used📓 Path-based instructions (3)src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
src/synthorg/providers/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📚 Learning: 2026-05-05T09:04:46.195ZApplied to files:
🔇 Additional comments (7)
WalkthroughThis PR implements a provider-layer cassette record/replay system: adds CassetteMode/CassetteConfig and settings, deterministic request keying (SHA‑256), cassette on-disk schema and CassetteSession with per-task FIFO lanes and atomic writes, pluggable redaction, cassette-specific errors and error reconstruction, CassetteCompletionProvider (complete/stream/capabilities) and ProviderRegistry wiring (shared session, skip inner drivers in REPLAY), observability events, a pre-tool-use pytest gating script, unit tests, and e2e acceptance tests validating byte-identical replay with zero live calls. 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/e2e/test_cassette_replay_e2e.py (1)
232-250:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStrengthen replay-stability assertions to prevent false positives.
test_second_replay_is_stablecurrently passes if both replays fail in the same way (for example, bothcompletion_summaryvalues areNone). Add per-run success/termination/artifact checks before comparing summaries.Suggested patch
summaries: list[str | None] = [] for i in range(2): ws = tmp_path / f"replay{i}" ws.mkdir() result = await _run_task( @@ provider_name=_PROVIDER, ), ws, ) + assert result.is_success is True + assert result.termination_reason == TerminationReason.COMPLETED + assert (ws / "output.txt").read_text(encoding="utf-8") == "Hello cassette" + assert result.completion_summary is not None summaries.append(result.completion_summary) assert summaries[0] == summaries[1]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/test_cassette_replay_e2e.py` around lines 232 - 250, In test_second_replay_is_stable, don't just compare completion_summary values because identical failures can mask instability; for each run (the result returned from _run_task using CassetteCompletionProvider/CassetteSession) assert the run produced a real successful completion (e.g. check result.success is True or result.termination_reason is None) and/or that required artifacts exist (e.g. result.artifacts or result.completion is not None) before appending/comparing completion_summary; only then assert summaries[0] == summaries[1] to ensure stability rather than equal failures.src/synthorg/providers/registry.py (2)
265-281:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRe-raise system exceptions and use structured error logging instead of
logger.exception().The
_build_driverhelper currently wraps all exceptions, includingMemoryErrorandRecursionError, which should propagate unmodified per project convention. Additionally,logger.exception()implicitly addsexc_info=True, violating the sanitized logging pattern.Re-raise system exceptions before the general handler, then use
logger.error()with structured attributes:Proposed fix
try: driver = factory(name, config) # type: ignore[operator] + except MemoryError, RecursionError: + raise except Exception as exc: msg = f"Failed to instantiate driver {driver_type!r} for provider {name!r}" - logger.exception( + logger.error( PROVIDER_DRIVER_FACTORY_MISSING, provider=name, driver=driver_type, + error_type=type(exc).__name__, + error=safe_error_description(exc), ) raise DriverFactoryNotFoundError( msg, context={ "provider": name,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/providers/registry.py` around lines 265 - 281, In _build_driver, do not catch and wrap system exceptions: detect and re-raise BaseException-derived critical errors (e.g., MemoryError, RecursionError, KeyboardInterrupt, SystemExit) before the general except block so they propagate unchanged; in the remaining exception handler, replace logger.exception(...) with logger.error(...) and pass the structured attributes (provider=name, driver=driver_type) and the safe_error_description(exc) in the error context/message, then raise DriverFactoryNotFoundError with the same context payload (provider, driver, detail) using the caught exception as the cause; refer to the variables and symbols factory, name, config, driver_type, safe_error_description, and DriverFactoryNotFoundError to locate and update the code.
166-183:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove concrete driver imports into the non-cassette path to avoid optional SDK failures in replay mode.
In replay mode (
cassette.is_active = True),_build_cassette_registryreceivesinner=Noneand never invokes the driver factory. However,LiteLLMDriverandScriptedDriverare imported unconditionally at lines 166–169, before the cassette check at line 177. Sincelitellm_driver.pyimportslitellm(an optional dependency), a replay run without the litellm SDK installed will fail on import even though the replay path doesn't need any driver factory. This breaks the stated pure-replay contract: "In replay mode the inner driver is not built at all: no factory is called."Defer the imports and defaults dict to the non-cassette path (after the cassette early return).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/providers/registry.py` around lines 166 - 183, The unconditional imports of LiteLLMDriver and ScriptedDriver (and construction of the defaults dict) happen before the cassette early-return, causing import-time failures in replay mode; move those imports and the defaults: dict creation so they occur after the if cassette is not None and cassette.is_active return (i.e., only when not taking the _build_cassette_registry path). Concretely, relocate the from .drivers.litellm_driver import LiteLLMDriver and from .drivers.scripted import ScriptedDriver lines and the defaults assignment so they are executed after the cassette check (so _build_cassette_registry is returned without importing drivers) while leaving factory_overrides and cassette.is_active logic, and references to defaults, unchanged for the non-cassette path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/check_no_unapproved_e2e_tests.sh`:
- Around line 87-89: The ALLOW_E2E_TESTS approval is currently checked against
the entire flattened command ($FLAT) which can allow a token from a different
segment to unblock pytest; update the logic so that when you detect the pytest
invocation segment (the code that finds the pytest command/segment), you instead
search that specific segment string for the token ALLOW_E2E_TESTS=1 (e.g. split
FLAT into segments using the same delimiter logic used to find pytest, capture
the matched segment variable, then run grep -qE
'(^|[[:space:]])ALLOW_E2E_TESTS=1([[:space:]]|$)' against that segment rather
than $FLAT) so approval is bound to the exact pytest segment.
In `@src/synthorg/providers/cassette/provider.py`:
- Around line 217-223: The current stream() wrapper buffers the entire live
stream before yielding; change it to forward chunks as they arrive while
simultaneously recording them: in stream() iterate the inner async generator,
for each yielded chunk immediately yield it to the caller and append a
serialized chunk entry to the cassette recording list, and on normal completion
append a terminal "end" marker. Also catch any ProviderError (or other
exceptions) raised during iteration, append a cassette "error" entry containing
the error type/message/traceback so replays can re-raise after emitting prior
chunks, then re-raise the error; update the replay path to recognize the "error"
cassette entry and raise the recorded error after streaming prior chunks. Apply
these changes to the streaming wrapper(s) named stream and the recording logic
referenced in the same file (also the block around lines 255-288) so recording
is incremental and terminal errors are faithfully represented.
In `@src/synthorg/providers/cassette/store.py`:
- Around line 3-7: Edit the module docstring that begins "A cassette is a single
canonical JSON document..." and remove the issue back-reference "`#1984`" while
preserving the explanatory rationale about filesystem-only design; ensure the
sentence is rewired to read smoothly without the issue tag (keep wording about
being filesystem-only and why a DB table would be undesirable) and do not add
any reviewer or issue citations.
- Around line 70-80: Add the original ProviderError.context into the cassette
schema and plumbing so replayed exceptions retain their payload: extend the
CassetteRecordedError model to include a context: dict[str, Any] =
Field(default_factory=dict, description="Recorded ProviderError.context")
(keeping frozen/extra rules), update CassetteOutcome.from_error(...) to capture
and store error.context when constructing the recorded error, modify the
recording sites in the cassette provider (the functions that create
CassetteOutcome instances) to pass through the provider error context, and
update provider_error_for(...) to rehydrate the reconstructed ProviderError with
the recorded context so exceptions raised on replay include the original
context.
---
Outside diff comments:
In `@src/synthorg/providers/registry.py`:
- Around line 265-281: In _build_driver, do not catch and wrap system
exceptions: detect and re-raise BaseException-derived critical errors (e.g.,
MemoryError, RecursionError, KeyboardInterrupt, SystemExit) before the general
except block so they propagate unchanged; in the remaining exception handler,
replace logger.exception(...) with logger.error(...) and pass the structured
attributes (provider=name, driver=driver_type) and the
safe_error_description(exc) in the error context/message, then raise
DriverFactoryNotFoundError with the same context payload (provider, driver,
detail) using the caught exception as the cause; refer to the variables and
symbols factory, name, config, driver_type, safe_error_description, and
DriverFactoryNotFoundError to locate and update the code.
- Around line 166-183: The unconditional imports of LiteLLMDriver and
ScriptedDriver (and construction of the defaults dict) happen before the
cassette early-return, causing import-time failures in replay mode; move those
imports and the defaults: dict creation so they occur after the if cassette is
not None and cassette.is_active return (i.e., only when not taking the
_build_cassette_registry path). Concretely, relocate the from
.drivers.litellm_driver import LiteLLMDriver and from .drivers.scripted import
ScriptedDriver lines and the defaults assignment so they are executed after the
cassette check (so _build_cassette_registry is returned without importing
drivers) while leaving factory_overrides and cassette.is_active logic, and
references to defaults, unchanged for the non-cassette path.
In `@tests/e2e/test_cassette_replay_e2e.py`:
- Around line 232-250: In test_second_replay_is_stable, don't just compare
completion_summary values because identical failures can mask instability; for
each run (the result returned from _run_task using
CassetteCompletionProvider/CassetteSession) assert the run produced a real
successful completion (e.g. check result.success is True or
result.termination_reason is None) and/or that required artifacts exist (e.g.
result.artifacts or result.completion is not None) before appending/comparing
completion_summary; only then assert summaries[0] == summaries[1] to ensure
stability rather than equal failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0df0b595-e47f-428c-befb-1058138b26c9
📒 Files selected for processing (23)
.claude/settings.json.claude/skills/pre-pr-review/SKILL.mddocs/design/providers.mdscripts/check_no_unapproved_e2e_tests.shsrc/synthorg/api/auto_wire.pysrc/synthorg/observability/events/provider.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/providers/cassette/keying.pysrc/synthorg/providers/cassette/mode.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.pysrc/synthorg/providers/registry.pysrc/synthorg/settings/definitions/providers.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/__init__.pytests/unit/providers/cassette/test_keying.pytests/unit/providers/cassette/test_provider.pytests/unit/providers/cassette/test_redaction.pytests/unit/providers/cassette/test_registry_wiring.pytests/unit/providers/cassette/test_store.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Dashboard Test
- GitHub Check: Test Integration
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (9)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py
Files:
src/synthorg/observability/events/provider.pysrc/synthorg/providers/cassette/mode.pysrc/synthorg/settings/definitions/providers.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/api/auto_wire.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/__init__.pysrc/synthorg/providers/cassette/keying.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/store.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use@computed_fieldfor derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: includeclock: Clock | None = Noneparameter; tests inject FakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...
Files:
src/synthorg/observability/events/provider.pysrc/synthorg/providers/cassette/mode.pysrc/synthorg/settings/definitions/providers.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/api/auto_wire.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/__init__.pysrc/synthorg/providers/cassette/keying.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/store.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/provider.pysrc/synthorg/providers/cassette/mode.pysrc/synthorg/settings/definitions/providers.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/api/auto_wire.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/__init__.pysrc/synthorg/providers/cassette/keying.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/store.py
src/synthorg/observability/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md
Files:
src/synthorg/observability/events/provider.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/providers/cassette/__init__.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_keying.pytests/unit/providers/cassette/test_registry_wiring.pytests/unit/providers/cassette/test_redaction.pytests/unit/providers/cassette/test_store.pytests/unit/providers/cassette/test_provider.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/providers/cassette/__init__.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_keying.pytests/unit/providers/cassette/test_registry_wiring.pytests/unit/providers/cassette/test_redaction.pytests/unit/providers/cassette/test_store.pytests/unit/providers/cassette/test_provider.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)
Files:
src/synthorg/providers/cassette/mode.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/__init__.pysrc/synthorg/providers/cassette/keying.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/store.py
{README.md,docs/**/*.md,data/**/*.md}
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics in README and public docs must be sourced from data/runtime_stats.yaml via markers
Files:
docs/design/providers.md
docs/**/*.{md,d2}
📄 CodeRabbit inference engine (CLAUDE.md)
Use d2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI
Files:
docs/design/providers.md
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup lifecycle: construction phase wires synchronous services; on_startup wires services needing connected persistence. Construction-phase invariants: agent_registry before auto_wire_meetings; tunnel_provider wired unconditionally. On-startup invariants: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService wires after persistence.connect() via _wire_ontology_service
Files:
src/synthorg/api/auto_wire.py
src/synthorg/{workers,api}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returning RuntimeServices (AgentEngineExecutionService + coordinator or NoProviderExecutionService + None). _install_runtime_services appends FIRST after persistence/SettingsService hooks; swap_worker_execution_service / swap_coordinator / swap_provider_registry hold a lock. Empty-company rejects task creation (AgentRuntimeNotConfiguredError 4014) and /coordinate 503s
Files:
src/synthorg/api/auto_wire.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/observability/events/provider.pytests/unit/providers/cassette/__init__.pysrc/synthorg/providers/cassette/mode.pysrc/synthorg/settings/definitions/providers.pysrc/synthorg/providers/__init__.pysrc/synthorg/providers/cassette/errors.pysrc/synthorg/api/auto_wire.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/redaction.pytests/e2e/test_cassette_replay_e2e.pysrc/synthorg/providers/cassette/__init__.pytests/unit/providers/cassette/test_keying.pysrc/synthorg/providers/cassette/keying.pytests/unit/providers/cassette/test_registry_wiring.pytests/unit/providers/cassette/test_redaction.pytests/unit/providers/cassette/test_store.pysrc/synthorg/providers/cassette/provider.pytests/unit/providers/cassette/test_provider.pysrc/synthorg/providers/cassette/store.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.
Applied to files:
docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).
Applied to files:
docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.
Applied to files:
docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.
Applied to files:
docs/design/providers.md
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.
Applied to files:
src/synthorg/settings/definitions/providers.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.
Applied to files:
src/synthorg/settings/definitions/providers.py
🪛 LanguageTool
docs/design/providers.md
[grammar] ~107-~107: Ensure spelling is correct
Context: ...This is stable across record and replay iff the first-call order of distinct tasks ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 OpenGrep (1.21.0)
tests/unit/providers/cassette/test_redaction.py
[WARNING] 37-37: Hardcoded AWS access key detected. Use environment variables or a secrets manager instead.
(coderabbit.secrets.aws-access-key)
[WARNING] 38-38: Hardcoded AWS access key detected. Use environment variables or a secrets manager instead.
(coderabbit.secrets.aws-access-key)
🔇 Additional comments (11)
tests/unit/providers/cassette/__init__.py (1)
1-1: LGTM!tests/unit/providers/cassette/test_keying.py (1)
1-296: LGTM!tests/unit/providers/cassette/test_provider.py (1)
1-355: LGTM!tests/unit/providers/cassette/test_redaction.py (1)
1-105: LGTM!tests/unit/providers/cassette/test_registry_wiring.py (1)
1-162: LGTM!tests/unit/providers/cassette/test_store.py (1)
1-345: LGTM!.claude/settings.json (1)
76-80: LGTM!docs/design/providers.md (1)
101-112: LGTM!src/synthorg/providers/__init__.py (1)
9-22: LGTM!Also applies to: 90-90, 101-109, 132-133
src/synthorg/providers/cassette/__init__.py (1)
1-60: LGTM!src/synthorg/api/auto_wire.py (1)
56-56: LGTM!Also applies to: 202-241
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a deterministic record/replay system for LLM provider interactions, enabling agent runs to be re-executed byte-identically without incurring real LLM costs. The solution establishes a provider-layer seam that manages request keying, FIFO lane management for concurrent tasks, and atomic persistence. Additionally, it hardens the test infrastructure by introducing a new pre-tool-use hook to enforce sanctioned test execution patterns. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2010 +/- ##
==========================================
+ Coverage 85.03% 85.05% +0.02%
==========================================
Files 1888 1895 +7
Lines 111930 112390 +460
Branches 9549 9591 +42
==========================================
+ Hits 95181 95598 +417
- Misses 14421 14453 +32
- Partials 2328 2339 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request implements a deterministic record/replay 'cassette' seam for LLM providers, facilitating byte-identical re-execution without real LLM calls. The implementation covers provider wrapping, SHA-256 request keying, atomic storage with FIFO lanes, and secret redaction, alongside a new pre-PR hook to restrict unapproved e2e test execution. Feedback focuses on enhancing the redaction regex to support secrets containing spaces, optimizing sequence number lookups in the session store to improve performance, and offloading document serialization to a worker thread to maintain event loop responsiveness.
| r"(?i)\b(api[_-]?key|secret|password|token|authorization)\b" | ||
| r"(['\"]?\s*[:=]\s*['\"]?)" | ||
| r"((?:\\.|[^'\"\s,}\\])+)", | ||
| ), |
There was a problem hiding this comment.
The labelled secret redaction regex stops at the first whitespace character, which causes incomplete redaction for values containing spaces (e.g., in JSON payloads like "password": "my secret"). Removing whitespace from the exclusion list in the value group will improve redaction coverage while still stopping at common separators like commas, braces, or quotes.
r"((?:\\.|[^'",}\\])+)"References
- Security: secret-log redaction is a priority focus area. (link)
- Ensure that URLs containing potential credentials are redacted in all output channels, including tool execution results and log messages, to prevent sensitive data leakage.
| seq = sum( | ||
| 1 | ||
| for i in self._recorded | ||
| if i.request_hash == request_hash and i.lane == lane | ||
| ) |
There was a problem hiding this comment.
| lock = asyncio.Lock() | ||
| self._persist_lock = lock | ||
| async with lock: | ||
| payload = self._serialise() |
There was a problem hiding this comment.
Serializing the entire cassette document on the event loop can block the loop for significant periods as the number of interactions grows. Consider offloading the _serialise call (which involves Pydantic model_dump and json.dumps) to a worker thread by snapshotting the interactions list first to ensure thread-safety.
References
- Real concurrency hazards: blocking calls inside async should be avoided. (link)
- scripts/check_no_unapproved_e2e_tests.sh: bind ALLOW_E2E_TESTS=1 to the detected pytest segment (was checked against whole flattened command) - cassette/provider.py: stream() forwards chunks incrementally while recording; terminal ProviderError recorded with prior chunks and re-raised after replay re-emits them - cassette/redaction.py: redact secret-named dict keys before recursing; widen labelled-secret value group so multi-word secrets are fully scrubbed - cassette/store.py + errors.py + provider.py: record (redacted) ProviderError.context and rehydrate it on replay - cassette/store.py: O(1) per-(hash,lane) seq counter; offload serialise+write to worker thread; drop issue back-ref from module docstring - providers/registry.py: re-raise MemoryError/RecursionError; logger.error with structured error_type/error; defer driver-SDK imports so pure replay needs no optional litellm - tests/e2e/test_cassette_replay_e2e.py: assert per-run success/termination/artifact before comparing replay summaries; drop issue back-refs
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/providers/cassette/store.py`:
- Around line 291-349: flush() currently bypasses the serialized write path and
can race with record_interaction(); change flush to use the same async locking
and offloaded snapshot write as record_interaction: make flush an async function
(or provide an async variant) that ensures _persist_lock exists (create
asyncio.Lock() if None), acquires it (async with self._persist_lock), takes
snapshot = tuple(self._recorded) and then awaits
asyncio.to_thread(self._persist_snapshot, snapshot) instead of calling
_persist()/_atomic_write() directly so all writes are serialized and run off the
event loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: c08c76e1-126f-4f0a-aafb-7f08e4d28394
📒 Files selected for processing (7)
scripts/check_no_unapproved_e2e_tests.shsrc/synthorg/providers/cassette/errors.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.pysrc/synthorg/providers/registry.pytests/e2e/test_cassette_replay_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Dashboard Test
- GitHub Check: Test Integration
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py
Files:
src/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use@computed_fieldfor derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: includeclock: Clock | None = Noneparameter; tests inject FakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...
Files:
src/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)
Files:
src/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/e2e/test_cassette_replay_e2e.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/e2e/test_cassette_replay_e2e.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/providers/cassette/errors.pysrc/synthorg/providers/registry.pysrc/synthorg/providers/cassette/provider.pysrc/synthorg/providers/cassette/redaction.pysrc/synthorg/providers/cassette/store.pytests/e2e/test_cassette_replay_e2e.py
🔇 Additional comments (1)
src/synthorg/providers/cassette/provider.py (1)
424-435: ⚡ Quick winNo refactor needed; sequential execution is required by cassette design.
The override cannot safely use
asyncio.TaskGroupbecause:
Error propagation:
CassetteReplayMissErrorandCassetteReplayExhaustedErrorareExceptionsubclasses that would be caught by the base class'sexcept Exceptionhandler in its_one()helper and converted toNone, defeating the override's explicit purpose to propagate them loudly.Replay lane semantics: The cassette session assigns lanes per asyncio task. Each task in a
TaskGroupwould receive a new lane, changing the FIFO replay order from what was recorded.The sequential dict comprehension is the correct implementation for this provider.
- cassette/store.py: flush() is now async and routes through the same _persist_lock + offloaded snapshot write as record_interaction, so an end-of-run flush racing an in-flight record cannot hit os.replace concurrently (Windows WinError 5) and never blocks the event loop on serialise+I/O; removed the now-dead sync _persist() helper - updated all flush() call sites (cassette unit + e2e tests) to await; kept TestAtomicFlush sync (ruff ASYNC240) by driving record+flush via asyncio.run
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/providers/cassette/store.py`:
- Around line 406-409: The current logic lets a pluggable _redactor.redact
replace the whole ProviderError.context mapping with a scalar, producing
{"value": ...} and breaking callers that expect the original keys; change the
code so we preserve the original mapping shape: if
_redactor.redact(dict(error.context)) returns a dict use it as-is, otherwise
perform per-entry redaction by iterating over dict(error.context).items() and
applying _redactor.redact to each value to build ctx with the same keys as
ProviderError.context (referencing _redactor.redact, ProviderError.context,
redacted, and ctx).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2452013d-9b9f-4f4e-a510-ace4a99374c5
📒 Files selected for processing (5)
src/synthorg/providers/cassette/store.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_provider.pytests/unit/providers/cassette/test_registry_wiring.pytests/unit/providers/cassette/test_store.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Test Integration
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/providers/cassette/test_registry_wiring.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_provider.pytests/unit/providers/cassette/test_store.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/providers/cassette/test_registry_wiring.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_provider.pytests/unit/providers/cassette/test_store.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py
Files:
src/synthorg/providers/cassette/store.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use@computed_fieldfor derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: includeclock: Clock | None = Noneparameter; tests inject FakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...
Files:
src/synthorg/providers/cassette/store.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/providers/cassette/store.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)
Files:
src/synthorg/providers/cassette/store.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/providers/cassette/test_registry_wiring.pytests/e2e/test_cassette_replay_e2e.pytests/unit/providers/cassette/test_provider.pytests/unit/providers/cassette/test_store.pysrc/synthorg/providers/cassette/store.py
- cassette/store.py: _redact_outcome_error preserves ProviderError.context key shape: if a pluggable redactor collapses the whole-mapping redact to a scalar, fall back to per-entry redaction keyed by the original context keys instead of the opaque {"value": ...} wrapper, so replayed exceptions stay faithful for callers that branch on exc.context[...]
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Multi-agent coordination is now active immediately on startup for smoother operation. - Governance rules are fully enforced during use, ensuring compliance at all times. - Coordination metrics update live, giving real-time insights into system activity. - Review agents are now reliably processed, preventing silent drops in tasks. - Sandbox containers can be reused for agents and tasks, speeding up execution and reducing overhead. ### What's new - Agents support online runtime with a minimal safety framework to improve stability. - Recorded LLM interactions can be deterministically replayed at the provider interface. - Distributed path validation has been enhanced for more robust data routing. - A client-simulation runtime was added for end-to-end testing of the IntakeEngine. - A new work pipeline spine architecture has been introduced to streamline task processing. ### Under the hood - Infrastructure, Python, and web dependencies have all been updated to latest versions. - Updated apko lockfiles in the CI/CD pipeline improve build consistency. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.6](v0.8.5...v0.8.6) (2026-05-19) ### Features * agent runtime online + minimal safety spine (runtime root) ([#2003](#2003)) ([e5eef1a](e5eef1a)), closes [#1956](#1956) * deterministic recorded-LLM cassette replay at the provider chokepoint ([#2010](#2010)) ([cabf55d](cabf55d)) * distributed path validation + hardening ([#2011](#2011)) ([a382e4a](a382e4a)), closes [#1966](#1966) * wire IntakeEngine via boot client-simulation runtime (e2e test harness) ([#2006](#2006)) ([6a9c0aa](6a9c0aa)), closes [#1961](#1961) * work pipeline spine ([#1960](#1960)) ([#2013](#2013)) ([29b64e3](29b64e3)) ### Bug Fixes * bring the multi-agent coordinator online at boot ([#2007](#2007)) ([180b38a](180b38a)), closes [#1958](#1958) * full governance enforcement online ([#1957](#1957)) ([#2005](#2005)) ([4140fc5](4140fc5)) * harden anti-ghost-wiring gate and fix silently-dropped review agents ([#2000](#2000)) ([89b57ce](89b57ce)) * make coordination metrics live ([#1959](#1959)) ([#2012](#2012)) ([c4775e2](c4775e2)) * sandbox lifecycle dispatch (per-agent / per-task container reuse) ([#2008](#2008)) ([03d2587](03d2587)), closes [#1965](#1965) ### Documentation * add GitButler concept-only concurrency research ([#1978](#1978)) ([#2009](#2009)) ([9e4f5c1](9e4f5c1)) * honest-hybrid refresh of README, site, and design specs ([#2001](#2001)) ([f485bea](f485bea)) ### CI/CD * update apko lockfiles ([#2004](#2004)) ([e2b9eee](e2b9eee)) ### Maintenance * Update Infrastructure dependencies ([#2014](#2014)) ([0b16bdf](0b16bdf)) * Update Python dependencies ([#2015](#2015)) ([a7224bb](a7224bb)) * Update Web dependencies ([#2016](#2016)) ([7a7fe76](7a7fe76)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closes #1984
Summary
Deterministic record/replay of provider responses at the
BaseCompletionProviderchokepoint, so a company run can be re-executed byte-identically with zero real LLM calls.Cassette feature (
src/synthorg/providers/cassette/)CassetteCompletionProviderwraps an inner driver, overriding the publiccomplete/stream/get_model_capabilities/batch_get_capabilities(not the_do_*hooks, so recordedprovider_metadatais preserved verbatim). Replay never constructs a real driver.ProviderRegistry.from_config(..., cassette=...)is the single decoration chokepoint; one sharedCassetteSession.offis a structural no-op.compute_content_hash; per-task FIFO lanes for concurrent-fanout determinism; loudCassetteReplayMiss/Exhausted/Formaterrors (never silent fallthrough).cassette_format_versiongate.request_repris redacted (pluggableCassetteRedactor, defaultPatternRedactor); the outcome is stored verbatim by design (documented trade-off).providers.cassette_mode/cassette_path), wired inapi/auto_wire.py.Benchmark wiring is scoped to child #1980 per
docs/design/providers.md; this PR delivers the seam, validated under the live engine harness.Pre-PR review fixes (reduced 5-agent roster; polish pass skipped per instruction)
Triage record:
_audit/pre-pr-review/triage.md. 7 valid findings addressed:take()miss/exhaustion now logPROVIDER_CASSETTE_MISS/_EXHAUSTED(WARNING) before raising (previously-unused event constant now used).registry.pyfactory-error context:str(exc)→safe_error_description(exc)(SEC-1; pre-existing, in a touched file)._load()format errors now logPROVIDER_CASSETTE_FORMAT_ERRORbefore raising.redaction.pydocstring.asyncio.to_thread, serialised under a lazily-created lock (caught + fixed a Windowsos.replacerace the original sync code avoided).Unrelated infra on this branch (not cassette)
scripts/check_no_unapproved_e2e_tests.sh— new PreToolUse(Bash) hook blocking unapproved e2e/integration/whole-tree pytest runs; allows-m unit,tests/unitpaths, single node ids, benchmarks, or an explicitALLOW_E2E_TESTS=1prefix. Wired in.claude/settings.json..claude/skills/pre-pr-review/SKILL.mdPhase 2 corrected frompytest tests/ -n 8(whole tree) to-m unit.Test plan
Review coverage
Reduced 5-agent roster: python-reviewer, code-reviewer, security-reviewer, async-concurrency-reviewer, issue-resolution-verifier. 13 findings triaged (6 fixed + 1 minor fixed, 2 surfaced/by-design, 4 classified invalid with rationale). Code-simplifier polish pass skipped per user instruction.