Skip to content

feat: deterministic recorded-LLM cassette replay at the provider chokepoint#2010

Merged
Aureliolo merged 6 commits into
mainfrom
test/llm-cassette-replay
May 19, 2026
Merged

feat: deterministic recorded-LLM cassette replay at the provider chokepoint#2010
Aureliolo merged 6 commits into
mainfrom
test/llm-cassette-replay

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Closes #1984

Summary

Deterministic record/replay of provider responses at the BaseCompletionProvider chokepoint, so a company run can be re-executed byte-identically with zero real LLM calls.

Cassette feature (src/synthorg/providers/cassette/)

  • CassetteCompletionProvider wraps an inner driver, overriding the public complete/stream/get_model_capabilities/batch_get_capabilities (not the _do_* hooks, so recorded provider_metadata is preserved verbatim). Replay never constructs a real driver.
  • ProviderRegistry.from_config(..., cassette=...) is the single decoration chokepoint; one shared CassetteSession. off is a structural no-op.
  • SHA-256 request keying via compute_content_hash; per-task FIFO lanes for concurrent-fanout determinism; loud CassetteReplayMiss/Exhausted/Format errors (never silent fallthrough).
  • Atomic single-JSON-document persistence; cassette_format_version gate.
  • SEC-1 redaction boundary: replay key hashed on the raw request; only the human-readable request_repr is redacted (pluggable CassetteRedactor, default PatternRedactor); the outcome is stored verbatim by design (documented trade-off).
  • Cat-2 bootstrap-resolved config (providers.cassette_mode / cassette_path), wired in api/auto_wire.py.

Benchmark wiring is scoped to child #1980 per docs/design/providers.md; this PR delivers the seam, validated under the live engine harness.

Pre-PR review fixes (reduced 5-agent roster; polish pass skipped per instruction)

Triage record: _audit/pre-pr-review/triage.md. 7 valid findings addressed:

  1. take() miss/exhaustion now log PROVIDER_CASSETTE_MISS/_EXHAUSTED (WARNING) before raising (previously-unused event constant now used).
  2. registry.py factory-error context: str(exc)safe_error_description(exc) (SEC-1; pre-existing, in a touched file).
  3. _load() format errors now log PROVIDER_CASSETTE_FORMAT_ERROR before raising.
  4. Removed real vendor name from redaction.py docstring.
  5. PEM redaction regex made non-backtracking (ReDoS hardening).
  6. Per-interaction persist offloaded off the event loop via asyncio.to_thread, serialised under a lazily-created lock (caught + fixed a Windows os.replace race the original sync code avoided).
  7. Labelled-secret regex now handles escaped quotes.

Unrelated infra on this branch (not cassette)

  • scripts/check_no_unapproved_e2e_tests.sh — new PreToolUse(Bash) hook blocking unapproved e2e/integration/whole-tree pytest runs; allows -m unit, tests/unit paths, single node ids, benchmarks, or an explicit ALLOW_E2E_TESTS=1 prefix. Wired in .claude/settings.json.
  • .claude/skills/pre-pr-review/SKILL.md Phase 2 corrected from pytest tests/ -n 8 (whole tree) to -m unit.

Test plan

  • 54 cassette unit tests + 2 e2e byte-identical-replay tests green (incl. concurrent-fanout determinism).
  • ruff + mypy clean; all pre-commit and pre-push convention gates pass.
  • New hook unit-verified across 23 command cases (git-commit/echo/grep false-positives confirmed not flagged).

Review coverage

Reduced 5-agent roster: python-reviewer, code-reviewer, security-reviewer, async-concurrency-reviewer, issue-resolution-verifier. 13 findings triaged (6 fixed + 1 minor fixed, 2 surfaced/by-design, 4 classified invalid with rationale). Code-simplifier polish pass skipped per user instruction.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ab3584e-fb25-4e0f-b65c-f0767a151047

📥 Commits

Reviewing files that changed from the base of the PR and between 0c72c7e and a80ce97.

📒 Files selected for processing (1)
  • src/synthorg/providers/cassette/store.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Build Backend
  • GitHub Check: Test E2E
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Test Unit
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

  • src/synthorg/providers/cassette/store.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

  • src/synthorg/providers/cassette/store.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/providers/cassette/store.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)

Files:

  • src/synthorg/providers/cassette/store.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/providers/cassette/store.py
🔇 Additional comments (7)
src/synthorg/providers/cassette/store.py (7)

1-56: LGTM!


58-84: LGTM!


87-188: LGTM!


191-213: LGTM!


216-289: LGTM!


291-458: LGTM!


460-589: LGTM!


Walkthrough

This PR implements a provider-layer cassette record/replay system: adds CassetteMode/CassetteConfig and settings, deterministic request keying (SHA‑256), cassette on-disk schema and CassetteSession with per-task FIFO lanes and atomic writes, pluggable redaction, cassette-specific errors and error reconstruction, CassetteCompletionProvider (complete/stream/capabilities) and ProviderRegistry wiring (shared session, skip inner drivers in REPLAY), observability events, a pre-tool-use pytest gating script, unit tests, and e2e acceptance tests validating byte-identical replay with zero live calls.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main feature: deterministic recorded-LLM cassette replay at the provider chokepoint. It accurately reflects the primary objective of the changeset.
Description check ✅ Passed The description is directly related to the changeset, explaining the cassette feature implementation, architecture, and related fixes. It provides meaningful context about the changes.
Linked Issues check ✅ Passed The PR addresses all key objectives from issue #1984: records provider responses into a cassette keyed by request [#1984], provides replay mode for byte-identical deterministic re-execution [#1984], integrates at the BaseCompletionProvider chokepoint [#1984], implements SHA-256 keying and per-task FIFO lanes for concurrent determinism [#1984], and validates via tests showing byte-identical replay with zero real LLM calls [#1984].
Out of Scope Changes check ✅ Passed The PR introduces the cassette feature (in-scope per #1984) and includes infrastructure changes: a bash hook script for e2e test gating and corrections to pre-PR-review instructions. The hook script is pre-tool-use infra (gating unapproved e2e tests); the SKILL.md correction aligns testing to cassette requirements. These are supporting infra, not out-of-scope.
Docstring Coverage ✅ Passed Docstring coverage is 43.70% which is sufficient. The required threshold is 40.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 19, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing test/llm-cassette-replay (a80ce97) with main (03d2587)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/e2e/test_cassette_replay_e2e.py (1)

232-250: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Strengthen replay-stability assertions to prevent false positives.

test_second_replay_is_stable currently passes if both replays fail in the same way (for example, both completion_summary values are None). Add per-run success/termination/artifact checks before comparing summaries.

Suggested patch
         summaries: list[str | None] = []
         for i in range(2):
             ws = tmp_path / f"replay{i}"
             ws.mkdir()
             result = await _run_task(
@@
                     provider_name=_PROVIDER,
                 ),
                 ws,
             )
+            assert result.is_success is True
+            assert result.termination_reason == TerminationReason.COMPLETED
+            assert (ws / "output.txt").read_text(encoding="utf-8") == "Hello cassette"
+            assert result.completion_summary is not None
             summaries.append(result.completion_summary)
         assert summaries[0] == summaries[1]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/test_cassette_replay_e2e.py` around lines 232 - 250, In
test_second_replay_is_stable, don't just compare completion_summary values
because identical failures can mask instability; for each run (the result
returned from _run_task using CassetteCompletionProvider/CassetteSession) assert
the run produced a real successful completion (e.g. check result.success is True
or result.termination_reason is None) and/or that required artifacts exist (e.g.
result.artifacts or result.completion is not None) before appending/comparing
completion_summary; only then assert summaries[0] == summaries[1] to ensure
stability rather than equal failures.
src/synthorg/providers/registry.py (2)

265-281: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Re-raise system exceptions and use structured error logging instead of logger.exception().

The _build_driver helper currently wraps all exceptions, including MemoryError and RecursionError, which should propagate unmodified per project convention. Additionally, logger.exception() implicitly adds exc_info=True, violating the sanitized logging pattern.

Re-raise system exceptions before the general handler, then use logger.error() with structured attributes:

Proposed fix
     try:
         driver = factory(name, config)  # type: ignore[operator]
+    except MemoryError, RecursionError:
+        raise
     except Exception as exc:
         msg = f"Failed to instantiate driver {driver_type!r} for provider {name!r}"
-        logger.exception(
+        logger.error(
             PROVIDER_DRIVER_FACTORY_MISSING,
             provider=name,
             driver=driver_type,
+            error_type=type(exc).__name__,
+            error=safe_error_description(exc),
         )
         raise DriverFactoryNotFoundError(
             msg,
             context={
                 "provider": name,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/providers/registry.py` around lines 265 - 281, In _build_driver,
do not catch and wrap system exceptions: detect and re-raise
BaseException-derived critical errors (e.g., MemoryError, RecursionError,
KeyboardInterrupt, SystemExit) before the general except block so they propagate
unchanged; in the remaining exception handler, replace logger.exception(...)
with logger.error(...) and pass the structured attributes (provider=name,
driver=driver_type) and the safe_error_description(exc) in the error
context/message, then raise DriverFactoryNotFoundError with the same context
payload (provider, driver, detail) using the caught exception as the cause;
refer to the variables and symbols factory, name, config, driver_type,
safe_error_description, and DriverFactoryNotFoundError to locate and update the
code.

166-183: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move concrete driver imports into the non-cassette path to avoid optional SDK failures in replay mode.

In replay mode (cassette.is_active = True), _build_cassette_registry receives inner=None and never invokes the driver factory. However, LiteLLMDriver and ScriptedDriver are imported unconditionally at lines 166–169, before the cassette check at line 177. Since litellm_driver.py imports litellm (an optional dependency), a replay run without the litellm SDK installed will fail on import even though the replay path doesn't need any driver factory. This breaks the stated pure-replay contract: "In replay mode the inner driver is not built at all: no factory is called."

Defer the imports and defaults dict to the non-cassette path (after the cassette early return).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/providers/registry.py` around lines 166 - 183, The unconditional
imports of LiteLLMDriver and ScriptedDriver (and construction of the defaults
dict) happen before the cassette early-return, causing import-time failures in
replay mode; move those imports and the defaults: dict creation so they occur
after the if cassette is not None and cassette.is_active return (i.e., only when
not taking the _build_cassette_registry path). Concretely, relocate the from
.drivers.litellm_driver import LiteLLMDriver and from .drivers.scripted import
ScriptedDriver lines and the defaults assignment so they are executed after the
cassette check (so _build_cassette_registry is returned without importing
drivers) while leaving factory_overrides and cassette.is_active logic, and
references to defaults, unchanged for the non-cassette path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/check_no_unapproved_e2e_tests.sh`:
- Around line 87-89: The ALLOW_E2E_TESTS approval is currently checked against
the entire flattened command ($FLAT) which can allow a token from a different
segment to unblock pytest; update the logic so that when you detect the pytest
invocation segment (the code that finds the pytest command/segment), you instead
search that specific segment string for the token ALLOW_E2E_TESTS=1 (e.g. split
FLAT into segments using the same delimiter logic used to find pytest, capture
the matched segment variable, then run grep -qE
'(^|[[:space:]])ALLOW_E2E_TESTS=1([[:space:]]|$)' against that segment rather
than $FLAT) so approval is bound to the exact pytest segment.

In `@src/synthorg/providers/cassette/provider.py`:
- Around line 217-223: The current stream() wrapper buffers the entire live
stream before yielding; change it to forward chunks as they arrive while
simultaneously recording them: in stream() iterate the inner async generator,
for each yielded chunk immediately yield it to the caller and append a
serialized chunk entry to the cassette recording list, and on normal completion
append a terminal "end" marker. Also catch any ProviderError (or other
exceptions) raised during iteration, append a cassette "error" entry containing
the error type/message/traceback so replays can re-raise after emitting prior
chunks, then re-raise the error; update the replay path to recognize the "error"
cassette entry and raise the recorded error after streaming prior chunks. Apply
these changes to the streaming wrapper(s) named stream and the recording logic
referenced in the same file (also the block around lines 255-288) so recording
is incremental and terminal errors are faithfully represented.

In `@src/synthorg/providers/cassette/store.py`:
- Around line 3-7: Edit the module docstring that begins "A cassette is a single
canonical JSON document..." and remove the issue back-reference "`#1984`" while
preserving the explanatory rationale about filesystem-only design; ensure the
sentence is rewired to read smoothly without the issue tag (keep wording about
being filesystem-only and why a DB table would be undesirable) and do not add
any reviewer or issue citations.
- Around line 70-80: Add the original ProviderError.context into the cassette
schema and plumbing so replayed exceptions retain their payload: extend the
CassetteRecordedError model to include a context: dict[str, Any] =
Field(default_factory=dict, description="Recorded ProviderError.context")
(keeping frozen/extra rules), update CassetteOutcome.from_error(...) to capture
and store error.context when constructing the recorded error, modify the
recording sites in the cassette provider (the functions that create
CassetteOutcome instances) to pass through the provider error context, and
update provider_error_for(...) to rehydrate the reconstructed ProviderError with
the recorded context so exceptions raised on replay include the original
context.

---

Outside diff comments:
In `@src/synthorg/providers/registry.py`:
- Around line 265-281: In _build_driver, do not catch and wrap system
exceptions: detect and re-raise BaseException-derived critical errors (e.g.,
MemoryError, RecursionError, KeyboardInterrupt, SystemExit) before the general
except block so they propagate unchanged; in the remaining exception handler,
replace logger.exception(...) with logger.error(...) and pass the structured
attributes (provider=name, driver=driver_type) and the
safe_error_description(exc) in the error context/message, then raise
DriverFactoryNotFoundError with the same context payload (provider, driver,
detail) using the caught exception as the cause; refer to the variables and
symbols factory, name, config, driver_type, safe_error_description, and
DriverFactoryNotFoundError to locate and update the code.
- Around line 166-183: The unconditional imports of LiteLLMDriver and
ScriptedDriver (and construction of the defaults dict) happen before the
cassette early-return, causing import-time failures in replay mode; move those
imports and the defaults: dict creation so they occur after the if cassette is
not None and cassette.is_active return (i.e., only when not taking the
_build_cassette_registry path). Concretely, relocate the from
.drivers.litellm_driver import LiteLLMDriver and from .drivers.scripted import
ScriptedDriver lines and the defaults assignment so they are executed after the
cassette check (so _build_cassette_registry is returned without importing
drivers) while leaving factory_overrides and cassette.is_active logic, and
references to defaults, unchanged for the non-cassette path.

In `@tests/e2e/test_cassette_replay_e2e.py`:
- Around line 232-250: In test_second_replay_is_stable, don't just compare
completion_summary values because identical failures can mask instability; for
each run (the result returned from _run_task using
CassetteCompletionProvider/CassetteSession) assert the run produced a real
successful completion (e.g. check result.success is True or
result.termination_reason is None) and/or that required artifacts exist (e.g.
result.artifacts or result.completion is not None) before appending/comparing
completion_summary; only then assert summaries[0] == summaries[1] to ensure
stability rather than equal failures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0df0b595-e47f-428c-befb-1058138b26c9

📥 Commits

Reviewing files that changed from the base of the PR and between 03d2587 and 10df914.

📒 Files selected for processing (23)
  • .claude/settings.json
  • .claude/skills/pre-pr-review/SKILL.md
  • docs/design/providers.md
  • scripts/check_no_unapproved_e2e_tests.sh
  • src/synthorg/api/auto_wire.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/cassette/keying.py
  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
  • src/synthorg/providers/registry.py
  • src/synthorg/settings/definitions/providers.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/__init__.py
  • tests/unit/providers/cassette/test_keying.py
  • tests/unit/providers/cassette/test_provider.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/unit/providers/cassette/test_store.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (9)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/settings/definitions/providers.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/api/auto_wire.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/keying.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/store.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/settings/definitions/providers.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/api/auto_wire.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/keying.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/store.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/settings/definitions/providers.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/api/auto_wire.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/keying.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/store.py
src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md

Files:

  • src/synthorg/observability/events/provider.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/providers/cassette/__init__.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_keying.py
  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/providers/cassette/test_store.py
  • tests/unit/providers/cassette/test_provider.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/providers/cassette/__init__.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_keying.py
  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/providers/cassette/test_store.py
  • tests/unit/providers/cassette/test_provider.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)

Files:

  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/keying.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/store.py
{README.md,docs/**/*.md,data/**/*.md}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs must be sourced from data/runtime_stats.yaml via markers

Files:

  • docs/design/providers.md
docs/**/*.{md,d2}

📄 CodeRabbit inference engine (CLAUDE.md)

Use d2 for architecture/nested containers, mermaid for flowcharts/sequence/pipelines. Markdown tables for tabular data. D2 theme 200 (Dark Mauve), D2 CLI pinned to v0.7.1 in CI

Files:

  • docs/design/providers.md
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: construction phase wires synchronous services; on_startup wires services needing connected persistence. Construction-phase invariants: agent_registry before auto_wire_meetings; tunnel_provider wired unconditionally. On-startup invariants: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService wires after persistence.connect() via _wire_ontology_service

Files:

  • src/synthorg/api/auto_wire.py
src/synthorg/{workers,api}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returning RuntimeServices (AgentEngineExecutionService + coordinator or NoProviderExecutionService + None). _install_runtime_services appends FIRST after persistence/SettingsService hooks; swap_worker_execution_service / swap_coordinator / swap_provider_registry hold a lock. Empty-company rejects task creation (AgentRuntimeNotConfiguredError 4014) and /coordinate 503s

Files:

  • src/synthorg/api/auto_wire.py
🧠 Learnings (7)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • tests/unit/providers/cassette/__init__.py
  • src/synthorg/providers/cassette/mode.py
  • src/synthorg/settings/definitions/providers.py
  • src/synthorg/providers/__init__.py
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/api/auto_wire.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/redaction.py
  • tests/e2e/test_cassette_replay_e2e.py
  • src/synthorg/providers/cassette/__init__.py
  • tests/unit/providers/cassette/test_keying.py
  • src/synthorg/providers/cassette/keying.py
  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/unit/providers/cassette/test_redaction.py
  • tests/unit/providers/cassette/test_store.py
  • src/synthorg/providers/cassette/provider.py
  • tests/unit/providers/cassette/test_provider.py
  • src/synthorg/providers/cassette/store.py
📚 Learning: 2026-05-16T18:36:31.446Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/reference/conventions.md:787-789
Timestamp: 2026-05-16T18:36:31.446Z
Learning: In Aureliolo/synthorg, do not require adding `<!--RS:...-->` “Doc Numeric Claims (MANDATORY)” numeric macros for Python version numbers mentioned in documentation prose (e.g., “Python 3.14”, “Python 3.15”). The `scripts/check_doc_numeric_macros.py` gate only applies to `README.md`, `docs/index.md`, `docs/roadmap/index.md`, `docs/architecture/decisions.md`, and `docs/reference/convention-gates.md`, and it only flags digits adjacent to specific stat nouns (tests/providers/agents/stars/releases), not language version mentions like “Python 3.14”.

Applied to files:

  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo, account for the CI gate `check_doc_numeric_macros.py`: it skips fenced code blocks entirely, and it only flags digits that are adjacent to these stat nouns: `tests`, `providers`, `agents`, `stars`, `releases`. Therefore, numeric examples such as CLI flag values (e.g., `--num-workers=4` in fenced bash blocks) and prose version numbers (e.g., `3.14`/`3.15`) are not expected to trigger this check; prioritize changes only when digits appear next to one of the listed nouns (e.g., “5 tests”, “10 stars”, etc.).

Applied to files:

  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing markdown files for the "Doc Numeric Claims (MANDATORY)" RS-marker rule, only require/flag missing RS markers in the files that are actually in-scope for the rule. The scope is enforced via an identical _SCOPED_FILES allowlist in scripts/check_doc_numeric_macros.py and scripts/inject_runtime_stats.py, and currently includes: README.md; docs/index.md; docs/roadmap/index.md; docs/architecture/decisions.md; docs/reference/convention-gates.md. For any other markdown files (e.g., docs/getting_started.md, docs/guides/*), missing RS markers for numeric claims are no-ops and should NOT be flagged.

Applied to files:

  • docs/design/providers.md
📚 Learning: 2026-05-16T18:36:35.250Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1944
File: docs/getting_started.md:109-109
Timestamp: 2026-05-16T18:36:35.250Z
Learning: When reviewing Markdown in the synthorg repo against the `check_doc_numeric_macros.py` gate, account for its documented behavior: it skips fenced code blocks entirely, and it only flags digits that are adjacent to specific stat nouns (`tests`, `providers`, `agents`, `stars`, `releases`). As a result, CLI-style numbers (e.g., `--num-workers=4`) inside fenced bash code blocks should never be treated as violations of this gate; only non-fenced text needs checking, and only around those specific nouns.

Applied to files:

  • docs/design/providers.md
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

  • src/synthorg/settings/definitions/providers.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

  • src/synthorg/settings/definitions/providers.py
🪛 LanguageTool
docs/design/providers.md

[grammar] ~107-~107: Ensure spelling is correct
Context: ...This is stable across record and replay iff the first-call order of distinct tasks ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 OpenGrep (1.21.0)
tests/unit/providers/cassette/test_redaction.py

[WARNING] 37-37: Hardcoded AWS access key detected. Use environment variables or a secrets manager instead.

(coderabbit.secrets.aws-access-key)


[WARNING] 38-38: Hardcoded AWS access key detected. Use environment variables or a secrets manager instead.

(coderabbit.secrets.aws-access-key)

🔇 Additional comments (11)
tests/unit/providers/cassette/__init__.py (1)

1-1: LGTM!

tests/unit/providers/cassette/test_keying.py (1)

1-296: LGTM!

tests/unit/providers/cassette/test_provider.py (1)

1-355: LGTM!

tests/unit/providers/cassette/test_redaction.py (1)

1-105: LGTM!

tests/unit/providers/cassette/test_registry_wiring.py (1)

1-162: LGTM!

tests/unit/providers/cassette/test_store.py (1)

1-345: LGTM!

.claude/settings.json (1)

76-80: LGTM!

docs/design/providers.md (1)

101-112: LGTM!

src/synthorg/providers/__init__.py (1)

9-22: LGTM!

Also applies to: 90-90, 101-109, 132-133

src/synthorg/providers/cassette/__init__.py (1)

1-60: LGTM!

src/synthorg/api/auto_wire.py (1)

56-56: LGTM!

Also applies to: 202-241

Comment thread scripts/check_no_unapproved_e2e_tests.sh Outdated
Comment thread src/synthorg/providers/cassette/provider.py Outdated
Comment thread src/synthorg/providers/cassette/redaction.py
Comment thread src/synthorg/providers/cassette/store.py
Comment thread src/synthorg/providers/cassette/store.py
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a deterministic record/replay system for LLM provider interactions, enabling agent runs to be re-executed byte-identically without incurring real LLM costs. The solution establishes a provider-layer seam that manages request keying, FIFO lane management for concurrent tasks, and atomic persistence. Additionally, it hardens the test infrastructure by introducing a new pre-tool-use hook to enforce sanctioned test execution patterns.

Highlights

  • Deterministic Record/Replay: Introduced the CassetteCompletionProvider to enable byte-identical replay of LLM provider interactions, ensuring zero-cost re-execution of agent runs.
  • Infrastructure Safety: Added a new pre-tool-use hook to prevent unapproved e2e and integration test runs, protecting the test environment from slow or unauthorized executions.
  • Configuration & Wiring: Integrated cassette mode and path settings into the provider registry boot process, allowing for flexible record/replay configuration.
  • Security & Redaction: Implemented a pluggable redaction system to scrub sensitive data from human-readable request representations while maintaining replay integrity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 88.24786% with 55 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.05%. Comparing base (03d2587) to head (a80ce97).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/providers/cassette/provider.py 76.29% 25 Missing and 7 partials ⚠️
src/synthorg/providers/cassette/store.py 89.18% 16 Missing and 4 partials ⚠️
src/synthorg/providers/cassette/redaction.py 93.54% 1 Missing and 1 partial ⚠️
src/synthorg/providers/registry.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2010      +/-   ##
==========================================
+ Coverage   85.03%   85.05%   +0.02%     
==========================================
  Files        1888     1895       +7     
  Lines      111930   112390     +460     
  Branches     9549     9591      +42     
==========================================
+ Hits        95181    95598     +417     
- Misses      14421    14453      +32     
- Partials     2328     2339      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a deterministic record/replay 'cassette' seam for LLM providers, facilitating byte-identical re-execution without real LLM calls. The implementation covers provider wrapping, SHA-256 request keying, atomic storage with FIFO lanes, and secret redaction, alongside a new pre-PR hook to restrict unapproved e2e test execution. Feedback focuses on enhancing the redaction regex to support secrets containing spaces, optimizing sequence number lookups in the session store to improve performance, and offloading document serialization to a worker thread to maintain event loop responsiveness.

r"(?i)\b(api[_-]?key|secret|password|token|authorization)\b"
r"(['\"]?\s*[:=]\s*['\"]?)"
r"((?:\\.|[^'\"\s,}\\])+)",
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The labelled secret redaction regex stops at the first whitespace character, which causes incomplete redaction for values containing spaces (e.g., in JSON payloads like "password": "my secret"). Removing whitespace from the exclusion list in the value group will improve redaction coverage while still stopping at common separators like commas, braces, or quotes.

r"((?:\\.|[^'",}\\])+)"
References
  1. Security: secret-log redaction is a priority focus area. (link)
  2. Ensure that URLs containing potential credentials are redacted in all output channels, including tool execution results and log messages, to prevent sensitive data leakage.

Comment on lines +260 to +264
seq = sum(
1
for i in self._recorded
if i.request_hash == request_hash and i.lane == lane
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sequence number calculation for identical requests is $O(N)$ per interaction, leading to $O(N^2)$ overall complexity for a run with $N$ interactions. This can be optimized to $O(1)$ by maintaining a counter dictionary for (request_hash, lane) pairs within the session.

lock = asyncio.Lock()
self._persist_lock = lock
async with lock:
payload = self._serialise()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Serializing the entire cassette document on the event loop can block the loop for significant periods as the number of interactions grows. Consider offloading the _serialise call (which involves Pydantic model_dump and json.dumps) to a worker thread by snapshotting the interactions list first to ensure thread-safety.

References
  1. Real concurrency hazards: blocking calls inside async should be avoided. (link)

- scripts/check_no_unapproved_e2e_tests.sh: bind ALLOW_E2E_TESTS=1 to the detected pytest segment (was checked against whole flattened command)
- cassette/provider.py: stream() forwards chunks incrementally while recording; terminal ProviderError recorded with prior chunks and re-raised after replay re-emits them
- cassette/redaction.py: redact secret-named dict keys before recursing; widen labelled-secret value group so multi-word secrets are fully scrubbed
- cassette/store.py + errors.py + provider.py: record (redacted) ProviderError.context and rehydrate it on replay
- cassette/store.py: O(1) per-(hash,lane) seq counter; offload serialise+write to worker thread; drop issue back-ref from module docstring
- providers/registry.py: re-raise MemoryError/RecursionError; logger.error with structured error_type/error; defer driver-SDK imports so pure replay needs no optional litellm
- tests/e2e/test_cassette_replay_e2e.py: assert per-run success/termination/artifact before comparing replay summaries; drop issue back-refs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/providers/cassette/store.py`:
- Around line 291-349: flush() currently bypasses the serialized write path and
can race with record_interaction(); change flush to use the same async locking
and offloaded snapshot write as record_interaction: make flush an async function
(or provide an async variant) that ensures _persist_lock exists (create
asyncio.Lock() if None), acquires it (async with self._persist_lock), takes
snapshot = tuple(self._recorded) and then awaits
asyncio.to_thread(self._persist_snapshot, snapshot) instead of calling
_persist()/_atomic_write() directly so all writes are serialized and run off the
event loop.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: c08c76e1-126f-4f0a-aafb-7f08e4d28394

📥 Commits

Reviewing files that changed from the base of the PR and between 10df914 and 3eb9ea9.

📒 Files selected for processing (7)
  • scripts/check_no_unapproved_e2e_tests.sh
  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
  • src/synthorg/providers/registry.py
  • tests/e2e/test_cassette_replay_e2e.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Conformance (SQLite)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)

Files:

  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

  • tests/e2e/test_cassette_replay_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/e2e/test_cassette_replay_e2e.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/providers/cassette/errors.py
  • src/synthorg/providers/registry.py
  • src/synthorg/providers/cassette/provider.py
  • src/synthorg/providers/cassette/redaction.py
  • src/synthorg/providers/cassette/store.py
  • tests/e2e/test_cassette_replay_e2e.py
🔇 Additional comments (1)
src/synthorg/providers/cassette/provider.py (1)

424-435: ⚡ Quick win

No refactor needed; sequential execution is required by cassette design.

The override cannot safely use asyncio.TaskGroup because:

  1. Error propagation: CassetteReplayMissError and CassetteReplayExhaustedError are Exception subclasses that would be caught by the base class's except Exception handler in its _one() helper and converted to None, defeating the override's explicit purpose to propagate them loudly.

  2. Replay lane semantics: The cassette session assigns lanes per asyncio task. Each task in a TaskGroup would receive a new lane, changing the FIFO replay order from what was recorded.

The sequential dict comprehension is the correct implementation for this provider.

Comment thread src/synthorg/providers/cassette/store.py
- cassette/store.py: flush() is now async and routes through the same _persist_lock + offloaded snapshot write as record_interaction, so an end-of-run flush racing an in-flight record cannot hit os.replace concurrently (Windows WinError 5) and never blocks the event loop on serialise+I/O; removed the now-dead sync _persist() helper
- updated all flush() call sites (cassette unit + e2e tests) to await; kept TestAtomicFlush sync (ruff ASYNC240) by driving record+flush via asyncio.run
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/providers/cassette/store.py`:
- Around line 406-409: The current logic lets a pluggable _redactor.redact
replace the whole ProviderError.context mapping with a scalar, producing
{"value": ...} and breaking callers that expect the original keys; change the
code so we preserve the original mapping shape: if
_redactor.redact(dict(error.context)) returns a dict use it as-is, otherwise
perform per-entry redaction by iterating over dict(error.context).items() and
applying _redactor.redact to each value to build ctx with the same keys as
ProviderError.context (referencing _redactor.redact, ProviderError.context,
redacted, and ctx).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2452013d-9b9f-4f4e-a510-ace4a99374c5

📥 Commits

Reviewing files that changed from the base of the PR and between 3eb9ea9 and 0c72c7e.

📒 Files selected for processing (5)
  • src/synthorg/providers/cassette/store.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_provider.py
  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/unit/providers/cassette/test_store.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Test Integration
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (4)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_provider.py
  • tests/unit/providers/cassette/test_store.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_provider.py
  • tests/unit/providers/cassette/test_store.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

  • src/synthorg/providers/cassette/store.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

  • src/synthorg/providers/cassette/store.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/providers/cassette/store.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Provider calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses. Retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error. WebSocket: per-frame timeout closes silent peers (1008); revalidation saturation closes (4011)

Files:

  • src/synthorg/providers/cassette/store.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/providers/cassette/test_registry_wiring.py
  • tests/e2e/test_cassette_replay_e2e.py
  • tests/unit/providers/cassette/test_provider.py
  • tests/unit/providers/cassette/test_store.py
  • src/synthorg/providers/cassette/store.py

Comment thread src/synthorg/providers/cassette/store.py Outdated
- cassette/store.py: _redact_outcome_error preserves ProviderError.context key shape: if a pluggable redactor collapses the whole-mapping redact to a scalar, fall back to per-entry redaction keyed by the original context keys instead of the opaque {"value": ...} wrapper, so replayed exceptions stay faithful for callers that branch on exc.context[...]
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 19, 2026 07:16 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit cabf55d into main May 19, 2026
86 of 89 checks passed
@Aureliolo Aureliolo deleted the test/llm-cassette-replay branch May 19, 2026 07:32
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 19, 2026 07:32 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 19, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Multi-agent coordination is now active immediately on startup for
smoother operation.
- Governance rules are fully enforced during use, ensuring compliance at
all times.
- Coordination metrics update live, giving real-time insights into
system activity.
- Review agents are now reliably processed, preventing silent drops in
tasks.
- Sandbox containers can be reused for agents and tasks, speeding up
execution and reducing overhead.

### What's new
- Agents support online runtime with a minimal safety framework to
improve stability.
- Recorded LLM interactions can be deterministically replayed at the
provider interface.
- Distributed path validation has been enhanced for more robust data
routing.
- A client-simulation runtime was added for end-to-end testing of the
IntakeEngine.
- A new work pipeline spine architecture has been introduced to
streamline task processing.

### Under the hood
- Infrastructure, Python, and web dependencies have all been updated to
latest versions.
- Updated apko lockfiles in the CI/CD pipeline improve build
consistency.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.6](v0.8.5...v0.8.6)
(2026-05-19)


### Features

* agent runtime online + minimal safety spine (runtime root)
([#2003](#2003))
([e5eef1a](e5eef1a)),
closes [#1956](#1956)
* deterministic recorded-LLM cassette replay at the provider chokepoint
([#2010](#2010))
([cabf55d](cabf55d))
* distributed path validation + hardening
([#2011](#2011))
([a382e4a](a382e4a)),
closes [#1966](#1966)
* wire IntakeEngine via boot client-simulation runtime (e2e test
harness) ([#2006](#2006))
([6a9c0aa](6a9c0aa)),
closes [#1961](#1961)
* work pipeline spine
([#1960](#1960))
([#2013](#2013))
([29b64e3](29b64e3))


### Bug Fixes

* bring the multi-agent coordinator online at boot
([#2007](#2007))
([180b38a](180b38a)),
closes [#1958](#1958)
* full governance enforcement online
([#1957](#1957))
([#2005](#2005))
([4140fc5](4140fc5))
* harden anti-ghost-wiring gate and fix silently-dropped review agents
([#2000](#2000))
([89b57ce](89b57ce))
* make coordination metrics live
([#1959](#1959))
([#2012](#2012))
([c4775e2](c4775e2))
* sandbox lifecycle dispatch (per-agent / per-task container reuse)
([#2008](#2008))
([03d2587](03d2587)),
closes [#1965](#1965)


### Documentation

* add GitButler concept-only concurrency research
([#1978](#1978))
([#2009](#2009))
([9e4f5c1](9e4f5c1))
* honest-hybrid refresh of README, site, and design specs
([#2001](#2001))
([f485bea](f485bea))


### CI/CD

* update apko lockfiles
([#2004](#2004))
([e2b9eee](e2b9eee))


### Maintenance

* Update Infrastructure dependencies
([#2014](#2014))
([0b16bdf](0b16bdf))
* Update Python dependencies
([#2015](#2015))
([a7224bb](a7224bb))
* Update Web dependencies
([#2016](#2016))
([7a7fe76](7a7fe76))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deterministic recorded-LLM cassette replay

1 participant