Skip to content

fix: audit cleanup C - persistence, concurrency & data integrity (#1708)#1717

Merged
Aureliolo merged 13 commits into
mainfrom
fix/audit-cleanup-c-persistence
May 2, 2026
Merged

fix: audit cleanup C - persistence, concurrency & data integrity (#1708)#1717
Aureliolo merged 13 commits into
mainfrom
fix/audit-cleanup-c-persistence

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Closes #1708.

Implements the persistence / concurrency / data-integrity cleanup bundle from the 2026-05-01 codebase audit (_audit/runs/2026-05-01-225703/).

Summary

Lifecycle (canonical pattern per docs/reference/lifecycle-sync.md)

  • providers/health_prober.py, meta/chief_of_staff/monitor.py: add _lifecycle_lock + _stop_failed + drain timeout. Cooperative loop honours _stop_event.
  • backup/scheduler.py, hr/pruning/service.py: convert sync start() → async, add canonical pattern. Callers updated to await.
  • integrations/tunnel/ngrok_adapter.py: add _lifecycle_lock for uniformity (no spawned task).
  • client/continuous.py: rename _lock_lifecycle_lock. Documents the in-place runner variant where the lock guards the _running flag transition only (not the loop body).

Race conditions (threading.Lock for sync methods callable from threadpool)

  • api/auth/ticket_store.py: lock count-and-insert in create(), plus pop-and-validate and bulk eviction.
  • tools/mcp/cache.py: lock get / put / invalidate.
  • integrations/webhooks/replay_protection.py: lock nonce dedup in check().

Currency aggregation invariant

  • budget/trends.py: _assert_single_currency in bucket_cost_records and project_daily_spend (raises MixedCurrencyAggregationError on mixed input).

Idempotency

  • api/controllers/simulations.py + client/store.py: new atomic SimulationStore.register_if_absent replaces the racy get-then-save; concurrent requests with the same simulation_id get HTTP 409 Conflict instead of both spawning runners.
  • communication/event_stream/stream.py: per-session sliding dedup window keyed by event.id (60s TTL, bounded per session). Cleaned up on last unsubscribe.
  • api/controllers/backup.py: Idempotency-Key header is now mandatory (HTTP 400 on missing, max_length=255).

Escalation factory

  • communication/conflict_resolution/escalation/factory.py: refactor if/elif dispatch into registry maps (MappingProxyType) for queue store and decision processor — shape mirrors PersistenceBackendRegistry.

Frontend

  • web/src/api/endpoints/backup.ts: createBackup() sends a generated Idempotency-Key header (or accepts a caller-supplied one).
  • web/src/api/endpoints/clients.ts: startSimulation() falls back to getSimulation() on HTTP 409 so retries observe the in-flight runner.

Documentation

  • New ADR: docs/research/lgpl-postgres-driver-decision.md (accept-with-ADR for LGPL psycopg + psycopg-pool in the optional postgres extra; resolves audit findings Implement Budget and Cost domain models #61 and docs: add crash recovery, sandboxing, analytics, and testing decisions #127 false-positives in passing).
  • docs/licensing.md: third-party LGPL note for the postgres extra.
  • docs/reference/lifecycle-sync.md: appended the 6 newly-compliant services + in-place runner variant.
  • docs/design/observability.md: enumerated EVENT_STREAM_HUB_PUBLISH_DEDUPED.
  • docs/design/backup.md, docs/design/client-simulation.md: documented the new mandatory-key and 409 contracts.
  • docs/openapi/openapi.json: regenerated.

Test plan

  • 26127 unit tests pass on Python 3.14.4 with uv run python -m pytest tests/ -m unit -n 8.
  • New tests added: 17 new test files covering lifecycle (concurrent start, restart, drain-timeout → unrestartable), thread safety (ThreadPoolExecutor, exact-count assertions), currency rejection, idempotency dedup, registry dispatch, dedup-window cleanup. Plus 1 new integration test (test_start_simulation_duplicate_id_returns_409).
  • Web dashboard: npm --prefix web run lint + type-check + test (2966 tests) all clean.
  • Mypy strict, ruff, all guard scripts (check_persistence_boundary.py, check_backend_regional_defaults.py, check_forbidden_literals.py) pass.
  • atlas migrate validate --env sqlite passes; schema parity confirmed (the audit-flagged Postgres-only migration was already absorbed into the SQLite squashed baseline — documented in the ADR).

Pre-PR review

Reviewed by 14 of 16 spawned agents (/pre-pr-review). Triage at _audit/pre-pr-review/triage.md. CRITICAL TOCTOU finding from security-reviewer (atomic check-and-register on simulations) addressed in the follow-up commit. PEP 758 false positives from 4 agents ignored: except A, B: (no parens, no binding) is the correct Python 3.14 syntax mandated by CLAUDE.md.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 515495ba-113d-46cd-8d16-dd3cecd352de

📥 Commits

Reviewing files that changed from the base of the PR and between d003360 and f2145f6.

📒 Files selected for processing (1)
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
🔇 Additional comments (1)
src/synthorg/integrations/tunnel/ngrok_adapter.py (1)

136-141: The guideline "timed-out stops mark the service unrestartable" does not apply to NgrokAdapter. The official lifecycle-sync documentation explicitly excludes it: "NgrokAdapter (integrations/tunnel/ngrok_adapter.py): lifecycle lock only; no spawned background task, so the drain timeout / unrestartable flag do not apply."

The adapter forwards blocking ngrok calls to a worker thread via asyncio.to_thread() but does not spawn background tasks. The lifecycle lock serializes start() / stop() and is sufficient to prevent races on the single-tunnel invariant, which is the intended design per the code's documented rationale at lines 69–76.

			> Likely an incorrect or invalid review comment.

Walkthrough

This PR applies concurrency, lifecycle, and idempotency fixes across the codebase and updates documentation, tests, and frontend callers. Backup endpoints now require an Idempotency-Key and always use the idempotency service; simulation starts atomically claim records via SimulationStore.register_if_absent and return HTTP 409 on duplicates; EventStreamHub adds per-session event-id deduplication. Multiple services (schedulers, probers, pruners, monitor, ngrok adapter) use lifecycle locks, async start/stop, and unrestartable-on-timeout semantics. Thread-safety fixes were added for websocket tickets, webhook replay protection, and the MCP result cache. New events, errors, tests, and client/frontend idempotency handling were added.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 2, 2026 11:14 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements comprehensive concurrency safety, idempotency, and lifecycle management across the system. Key changes include the adoption of a canonical lifecycle pattern for background services (scheduler, prober, monitor, pruning), thread-safety enhancements for in-memory stores (ticket store, replay protection, MCP cache), and the introduction of mandatory idempotency keys for manual backups. Additionally, it adds per-session event deduplication in the event stream hub, currency validation in budget aggregation, and refactors escalation factories to use a registry-based dispatch. Documentation has been updated with a new ADR regarding LGPL Postgres drivers and design notes on idempotency. I have no feedback to provide as there are no review comments.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 2, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing fix/audit-cleanup-c-persistence (f2145f6) with main (3a3591a)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 79.83193% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.65%. Comparing base (015d352) to head (f2145f6).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/controllers/simulations.py 41.93% 18 Missing ⚠️
src/synthorg/hr/pruning/service.py 72.34% 13 Missing ⚠️
src/synthorg/integrations/tunnel/ngrok_adapter.py 65.71% 12 Missing ⚠️
src/synthorg/meta/chief_of_staff/monitor.py 77.77% 10 Missing ⚠️
src/synthorg/communication/event_stream/stream.py 80.85% 6 Missing and 3 partials ⚠️
src/synthorg/api/controllers/backup.py 33.33% 7 Missing and 1 partial ⚠️
src/synthorg/backup/scheduler.py 86.44% 6 Missing and 2 partials ⚠️
src/synthorg/client/store.py 46.66% 8 Missing ⚠️
...synthorg/settings/subscribers/backup_subscriber.py 37.50% 5 Missing ⚠️
src/synthorg/providers/health_prober.py 90.69% 4 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1717      +/-   ##
==========================================
+ Coverage   84.62%   84.65%   +0.02%     
==========================================
  Files        1782     1782              
  Lines      101896   102180     +284     
  Branches     8968     8992      +24     
==========================================
+ Hits        86227    86497     +270     
- Misses      13476    13491      +15     
+ Partials     2193     2192       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/integrations/webhooks/replay_protection.py (1)

131-220: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Split check() and reuse check_freshness() to avoid logic drift.

Line 131 keeps check() well over the 50-line limit and duplicates freshness checks already implemented in check_freshness(). Extracting nonce handling into a helper and delegating freshness to check_freshness() will keep behavior aligned and reduce maintenance risk.

♻️ Proposed refactor direction
 def check(
     self,
     *,
     nonce: str | None,
     timestamp: float | None,
 ) -> bool:
@@
-    now = self._clock.now().timestamp()
-
     if nonce is None and timestamp is None:
         logger.warning(
             WEBHOOK_REPLAY_DETECTED,
             reason="no freshness signal (nonce and timestamp both missing)",
         )
         return False
-
-    if timestamp is not None and not math.isfinite(timestamp):
-        logger.warning(
-            WEBHOOK_REPLAY_DETECTED,
-            reason="non-finite timestamp",
-        )
-        return False
-
-    if timestamp is not None and abs(now - timestamp) > self._window:
-        logger.warning(
-            WEBHOOK_REPLAY_DETECTED,
-            reason="timestamp outside window",
-            skew=abs(now - timestamp),
-        )
+    if not self.check_freshness(timestamp):
         return False
+    now = self._clock.now().timestamp()
+    return self._check_nonce(nonce=nonce, now=now)
+
+def _check_nonce(self, *, nonce: str | None, now: float) -> bool:
+    if nonce is None:
+        with self._lock:
+            self._evict_locked(now)
+        return True
+    if len(nonce) > MAX_NONCE_CHARS:
+        logger.warning(
+            WEBHOOK_REPLAY_DETECTED,
+            reason="nonce exceeds max size",
+            nonce_length=len(nonce),
+            max_nonce_chars=MAX_NONCE_CHARS,
+        )
+        return False
+    key = _fingerprint_nonce(nonce)
+    with self._lock:
+        self._evict_locked(now)
+        duplicate = key in self._seen
+        if not duplicate:
+            self._seen[key] = now
+            while len(self._seen) > self._max_entries:
+                self._seen.popitem(last=False)
+    if duplicate:
+        logger.warning(
+            WEBHOOK_REPLAY_DETECTED,
+            reason="duplicate nonce",
+            nonce_fingerprint=key[:16],
+        )
+        return False
+    return True

As per coding guidelines, "Functions must be < 50 lines."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/integrations/webhooks/replay_protection.py` around lines 131 -
220, The check() method currently mixes freshness validation and nonce cache
logic; extract and reuse the existing check_freshness(timestamp) for all
timestamp-related checks and move nonce-specific logic into a small helper
(e.g., _check_nonce(nonce, now) or inline after freshness delegation) so check()
stays under 50 lines. Concretely: have check() first fail-fast on both nonce and
timestamp missing (preserve the same logger call), then call
self.check_freshness(timestamp) (which should return bool or raise) and return
False on failure, then handle the nonce branch: if nonce is None acquire
self._lock, call self._evict_locked(now) and return True; otherwise keep the
oversized check using MAX_NONCE_CHARS and the logger, compute key =
_fingerprint_nonce(nonce), with self._lock call self._evict_locked(now) and the
duplicate detection/update of self._seen (respecting _max_entries and popitem),
and preserve the duplicate logger with nonce_fingerprint=key[:16]; keep use of
_lock, _evict_locked, _fingerprint_nonce, _seen and _max_entries to ensure
behavior is unchanged.
tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (1)

41-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cover the registry fallback path too.

These tests only exercise the registered keys. Add one assertion for an unknown backend/strategy value so the new registry contract still fails with ValueError if a key is misspelled or omitted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py`
around lines 41 - 105, Add tests that assert the registry fallback raises
ValueError for unknown keys: for build_escalation_queue_store call it with
EscalationQueueConfig(backend="unknown") (and persistence=None or a fake) and
assert raises ValueError with an appropriate message about unknown backend;
likewise for build_decision_processor call it with
EscalationQueueConfig(decision_strategy="unknown") and assert it raises
ValueError. Place these assertions alongside the existing TestQueueStoreRegistry
and TestDecisionProcessorRegistry tests to cover the fallback/unregistered-path
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/auth/ticket_store.py`:
- Around line 156-159: The create() path raises TicketLimitExceededError when
user_pending >= self._max_pending but does not log the rejection; add a
WARNING/ERROR log immediately before raising that includes the user identifier,
the current pending count and the max (e.g. use self._logger.warning(...) or
module logger.warning(...)) so the rejection is recorded (reference symbols:
create(), user_pending, self._max_pending, TicketLimitExceededError,
user.user_id).

In `@src/synthorg/backup/scheduler.py`:
- Around line 128-132: The lifecycle lock is being replaced after stop which
allows overlap between callers on the old and new locks; instead ensure a single
dedicated self._lifecycle_lock instance exists for the service lifetime (create
it once, e.g. in __init__ or at construction) and do not reassign or replace it
in stop() or start(); if you need to reinitialize synchronization state
keep/replace only events like self._stop_event or self._wake_event but never
rebind self._lifecycle_lock, and update start()/stop() to use the persistent
self._lifecycle_lock for mutual exclusion.

In `@src/synthorg/client/continuous.py`:
- Around line 11-15: The module docstring incorrectly states that
self._lifecycle_lock "spans the full body of start() and stop()"; update the
docstring to match the implementation by noting that self._lifecycle_lock is
only held briefly in start() to serialize the running-flag check (and not held
for the full start execution) and that stop() does not use this lock; reference
self._lifecycle_lock, start(), and stop() in the revised sentence so future
readers understand the actual locking scope and intent.

In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 60-67: Validate constructor inputs for dedup parameters: in the
class initializer that accepts dedup_ttl_seconds and
dedup_max_entries_per_session (the shown constructor), check that
dedup_ttl_seconds is non-negative and dedup_max_entries_per_session is a
non-negative (or strictly positive if logical) integer and raise a ValueError
with a clear message if not; this prevents the trim loop in
_record_published_locked() from calling popitem() on an empty OrderedDict at
publish time. Update validation to coerce or reject bad types (e.g., non-int for
dedup_max_entries_per_session) and document the error messages so callers
fail-fast when creating the Stream instead of crashing later in
_record_published_locked().

In `@src/synthorg/hr/pruning/service.py`:
- Around line 209-213: The code currently replaces self._lifecycle_lock which
breaks synchronization for callers queued on the original lock; instead, do not
reassign self._lifecycle_lock during restart preparation — only recreate
loop-bound primitives like self._stop_event and self._wake_event after releasing
the lock. Concretely, remove or revert the assignment of self._lifecycle_lock =
asyncio.Lock() in the restart path, keep the existing self._lifecycle_lock used
by start()/stop(), and only set new asyncio.Event() instances for
self._stop_event and self._wake_event (and ensure this happens outside the held
lifecycle lock) so events rebind to a new loop while preserving the single
dedicated lifecycle lock.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 81-83: The duplicate-start branch in the ngrok adapter (the check
on self._tunnel in the method that starts the tunnel) raises RuntimeError
without logging; modify that branch to emit a warning or error log (e.g., use
the adapter or module logger such as self.logger or module-level logger) with
the message "ngrok tunnel already active on this adapter" and any relevant
context (adapter id/state) immediately before raising the RuntimeError to
satisfy the error-path observability rule.

In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 152-155: The code currently reassigns self._lifecycle_lock during
stop which can split synchronization between old waiters and new entrants;
instead ensure self._lifecycle_lock is created once (e.g., in __init__) and
never reassigned, and only recreate or reset non-lock primitives (like
self._stop_event) if necessary for event-loop rebinding; update the code that
now sets self._lifecycle_lock = asyncio.Lock() to remove that reassignment, keep
start()/stop() protected by the single dedicated self._lifecycle_lock, and if
you must recreate loop-bound primitives do so for self._stop_event only while
leaving self._lifecycle_lock intact.

In `@src/synthorg/providers/health_prober.py`:
- Around line 264-267: The change replaces self._lifecycle_lock with a new
asyncio.Lock() during stop, which breaks serialization because callers may still
be awaiting the old lock; instead do not swap out self._lifecycle_lock — keep it
created once (e.g., at init) and never reassign it in stop()/start(); if you
need to reset state on stop, only recreate self._stop_event (or clear it) but
leave self._lifecycle_lock intact so start() and stop() still serialize
correctly.

In `@tests/unit/api/auth/test_ticket_store_threadsafety.py`:
- Around line 46-49: Tests currently start worker bodies opportunistically (the
attempt function) causing nondeterministic races; introduce a shared start gate
(e.g., threading.Event or threading.Barrier) named start_event that each worker
(the attempt function) waits on before proceeding, submit all futures with the
workers blocked, call start_event.set() once all tasks are submitted to release
them simultaneously, then collect results as before; apply the same pattern to
the other test blocks that create ThreadPoolExecutor/futures around the attempt
worker (the blocks at the other ranges).

In `@tests/unit/api/controllers/test_backup_required_idempotency.py`:
- Around line 42-55: The helper _make_state uses bare AsyncMock/MagicMock
instances; replace them with spec-bound mocks to prevent interface drift: set
service = AsyncMock(spec=BackupServiceInterface) (or the concrete backup service
class used in production) and service.create_backup = AsyncMock(spec=Callable,
return_value=_make_manifest()), set app_state = MagicMock(spec=AppStateClass),
idempotency_service = MagicMock(spec=IdempotencyServiceClass) and assign
idempotency_service.run_idempotent = AsyncMock(spec=Callable) (or use
spec_set=... if stricter), and keep CursorSecret.from_key unchanged; update the
other mock site around lines 80-84 similarly to use the same spec classes
referenced here.

In `@tests/unit/api/controllers/test_backup.py`:
- Around line 76-96: The test uses a bare MagicMock for idempotency_service
which can mask interface drift; replace it with a specced async double by
creating idempotency_service = MagicMock(spec=IdempotencyService) (or the
concrete class used in production) and implement run_idempotent as an async def
_run_idempotent(...) that accepts only the expected signature, uses an AsyncMock
for the callback (or awaits the provided callback to enforce the awaitable
contract), sets outcome.timed_out/result/fresh as before, assign
idempotency_service.run_idempotent = _run_idempotent, and set
app_state.idempotency_service = idempotency_service so the test enforces the
real interface for idempotency_service and its run_idempotent method.

In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 33-60: The helpers _make_state and _make_request create bare
MagicMock/AsyncMock instances (e.g., sim_state, app_state, state, and the return
of _make_request) which violates the spec= requirement; replace each bare
MagicMock()/AsyncMock() call with mocks that declare the concrete interface via
spec= (for example spec=SimulationState, spec=SimulationStore, spec=AppState,
spec=PoolClientList, etc.) and ensure async methods use AsyncMock(spec=...)
where appropriate (e.g., register_if_absent, save, list_clients); update
_make_request to return a MagicMock(spec=IncomingRequestClass) and then remove
the test-level baseline suppressions that were added to silence bare-mock
checks.

In `@tests/unit/backup/test_scheduler_lifecycle.py`:
- Around line 21-23: The test helper _make_scheduler creates a bare MagicMock
and AsyncMock; replace them with spec-bound mocks of the concrete service type
used by BackupScheduler (e.g., use MagicMock(spec=BackupService) and set
service.create_backup = AsyncMock(spec=Callable or the concrete method
signature) or AsyncMock(spec=BackupService.create_backup)) so the mock enforces
the real interface for BackupScheduler and its create_backup method; update
_make_scheduler to construct service = MagicMock(spec=BackupService) and assign
service.create_backup = AsyncMock(spec=BackupService.create_backup) (or the
appropriate concrete class/method) before returning BackupScheduler(service,
interval_hours=1).

In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 21-23: Replace the bare AsyncMock with a spec-bound mock to
SnapshotBuilder and stop reassigning build: create builder =
AsyncMock(spec=SnapshotBuilder) (or use create_autospec/Sentinel as your test
helpers) and set builder.build.return_value to the desired value instead of
reassigning builder.build; then pass that builder into OrgInflectionMonitor so
the build method stays an AsyncMock consistent with the SnapshotBuilder
interface.

In `@tests/unit/settings/test_backup_subscriber.py`:
- Around line 30-34: The scheduler test double is too loose: tighten it by
creating the scheduler mock with a concrete spec (e.g., scheduler =
MagicMock(spec=Scheduler)) and give its async methods proper AsyncMock specs
(scheduler.start = AsyncMock(spec=Scheduler.start), scheduler.stop =
AsyncMock(spec=Scheduler.stop)) and reschedule as
MagicMock(spec=Scheduler.reschedule); keep type(scheduler).is_running =
PropertyMock(...) for the property. Finally update the enabled-path assertion to
wait for the coroutine by using scheduler.start.assert_awaited_once() (and
similarly use assert_awaited_once() for stop where applicable).

In `@tests/unit/tools/mcp/test_cache_threadsafety.py`:
- Around line 22-38: The test test_concurrent_get_put_no_corruption doesn't
exercise cache hits because writer and reader use different payloads; update the
reader in that test so it queries the exact same key+input the preceding writer
used (e.g., change reader to call cache.get(f"tool-{i % 8}", {"i": i-1}) when i
is odd) so the get frequently hits the existing entry and exercises the
MCPResultCache hit path (move_to_end + deepcopy) modified in this PR.

In `@web/src/api/endpoints/backup.ts`:
- Around line 13-17: The idempotencyKey is currently accepted as-is (using ??)
which preserves empty or whitespace-only strings; change the logic that defines
key (the variable assigned from idempotencyKey) to treat blank/whitespace-only
values as "not provided" and generate a UUID via crypto.randomUUID() instead.
Specifically, update the assignment that sets key (used in the headers for
apiClient.post) to check idempotencyKey.trim() (or equivalent) and only reuse it
when non-empty; otherwise call crypto.randomUUID() so no empty Idempotency-Key
is sent.

---

Outside diff comments:
In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 131-220: The check() method currently mixes freshness validation
and nonce cache logic; extract and reuse the existing check_freshness(timestamp)
for all timestamp-related checks and move nonce-specific logic into a small
helper (e.g., _check_nonce(nonce, now) or inline after freshness delegation) so
check() stays under 50 lines. Concretely: have check() first fail-fast on both
nonce and timestamp missing (preserve the same logger call), then call
self.check_freshness(timestamp) (which should return bool or raise) and return
False on failure, then handle the nonce branch: if nonce is None acquire
self._lock, call self._evict_locked(now) and return True; otherwise keep the
oversized check using MAX_NONCE_CHARS and the logger, compute key =
_fingerprint_nonce(nonce), with self._lock call self._evict_locked(now) and the
duplicate detection/update of self._seen (respecting _max_entries and popitem),
and preserve the duplicate logger with nonce_fingerprint=key[:16]; keep use of
_lock, _evict_locked, _fingerprint_nonce, _seen and _max_entries to ensure
behavior is unchanged.

In
`@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py`:
- Around line 41-105: Add tests that assert the registry fallback raises
ValueError for unknown keys: for build_escalation_queue_store call it with
EscalationQueueConfig(backend="unknown") (and persistence=None or a fake) and
assert raises ValueError with an appropriate message about unknown backend;
likewise for build_decision_processor call it with
EscalationQueueConfig(decision_strategy="unknown") and assert it raises
ValueError. Place these assertions alongside the existing TestQueueStoreRegistry
and TestDecisionProcessorRegistry tests to cover the fallback/unregistered-path
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: afa59223-3a93-4652-a57f-3dd9e2165a11

📥 Commits

Reviewing files that changed from the base of the PR and between ade03b7 and aef7ead.

📒 Files selected for processing (46)
  • docs/design/backup.md
  • docs/design/client-simulation.md
  • docs/design/observability.md
  • docs/licensing.md
  • docs/reference/lifecycle-sync.md
  • docs/research/lgpl-postgres-driver-decision.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/backup/service.py
  • src/synthorg/budget/trends.py
  • src/synthorg/client/continuous.py
  • src/synthorg/client/store.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/tools/mcp/cache.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Dashboard Type Check
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (12)
web/src/**/!(logger).ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/!(logger).ts?(x): Always use createLogger from @/lib/logger instead of bare console.warn, console.error, or console.debug in application code; use variable name log
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go through sanitizeArg
Wrap attacker-controlled fields inside structured objects with sanitizeForLog() before embedding in logs

Files:

  • web/src/api/endpoints/clients.ts
  • web/src/api/endpoints/backup.ts
web/src/**/*.ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls in try / catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from @/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from @/utils/format and design tokens
NEVER hardcode currency symbols or codes; use DEFAULT_CURRENCY from @/utils/currencies

Files:

  • web/src/api/endpoints/clients.ts
  • web/src/api/endpoints/backup.ts
web/src/api/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use getLiveness() for 200 always-alive checks and getReadiness() for binary 'ok' | 'unavailable' health checks (no tri-state); any new caller must handle the 503 path explicitly

Files:

  • web/src/api/endpoints/clients.ts
  • web/src/api/endpoints/backup.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

web/src/**/*.{ts,tsx}: Reuse components from web/src/components/ui/ before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens, @/lib/motion presets, and the helpers in @/utils/format. Enforced by scripts/check_web_design_system.py (PostToolUse hook on every web/src/ edit). See web/CLAUDE.md for the component inventory, token rules, and post-training references (TS6, Storybook 10).
Never hardcode BCP 47 locale strings or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale.
Currency: never hardcode ISO 4217 codes or symbols. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency. Field naming: no _usd suffix on money fields anywhere.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins). Date / number format: always via Intl; no hand-rolled templates. Units: metric only.

Files:

  • web/src/api/endpoints/clients.ts
  • web/src/api/endpoints/backup.ts
web/**/*.{ts,tsx,js,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Web Dashboard: see web/CLAUDE.md for commands, design system, and component inventory. The CI-matching full-suite leak check is npm --prefix web run test -- --coverage --detect-async-leaks (the Leaks N leaks summary must stay at or below the MAX_ASYNC_LEAKS ceiling in .github/workflows/ci.yml; any new store that schedules timers must expose a teardown hook per web/CLAUDE.md).

Files:

  • web/src/api/endpoints/clients.ts
  • web/src/api/endpoints/backup.ts
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/design/**/*.md: ALWAYS read the relevant docs/design/ page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking the design pages under docs/design/. The design spec is the starting point for architecture, data models, and behavior.
If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevant docs/design/ page to reflect the new reality.

Files:

  • docs/design/backup.md
  • docs/design/observability.md
  • docs/design/client-simulation.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2) for architecture diagrams, nested container layouts, complex entity relationships. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, pipelines. Use Markdown tables for grid/matrix data that is semantically tabular. Never use ```textblocks with ASCII/Unicode box-drawing characters for diagrams. D2 uses theme 200 (Dark Mauve), configured globally inmkdocs.yml. Review agent diagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.

Files:

  • docs/design/backup.md
  • docs/design/observability.md
  • docs/licensing.md
  • docs/reference/lifecycle-sync.md
  • docs/research/lgpl-postgres-driver-decision.md
  • docs/design/client-simulation.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
No from __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction and wrap with MappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults to ConfigDict(frozen=True, allow_inf_nan=False) unless documented otherwise; mutations go thro...

Files:

  • src/synthorg/api/auth/ticket_store.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/settings/test_backup_subscriber.py
  • src/synthorg/client/store.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • src/synthorg/observability/events/event_stream.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/api/controllers/backup.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • src/synthorg/backup/service.py
  • src/synthorg/tools/mcp/cache.py
  • tests/unit/backup/test_scheduler.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • src/synthorg/client/continuous.py
  • tests/unit/api/controllers/test_backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/budget/trends.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every business-logic module must have from synthorg.observability import get_logger then logger = get_logger(__name__) at module level. Variable name is always logger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction and bootstrap code.
Event names must always import constants from synthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split between TELEMETRY_* (log events) and TELEMETRY_EVENT_* (payload types).
Always log structured kwargs: logger.info(EVENT, key=value). Never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never use logger.exception(EVENT, error=str(exc)), logger.warning(EVENT, error=str(exc)), or logger.error(EVENT, error=str(exc)); use structured logging with error_type=type(exc).__name__ and error=safe_error_description(exc) (SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops like PENDING -> RUNNING) must log at INFO using a domain-scoped *_STATUS_TRANSITIONED constant carrying from_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFO settings.value.resolved event on first cold read per process, carrying source and yaml_path when applicable.
All provider calls must go through BaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. Configure RetryConfig and RateLimiterConfig per-provider in ProviderConfig....

Files:

  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/backup/service.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/client/continuous.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/budget/trends.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/backup/service.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/client/continuous.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/budget/trends.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable.
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. Use FakeClock.sleep which advances virtual time AND yields via asyncio.sleep(0). For tests driving cooperative tasks, use await clock.advance_async(seconds). FakeClock-first: when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Tests must use test-provider, test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit @example(...) decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patch time.monotonic()/asyncio.sleep() only for legacy code paths without a Clock seam. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global in pyproject.toml). Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never run tests ...

Files:

  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/api/controllers/test_backup.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
🪛 LanguageTool
docs/design/observability.md

[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (28)
docs/research/lgpl-postgres-driver-decision.md (1)

1-80: Well-structured ADR and audit traceability are solid.

The decision record is clear, scoped, and operationally actionable, with good linkage to findings and supporting references.

docs/licensing.md (1)

112-123: Licensing addendum is clear and consistent with the ADR linkage.

The new section cleanly distinguishes SQLite default licensing from optional Postgres LGPL obligations and provides a direct path to the decision record.

src/synthorg/budget/trends.py (2)

216-241: Currency-invariant enforcement looks correct here.

Validating the full record set before bucketing prevents a mixed-currency aggregate from being produced, and the new Raises: docstring matches the behavior.


413-460: Projection logic now respects the same invariant.

The mixed-currency check happens before daily spend and exhaustion calculations, so the forecast can no longer conflate currencies. The empty-input fast path still behaves sensibly.

tests/unit/budget/test_trends_currency.py (2)

27-83: Good coverage for the bucketing path.

The helper keeps the fixtures concise, and the tests cover single-currency aggregation, mixed-currency rejection, and the empty-input case.


85-108: Good coverage for the projection path.

These cases exercise the new currency guard plus the empty-input behavior, which is enough to catch regressions in the new invariant.

tests/unit/integrations/test_replay_protection_threadsafety.py (1)

25-75: Strong concurrency regression coverage.

These tests cleanly validate single-accept duplicate handling and bounded cache behavior under concurrent pressure.

src/synthorg/api/auth/ticket_store.py (1)

149-166: Lock scope is correctly applied to all mutation-critical sections.

The lock now guards the exact TOCTOU-sensitive blocks (count+insert, pop+consume, snapshot+delete), which aligns with the threadpool-concurrency objective.

Also applies to: 187-189, 221-226

tests/unit/api/auth/test_ticket_store_threadsafety.py (1)

35-106: Great coverage of the thread-safety invariants.

The four tests map cleanly to the race fixes (cap enforcement, user isolation, single-consumer guarantee, and mixed-operation stability).

src/synthorg/observability/events/event_stream.py (1)

20-20: Event constant addition looks correct.

EVENT_STREAM_HUB_PUBLISH_DEDUPED is consistent with the existing event taxonomy and supports the new dedup log path cleanly.

docs/design/observability.md (1)

200-206: Event-stream observability docs are clear and aligned.

This section documents both failure and dedup outcomes with concrete runtime semantics (TTL + bounded state), which matches the implemented hub behavior.

src/synthorg/communication/event_stream/stream.py (1)

146-207: Dedup lock-scoped check/record flow is solid.

The duplicate check + record under a single lock and bounded per-session eviction is a good, race-safe implementation.

tests/unit/communication/event_stream/test_stream_dedup.py (1)

39-139: Test coverage for dedup behavior is thorough.

These cases exercise the core invariants (TTL window, per-session isolation, bounded memory, and unsubscribe cleanup) and give good confidence in the new publish semantics.

src/synthorg/tools/mcp/cache.py (1)

8-8: Thread-safety update looks correct.

The lock scopes keep get, put, and invalidate atomic without holding the mutex across logging/metrics, and the deepcopy-on-return preserves the cache’s immutability contract.

Also applies to: 30-37, 58-58, 78-105, 124-142, 154-161

src/synthorg/client/store.py (1)

177-193: Atomic registration logic is solid.

This lock-guarded register_if_absent implementation correctly prevents duplicate runner claims under concurrency.

src/synthorg/api/controllers/simulations.py (1)

292-297: Idempotency gate is placed correctly before runner spawn.

Rejecting duplicate claims here prevents double-execution and preserves run integrity.

src/synthorg/communication/conflict_resolution/escalation/factory.py (1)

150-158: Registry dispatch refactor is clean and safer to extend.

Centralized factory maps plus explicit unknown-key errors improve maintainability without changing public behavior.

Also applies to: 256-264

tests/integration/api/controllers/test_client_simulation.py (1)

318-347: Good integration coverage for duplicate-start idempotency.

This test validates the externally visible 201 then 409 contract for repeated simulation_id.

docs/design/client-simulation.md (1)

227-237: Idempotency spec update is clear and implementation-aligned.

The section accurately documents duplicate simulation_id conflict semantics and retry behavior.

web/src/api/endpoints/clients.ts (1)

267-284: 409 fallback to getSimulation() is a good idempotent client behavior.

This keeps retries user-safe while preserving normal error propagation for non-conflict failures.

src/synthorg/backup/service.py (1)

94-97: Async scheduler startup is wired through correctly.

Awaiting self._scheduler.start() here matches the refactored scheduler lifecycle and keeps service startup consistent.

tests/unit/backup/test_scheduler.py (1)

36-126: Async start/idempotency coverage looks good.

The updated tests now await start() and verify the no-op second call, which is the right contract for the scheduler rewrite.

tests/unit/hr/pruning/test_service.py (1)

767-844: Lifecycle await updates look consistent.

These tests now await service.start() before asserting is_running and waking the loop, which matches the async lifecycle change.

src/synthorg/settings/subscribers/backup_subscriber.py (1)

112-123: Async scheduler startup is wired through correctly.

Awaiting scheduler.start() here keeps the subscriber aligned with the async lifecycle implementation.

docs/design/backup.md (1)

29-29: Spec update looks aligned with the controller change.

The manual backup section now clearly captures the required Idempotency-Key behavior and the 24h retry semantics.

docs/reference/lifecycle-sync.md (1)

34-37: Clear in-place runner exception.

The new ContinuousMode section cleanly distinguishes lock scope for in-place loops vs background-task services, which prevents lifecycle pattern misuse.

src/synthorg/api/controllers/backup.py (1)

88-107: Idempotency enforcement path looks solid.

Required header validation plus unconditional run_idempotent() with a dedicated in-flight 409 branch is a good hardening of the backup create flow.

Also applies to: 143-159

tests/unit/hr/pruning/test_service_lifecycle.py (1)

45-90: Lifecycle coverage here is strong.

These tests exercise the key concurrency and unrestartable-state transitions expected from the lifecycle contract.

Comment thread src/synthorg/api/auth/ticket_store.py
Comment thread src/synthorg/backup/scheduler.py Outdated
Comment thread src/synthorg/client/continuous.py
Comment thread src/synthorg/communication/event_stream/stream.py
Comment thread src/synthorg/hr/pruning/service.py Outdated
Comment thread tests/unit/backup/test_scheduler_lifecycle.py Outdated
Comment thread tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py Outdated
Comment thread tests/unit/settings/test_backup_subscriber.py Outdated
Comment thread tests/unit/tools/mcp/test_cache_threadsafety.py
Comment thread web/src/api/endpoints/backup.ts Outdated
Aureliolo added a commit that referenced this pull request May 2, 2026
…iff)

Findings (per the rolling CodeRabbit review on PR #1717):

Concurrency:
- 5 services: stop no longer reassigns _lifecycle_lock; only the
  loop-bound stop_event / wake_event is recreated. Replacing the lock
  let a caller queued on the old lock and a fresh caller acquiring
  the new lock both proceed concurrently. Affects scheduler.py,
  monitor.py, health_prober.py, hr/pruning/service.py, and the
  canonical sweeper.py (same pattern, same bug).
- stream.py: validate dedup_ttl_seconds and
  dedup_max_entries_per_session at construction so the trim loop
  cannot popitem an empty OrderedDict at publish time.

Logging:
- ticket_store.py: log API_WS_TICKET_LIMIT_EXCEEDED before raising
  TicketLimitExceededError so cap rejections appear in audit logs
  (new event constant added to events/api.py).
- ngrok_adapter.py: log TUNNEL_ERROR before raising on duplicate
  start.

Refactors:
- replay_protection.py: split check() into check() + _check_nonce()
  reusing check_freshness() so the public method stays under 50
  lines.

Docstring:
- continuous.py: module docstring now matches implementation
  (lifecycle lock guards the running-flag transition, not the loop
  body; stop() does not acquire the lock).

Tests:
- 6 test files: add spec=ConcreteClass to bare Mock/AsyncMock
  call sites (test_scheduler_lifecycle, test_monitor_lifecycle,
  test_backup_subscriber, test_backup_required_idempotency,
  test_backup, test_simulations_idempotency). assert_awaited_once
  replaces assert_called_once where applicable.
- test_ticket_store_threadsafety.py: threading.Event start gate so
  worker bodies hit store.create together rather than as the pool
  fills (tighter contention on the lock).
- test_cache_threadsafety.py: seed a shared key so the reader path
  exercises the locked hit branch (move_to_end + deepcopy).
- test_factory_registry.py: cover the unknown-key fallback (both
  registries) via model_construct bypassing Pydantic literal
  validation.
- test_stream_dedup.py: 2 new tests for unsubscribe dedup-window
  cleanup (already in the prior PR commit).

Frontend:
- backup.ts: trim whitespace-only Idempotency-Key inputs and fall
  through to a fresh UUID; ?? alone would forward an empty string
  and the server rejects it.

26138 unit tests + web type-check + web lint clean; mypy + ruff
clean.
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 2, 2026 11:56 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/communication/event_stream/stream.py (1)

40-44: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Document the new public constructor parameters.

The class docstring still only describes max_queue_size, but EventStreamHub now publicly exposes dedup_ttl_seconds, dedup_max_entries_per_session, and clock. Please add them to the Args: block so the API contract stays discoverable.

As per coding guidelines, "All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules)."

Also applies to: 56-63

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/communication/event_stream/stream.py` around lines 40 - 44,
Update the EventStreamHub public class docstring Args block to document the
newly exposed constructor parameters dedup_ttl_seconds (float|int - TTL for
dedup cache in seconds), dedup_max_entries_per_session (int - max dedup entries
kept per session), and clock (callable/Clock-like object used for time)
alongside the existing max_queue_size entry; ensure the docstring follows
Google-style formatting and includes type hints in the descriptions so it
satisfies mypy/ruff D rules, and apply the same addition to the secondary
docstring region referenced around lines 56-63.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/mock_spec_baseline.txt`:
- Around line 304-326: The PR added new bare Mock/AsyncMock/MagicMock usages in
the backup test modules that were incorrectly added to the frozen mock baseline;
convert each bare double in the affected backup tests to a spec-bound double
(e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass) or
MagicMock(spec=MyClass)) by identifying the mocks in the backup test modules and
replacing their constructors, then remove the corresponding new entries from the
mock_spec_baseline so the pre-commit check_mock_spec.py still enforces
spec-bound mocks rather than grandfathering these new sites.

In `@src/synthorg/communication/conflict_resolution/escalation/sweeper.py`:
- Around line 203-216: The stop() implementation should recreate the loop-bound
self._stop_event while holding self._lifecycle_lock to match the canonical
pattern used by PruningService/BackupScheduler: implement a private helper (e.g.
_drain_and_reset()) that is called inside the async with self._lifecycle_lock in
stop(), which drains or signals the existing self._stop_event as needed and then
assigns a fresh asyncio.Event() to self._stop_event before releasing the lock;
keep _lifecycle_lock as the same instance, ensure _run() continues to read
self._stop_event each iteration (no captured reference), and update stop() to
call this helper inside the lock rather than reassigning self._stop_event after
the lock.

In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 165-177: The publish() logic currently calls
_record_published_locked(event, now) before checking subscribers, causing events
to be marked deduped even when there are no live subscribers; change the flow in
publish() so you first (inside the same async with self._lock) obtain
queues_snapshot = list(self._subscribers.get(event.session_id, ())) and if
queues_snapshot is empty return without calling _record_published_locked or
logging EVENT_STREAM_HUB_PUBLISH_DEDUPED; only when queues_snapshot is non-empty
call _record_published_locked(event, now) and then proceed to release the lock
and publish to the queues, ensuring _is_duplicate_locked and
_record_published_locked remain protected by the same lock to avoid races and
orphaned entries in _seen_event_ids and incorrect use of _dedup_ttl_seconds.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 81-89: Replace the generic RuntimeError raised when a start is
attempted on an already-active adapter with a domain-specific non-retryable
error: define a new TunnelAlreadyActiveError that inherits from the appropriate
domain base (e.g., IntegrationError or a domain-specific DomainError) and raise
that instead of RuntimeError in the block that checks self._tunnel (the branch
that logs TUNNEL_ERROR and sets msg "ngrok tunnel already active on this
adapter"); ensure the new class name follows the <Domain><Condition>Error
pattern and does not inherit from TunnelError (which is retryable).
- Around line 90-95: The code currently mutates pyngrok's global config via
conf.get_default().auth_token which can leak/stomp tokens across NgrokAdapter
instances; instead, read the token from self._auth_token_env, build a
conf.PyngrokConfig instance (using the token only when non-empty) and pass that
instance to ngrok.connect as the pyngrok_config argument (call site:
ngrok.connect in NgrokAdapter), removing any assignment to
conf.get_default().auth_token so tokens remain instance-local and blank tokens
do not reuse stale global credentials.

In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 9-11: Docstring currently references hardcoded line numbers (e.g.,
"192" and "199") which will drift; edit the module docstring in
src/synthorg/integrations/webhooks/replay_protection.py to remove those numeric
citations and instead describe the race condition generically (mention the
duplicate check and the insertion of the nonce/record such that simultaneous
deliveries can both see the nonce as fresh and proceed). Keep the race-condition
explanation and any references to the nonce/duplicate check logic but do not
include specific line numbers.

In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 152-158: The bug is replacing self._stop_event while still holding
self._lifecycle_lock which lets a racing start() create a monitor task bound to
the old event and then stop() swap the event so the task never sees the signal;
fix stop() so you never assign to self._stop_event while the lifecycle lock is
held: signal/await the current event and shut down tasks while holding
self._lifecycle_lock, release the lock, then create/assign a fresh
asyncio.Event() to self._stop_event (or create the new event beforehand and only
assign it after releasing the lock); update the stop() implementation and any
related shutdown sequence that references _stop_event, _lifecycle_lock, start(),
and the monitor task to follow this ordering.

In `@src/synthorg/providers/health_prober.py`:
- Around line 264-271: The _stop_event is being recreated after releasing
self._lifecycle_lock which allows a racing start() to spawn tasks waiting on the
old event; move the creation of self._stop_event = asyncio.Event() so it happens
while holding self._lifecycle_lock (i.e. before releasing the lock in stop()),
ensuring the same self._lifecycle_lock instance is kept and only the event
object is swapped while still under the lock so concurrent start()/stop() cannot
race with probe tasks waiting on the old event.

In `@tests/unit/api/controllers/test_backup_required_idempotency.py`:
- Around line 75-101: The test currently doesn't verify that the callback
actually calls BackupService.create_backup; update
test_handler_invokes_idempotency_service to inject a mocked backup_service with
create_backup set to an AsyncMock (or MagicMock) and inside
fake_run_idempotent’s callback await that mock via the controller path, then
after calling ctrl.create_backup.fn assert the mock was awaited exactly once
(e.g. mock_create_backup.assert_awaited_once() or assert
mock_create_backup.await_count == 1) to ensure the callback wiring invokes
BackupService.create_backup; reference BackupController.create_backup.fn,
fake_run_idempotent, and BackupService.create_backup when making the changes.

In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 105-109: The test currently uses contextlib.suppress(Exception)
which swallows all downstream failures; replace that with a targeted fix: either
stub/mock the runner-spawning collaborator so the handler completes without
raising (e.g., mock the method that performs spawn — spawn_runner /
SimulationRunner.start / RunnerManager.start depending on which symbol exists —
to return successfully), or narrow the suppression to the single expected
exception type (e.g., contextlib.suppress(RunnerSpawnError)) and import that
specific exception class; remove broad Exception suppression and ensure the test
only tolerates the known downstream error or fully fakes the collaborator so no
error is raised.

---

Outside diff comments:
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 40-44: Update the EventStreamHub public class docstring Args block
to document the newly exposed constructor parameters dedup_ttl_seconds
(float|int - TTL for dedup cache in seconds), dedup_max_entries_per_session (int
- max dedup entries kept per session), and clock (callable/Clock-like object
used for time) alongside the existing max_queue_size entry; ensure the docstring
follows Google-style formatting and includes type hints in the descriptions so
it satisfies mypy/ruff D rules, and apply the same addition to the secondary
docstring region referenced around lines 56-63.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 823be26c-de35-417a-ac70-d22ce689ad97

📥 Commits

Reviewing files that changed from the base of the PR and between aef7ead and b7012e7.

📒 Files selected for processing (22)
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • web/src/api/endpoints/backup.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dashboard Type Check
  • GitHub Check: Dashboard Test
  • GitHub Check: Type Check
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
No from __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction and wrap with MappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults to ConfigDict(frozen=True, allow_inf_nan=False) unless documented otherwise; mutations go thro...

Files:

  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • src/synthorg/observability/events/api.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • src/synthorg/client/continuous.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • src/synthorg/hr/pruning/service.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable.
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. Use FakeClock.sleep which advances virtual time AND yields via asyncio.sleep(0). For tests driving cooperative tasks, use await clock.advance_async(seconds). FakeClock-first: when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Tests must use test-provider, test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit @example(...) decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patch time.monotonic()/asyncio.sleep() only for legacy code paths without a Clock seam. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global in pyproject.toml). Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never run tests ...

Files:

  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
web/src/**/!(logger).ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/!(logger).ts?(x): Always use createLogger from @/lib/logger instead of bare console.warn, console.error, or console.debug in application code; use variable name log
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go through sanitizeArg
Wrap attacker-controlled fields inside structured objects with sanitizeForLog() before embedding in logs

Files:

  • web/src/api/endpoints/backup.ts
web/src/**/*.ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls in try / catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from @/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from @/utils/format and design tokens
NEVER hardcode currency symbols or codes; use DEFAULT_CURRENCY from @/utils/currencies

Files:

  • web/src/api/endpoints/backup.ts
web/src/api/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use getLiveness() for 200 always-alive checks and getReadiness() for binary 'ok' | 'unavailable' health checks (no tri-state); any new caller must handle the 503 path explicitly

Files:

  • web/src/api/endpoints/backup.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

web/src/**/*.{ts,tsx}: Reuse components from web/src/components/ui/ before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens, @/lib/motion presets, and the helpers in @/utils/format. Enforced by scripts/check_web_design_system.py (PostToolUse hook on every web/src/ edit). See web/CLAUDE.md for the component inventory, token rules, and post-training references (TS6, Storybook 10).
Never hardcode BCP 47 locale strings or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale.
Currency: never hardcode ISO 4217 codes or symbols. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency. Field naming: no _usd suffix on money fields anywhere.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins). Date / number format: always via Intl; no hand-rolled templates. Units: metric only.

Files:

  • web/src/api/endpoints/backup.ts
web/**/*.{ts,tsx,js,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Web Dashboard: see web/CLAUDE.md for commands, design system, and component inventory. The CI-matching full-suite leak check is npm --prefix web run test -- --coverage --detect-async-leaks (the Leaks N leaks summary must stay at or below the MAX_ASYNC_LEAKS ceiling in .github/workflows/ci.yml; any new store that schedules timers must expose a teardown hook per web/CLAUDE.md).

Files:

  • web/src/api/endpoints/backup.ts
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every business-logic module must have from synthorg.observability import get_logger then logger = get_logger(__name__) at module level. Variable name is always logger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction and bootstrap code.
Event names must always import constants from synthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split between TELEMETRY_* (log events) and TELEMETRY_EVENT_* (payload types).
Always log structured kwargs: logger.info(EVENT, key=value). Never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never use logger.exception(EVENT, error=str(exc)), logger.warning(EVENT, error=str(exc)), or logger.error(EVENT, error=str(exc)); use structured logging with error_type=type(exc).__name__ and error=safe_error_description(exc) (SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops like PENDING -> RUNNING) must log at INFO using a domain-scoped *_STATUS_TRANSITIONED constant carrying from_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFO settings.value.resolved event on first cold read per process, carrying source and yaml_path when applicable.
All provider calls must go through BaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. Configure RetryConfig and RateLimiterConfig per-provider in ProviderConfig....

Files:

  • src/synthorg/observability/events/api.py
  • src/synthorg/client/continuous.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/pruning/service.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/api.py
  • src/synthorg/client/continuous.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/pruning/service.py
🔇 Additional comments (31)
src/synthorg/observability/events/api.py (1)

80-80: Event constant addition is consistent and correctly scoped.

API_WS_TICKET_LIMIT_EXCEEDED follows the existing API event naming/value convention and matches the emitted event in src/synthorg/api/auth/ticket_store.py (Line 141-181).

tests/unit/tools/mcp/test_cache_threadsafety.py (3)

22-48: Good fix: this now truly exercises the locked cache-hit path under contention.

Seeding and repeatedly reading the shared key makes this test validate the exact get path changed in the cache implementation.


49-68: Concurrency coverage for invalidate() vs put() looks solid.

Collecting all futures and calling result() ensures threadpool exceptions are surfaced instead of being silently ignored.


69-80: Eviction-bound assertion under concurrent writes is well targeted.

This directly checks the LRU capacity invariant remains enforced under load.

src/synthorg/api/auth/ticket_store.py (3)

150-174: Atomic create path now correctly closes the cap-check TOCTOU window.

Great fix: Line 150 through Line 174 keeps count, limit check, and insert in one
critical section, and logs the rejection path before raising.


195-197: Single-consumer guarantee is properly enforced.

Locking around dict.pop() on Line 195 through Line 197 gives clean
single-winner semantics for concurrent validate_and_consume() calls.


229-239: Cleanup pass is safely synchronized and logs consistent counts.

Holding the lock for snapshot/delete and computing remaining inside the same
critical section avoids iteration/mutation races and inconsistent telemetry.

tests/unit/api/auth/test_ticket_store_threadsafety.py (2)

40-57: Concurrency harness is strong and deterministic.

Using a shared start gate before releasing worker threads is a solid improvement
for reproducible contention across all four threadpool scenarios.

Also applies to: 66-79, 89-99, 107-121


104-124: Mixed create/cleanup stress test is valuable coverage for dict safety.

This case directly exercises the historical corruption surface under concurrent
mutation and cleanup and is well-targeted for regression protection.

src/synthorg/client/continuous.py (5)

1-21: LGTM — docstring now accurately reflects locking behavior.

The updated module documentation correctly describes that _lifecycle_lock is held briefly for _running flag transitions (not the full loop body), and that stop() is synchronous without lock acquisition. This resolves the prior review concern about docstring/implementation mismatch.


66-71: LGTM — lifecycle lock properly named and documented.

The comment explains the naming rationale (uniformity with other services per lifecycle reference doc), and the lock is correctly initialized as an asyncio.Lock().


98-113: LGTM — brief lock acquisition correctly gates _running transition.

The lock is held only to atomically check and set _running, preventing concurrent start() calls from racing. The explanatory comment clearly articulates why holding the lock for the full loop would cause deadlock.


133-136: LGTM — finally block correctly clears _running under lock.

Re-acquiring _lifecycle_lock in the finally ensures _running is cleared atomically even if an exception or cancellation occurs mid-loop, preventing stale state from blocking future start() calls.


138-147: LGTM — synchronous stop() correctly signals without lock contention.

The docstring accurately explains the design: stop() only sets the event, leaving the running start() coroutine to observe it on the next loop iteration. Not acquiring the lock here is correct since the lock guards only _running flag transitions.

src/synthorg/integrations/webhooks/replay_protection.py (6)

14-46: LGTM!

Imports are appropriate, constants are well-documented with security rationale, and _fingerprint_nonce correctly bounds cache key size using SHA-256.


70-92: LGTM!

Constructor correctly validates configuration upfront (preventing silent misconfiguration), initializes the threading.Lock for thread safety, and follows the Clock injection pattern for testability.


94-129: LGTM!

Method correctly handles the None case for callers delegating dedup to durable stores, rejects non-finite timestamps, and reads the clock once to ensure the comparison and logged skew agree.


131-166: LGTM!

The fail-closed behavior when both nonce and timestamp are missing is correct for security. The clock is intentionally read twice—once for timestamp validation, once for nonce tracking—since these serve different purposes and the small delta is negligible relative to the 300s default window.


168-215: LGTM!

Thread-safety pattern is correct: all mutations to _seen occur under the lock, eliminating the TOCTOU race. Logging outside the lock minimizes hold time while the captured duplicate boolean remains valid. Oversized nonce rejection before hashing is good DoS mitigation.


217-229: LGTM!

Method correctly documents the caller lock requirement, leverages OrderedDict insertion order for efficient oldest-first eviction, and stops early since insertions are monotonically ordered by time.

src/synthorg/communication/conflict_resolution/escalation/sweeper.py (1)

209-214: Lock preservation invariant is correctly documented.

The comment clearly articulates the critical invariant: _lifecycle_lock must remain the same instance for the service's lifetime. Replacing it would allow concurrent callers—one queued on the old lock, one acquiring the new—to proceed simultaneously, breaking start/stop serialization. This matches the canonical pattern in PruningService (context snippet 1, lines 211–214) and BackupScheduler (context snippet 2, lines 130–133).

src/synthorg/hr/pruning/service.py (2)

97-104: Lifecycle lock correctly preserved across service lifetime.

The past review concern has been addressed: _lifecycle_lock remains the same instance for the service's lifetime (lines 211-214 explicitly document this constraint), while only the loop-bound _stop_event and _wake_event are recreated after releasing the lock to support rebinding on a different event loop.

Also applies to: 209-216


747-783: Clean cooperative shutdown pattern.

The loop correctly honors _stop_event for graceful shutdown, re-checks the stop flag after waking (line 765-766), and properly propagates CancelledError to support task cancellation.

src/synthorg/backup/scheduler.py (2)

128-135: Lifecycle lock correctly preserved.

The past review concern has been addressed: _lifecycle_lock remains the same instance (documented in lines 130-133), while only the loop-bound events are recreated for event-loop rebinding.


47-78: All BackupScheduler.start() callers correctly use await.

Verification confirms all five call sites of BackupScheduler.start() in production and test code use await: backup/service.py:97, backup_subscriber.py:113, and three test contexts in test_scheduler_lifecycle.py and test_scheduler.py. The async transition is complete.

tests/unit/settings/test_backup_subscriber.py (2)

33-42: LGTM!

All mocks are now spec-bound (spec=BackupScheduler, spec=BackupService, spec=SettingsService), and async methods correctly use AsyncMock(spec=...). This addresses the past review concern and enforces interface correctness.


101-106: Correct use of assert_awaited_once().

Using assert_awaited_once() instead of assert_called_once() correctly catches the regression where a coroutine is created but never awaited, as documented in the comment.

tests/unit/backup/test_scheduler_lifecycle.py (2)

21-24: Spec-bound mocks correctly address past review concern.

The helper now uses MagicMock(spec=BackupService) and AsyncMock(spec=BackupService.create_backup), enforcing interface correctness per coding guidelines.


30-71: Comprehensive lifecycle-lock coverage.

The tests correctly verify:

  1. Concurrent start() calls spawn only one task (idempotency under contention)
  2. Clean restart works after a successful stop()
  3. A drain timeout marks the scheduler unrestartable, and subsequent start() raises RuntimeError

The hung-loop simulation (lines 55-60) properly ignores CancelledError to force the timeout path.

tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)

34-38: Spec-bound mock for the protocol, return value acceptable.

The backend mock correctly uses spec=PersistenceBackend. The build_escalations return value is a bare MagicMock(), but this is acceptable since tests only assert on call arguments, not the return value's interface.


107-128: Good defensive testing via model_construct bypass.

Using model_construct() to bypass Pydantic's literal validation is the right approach to verify the factory's defensive ValueError path. This ensures operators see a helpful error message if an invalid backend somehow reaches the factory.

Comment thread scripts/mock_spec_baseline.txt Outdated
Comment thread src/synthorg/communication/conflict_resolution/escalation/sweeper.py Outdated
Comment thread src/synthorg/communication/event_stream/stream.py Outdated
Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py Outdated
Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py Outdated
Comment thread src/synthorg/integrations/webhooks/replay_protection.py Outdated
Comment thread src/synthorg/meta/chief_of_staff/monitor.py Outdated
Comment thread src/synthorg/providers/health_prober.py Outdated
Comment thread tests/unit/api/controllers/test_backup_required_idempotency.py
Comment thread tests/unit/api/controllers/test_simulations_idempotency.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/integrations/tunnel/ngrok_adapter.py (1)

75-82: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the Raises docstring to match actual exception type.

Line 80 still lists RuntimeError, but Line 92 raises TunnelAlreadyActiveError. Please update the docstring to keep API expectations accurate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py` around lines 75 - 82, The
docstring's Raises section is inaccurate: replace the listed RuntimeError with
TunnelAlreadyActiveError to match the actual exception raised; update the Raises
block in the docstring for the method that starts the ngrok tunnel (the
docstring surrounding the start/start_tunnel method in ngrok_adapter.py) so it
lists TunnelAlreadyActiveError (and keeps TunnelError for other failures) and
ensure the exception name matches the custom exception class
TunnelAlreadyActiveError used in the code.
♻️ Duplicate comments (1)
scripts/mock_spec_baseline.txt (1)

304-305: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove new baseline suppressions for touched tests.

Line 304, Line 305, and Line 395 should not be grandfathered here; these suppressions weaken enforcement for tests modified in this PR. Fix the bare-mock sites in those tests and drop these new baseline entries.

♻️ Suggested patch
-tests/unit/api/controllers/test_backup.py:112:12
-tests/unit/api/controllers/test_backup_required_idempotency.py:72:12
 ...
-tests/unit/api/controllers/test_simulations_idempotency.py:69:12

As per coding guidelines, "Every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass ... pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt."

Also applies to: 395-395

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/mock_spec_baseline.txt` around lines 304 - 305, The new baseline
entries were added to whitelist bare Mock()/AsyncMock()/MagicMock() calls for
tests modified in this PR; instead, update the tests
tests/unit/api/controllers/test_backup.py and
tests/unit/api/controllers/test_backup_required_idempotency.py to replace bare
Mock()/AsyncMock()/MagicMock() usages with mocks that declare the concrete
interface via spec=ConcreteClass (or spec_set) for the object being mocked
(e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass)), run the pre-commit
checker (scripts/check_mock_spec.py) to ensure no bare-call sites remain, and
then remove the corresponding lines (304, 305 and the other added line around
395) from scripts/mock_spec_baseline.txt so no new suppressions are introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 222-227: The TTL eviction loop treats dedup_ttl_seconds == 0 as
immediate expiry; change the logic in the eviction block so that TTL eviction
only runs when self._dedup_ttl_seconds > 0: wrap the computation of cutoff = now
- self._dedup_ttl_seconds and the while seen: ... eviction loop in a conditional
(e.g., if self._dedup_ttl_seconds > 0:) so entries in seen (the dedup map used
for event.id) are not removed when TTL is disabled; keep the existing eviction
behavior unchanged for positive TTLs.

In `@src/synthorg/integrations/errors.py`:
- Around line 195-206: TunnelAlreadyActiveError currently inherits
IntegrationError defaults (502/INTEGRATION_ERROR) but represents a conflict;
update the class to override the integration error metadata to reflect an HTTP
409 conflict and a conflict-style error code. In the TunnelAlreadyActiveError
class (subclassing IntegrationError) add/override the appropriate metadata
attributes (e.g. http status/status_code = 409 and an error code like code =
"ALREADY_ACTIVE" or code = "CONFLICT") so the API surfaces a 409-style domain
error instead of the 502 integration default.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 84-92: The new start() raises TunnelAlreadyActiveError when
self._tunnel is set, breaking the tunnel facade's expectation of idempotent
start/connect; modify ngrok_adapter.start() (the block checking self._tunnel) to
preserve idempotency: keep the warning log (TUNNEL_ERROR, phase="start",
reason="already_active", port=self._port) but do not raise
TunnelAlreadyActiveError—simply return the existing tunnel (or None if that
matches the adapter API) so repeated calls behave like a no-op and align with
mcp_service.connect()'s expected reconnect semantics.

In `@tests/unit/api/controllers/test_backup.py`:
- Around line 107-114: The comment explains why
SimpleNamespace(app_state=app_state) is used as the test state carrier; rewrite
it to focus on the technical rationale only: state.app_state must return the
assigned AppState object reliably, and SimpleNamespace has no auto-mocking
behavior so it satisfies that; MagicMock(spec=State) could trigger
State.__getattr__ and return an unintended mock, and a plain MagicMock() would
also be inappropriate—update the comment near the return
SimpleNamespace(app_state=app_state), service to this concise rationale without
any mention of gates, review processes, or internal taxonomy.

In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 64-69: The comment block around the SimpleNamespace sentinel
contains review-history/internal-gate phrasing; replace it with a concise
behavior-focused explanation: state that SimpleNamespace is used as the sentinel
because it returns the assigned object for state.app_state without Litestar's
State.__getattr__ interception (unlike MagicMock(spec=State)) and avoids
bare-mock issues present with MagicMock(); remove any wording about internal
taxonomy, review history, or gates. Ensure the same change is applied to the
similar comment at lines referencing the same rationale (the block around
SimpleNamespace and the mention of MagicMock/spec).

In
`@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py`:
- Around line 62-83: Add a unit test that exercises the "auto" branch of
cross_instance_notify: create an EscalationQueueConfig with backend="postgres",
cross_instance_notify="auto", and notify_channel="escalations", use
_fake_persistence("postgres"), call build_escalation_queue_store(config,
backend), and assert
backend.build_escalations.assert_called_once_with(notify_channel="escalations")
to ensure the "auto" path behaves the same as "on".
- Around line 124-135: The tests test_unknown_queue_backend_raises_value_error
and test_unknown_decision_strategy_raises_value_error currently only assert that
a ValueError is raised; update them to also assert that the exception message
includes the available registry names so callers know which backends/strategies
are registered. Specifically, when calling build_escalation_queue_store with
EscalationQueueConfig.model_construct(backend="unknown"), capture the raised
ValueError and assert its string contains the registry keys (e.g., from the
escalation queue registry used by build_escalation_queue_store); do the same for
build_decision_processor with decision_strategy="unknown" and assert the error
message includes the decision processor registry keys. Ensure you reference the
same registry identifiers used by those factory functions when building the
expected substring.

---

Outside diff comments:
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 75-82: The docstring's Raises section is inaccurate: replace the
listed RuntimeError with TunnelAlreadyActiveError to match the actual exception
raised; update the Raises block in the docstring for the method that starts the
ngrok tunnel (the docstring surrounding the start/start_tunnel method in
ngrok_adapter.py) so it lists TunnelAlreadyActiveError (and keeps TunnelError
for other failures) and ensure the exception name matches the custom exception
class TunnelAlreadyActiveError used in the code.

---

Duplicate comments:
In `@scripts/mock_spec_baseline.txt`:
- Around line 304-305: The new baseline entries were added to whitelist bare
Mock()/AsyncMock()/MagicMock() calls for tests modified in this PR; instead,
update the tests tests/unit/api/controllers/test_backup.py and
tests/unit/api/controllers/test_backup_required_idempotency.py to replace bare
Mock()/AsyncMock()/MagicMock() usages with mocks that declare the concrete
interface via spec=ConcreteClass (or spec_set) for the object being mocked
(e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass)), run the pre-commit
checker (scripts/check_mock_spec.py) to ensure no bare-call sites remain, and
then remove the corresponding lines (304, 305 and the other added line around
395) from scripts/mock_spec_baseline.txt so no new suppressions are introduced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6caec15d-c417-426e-b6eb-583bd3f935c5

📥 Commits

Reviewing files that changed from the base of the PR and between b7012e7 and 877bbd8.

📒 Files selected for processing (16)
  • scripts/mock_spec_baseline.txt
  • src/synthorg/backup/scheduler.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/errors.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dashboard Test
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
No from __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction and wrap with MappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults to ConfigDict(frozen=True, allow_inf_nan=False) unless documented otherwise; mutations go thro...

Files:

  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/settings/test_backup_subscriber.py
  • src/synthorg/integrations/errors.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/api/controllers/test_backup.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every business-logic module must have from synthorg.observability import get_logger then logger = get_logger(__name__) at module level. Variable name is always logger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction and bootstrap code.
Event names must always import constants from synthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split between TELEMETRY_* (log events) and TELEMETRY_EVENT_* (payload types).
Always log structured kwargs: logger.info(EVENT, key=value). Never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never use logger.exception(EVENT, error=str(exc)), logger.warning(EVENT, error=str(exc)), or logger.error(EVENT, error=str(exc)); use structured logging with error_type=type(exc).__name__ and error=safe_error_description(exc) (SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops like PENDING -> RUNNING) must log at INFO using a domain-scoped *_STATUS_TRANSITIONED constant carrying from_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFO settings.value.resolved event on first cold read per process, carrying source and yaml_path when applicable.
All provider calls must go through BaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. Configure RetryConfig and RateLimiterConfig per-provider in ProviderConfig....

Files:

  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/integrations/errors.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/integrations/errors.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable.
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. Use FakeClock.sleep which advances virtual time AND yields via asyncio.sleep(0). For tests driving cooperative tasks, use await clock.advance_async(seconds). FakeClock-first: when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Tests must use test-provider, test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit @example(...) decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patch time.monotonic()/asyncio.sleep() only for legacy code paths without a Clock seam. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global in pyproject.toml). Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never run tests ...

Files:

  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_backup.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_backup.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Prefer @pytest.mark.parametrize for testing similar cases.

Files:

  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_backup.py
🔇 Additional comments (8)
src/synthorg/integrations/webhooks/replay_protection.py (1)

92-93: Thread-safe replay check refactor looks solid.

Locking the nonce mutation path and separating freshness from nonce dedup removes the race window while keeping fail-closed behavior explicit when both signals are missing.

Also applies to: 157-167, 168-215

tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)

37-60: Helper and primary queue-store dispatch tests look solid.

The mocked backend is strict enough to catch call-shape drift, and the memory/sqlite coverage matches the factory branches.


103-111: Decision-processor dispatch coverage looks good.

These assertions line up with the registry-backed factory behavior and should catch accidental dispatcher regressions.

src/synthorg/communication/conflict_resolution/escalation/sweeper.py (1)

202-214: Canonical lifecycle race fix looks correct.

Recreating self._stop_event inside the lifecycle lock matches the intended pattern and prevents start/stop event-binding races.

src/synthorg/providers/health_prober.py (1)

185-215: Lifecycle refactor is consistent and race-safe.

The lock-guarded async start()/stop(), timeout-to-unrestartable behavior, and in-lock _stop_event swap are correctly implemented.

Also applies to: 218-278

src/synthorg/meta/chief_of_staff/monitor.py (1)

73-105: Monitor lifecycle changes look solid.

The service now follows the same safe async lifecycle contract (serialized start/stop, hard drain deadline, cooperative loop wake/exit).

Also applies to: 107-160, 163-187

src/synthorg/hr/pruning/service.py (1)

135-163: PruningService lifecycle conversion is well implemented.

Async start()/stop() serialization, timeout drain behavior, and cooperative loop shutdown are all aligned with the intended contract.

Also applies to: 165-218, 750-785

tests/unit/api/controllers/test_backup_required_idempotency.py (1)

49-76: Good test wiring and mock-spec discipline.

Using spec-bound mocks plus the explicit create_backup.assert_awaited_once_with(...) check gives strong coverage for idempotency callback delegation.

Also applies to: 88-123

Comment thread src/synthorg/communication/event_stream/stream.py Outdated
Comment thread src/synthorg/integrations/errors.py Outdated
Comment on lines +195 to +206
class TunnelAlreadyActiveError(IntegrationError):
"""A tunnel is already active on the adapter; refuse re-start.

Lifecycle conflict, not a transient I/O error: the operator must
``stop()`` the running tunnel before issuing a fresh ``start()``.
Marked non-retryable so the resilience layer does not loop on a
permanent state error and so the API surface returns a 409-style
domain error instead of a generic 500.
"""

is_retryable = False
retryable: ClassVar[bool] = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set conflict HTTP/error metadata on TunnelAlreadyActiveError.

On Line 195, this class is documented as a conflict-state error, but it inherits IntegrationError defaults (502 / INTEGRATION_ERROR). That will surface the wrong API semantics for “already active”.

Proposed fix
 class TunnelAlreadyActiveError(IntegrationError):
@@
     is_retryable = False
     retryable: ClassVar[bool] = False
+    status_code: ClassVar[int] = 409
+    error_code: ClassVar[ErrorCode] = ErrorCode.RESOURCE_CONFLICT
+    error_category: ClassVar[ErrorCategory] = ErrorCategory.CONFLICT
+    default_message: ClassVar[str] = "Tunnel already active"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/integrations/errors.py` around lines 195 - 206,
TunnelAlreadyActiveError currently inherits IntegrationError defaults
(502/INTEGRATION_ERROR) but represents a conflict; update the class to override
the integration error metadata to reflect an HTTP 409 conflict and a
conflict-style error code. In the TunnelAlreadyActiveError class (subclassing
IntegrationError) add/override the appropriate metadata attributes (e.g. http
status/status_code = 409 and an error code like code = "ALREADY_ACTIVE" or code
= "CONFLICT") so the API surfaces a 409-style domain error instead of the 502
integration default.

Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py Outdated
Comment thread tests/unit/api/controllers/test_backup.py Outdated
Comment thread tests/unit/api/controllers/test_simulations_idempotency.py Outdated
Aureliolo added a commit that referenced this pull request May 2, 2026
…iff)

Findings (per the rolling CodeRabbit review on PR #1717):

Concurrency:
- 5 services: stop no longer reassigns _lifecycle_lock; only the
  loop-bound stop_event / wake_event is recreated. Replacing the lock
  let a caller queued on the old lock and a fresh caller acquiring
  the new lock both proceed concurrently. Affects scheduler.py,
  monitor.py, health_prober.py, hr/pruning/service.py, and the
  canonical sweeper.py (same pattern, same bug).
- stream.py: validate dedup_ttl_seconds and
  dedup_max_entries_per_session at construction so the trim loop
  cannot popitem an empty OrderedDict at publish time.

Logging:
- ticket_store.py: log API_WS_TICKET_LIMIT_EXCEEDED before raising
  TicketLimitExceededError so cap rejections appear in audit logs
  (new event constant added to events/api.py).
- ngrok_adapter.py: log TUNNEL_ERROR before raising on duplicate
  start.

Refactors:
- replay_protection.py: split check() into check() + _check_nonce()
  reusing check_freshness() so the public method stays under 50
  lines.

Docstring:
- continuous.py: module docstring now matches implementation
  (lifecycle lock guards the running-flag transition, not the loop
  body; stop() does not acquire the lock).

Tests:
- 6 test files: add spec=ConcreteClass to bare Mock/AsyncMock
  call sites (test_scheduler_lifecycle, test_monitor_lifecycle,
  test_backup_subscriber, test_backup_required_idempotency,
  test_backup, test_simulations_idempotency). assert_awaited_once
  replaces assert_called_once where applicable.
- test_ticket_store_threadsafety.py: threading.Event start gate so
  worker bodies hit store.create together rather than as the pool
  fills (tighter contention on the lock).
- test_cache_threadsafety.py: seed a shared key so the reader path
  exercises the locked hit branch (move_to_end + deepcopy).
- test_factory_registry.py: cover the unknown-key fallback (both
  registries) via model_construct bypassing Pydantic literal
  validation.
- test_stream_dedup.py: 2 new tests for unsubscribe dedup-window
  cleanup (already in the prior PR commit).

Frontend:
- backup.ts: trim whitespace-only Idempotency-Key inputs and fall
  through to a fresh UUID; ?? alone would forward an empty string
  and the server rejects it.

26138 unit tests + web type-check + web lint clean; mypy + ruff
clean.
@Aureliolo Aureliolo force-pushed the fix/audit-cleanup-c-persistence branch from 877bbd8 to 5b1dada Compare May 2, 2026 13:24
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 2, 2026 13:25 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/synthorg/budget/trends.py (1)

241-261: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate only the records that actually contribute to the bucket.

Calling _assert_single_currency(records) before the [start, end) filter means bucket_cost_records() now fails if the input contains mixed currencies outside the requested window, even though those rows are ignored by the aggregation. That is stricter than the function’s semantics and can break valid partial-range queries.

♻️ Suggested fix
 def bucket_cost_records(
     records: Sequence[CostRecord],
     start: datetime,
     end: datetime,
     bucket_size: BucketSize,
 ) -> tuple[TrendDataPoint, ...]:
-    _assert_single_currency(records)
     bucket_starts = generate_bucket_starts(start, end, bucket_size)
+    in_window_records = tuple(
+        record for record in records
+        if start <= record.timestamp < end
+    )
+    _assert_single_currency(in_window_records)
     sums: dict[datetime, list[float]] = defaultdict(list)
 
-    for record in records:
-        ts = record.timestamp
-        if ts < start or ts >= end:
-            continue
-        key = _bucket_key(ts, bucket_size)
+    for record in in_window_records:
+        key = _bucket_key(record.timestamp, bucket_size)
         sums[key].append(record.cost)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/trends.py` around lines 241 - 261, The currency assertion
is applied to all input records before filtering by the requested time window,
causing bucket_cost_records to reject inputs that include out-of-range
mixed-currency rows; change the logic so you first filter records to the [start,
end) window (using the same condition as the loop that computes key =
_bucket_key(ts, bucket_size)) and only then call _assert_single_currency on the
filtered list (or alternatively validate currency per-record when adding to
sums), so generate_bucket_starts, the sums aggregation and TrendDataPoint
creation remain unchanged.
tests/unit/hr/pruning/test_service_lifecycle.py (1)

69-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clean up the intentionally hung task after the timeout assertion.

This path leaves service._task alive for about a second after Line 85, which makes the test wall-clock sensitive and can leak a pending task into later cases. Use an asyncio.Event gate in hung_loop, wait for the timeout, then release and await the task in the test.

Patch sketch
     async def test_unrestartable_after_drain_timeout(self) -> None:
         service = _make_service()
         service._stop_drain_timeout_seconds = 0.05
+        entered = asyncio.Event()
+        release = asyncio.Event()

         async def hung_loop(self: PruningService) -> None:
             del self
+            entered.set()
             try:
                 await asyncio.Event().wait()
             except asyncio.CancelledError:
-                await asyncio.sleep(1.0)
+                await release.wait()

         with patch.object(PruningService, "_run_loop", hung_loop):
             await service.start()
-            await asyncio.sleep(0)
+            await entered.wait()

             with pytest.raises(TimeoutError):
                 await service.stop()
             assert service._stop_failed is True
+            task = service._task
+            assert task is not None
+            release.set()
+            await task

         with pytest.raises(RuntimeError, match="unrestartable"):
             await service.start()

As per coding guidelines, "For tasks that must block indefinitely until cancelled (e.g. simulating a slow provider or stubborn coroutine), use asyncio.Event().wait() instead of asyncio.sleep(large_number); it is cancellation-safe and carries no timing assumptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/hr/pruning/test_service_lifecycle.py` around lines 69 - 90, The
test leaves the intentionally hung task running after asserting the
TimeoutError; modify the test_unrestartable_after_drain_timeout to create an
asyncio.Event gate and patch PruningService._run_loop with a hung_loop that
awaits that event (i.e., await gate.wait()) so the test can release the gate
after the timeout assertion, then await service._task to ensure the hung
coroutine finishes; specifically, reference the patched
PruningService._run_loop, the local hung_loop, the created gate Event, and the
service._task to locate where to add waiting/release and the final await to
clean up the task.
tests/unit/api/controllers/test_backup.py (2)

190-193: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Compare ApiResponse.data by value here, not identity.

These checks are brittle: if the controller ever clones or serializes the payload, is will fail even though the response is still correct. == keeps the tests focused on behavior.

♻️ Suggested fix
-        assert result.data is manifest
+        assert result.data == manifest
@@
-        assert result.data is response
+        assert result.data == response

Also applies to: 258-263

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_backup.py` around lines 190 - 193, Replace
identity checks with value equality in the backup controller tests: instead of
asserting ApiResponse.data is manifest, use equality (==) so the test compares
payloads by value not identity; update the assertions around result, ApiResponse
and manifest in the block that calls
service.get_backup.assert_awaited_once_with("abc123def456") and the similar
block at the 258-263 range so they use assert result.data == manifest while
keeping assert isinstance(result, ApiResponse) and the get_backup await
assertion unchanged.

367-368: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use assert_not_called() for the confirm=false gate.

assert_not_awaited() misses an unawaited call, so this test could still pass if the controller touched restore_from_backup and forgot to await it. Since confirm=False should block any service interaction, assert that the method was never called at all.

♻️ Suggested fix
-        service.restore_from_backup.assert_not_awaited()
+        service.restore_from_backup.assert_not_called()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_backup.py` around lines 367 - 368, The test
currently uses service.restore_from_backup.assert_not_awaited() which can miss
an un-awaited invocation; change the assertion to
service.restore_from_backup.assert_not_called() so the test verifies the
confirm=False gate prevents any call at all (replace the assert_not_awaited()
call in the test for the confirm=false scenario with assert_not_called() on the
restore_from_backup mock).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 292-298: The code currently claims the simulation id via
sim_state.simulation_store.register_if_absent(record) and then calls
_publish_event(WsEventType.SIMULATION_STARTED, record) (and likely spawns the
runner) without rollback; if any subsequent exception occurs the record remains
claimed and blocks retries. Wrap the post-claim steps (the call to
_publish_event and any runner/task creation) in a try/except/finally and on
failure call the store rollback/unregister method (e.g.,
simulation_store.unregister or simulation_store.deregister(record.simulation_id)
— whatever the store exposes) or mark the record as not-running, then re-raise
the exception; ensure you reference
sim_state.simulation_store.register_if_absent, _publish_event, and
WsEventType.SIMULATION_STARTED when adding the rollback logic.

In `@src/synthorg/backup/scheduler.py`:
- Around line 71-78: The spawned scheduler task created in the constructor
(self._task = asyncio.create_task(self._run_loop(), name="backup-scheduler"))
can die silently if _run_loop() raises; attach a done callback like
OrgInflectionMonitor does that inspects the completed task, calls
task.exception() to retrieve any exception, and logs an error (including
traceback/context) if one exists so unexpected task deaths are surfaced and
handled; add the callback registration immediately after creating self._task and
reference self._task and _run_loop in the change.

In `@src/synthorg/communication/conflict_resolution/escalation/factory.py`:
- Around line 150-158: The branch that looks up factory =
_QUEUE_STORE_FACTORIES.get(config.backend) (and the similar branch at the other
location) currently raises ValueError without logging; update both places to
emit a warning or error log with context (including config.backend and any
strategy identifier) using the module/class logger before raising the
ValueError, e.g., logger.warning("Unknown escalation queue backend=%r;
registered=%s", config.backend, sorted(_QUEUE_STORE_FACTORIES) or ["(none)"])
and then raise the ValueError as before so the misconfiguration is recorded in
logs prior to exception propagation.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 107-112: The start() method in ngrok_adapter.py currently reads
the ngrok token directly via os.environ.get(self._auth_token_env), which
violates the settings resolution policy; replace this direct environment access
by resolving the token through the application's settings resolver (e.g., use
ConfigResolver or SettingsService to call the appropriate get_* method for the
registered ngrok auth token setting) and pass the resolved token into
conf.PyngrokConfig(auth_token=...) (fall back to conf.PyngrokConfig() when the
resolver returns empty); update any references to self._auth_token_env to use
the canonical setting name registered under src/synthorg/settings/definitions
and ensure the code path remains in start() and preserves the existing
conditional behavior.

In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 22-44: Replace the ad-hoc asyncio.sleep calls with a runner-driven
asyncio.Event: modify _FakeRunner to create an asyncio.Event (e.g., self.ready =
asyncio.Event()) and in run() set the event on its first entry
(self.ready.set()) then await self.ready.wait() at the point where the test
previously relied on sleeps so the test drives the second start/stop by clearing
or recreating the event; update the tests that call _FakeRunner.run() to wait
for runner.ready.wait() to observe the first invocation and then set/clear the
event to allow the runner to proceed instead of using await asyncio.sleep(0) /
await asyncio.sleep(0.01).

In `@tests/unit/integrations/test_ngrok_adapter_lifecycle.py`:
- Around line 95-100: Replace the swallowing fake with a Mock and assert it was
never called: patch
"synthorg.integrations.tunnel.ngrok_adapter.ngrok.disconnect" with a Mock
(instead of _fake_disconnect) around the await adapter.stop() call, then assert
the mock's assert_not_called() (or assert mock.call_count == 0) to ensure
ngrok.disconnect is not invoked on the no-start path; keep the existing final
assertion that adapter._tunnel is None.

In `@tests/unit/integrations/test_replay_protection_threadsafety.py`:
- Around line 33-35: The test's ThreadPoolExecutor worker tasks (the attempt
callable submitted in the pool) are currently started opportunistically; modify
the test to add a shared start gate (e.g., threading.Event or threading.Barrier)
so each worker waits on the gate inside attempt before doing the critical work,
then trigger the gate after all futures are submitted to start them
simultaneously—apply the same change to the other ThreadPoolExecutor blocks that
create futures and call f.result() so the race path is consistently stressed.

In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 58-79: The test leaves a hung monitor task alive causing
timing-sensitive leaks; modify the patched OrgInflectionMonitor._loop (used in
test_unrestartable_after_drain_timeout) to wait on a dedicated asyncio.Event
instead of sleeping, expose that event from the test scope so after
cancelling/triggering the drain timeout you set the event to let the fake _loop
finish, then await monitor._task to ensure the task has completed before exiting
the patch scope and before calling monitor.start() again; ensure you still
simulate suppression of CancelledError inside the fake _loop to reproduce the
stuck-drain behavior while making teardown deterministic.

In `@tests/unit/settings/test_backup_subscriber.py`:
- Around line 103-108: The test uses assert_not_awaited() for scheduler methods
which can falsely pass if the async method was called but its coroutine never
awaited; update the assertions to use assert_not_called() for the scheduler
paths that must remain untouched: replace
service.scheduler.stop.assert_not_awaited() (and any similar occurrences like
the other two assertions around service.scheduler.stop/assert_not_awaited at the
other test cases) with service.scheduler.stop.assert_not_called(), and ensure
service.scheduler.start still uses service.scheduler.start.assert_awaited_once()
where appropriate.

---

Outside diff comments:
In `@src/synthorg/budget/trends.py`:
- Around line 241-261: The currency assertion is applied to all input records
before filtering by the requested time window, causing bucket_cost_records to
reject inputs that include out-of-range mixed-currency rows; change the logic so
you first filter records to the [start, end) window (using the same condition as
the loop that computes key = _bucket_key(ts, bucket_size)) and only then call
_assert_single_currency on the filtered list (or alternatively validate currency
per-record when adding to sums), so generate_bucket_starts, the sums aggregation
and TrendDataPoint creation remain unchanged.

In `@tests/unit/api/controllers/test_backup.py`:
- Around line 190-193: Replace identity checks with value equality in the backup
controller tests: instead of asserting ApiResponse.data is manifest, use
equality (==) so the test compares payloads by value not identity; update the
assertions around result, ApiResponse and manifest in the block that calls
service.get_backup.assert_awaited_once_with("abc123def456") and the similar
block at the 258-263 range so they use assert result.data == manifest while
keeping assert isinstance(result, ApiResponse) and the get_backup await
assertion unchanged.
- Around line 367-368: The test currently uses
service.restore_from_backup.assert_not_awaited() which can miss an un-awaited
invocation; change the assertion to
service.restore_from_backup.assert_not_called() so the test verifies the
confirm=False gate prevents any call at all (replace the assert_not_awaited()
call in the test for the confirm=false scenario with assert_not_called() on the
restore_from_backup mock).

In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 69-90: The test leaves the intentionally hung task running after
asserting the TimeoutError; modify the test_unrestartable_after_drain_timeout to
create an asyncio.Event gate and patch PruningService._run_loop with a hung_loop
that awaits that event (i.e., await gate.wait()) so the test can release the
gate after the timeout assertion, then await service._task to ensure the hung
coroutine finishes; specifically, reference the patched
PruningService._run_loop, the local hung_loop, the created gate Event, and the
service._task to locate where to add waiting/release and the final await to
clean up the task.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cc4e1775-c0ef-4a5d-bd43-757bc816ccf7

📥 Commits

Reviewing files that changed from the base of the PR and between 877bbd8 and 5b1dada.

📒 Files selected for processing (48)
  • docs/design/backup.md
  • docs/design/client-simulation.md
  • docs/design/observability.md
  • docs/licensing.md
  • docs/reference/lifecycle-sync.md
  • docs/research/lgpl-postgres-driver-decision.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/backup/service.py
  • src/synthorg/budget/trends.py
  • src/synthorg/client/continuous.py
  • src/synthorg/client/store.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/tools/mcp/cache.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
💤 Files with no reviewable changes (1)
  • scripts/mock_spec_baseline.txt
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (13)
web/src/**/!(logger).ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/!(logger).ts?(x): Always use createLogger from @/lib/logger instead of bare console.warn, console.error, or console.debug in application code; use variable name log
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go through sanitizeArg
Wrap attacker-controlled fields inside structured objects with sanitizeForLog() before embedding in logs

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/**/*.ts?(x)

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls in try / catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from @/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from @/utils/format and design tokens
NEVER hardcode currency symbols or codes; use DEFAULT_CURRENCY from @/utils/currencies

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/api/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Use getLiveness() for 200 always-alive checks and getReadiness() for binary 'ok' | 'unavailable' health checks (no tri-state); any new caller must handle the 503 path explicitly

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • web/src/api/endpoints/backup.ts
  • src/synthorg/observability/events/event_stream.py
  • web/src/api/endpoints/clients.ts
  • src/synthorg/client/store.py
  • src/synthorg/backup/service.py
  • tests/unit/client/test_continuous_lifecycle.py
  • src/synthorg/api/controllers/simulations.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/integration/api/controllers/test_client_simulation.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/api.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/budget/trends.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • src/synthorg/tools/mcp/cache.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/backup/test_scheduler.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/api/controllers/test_backup.py
  • src/synthorg/client/continuous.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Units: metric only. Spelling: International / British English UI default (colour, behaviour, organise, centred, analyse, cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through @/utils/format, @/utils/locale, DEFAULT_CURRENCY, and useSettingsStore().currency, with no contradiction to the locale-neutral defaults above.

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse components from web/src/components/ui/ before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens, @/lib/motion presets, and the helpers in @/utils/format. Enforced by scripts/check_web_design_system.py (PostToolUse hook on every web/src/ edit).

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Use Mermaid (\``mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side via pymdownx.superfences. Use **Markdown tables**: grid/matrix data that is semantically tabular (not diagrams). D2 uses theme 200 (Dark Mauve), dark-only render, configured globally in mkdocs.yml. Never use ```text` blocks with ASCII/Unicode box-drawing characters for diagrams.

Files:

  • docs/research/lgpl-postgres-driver-decision.md
  • docs/licensing.md
  • docs/design/backup.md
  • docs/design/client-simulation.md
  • docs/design/observability.md
  • docs/reference/lifecycle-sync.md
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/service.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/budget/trends.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/client/continuous.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/service.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/budget/trends.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/client/continuous.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/service.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/budget/trends.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/client/continuous.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time via mkdocs-d2-plugin(dagre layout). Requires the [D2 CLI](https://d2lang.com/tour/install) onPATHlocally and in CI (pinned to v0.7.1 via.github/workflows/pages.yml`).

Files:

  • docs/design/backup.md
  • docs/design/client-simulation.md
  • docs/design/observability.md
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/client/store.py
  • src/synthorg/backup/service.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/budget/trends.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/client/continuous.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/api/auth/ticket_store.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
🪛 LanguageTool
docs/design/observability.md

[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (25)
docs/research/lgpl-postgres-driver-decision.md (1)

1-80: Well-structured ADR documentation and guideline-compliant formatting.

Clear decision framing, traceable audit linkage, and correct use of Markdown tables for trade-offs; no docs-format guideline violations found.

docs/licensing.md (1)

112-124: Good operator-facing licensing clarification for the optional Postgres path.

This section cleanly distinguishes default vs optional dependency obligations and links to the deeper ADR context.

src/synthorg/observability/events/event_stream.py (1)

20-20: Good event constant addition.

EVENT_STREAM_HUB_PUBLISH_DEDUPED follows the existing event naming scheme and keeps observability events centralized.

src/synthorg/observability/events/api.py (1)

80-80: Looks good.

API_WS_TICKET_LIMIT_EXCEEDED is consistent with adjacent WS ticket event constants and improves event coverage cleanly.

src/synthorg/communication/event_stream/stream.py (2)

69-107: Constructor hardening is solid.

The validation and injected Clock seam make dedup behavior fail-fast, testable, and safer under misconfiguration.


177-256: Dedup publish path is correctly synchronized and bounded.

The lock-scoped duplicate check/record flow, no-subscriber short-circuit, and per-session TTL/size eviction logic are implemented cleanly.

docs/design/observability.md (1)

200-206: Nice observability doc update.

The new event-stream section accurately documents dedup behavior and warning events in a clear, operator-friendly way.

tests/unit/communication/event_stream/test_stream_dedup.py (1)

39-165: Strong dedup regression coverage.

These tests exercise the key concurrency-safe dedup invariants (TTL, zero-TTL mode, per-session isolation, bounded state, and unsubscribe cleanup) with proper FakeClock usage.

src/synthorg/api/auth/ticket_store.py (2)

150-174: Atomic cap-check + insert path looks correct.

The lock now covers the full count-and-insert critical section, which closes the concurrent over-issuance race.


195-241: Consume and cleanup mutation paths are now safely serialized.

pop and bulk-expiry deletion are both protected, preventing concurrent dictionary mutation hazards.

src/synthorg/tools/mcp/cache.py (2)

77-106: Locked get-path refactor is solid.

The lookup, TTL decision, and LRU move are now atomic under contention while preserving copy-on-read isolation.


125-162: Put/invalidate locking correctly protects cache state.

This closes concurrent mutation races around eviction, replacement, and targeted invalidation.

tests/unit/api/auth/test_ticket_store_threadsafety.py (2)

36-61: Strong contention test for per-user cap.

Using a start gate here materially improves the chance of surfacing cap-related races.


104-124: Create/cleanup mixed workload coverage is valuable.

Good stress case for dictionary-mutation safety under concurrent operations.

tests/unit/tools/mcp/test_cache_threadsafety.py (2)

22-48: Nice improvement: concurrent hit-path is now explicitly exercised.

Seeding + shared-key reads ensures the locked hit branch is tested, not only miss behavior.


69-80: Bounded-size assertion is a good integrity guard.

This directly validates LRU capacity under concurrent insert pressure.

src/synthorg/integrations/webhooks/replay_protection.py (2)

94-167: Freshness/nonce split is clean and improves safety readability.

The fail-closed path plus explicit freshness validation makes behavior easier to reason about.


200-215: Atomic duplicate-check + insert is correctly enforced.

Locking this block closes the replay race where identical nonces arrive concurrently.

tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (1)

37-160: Coverage looks solid.

The suite exercises all registry branches, including the auto notify case and the model_construct fallback checks for the expanded error messages.

src/synthorg/client/store.py (1)

177-193: Atomic check-and-insert is implemented correctly.

register_if_absent() closes the race window cleanly under self._lock and matches the intended idempotency contract.

web/src/api/endpoints/clients.ts (1)

1-1: Good idempotency fallback on HTTP 409.

The axios.isAxiosError(...status===409) branch correctly converges retries to getSimulation(simulation_id) and preserves caller flow.

Also applies to: 267-285

tests/integration/api/controllers/test_client_simulation.py (1)

318-347: Strong integration coverage for duplicate-start rejection.

This test validates the wire-level 201→409 sequence for same simulation_id, which is the critical idempotency contract.

tests/unit/api/controllers/test_simulations_idempotency.py (1)

1-147: Unit tests are focused and well-isolated.

The success/failure claim branches are covered, and the patched collaborators avoid masking unrelated failures while keeping the idempotency assertion precise.

docs/design/client-simulation.md (1)

227-237: Documentation update is clear and aligned with runtime behavior.

The idempotency section accurately explains atomic registration and the 409-on-duplicate contract.

src/synthorg/client/continuous.py (1)

134-135: The concern about _running staying True after cancellation is unfounded.

The finally block executes completely and acquires the lock successfully even when the task is cancelled. In Python's asyncio, a CancelledError does not interrupt lock acquisition in a finally block—the finally block runs to completion, acquires the lock, resets _running = False, and only then does the exception propagate. The current code is correct and requires no changes.

			> Likely an incorrect or invalid review comment.

Comment thread src/synthorg/api/controllers/simulations.py Outdated
Comment thread src/synthorg/backup/scheduler.py
Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py Outdated
Comment thread tests/unit/client/test_continuous_lifecycle.py
Comment thread tests/unit/integrations/test_ngrok_adapter_lifecycle.py
Comment thread tests/unit/integrations/test_replay_protection_threadsafety.py Outdated
Comment thread tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
Comment thread tests/unit/settings/test_backup_subscriber.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 353-379: If create_task(...) for runner_task succeeds but a later
setup step fails, the except BaseException path currently unregisters the
simulation without cancelling the spawned task; fix this by detecting the
created task (the local task from asyncio.create_task called for runner_task),
and in the except BaseException handler cancel that task (task.cancel()), await
it (catch asyncio.CancelledError and ignore) or await asyncio.shield(task) with
a timeout, then proceed to await
sim_state.simulation_store.unregister(record.simulation_id) and re-raise; ensure
you still attach log_task_exceptions and sim_state.background_tasks.add/discard
only when setup completes successfully to avoid double-registering the task.

In `@src/synthorg/budget/trends.py`:
- Around line 237-239: The docstring incorrectly states
MixedCurrencyAggregationError is raised when the input parameter records span
multiple currencies, but the implementation checks only the filtered variable
in_window_records (see in_window_records check around where
MixedCurrencyAggregationError is raised); update the docstring wording to state
the error is raised when the records remaining after window filtering
(in_window_records) span multiple currencies, or otherwise make the check match
the docstring by inspecting the original records—mention the symbols
MixedCurrencyAggregationError, records, and in_window_records so reviewers can
find and align the docstring with actual behavior.

In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 3-8: Update the docstring/comments to stop claiming the lifecycle
lock spans the full run loop: state that ContinuousMode.start() only acquires
_lifecycle_lock for the _running check/set transition (it sets _running
true/raises if already running) and then releases the lock before entering the
synchronous run loop; remove or reword any sentence that says the lock is held
"through the loop" or "spans the full body" and ensure references to stop() and
the semantics of _running remain accurate (i.e., stop() signals termination
while the loop runs without holding _lifecycle_lock).

In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 45-53: The test test_concurrent_starts_spawn_one_task only asserts
service.is_running which doesn't prove only one run-loop task was spawned;
modify the test to explicitly verify a single task was created by either (a)
asserting that service._run_task (or the concrete attribute holding the created
Task) is non-None and that repeated start() calls return the same Task object
(e.g., compare identity), or (b) monkeypatching/patching asyncio.create_task (or
the service's internal task-creation hook) to count invocations and assert it
was called exactly once when calling service.start() concurrently; target
symbols: test_concurrent_starts_spawn_one_task, service.start(), and the service
attribute that stores the Task (e.g., service._run_task) or asyncio.create_task.

In `@tests/unit/integrations/test_replay_protection_threadsafety.py`:
- Around line 40-41: Add a per-wait timeout to the Barrier.wait() calls to avoid
tests hanging: change each barrier.wait() (the ones surrounding the
protector.check(nonce="duplicate-nonce", timestamp=ts) call and the other two
occurrences) to barrier.wait(timeout=5) (or another small reasonable value) and
handle a failed wait by catching threading.BrokenBarrierError (or failing the
test) so a timeout produces a fast, diagnosable failure instead of a long
deadlock-style hang.

In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 37-47: The test can race-fail because the first spawned task may
finish before other start() calls run; modify the test to ensure the first task
remains running while the concurrent start() calls happen by making
builder.build return a coroutine that blocks on a controllable asyncio.Event (or
similar) so the spawned task does not complete immediately. Update
test_concurrent_starts_spawn_one_task to create an Event, patch or pass a
builder into _make_monitor so its build returns a coroutine that awaits that
Event, run asyncio.gather(monitor.start(), monitor.start(), monitor.start()),
assert monitor._task is not None and that all start() calls did not create
multiple tasks (e.g., compare identity of monitor._task or count spawns), then
set the Event to allow the task to finish and await monitor.stop(); reference
monitor.start(), monitor._task, _make_monitor, and builder.build to locate
changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4a833d9-d9ad-4f1c-adfa-7fb383421533

📥 Commits

Reviewing files that changed from the base of the PR and between 5b1dada and fb3db73.

📒 Files selected for processing (13)
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/budget/trends.py
  • src/synthorg/client/store.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • src/synthorg/budget/trends.py
  • tests/unit/settings/test_backup_subscriber.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • src/synthorg/client/store.py
  • tests/unit/client/test_continuous_lifecycle.py
  • src/synthorg/backup/scheduler.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/api/controllers/test_backup.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/api/controllers/test_backup.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/scheduler.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/scheduler.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/scheduler.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/client/store.py
  • src/synthorg/backup/scheduler.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/simulations.py
🔇 Additional comments (25)
tests/unit/integrations/test_replay_protection_threadsafety.py (1)

14-17: Good test harness setup for deterministic thread-safety checks.

Using FakeClock from the shared test seam and a unit marker keeps these tests aligned with repo test conventions.

tests/unit/client/test_continuous_lifecycle.py (1)

22-132: Good event-driven lifecycle tests and unit marker usage.

Nice job replacing scheduler-dependent sleeps with runner signaling; this makes the concurrency assertions deterministic and aligned with the lock-gated start/stop behavior.

src/synthorg/communication/conflict_resolution/escalation/factory.py (3)

52-89: Strong defensive backend validation with pre-raise logging.

_require_persistence now fails fast on missing/mismatched backend and logs structured context before raising, which aligns well with startup diagnosability requirements.


150-169: Unknown queue backend handling is now robust and operator-friendly.

Good improvement: enumerating registered backends in both the error and structured warning makes misconfiguration triage much faster.


242-283: Decision-processor registry dispatch is clean and extensible.

The MappingProxyType registry + explicit unknown-strategy warning/raise path is a solid replacement for conditional branching.

src/synthorg/backup/scheduler.py (4)

36-41: LGTM!

Lifecycle primitives are correctly initialized in __init__: dedicated _lifecycle_lock (retained for service lifetime per past review), _stop_event for cooperative shutdown, _stop_failed flag for unrestartable state, and configurable drain timeout. This aligns with the canonical lifecycle pattern documented in docs/reference/lifecycle-sync.md.


48-86: LGTM!

The start() method correctly implements the canonical async lifecycle pattern: guarded by _lifecycle_lock, idempotent when already running, refuses restart after a timed-out stop, and attaches log_task_exceptions callback to surface unexpected loop termination. This addresses the past review concern about silent scheduler deaths.


88-145: LGTM!

The stop() method correctly implements coordinated shutdown: signals both events, cancels the task, drains with a shielded timeout, and marks the scheduler unrestartable on timeout to prevent stacking loops on orphan tasks. Events are recreated while holding _lifecycle_lock (lines 135-144), preventing the race condition where a concurrent start() could bind to stale events. The lock itself is correctly preserved across the service lifetime.


169-202: LGTM!

The _run_loop() correctly honors cooperative shutdown via _stop_event, includes a post-wake stop check (lines 187-188) to prevent firing after shutdown is requested, and uses proper structured logging with safe_error_description for exception handling. The PEP 758 except syntax and explicit CancelledError propagation are correct.

tests/unit/settings/test_backup_subscriber.py (3)

34-56: LGTM!

The _make_subscriber helper correctly specs all mocks per the coding guidelines: BackupScheduler, BackupService, SettingsService, and their async methods. The start and stop methods are AsyncMock(spec=...) to enforce the awaitable contract, while reschedule is a synchronous MagicMock(spec=...).


95-133: LGTM!

The test assertions correctly use assert_awaited_once() for async methods that must be awaited and assert_not_called() for paths that must avoid the scheduler entirely. The inline comments (lines 103-111) clearly document why these stricter assertions catch regressions that assert_called_once() or assert_not_awaited() would miss.


136-176: LGTM!

Advisory key tests correctly use assert_not_called() for both start and stop methods, ensuring the scheduler is never touched (not even with an unawaited coroutine) when processing advisory settings like compression, on_shutdown, or on_startup.

tests/unit/api/controllers/test_backup.py (4)

74-113: LGTM!

The _make_state_and_service helper correctly implements spec'd mocks for BackupService, AppState, and IdempotencyService. The run_idempotent is implemented as an inline async helper that exercises the awaitable contract by calling await callback(). The comment explaining SimpleNamespace usage (lines 107-112) focuses on technical rationale without internal gate terminology, addressing past review feedback.


125-153: LGTM!

The tests correctly pass idempotency_key to create_backup.fn(), aligning with the controller now requiring the Idempotency-Key header. The assertion uses equality (==) rather than identity (is) for comparing result.data to the manifest, which is appropriate for value comparison.


354-372: LGTM!

The restore confirm gate test correctly uses assert_not_called() with a clear comment explaining why this is stricter than assert_not_awaited() — it catches regressions where the controller creates but forgets to await the service call.


395-403: LGTM!

The guard test fixture correctly specs BackupService and BackupScheduler mocks, with scheduler.stop as an AsyncMock(spec=BackupScheduler.stop) to match the now-async lifecycle method.

src/synthorg/budget/trends.py (2)

242-256: Good ordering: window filter before currency assertion.

This sequence is correct: Line 249 filters contributing rows first, then Line 252 validates currency on the actual aggregation set. Nice data-integrity fix for partial-window queries.


467-467: HTTP 409 translation for MixedCurrencyAggregationError is correctly implemented.

The exception raised at line 467 in project_daily_spend is properly mapped:

  • MixedCurrencyAggregationError declares status_code: ClassVar[int] = 409 in its definition (src/synthorg/budget/errors.py:144).
  • The handler is registered in EXCEPTION_HANDLERS (src/synthorg/api/exception_handlers.py:853) and dispatched to handle_domain_error, which reads the status_code ClassVar via getattr(exc, "status_code", 500) at line 617.
  • analytics.py imports and calls project_daily_spend; the exception bubbles up through the controller to the handler without being caught.

No changes needed.

tests/unit/hr/pruning/test_service_lifecycle.py (1)

72-99: Good cancellation-safe timeout harness.

Using asyncio.Event().wait() plus the explicit release gate is a solid way to avoid flaky wall-clock sleeps and task leaks in this timeout-path test.

src/synthorg/integrations/tunnel/ngrok_adapter.py (3)

47-74: LGTM on initialization and lifecycle setup.

The auth token is correctly captured at init time (the sanctioned "init-time only" exception for bootstrap secrets), keeping the runtime start() path off the os.environ API. The lifecycle lock and state variables are properly initialized.


95-150: LGTM on idempotent start() with proper concurrency handling.

The implementation correctly:

  • Serializes access with _lifecycle_lock
  • Returns the existing URL if already active (idempotent behavior)
  • Builds a per-call PyngrokConfig instead of mutating the process-global config
  • Uses safe_error_description(exc) for scrubbing sensitive error details before logging/raising

This addresses all the past review concerns about global config mutation, idempotency, and the settings resolution path.


163-177: LGTM on best-effort stop() cleanup.

The stop method properly:

  • Acquires the lifecycle lock
  • Returns early if no tunnel is active (no-op case)
  • Performs best-effort disconnect with scrubbed error logging
  • Clears state even on disconnect failure (documented intentional behavior)
tests/unit/integrations/test_ngrok_adapter_lifecycle.py (3)

38-65: LGTM on idempotency test.

The test correctly verifies that a second start() while active reuses the existing tunnel rather than invoking a second upstream connect.


67-91: LGTM on concurrency test.

The asyncio.gather approach validates that the lifecycle lock serializes concurrent start() calls, resulting in only one tunnel creation.


93-107: LGTM on stop-without-start test.

The test now uses MagicMock(spec=ngrok.disconnect) and assert_not_called() to verify the no-op path, addressing the past review feedback about proving disconnect is never invoked.

Comment thread src/synthorg/api/controllers/simulations.py
Comment thread src/synthorg/budget/trends.py Outdated
Comment thread tests/unit/client/test_continuous_lifecycle.py
Comment thread tests/unit/hr/pruning/test_service_lifecycle.py
Comment thread tests/unit/integrations/test_replay_protection_threadsafety.py Outdated
Comment thread tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/unit/client/test_continuous_lifecycle.py (1)

67-94: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reword the lock-scope comment.

await runner.ready.wait() only tells you run() has started; it does not mean _lifecycle_lock is still held. The current wording repeats the stale lock-scope claim and could mislead future concurrency changes.

Suggested wording
-        # Wait until the runner reports it has entered ``run()``; this
-        # guarantees the first ``start()`` already holds the lifecycle
-        # lock before the second call below races for it.
+        # Wait until the runner reports it has entered ``run()``; this
+        # guarantees the first ``start()`` has set ``_running=True``
+        # before the second call below races for it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/client/test_continuous_lifecycle.py` around lines 67 - 94, The
comment incorrectly claims that awaiting runner.ready.wait() guarantees the
lifecycle lock is still held; update the comment near the
test_double_start_raises_when_already_running to accurately state that awaiting
runner.ready.wait() only ensures the runner's run() has begun (i.e., run()
entered and set ready), but does not assert that ContinuousMode's
_lifecycle_lock is still held—explain that this timing merely increases the
chance the first start() acquired the lock before the second start() races for
it, and reference mode.start(), runner.ready.wait(), run(), and _lifecycle_lock
in the reworded comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 428-434: The exception handler around the post-claim setup
currently rolls back via _rollback_register_if_absent and re-raises without
logging the original error; modify the except BaseException block to capture the
exception (e.g., "except BaseException as exc") and log it at WARNING or ERROR
with context including spawned_task, sim_state and record.simulation_id (use the
module/class logger and include exc or exc_info=True) before calling
_rollback_register_if_absent and re-raising so the root cause is recorded for
failed starts.

In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 80-90: Add a positive running-state assertion after the second
await service.start() in test_restart_after_clean_stop to ensure the restart
actually takes effect: after calling await service.start() the second time,
assert service.is_running is True (do not access is_running before the second
start to avoid mypy narrowing); you can then await service.stop() as before and
optionally assert not service.is_running afterward.

In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 122-128: The test can leak the monitor task when the
timeout/assertion raises; wrap the timeout/assertion block in a try/finally so
cleanup always runs: call await monitor.stop() inside the try, assert
monitor._stop_failed and capture monitor._task, and in the finally always call
release.set() and await the saved task (if not None) to ensure the hung monitor
is released; reference monitor.stop, monitor._stop_failed, monitor._task,
release.set and awaiting the task when implementing the change.

---

Duplicate comments:
In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 67-94: The comment incorrectly claims that awaiting
runner.ready.wait() guarantees the lifecycle lock is still held; update the
comment near the test_double_start_raises_when_already_running to accurately
state that awaiting runner.ready.wait() only ensures the runner's run() has
begun (i.e., run() entered and set ready), but does not assert that
ContinuousMode's _lifecycle_lock is still held—explain that this timing merely
increases the chance the first start() acquired the lock before the second
start() races for it, and reference mode.start(), runner.ready.wait(), run(),
and _lifecycle_lock in the reworded comment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34c568eb-986f-4875-be80-d684a16e0645

📥 Commits

Reviewing files that changed from the base of the PR and between fb3db73 and 82fabf6.

📒 Files selected for processing (6)
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/budget/trends.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Dashboard Build
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Dashboard Type Check
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • src/synthorg/budget/trends.py
  • tests/unit/client/test_continuous_lifecycle.py
  • src/synthorg/api/controllers/simulations.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/budget/trends.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/budget/trends.py
  • src/synthorg/api/controllers/simulations.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/simulations.py
🔇 Additional comments (15)
tests/unit/integrations/test_replay_protection_threadsafety.py (1)

16-94: Looks good.

The concurrency tests are properly synchronized, deterministic, and cover the intended deduplication and bounded-eviction paths under contention.

src/synthorg/budget/trends.py (3)

17-17: No concern.
The helper import matches the updated validation flow.


236-270: Window-scoped currency validation is correct.
Filtering before _assert_single_currency() preserves partial-range queries while still rejecting mixed-currency buckets inside the requested window.


451-487: Single-currency enforcement is in the right place.
Validating after the empty-input fast path keeps the existing zero-input behavior and still blocks mixed-currency projections for real data.

src/synthorg/api/controllers/simulations.py (3)

115-139: Good callback ordering to avoid silent task-failure drops.

This sequence correctly registers exception logging before discard/add bookkeeping, which protects against fast-completing tasks.


141-174: Rollback sequencing is correct (cancel/drain before unregister).

Cancelling and draining the spawned runner before unregistering the claim avoids the runner/store race you were targeting.


344-358: Atomic claim + explicit 409 conflict handling looks correct.

Using register_if_absent() here closes the duplicate-runner TOCTOU gap and cleanly enforces idempotency semantics.

tests/unit/client/test_continuous_lifecycle.py (4)

1-64: Looks good.

The docstring, imports, and _FakeRunner coordination helper are clean and avoid the old sleep-based synchronization.


95-103: Looks good.

The disabled-path short-circuit is straightforward and the runner stays untouched as expected.


105-126: Looks good.

The stop/release test is deterministic enough here and uses the runner event instead of ad-hoc sleeps.


128-135: Looks good.

The _lifecycle_lock smoke check matches the new lifecycle contract.

tests/unit/hr/pruning/test_service_lifecycle.py (2)

45-78: Good fix for the concurrent-start race.

Counting asyncio.create_task at the service call site directly proves only one run-loop task is spawned and closes the gap from the previous review.


92-124: Good coverage for the timeout and unrestartable path.

This exercises the drain timeout, confirms _stop_failed is set, and verifies a subsequent start() is rejected.

tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (2)

21-31: Spec-bound builder mock is exactly right.

This follows the mock-spec gate and keeps SnapshotBuilder.build mocked with the right interface.


37-95: Good coverage for concurrent start and clean restart.

The blocked build() plus create_task counting makes the single-spawn assertion deterministic, and the restart-after-stop path is exercised cleanly.

Comment thread src/synthorg/api/controllers/simulations.py Outdated
Comment thread tests/unit/hr/pruning/test_service_lifecycle.py
Comment thread tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 87-94: The test_restart_after_clean_stop currently restarts the
monitor but lacks a positive assertion that the restart took effect; after the
second await monitor.start() add an assertion like assert monitor.is_running (or
the monitor's equivalent running-state property) to mirror the PruningService
test and ensure start() actually transitions the monitor back to running; locate
the test function test_restart_after_clean_stop and update it to assert the
monitor's running state after the second start.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fff25349-f799-4cbe-b5fa-ffe3519fc0b3

📥 Commits

Reviewing files that changed from the base of the PR and between 82fabf6 and 3503e6d.

📒 Files selected for processing (4)
  • src/synthorg/api/controllers/simulations.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Build
  • GitHub Check: Dashboard Type Check
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dashboard Test
  • GitHub Check: Type Check
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/client/test_continuous_lifecycle.py
  • src/synthorg/api/controllers/simulations.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/api/controllers/simulations.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/controllers/simulations.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/api/controllers/simulations.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/simulations.py
🔇 Additional comments (9)
tests/unit/client/test_continuous_lifecycle.py (3)

3-12: Docstring now matches ContinuousMode.start() lock semantics.

The lock scope and _running transition behavior are described precisely and align with the current implementation.


26-34: Good shift to event-driven synchronization in lifecycle tests.

Using runner.ready.wait() instead of scheduler-timing sleeps makes these assertions deterministic and less flaky under CI contention.

Also applies to: 84-92, 123-127


70-98: Lifecycle coverage is solid across conflict, disabled path, and cooperative stop.

These tests exercise the important control-flow branches and validate both behavior and state (runs_completed) without leaking tasks.

Also applies to: 99-107, 109-131

src/synthorg/api/controllers/simulations.py (4)

115-139: LGTM!

The callback ordering logic is correctly implemented—attaching the exception logger before the discard callback ensures fast-completing failures are surfaced. The docstring clearly explains this subtlety.


141-174: LGTM!

The rollback helper correctly sequences: cancel → drain → unregister. Drain failures are logged without blocking the unregister, preventing orphaned claims. This addresses the past concern about leaving tasks running after claim removal.


344-358: LGTM!

The atomic registration via register_if_absent with proper 409 response for duplicates closes the idempotency gap. The detailed comment explaining the JetStream redelivery / HTTP retry scenario provides excellent context for future maintainers.


409-447: LGTM!

The rollback handling correctly addresses the previous review concerns:

  • Logs the original exception with context before rollback (lines 434-441)
  • Cancels/drains the spawned task before unregistering via _rollback_register_if_absent
  • The spawned_task boolean in the log aids diagnosis of pre- vs post-spawn failures

The MemoryError, RecursionError passthrough (PEP 758) correctly propagates system errors without masking them.

tests/unit/hr/pruning/test_service_lifecycle.py (1)

1-135: LGTM!

The test suite comprehensively validates the canonical lifecycle pattern:

  • test_concurrent_starts_spawn_one_task: Correctly patches asyncio.create_task to count spawns and asserts exactly one task is created under concurrent start() calls.
  • test_restart_after_clean_stop: Properly asserts service.is_running after the second start() to verify restart took effect.
  • test_unrestartable_after_drain_timeout: Uses asyncio.Event().wait() for deterministic blocking (per guidelines), captures saved_task before assertions, and uses try/finally to ensure cleanup even if assertions fail.

Past review feedback has been addressed.

tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (1)

21-31: LGTM on mock-spec compliance and lifecycle test patterns.

All AsyncMock usages properly specify spec=SnapshotBuilder per the mock-spec gate requirement. The tests correctly use asyncio.Event().wait() for indefinite blocking and try/finally for deterministic cleanup, addressing all prior review feedback.

Also applies to: 37-85, 96-138

Comment thread tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
Aureliolo added 4 commits May 2, 2026 18:13
Source code:

- simulations.py: roll back register_if_absent claim if post-claim setup raises (publish/spawn/callback registration); add SimulationStore.unregister

- backup/scheduler.py: add log_task_exceptions done_callback so silent _run_loop deaths are surfaced via BACKUP_FAILED

- escalation/factory.py: log API_APP_STARTUP warning with config value + registered keys before raising ValueError on unknown backend / strategy

- ngrok_adapter.py: read NGROK_AUTHTOKEN at __init__ time per CLAUDE.md bootstrap-secret exception (no os.environ at runtime)

- budget/trends.py: filter records to [start, end) before _assert_single_currency so out-of-window mixed-currency rows do not reject valid partial-range queries

Tests:

- test_backup.py: == not is for ApiResponse.data; assert_not_called for confirm=false gate

- test_continuous_lifecycle: Event-driven runner, drop wall-clock asyncio.sleep gates

- test_ngrok_adapter_lifecycle: replace swallowing fake disconnect with strict MagicMock(spec=ngrok.disconnect) + assert_not_called on stop-without-start path

- test_replay_protection_threadsafety: threading.Barrier across all 3 ThreadPoolExecutor blocks so workers all hit the critical section simultaneously

- test_monitor_lifecycle / test_pruning_service_lifecycle: asyncio.Event gate around hung loops; release+await task after timeout assertion (no leaked tasks)

- test_backup_subscriber: assert_not_called instead of assert_not_awaited on scheduler.stop (catches unawaited coroutine creation)
Source code:

- simulations.py: cancel + drain spawned runner_task before unregister on rollback path; extract _attach_runner_callbacks and _rollback_register_if_absent helpers

- budget/trends.py: docstring matches in_window_records check (currency error fires on the post-filter set, not raw input)

Tests:

- test_continuous_lifecycle: module docstring reflects actual lock semantics (released before run loop body)

- test_pruning_service_lifecycle: patch asyncio.create_task and assert exactly one spawn under concurrent start() callers

- test_replay_protection: threading.Barrier.wait(timeout=5) on all three pools so a stalled worker fails fast instead of hanging the suite

- test_monitor_lifecycle: concurrent-starts now uses a blocking builder.build (asyncio.Event) and counts asyncio.create_task spawns to assert exactly one task
Source code:

- simulations.py: capture and log original exception in except BaseException before _rollback_register_if_absent + re-raise; without this the rollback drain log was the only trace of a failed start

Tests:

- test_continuous_lifecycle: reword sequencing comment to drop misleading 'lock still held' claim (lock released before run loop body; ready.wait() only proves run() has entered)

- test_pruning_service_lifecycle.test_restart_after_clean_stop: assert is_running is True after the second start() so a silent restart no-op cannot pass

- test_pruning_service_lifecycle.test_unrestartable_after_drain_timeout / test_monitor_lifecycle.test_unrestartable_after_drain_timeout: wrap timeout/assertion blocks in try/finally so release.set() + saved_task await always run, even when an assertion above raises (no orphan task leaks into next test)
- test_monitor_lifecycle.test_restart_after_clean_stop: assert monitor._task is not None after the second start() so a silent restart no-op cannot pass the test (mirrors the PruningService equivalent)
@Aureliolo Aureliolo force-pushed the fix/audit-cleanup-c-persistence branch from 1463bd9 to 03887ab Compare May 2, 2026 16:15
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 2, 2026 16:18 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Around line 88-107: Change the idempotency_key parameter type from plain str
to NotBlankStr to ensure whitespace-only Idempotency-Key header values are
rejected at parameter parsing; update the Annotated type for idempotency_key in
the function signature (the parameter named idempotency_key) and add the
necessary import for NotBlankStr from core.types so the validation runs before
any downstream logic that currently re-wraps the value.
- Around line 143-159: The cached idempotency result (outcome.result) is assumed
to always validate to BackupManifest but a corrupt/stale payload can make
BackupManifest.model_validate(...) raise and bypass the BACKUP_FAILED logging;
wrap the call to BackupManifest.model_validate(outcome.result) in a try/except
that catches the pydantic/validation error (and any deserialization error), log
an ERROR/WARNING using BACKUP_FAILED with context (scope="backup",
idempotency_key, endpoint="backup.create") and then raise an appropriate
InternalError/ServiceError (instead of letting the raw validation exception
leak); place this handling immediately after obtaining outcome from
app_state.idempotency_service.run_idempotent so the failure path from cached
manifests is translated and logged.

In `@src/synthorg/backup/scheduler.py`:
- Around line 57-67: The code raises a bare RuntimeError when self._stop_failed
in BackupScheduler.start(); define a domain-specific exception (e.g.,
BackupUnrestartableError) that inherits from DomainError and replace the
RuntimeError raise with raising this new exception, keeping the existing
BACKUP_FAILED logger call and message; ensure the new class follows the naming
pattern <Domain><Condition>Error and is exported/used by callers so the
backup-domain exception family flows through EXCEPTION_HANDLERS.

In `@src/synthorg/client/store.py`:
- Around line 194-205: unregister currently unconditionally deletes whatever is
at _runs[simulation_id]; change it to a compare-and-delete so it only removes
the entry when it still equals the originally claimed snapshot: add an optional
parameter expected (e.g. expected: Optional[RegistrationRecord]=None) to
unregister, and inside the async with self._lock check current =
self._runs.get(simulation_id) and only pop(simulation_id) and return True if
current is expected (or if expected is None preserve existing behavior),
otherwise return False; update the rollback caller in
start_simulation/register_if_absent to pass the originally claimed record as
expected so you don’t delete a later real run.

In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 199-200: Remove the redundant unreachable early return that checks
"if not queues_snapshot: return" (the second check after the lock) — the
empty-queues case is already handled inside the lock when the snapshot is
created, so delete this dead branch referencing queues_snapshot to avoid
unreachable code in the event stream logic.

In `@src/synthorg/hr/pruning/service.py`:
- Around line 143-153: Replace the bare RuntimeError raised in
PruningService.start() when self._stop_failed is true with a pruning-domain
error: define a new DomainError subclass named PruningUnrestartableError
(following the <Domain><Condition>Error pattern) that inherits from DomainError,
import it into service.py, and raise PruningUnrestartableError(msg) instead of
RuntimeError(msg); keep the existing logger.warning call but ensure the raised
exception is the new pruning-scoped error so it flows through the centralized
EXCEPTION_HANDLERS and domain error handling.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 123-131: The adapter assigns self._tunnel before materializing the
tunnel.public_url, which can leave the object half-started if converting
public_url raises; update the start logic in ngrok_adapter (the async block that
calls ngrok.connect and assigns tunnel/self._tunnel) to first compute/convert
the public URL from the returned tunnel (e.g., materialize public_url into a
local variable or set self._public_url) and only after that assign self._tunnel
and publish state; additionally wrap the conversion in a try/except to
close/disconnect the newly created tunnel on failure so stop() won't later try
to disconnect a None/second tunnel.

In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 163-166: The code samples the clock twice causing a boundary
replay gap; fix by sampling now once in check() and using that same now for
freshness and nonce eviction: add a helper or overload (e.g. implement
_check_freshness_at(timestamp, now) or extend check_freshness to accept an
optional now), change check() to do now = self._clock.now().timestamp() once,
call self._check_freshness_at(timestamp, now) (or
self.check_freshness(timestamp, now)) and then call
self._check_nonce(nonce=nonce, now=now); update any internal callers to use the
new helper/signature so all freshness and nonce decisions use the identical
timestamp.

In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 80-90: Replace the bare RuntimeError raised in
OrgInflectionMonitor when self._stop_failed is true with a domain-scoped
lifecycle error: define a new exception class named
InflectionMonitorLifecycleError (or OrgInflectionLifecycleError) that subclasses
DomainError, import DomainError into this module, and raise that new class with
the same msg; keep the existing logger.warning call (using
COS_INFLECTION_CHECK_FAILED) but ensure the raised exception type is the new
InflectionMonitorLifecycleError so the failure participates in the domain error
hierarchy and centralized EXCEPTION_HANDLERS routing.

In `@src/synthorg/providers/health_prober.py`:
- Around line 193-204: Replace the bare RuntimeError raised in
ProviderHealthProber when self._stop_failed is true with a provider-scoped
domain error: define a new error class (e.g. ProviderLifecycleConflictError)
that inherits from DomainError following the "<Domain><Condition>Error" naming
convention, import that class into health_prober.py, and raise
ProviderLifecycleConflictError(msg) instead of RuntimeError(msg); keep the
existing logger call (PROVIDER_HEALTH_PROBER_CYCLE_FAILED) and message intact so
the domain error is typed consistently across API/controller boundaries.

In `@src/synthorg/settings/subscribers/backup_subscriber.py`:
- Around line 112-113: Wrap the await scheduler.start() call in a try/except
inside the subscriber so any exception from BackupScheduler.start() is logged
with context before being allowed to propagate; catch Exception as exc, call the
module logger with structured fields error_type=type(exc).__name__ and
error=safe_error_description(exc) plus context like subscriber name/setting and
scheduler id, then re-raise (or translate if needed) so the failure still
surfaces. Ensure you reference the existing scheduler.start() call and use the
safe_error_description helper when logging.

In `@tests/unit/backup/test_scheduler_lifecycle.py`:
- Around line 55-68: The test's hung_loop implementation uses a fixed
asyncio.sleep(1.0) in the CancelledError path making the test slow/flaky; update
hung_loop (the patched BackupScheduler._run_loop) to use an asyncio.Event
created outside the coroutine and await event.wait() in the except/finally path
so the test can release the event deterministically (from the test body) before
asserting scheduler.stop() raises TimeoutError and checking
scheduler._stop_failed; ensure the Event is set in a finally block or by the
test to unblock the hung task rather than sleeping.
- Around line 30-39: The test test_concurrent_starts_spawn_one_task currently
only checks scheduler._task is not None which misses multiple creations; replace
this with a deterministic assertion that only one underlying task was created by
spying/mocking asyncio.create_task during the concurrent scheduler.start() calls
(or instrumenting scheduler to append created tasks to a list) and then assert
that the spy was called exactly once and that scheduler._task is the single
created task; target the scheduler.start() calls and scheduler._task symbol for
locating where to add the create_task spy/collector and the exact-one assertion.

In `@tests/unit/budget/test_trends_currency.py`:
- Around line 61-83: Add a regression test that ensures bucket_cost_records does
NOT raise MixedCurrencyAggregationError when mixed-currency records are all
outside the requested time window: create a new test (e.g.,
test_out_of_window_mixed_currency_no_error) that builds mixed-currency records
with timestamps outside [start, end) using the existing _record helper and _NOW
constant, call bucket_cost_records with that start/end and BucketSize.HOUR, and
assert it returns points (or zeros) without raising; follow the style of
test_mixed_currency_raises/test_empty_records_no_error and ensure you do not
change bucket_cost_records itself or other tests.
- Around line 50-51: Tests are hardcoding ISO codes ("EUR"/"USD") in
fixtures/assertions; import and use the project's Currency type/constant instead
(e.g., Currency.EUR or the project's equivalent factory) and replace the string
literals in the _record calls and all other occurrences in the test (references:
_record, the test_trends_currency test cases) so the tests derive currency
values from the shared Currency type rather than literal strings; ensure you add
the necessary import for Currency at the top of the test file and update every
"EUR"/"USD" occurrence indicated to use the Currency constant/type.

In `@tests/unit/providers/test_health_prober_lifecycle.py`:
- Around line 78-102: The patched ProviderHealthProber._run_loop in the test
suppresses cancellation and uses asyncio.sleep(1.0), leaving the background task
alive after the TimeoutError and leaking tasks; change the hung_loop to wait on
an asyncio.Event (e.g., cancel_started and a new release event), set
cancel_started on CancelledError, then in test cleanup set the release event and
await prober._task to ensure the background task finishes; specifically update
the patched _run_loop used in the with patch.object(ProviderHealthProber,
"_run_loop", hung_loop) block to gate the post-cancel wait on the release event,
call release.set() after the TimeoutError assertion, and await prober._task
before asserting the prober is unrestartable.

In `@web/src/api/endpoints/clients.ts`:
- Around line 273-283: When catching the 409 in the create path, don't
unconditionally return getSimulation(config.simulation_id); instead fetch the
existing run (via getSimulation(config.simulation_id)), compare its stored
config object to the requested config (use a deep-equality check such as
lodash/isEqual or an equivalent deepEqual helper) and only return the existing
run if the configs are identical; if they differ, rethrow the original err so
the 409 surfaces. Ensure you still detect axios.isAxiosError(err) &&
err.response?.status === 409, preserve the original error variable for rethrow,
and perform the comparison against the same fields used to create the simulation
(config) before returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4638b438-f9e6-4733-9fac-75efd88ff139

📥 Commits

Reviewing files that changed from the base of the PR and between 1463bd9 and 03887ab.

📒 Files selected for processing (48)
  • docs/design/backup.md
  • docs/design/client-simulation.md
  • docs/design/observability.md
  • docs/licensing.md
  • docs/reference/lifecycle-sync.md
  • docs/research/lgpl-postgres-driver-decision.md
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/backup/service.py
  • src/synthorg/budget/trends.py
  • src/synthorg/client/continuous.py
  • src/synthorg/client/store.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/tools/mcp/cache.py
  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
💤 Files with no reviewable changes (1)
  • scripts/mock_spec_baseline.txt
📜 Review details
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • src/synthorg/settings/subscribers/backup_subscriber.py
  • web/src/api/endpoints/backup.ts
  • src/synthorg/observability/events/event_stream.py
  • web/src/api/endpoints/clients.ts
  • src/synthorg/backup/service.py
  • src/synthorg/client/store.py
  • tests/integration/api/controllers/test_client_simulation.py
  • src/synthorg/budget/trends.py
  • tests/unit/backup/test_scheduler.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/observability/events/api.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • tests/unit/api/controllers/test_backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
  • src/synthorg/hr/pruning/service.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/backup/service.py
  • src/synthorg/client/store.py
  • src/synthorg/budget/trends.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/hr/pruning/service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/backup/service.py
  • src/synthorg/client/store.py
  • src/synthorg/budget/trends.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/hr/pruning/service.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/observability/events/event_stream.py
  • src/synthorg/backup/service.py
  • src/synthorg/client/store.py
  • src/synthorg/budget/trends.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/hr/pruning/service.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/backup/service.py
  • src/synthorg/client/store.py
  • src/synthorg/budget/trends.py
  • src/synthorg/tools/mcp/cache.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/continuous.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/communication/conflict_resolution/escalation/sweeper.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/communication/conflict_resolution/escalation/factory.py
  • src/synthorg/hr/pruning/service.py
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Units: metric only. Spelling: International / British English UI default (colour, behaviour, organise, centred, analyse, cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through @/utils/format, @/utils/locale, DEFAULT_CURRENCY, and useSettingsStore().currency, with no contradiction to the locale-neutral defaults above.

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse components from web/src/components/ui/ before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens, @/lib/motion presets, and the helpers in @/utils/format. Enforced by scripts/check_web_design_system.py (PostToolUse hook on every web/src/ edit).

web/src/**/*.{ts,tsx}: Always use createLogger from @/lib/logger; never use bare console.warn, console.error, or console.debug in application code
Logger variable name must always be log (e.g., const log = createLogger('module-name'))
Use logger levels: log.debug() (DEV-only, stripped in production), log.warn(), log.error()
Pass dynamic/untrusted values as separate arguments to logger (not interpolated into the message string) so they go through sanitizeArg
Wrap attacker-controlled fields inside structured objects with sanitizeForLog() before embedding in logs
Callers MUST NOT wrap store mutation calls in try / catch; the store owns the error UX
Display counts in paginated lists come from data.length; the wire envelope no longer carries total
Any new caller of health/readiness endpoints must handle the 503 path explicitly
Use sanitizeWsString() and sanitizeWsEnum() from web/src/utils/ws-sanitize.ts (pure helpers, re-exported from @/stores/notifications) for WebSocket payload sanitization
Any new WS payload handler that ingests untrusted strings MUST route through sanitizeWsString() or sanitizeWsEnum(); raw (sanitizeWsString(x, n) ?? '') as EnumType casts are forbidden
NEVER hardcode Motion transition durations; use @/lib/motion presets instead
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from @/utils/format instead
NEVER hardcode currency symbols / codes; use DEFAULT_CURRENCY from @/utils/currencies instead
Use <StatusBadge> instead of inline status dots (defaults to role="img" with aria-label; decorative for adjacent-labeled, announce for liv...

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/api/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

getLiveness() is always 200 while the process is alive; getReadiness() is 200 healthy / 503 unavailable (binary 'ok' | 'unavailable' outcome, no tri-state)

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx,css}

📄 CodeRabbit inference engine (web/CLAUDE.md)

NEVER hardcode hex colors, font-family declarations, or pixel spacing; use design tokens instead

Files:

  • web/src/api/endpoints/backup.ts
  • web/src/api/endpoints/clients.ts
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time via mkdocs-d2-plugin(dagre layout). Requires the [D2 CLI](https://d2lang.com/tour/install) onPATHlocally and in CI (pinned to v0.7.1 via.github/workflows/pages.yml`).

Files:

  • docs/design/observability.md
  • docs/design/client-simulation.md
  • docs/design/backup.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/**/*.md: Use Mermaid (\``mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side via pymdownx.superfences. Use **Markdown tables**: grid/matrix data that is semantically tabular (not diagrams). D2 uses theme 200 (Dark Mauve), dark-only render, configured globally in mkdocs.yml. Never use ```text` blocks with ASCII/Unicode box-drawing characters for diagrams.

Files:

  • docs/design/observability.md
  • docs/design/client-simulation.md
  • docs/design/backup.md
  • docs/licensing.md
  • docs/reference/lifecycle-sync.md
  • docs/research/lgpl-postgres-driver-decision.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/integration/api/controllers/test_client_simulation.py
  • tests/unit/backup/test_scheduler.py
  • tests/unit/tools/mcp/test_cache_threadsafety.py
  • tests/unit/api/controllers/test_backup_required_idempotency.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/api/auth/test_ticket_store_threadsafety.py
  • tests/unit/client/test_continuous_lifecycle.py
  • tests/unit/settings/test_backup_subscriber.py
  • tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py
  • tests/unit/integrations/test_replay_protection_threadsafety.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/api/controllers/test_simulations_idempotency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/hr/pruning/test_service.py
  • tests/unit/communication/event_stream/test_stream_dedup.py
  • tests/unit/api/controllers/test_backup.py
  • tests/unit/integrations/test_ngrok_adapter_lifecycle.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/auth/ticket_store.py
  • src/synthorg/api/controllers/simulations.py
🪛 LanguageTool
docs/design/observability.md

[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (24)
src/synthorg/observability/events/api.py (1)

80-80: Event constant addition is consistent and correct.

API_WS_TICKET_LIMIT_EXCEEDED follows the existing API_WS_TICKET_* naming/key convention and keeps the event catalog centralized.

docs/research/lgpl-postgres-driver-decision.md (1)

24-29: Decision matrix formatting is correct and guideline-compliant.

Using a Markdown table for the option trade-off section is the right format and keeps the ADR easy to scan.

As per coding guidelines, “Use Markdown tables: grid/matrix data that is semantically tabular (not diagrams).”

docs/licensing.md (1)

112-122: Third-party license section is clear and well-grounded.

The new section cleanly scopes default vs optional install obligations and points to the decision record for traceability.

src/synthorg/communication/conflict_resolution/escalation/factory.py (2)

150-169: Defensive fallback behavior is solid.

Unknown backend/strategy paths now log contextual startup warnings (including registered keys) before raising, which is exactly what operators need during misconfiguration triage.

Also applies to: 267-282


122-129: Registry refactor is clean and immutable.

Using MappingProxyType for dispatch tables keeps the extension point explicit and prevents accidental runtime mutation.

Also applies to: 242-250

tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)

85-98: Good cross_instance_notify="auto" regression coverage.

This explicitly protects the shared {"auto","on"} branch from narrowing regressions.


139-160: Fallback tests are well tightened.

Asserting registry option names in the ValueError messages validates the operator-facing diagnostics, not just exception type.

src/synthorg/tools/mcp/cache.py (4)

58-58: LGTM!

Adding threading.Lock for thread-safety is appropriate for a cache accessed from thread pools.


78-105: LGTM!

The lock correctly guards the read-decision-write sequence (TTL check, deletion, move_to_end). Logging/metrics are wisely moved outside the lock to minimize hold time. The deepcopy on the local hit reference outside the lock is safe since hit holds a valid reference regardless of concurrent cache mutations.


124-142: LGTM!

The lock correctly guards the entire put sequence (delete existing → evict oldest → insert). The eviction loop handles edge cases properly, including max_size=0 (disables caching). Logging evicted keys outside the lock minimizes contention.


154-161: LGTM!

Lock guards all invalidation paths. Building the key list before deletion avoids "dictionary changed size during iteration" errors.

tests/unit/tools/mcp/test_cache_threadsafety.py (3)

22-47: LGTM!

The test correctly exercises the cache hit path under contention by seeding a shared key and having readers query it while writers update it concurrently. The implicit "no exception raised" assertion is appropriate for thread-safety verification.


49-67: LGTM!

Good coverage for concurrent put + tool-specific invalidate operations. The mix of writers and invalidators stresses the lock-guarded iteration logic.


69-80: LGTM!

The test effectively stresses eviction under high concurrency with many more unique keys than max_size. Asserting <= max_size correctly validates the bounded-cache invariant.

src/synthorg/budget/trends.py (1)

244-255: Currency invariant is enforced at the right aggregation boundaries.

Filtering first (Line 251) and then validating currency uniformity, plus enforcing it in project_daily_spend (Line 469), correctly prevents mixed-currency sums without rejecting irrelevant out-of-window rows.

Also applies to: 469-470

src/synthorg/api/auth/ticket_store.py (1)

8-14: Lock scope and rejection logging look correct.

The updated lock coverage closes the count-and-insert, pop-and-validate, and snapshot/delete race windows, and the cap-rejection path now logs before raising. This matches the audit goal cleanly.

Also applies to: 26-40, 118-241

tests/unit/api/auth/test_ticket_store_threadsafety.py (1)

32-124: The new contention tests exercise the right invariants.

Using a shared start gate plus threadpool workers gives these tests meaningful concurrency pressure, and the four cases map well to the store's cap, isolation, single-consume, and cleanup guarantees.

src/synthorg/observability/events/event_stream.py (1)

20-20: LGTM!

The new event constant follows the established naming convention (<domain>.<noun>.<verb>) and is consistent with the existing EVENT_STREAM_HUB_* constants in this module.

src/synthorg/communication/event_stream/stream.py (4)

69-107: LGTM!

The constructor properly validates inputs fail-fast (preventing runtime crashes in _record_published_locked), implements the Clock seam pattern for testability, and uses __slots__ for memory efficiency. The validation bounds are correct: max_queue_size >= 1, dedup_ttl_seconds >= 0 (allowing 0 to disable TTL-based eviction), and dedup_max_entries_per_session >= 1.


139-155: LGTM!

The cleanup logic correctly removes per-session dedup state when the last subscriber for that session unsubscribes, preventing memory leaks for long-lived hubs with session churn. Using pop(session_id, None) handles sessions that never had any published events gracefully.


212-239: LGTM!

The duplicate-check logic correctly handles the dedup_ttl_seconds=0 case by skipping time-based eviction entirely (entries only fall out via size-bound trimming). The eviction loop properly walks oldest-first thanks to OrderedDict insertion order, and the cleanup at lines 236-238 prevents empty dicts from accumulating.


241-255: LGTM!

The bounded recording logic is correct: insert the new entry, then trim oldest entries via popitem(last=False) until within bounds. The constructor's validation of dedup_max_entries_per_session >= 1 ensures the trim loop never calls popitem() on an empty dict.

tests/unit/communication/event_stream/test_stream_dedup.py (1)

1-164: LGTM!

Comprehensive test coverage for the dedup functionality:

  • Duplicate suppression within TTL
  • Redelivery after TTL expiry
  • Distinct event IDs all delivered
  • Per-session dedup isolation
  • Bounded window size enforcement
  • Cleanup on last unsubscribe
  • Partial unsubscribe preserves dedup
  • dedup_ttl_seconds=0 disables time-based eviction

The tests follow guidelines: FakeClock injection via the clock= parameter (FakeClock-first), pytestmark for unit marker, and no mocks needed since real objects are used.

docs/design/observability.md (1)

200-206: LGTM!

The documentation accurately describes both event-stream hub warning events, including the new EVENT_STREAM_HUB_PUBLISH_DEDUPED. The description correctly captures the per-session sliding-window behavior, default TTL (60s), and bounded size (1024 entries). The static analysis hint about "dedup" is a false positive—it's standard technical shorthand.

Comment thread src/synthorg/api/controllers/backup.py
Comment thread src/synthorg/api/controllers/backup.py Outdated
Comment thread src/synthorg/backup/scheduler.py Outdated
Comment thread src/synthorg/client/store.py Outdated
Comment thread src/synthorg/communication/event_stream/stream.py Outdated
Comment thread tests/unit/backup/test_scheduler_lifecycle.py Outdated
Comment thread tests/unit/budget/test_trends_currency.py Outdated
Comment thread tests/unit/budget/test_trends_currency.py
Comment thread tests/unit/providers/test_health_prober_lifecycle.py
Comment thread web/src/api/endpoints/clients.ts
Domain errors (CLAUDE.md naming convention):

- BackupUnrestartableError (backup/errors.py) replaces RuntimeError in BackupScheduler.start()

- PruningUnrestartableError (hr/errors.py) replaces RuntimeError in PruningService.start()

- InflectionMonitorLifecycleError (meta/chief_of_staff/monitor.py) replaces RuntimeError in OrgInflectionMonitor.start()

- ProviderLifecycleConflictError (providers/errors.py) replaces RuntimeError in ProviderHealthProber.start()

Source code:

- backup.py controller: idempotency_key str -> NotBlankStr; cached BackupManifest.model_validate wrapped in try/except + BACKUP_FAILED log + InternalServerException

- SimulationStore.unregister: compare-and-delete via expected= snapshot so a fresh retry winning the slot is preserved instead of being silently deleted by the loser's rollback

- event_stream/stream.py: drop dead post-lock if-not-queues branch (already handled inside lock)

- ngrok_adapter: materialise public_url BEFORE assigning _tunnel; cleanup-disconnect on conversion failure

- replay_protection: sample clock once per check() and pass the same now to freshness + nonce eviction (closes boundary replay gap)

- backup_subscriber: try/except around scheduler.start() with structured log + re-raise

Tests:

- test_scheduler_lifecycle: spawn-count assertion + Event-driven hung loop with finally cleanup

- test_trends_currency: derive currencies from DEFAULT_CURRENCY (validator-allowlist forces a real ISO code for the alternate); add out-of-window mixed-currency regression test

- test_health_prober_lifecycle: Event-driven hung loop with finally release+await

- All four lifecycle tests: pytest.raises now expects the new domain-error class instead of RuntimeError

Frontend:

- clients.ts startSimulation 409 path: deep-compare existing config to incoming config (field-by-field, type-safe); only inherit the in-flight run when configs match, else rethrow
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/providers/health_prober.py (1)

158-183: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add a Clock seam before the new timeout behavior hardens around wall time.

ProviderHealthProber now owns more timing-sensitive lifecycle logic, but it still bakes in wall-clock reads/deadlines. That makes these timeout paths harder to drive deterministically and forces the tests onto real time. Please thread clock: Clock | None = None through the constructor and route the interval/drain timing through it so the lifecycle tests can use FakeClock.

As per coding guidelines, "Time injection (Clock seam): classes that read time or sleep cooperatively take clock: Clock | None = None defaulting to SystemClock() (synthorg.core.clock); tests inject FakeClock."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/providers/health_prober.py` around lines 158 - 183, Add a Clock
seam to ProviderHealthProber by adding a new constructor parameter clock: Clock
| None = None (defaulting to SystemClock()) in __init__, import Clock and
SystemClock from synthorg.core.clock, assign self._clock = clock or
SystemClock(), and replace any direct wall-clock reads/sleeps or uses of
_interval/_stop_drain_timeout_seconds in the class (e.g., any await
asyncio.sleep(self._interval) or time-based deadline calculations) to use the
injected clock for sleeping and deadline computation so tests can inject
FakeClock; keep existing validation of interval_seconds and current attributes
(_interval, _stop_drain_timeout_seconds, _lifecycle_lock, _stop_event, _task,
_stop_failed) otherwise unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Around line 143-145: The code is incorrectly calling NotBlankStr(...) at
runtime which raises a TypeError; update the
app_state.idempotency_service.run_idempotent call to pass plain validated
strings (use scope="backup" and key=idempotency_key) instead of constructing
NotBlankStr("backup") or NotBlankStr(idempotency_key), relying on the type
annotations (NotBlankStr) in the service signature for validation rather than
runtime wrapping.

In `@src/synthorg/hr/pruning/service.py`:
- Around line 171-177: In stop(), don't call self._task.cancel() immediately;
instead set self._stop_event and self._wake_event, copy task = self._task,
release the _lifecycle_lock, then await the task to drain cooperatively (e.g.
await asyncio.wait_for(task, timeout=self._graceful_shutdown_timeout)) so
run_pruning_cycle() can finish via the stop/wake path; only on
asyncio.TimeoutError (hard deadline) call task.cancel() and await it to ensure
proper cancellation. Use _lifecycle_lock, _stop_event, _wake_event, _task,
stop(), and run_pruning_cycle() as the referenced symbols.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 101-107: The start() method currently logs TUNNEL_ERROR at WARNING
when self._public_url is already set (the idempotent no-op path); change this to
a non-error event/severity (e.g., use logger.info or logger.debug and a new
message/event like "tunnel_already_active" instead of TUNNEL_ERROR) so repeated
legitimate calls to start() (which check self._public_url and self._port) do not
emit tunnel-failure alerts; update the logging invocation in start() that
references self._public_url, self._port, and TUNNEL_ERROR accordingly and keep
the same contextual fields (phase="start", reason="already_active", port=...)
but with the lowered severity/event name.

In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 135-163: The created helper task pattern leaks when wait_for times
out; replace creating drain_task and shielding it with directly awaiting the
_drain() coroutine under asyncio.wait_for so cancellation propagates and no
orphaned tasks remain: remove asyncio.create_task(_drain()) and
asyncio.shield(drain_task), call await asyncio.wait_for(_drain(),
timeout=self._stop_drain_timeout_seconds) instead, keeping the existing
try/except handling (including handling asyncio.CancelledError,
MemoryError/RecursionError re-raises, and logging via
COS_INFLECTION_CHECK_FAILED with safe_error_description) and preserving the
TimeoutError branch that sets self._stop_failed and logs the error.

---

Outside diff comments:
In `@src/synthorg/providers/health_prober.py`:
- Around line 158-183: Add a Clock seam to ProviderHealthProber by adding a new
constructor parameter clock: Clock | None = None (defaulting to SystemClock())
in __init__, import Clock and SystemClock from synthorg.core.clock, assign
self._clock = clock or SystemClock(), and replace any direct wall-clock
reads/sleeps or uses of _interval/_stop_drain_timeout_seconds in the class
(e.g., any await asyncio.sleep(self._interval) or time-based deadline
calculations) to use the injected clock for sleeping and deadline computation so
tests can inject FakeClock; keep existing validation of interval_seconds and
current attributes (_interval, _stop_drain_timeout_seconds, _lifecycle_lock,
_stop_event, _task, _stop_failed) otherwise unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bd57e3ce-5e15-481e-a5af-0d70a13fb224

📥 Commits

Reviewing files that changed from the base of the PR and between 03887ab and df6b618.

📒 Files selected for processing (20)
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/errors.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/client/store.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/hr/errors.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/providers/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/budget/test_trends_currency.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • web/src/api/endpoints/clients.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: Dashboard Test
  • GitHub Check: Type Check
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • web/src/api/endpoints/clients.ts
  • src/synthorg/backup/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/providers/errors.py
  • tests/unit/budget/test_trends_currency.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/errors.py
  • src/synthorg/api/controllers/backup.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/simulations.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Units: metric only. Spelling: International / British English UI default (colour, behaviour, organise, centred, analyse, cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through @/utils/format, @/utils/locale, DEFAULT_CURRENCY, and useSettingsStore().currency, with no contradiction to the locale-neutral defaults above.

Files:

  • web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Reuse components from web/src/components/ui/ before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens, @/lib/motion presets, and the helpers in @/utils/format. Enforced by scripts/check_web_design_system.py (PostToolUse hook on every web/src/ edit).

web/src/**/*.{ts,tsx}: Always use createLogger from @/lib/logger; never use bare console.warn, console.error, or console.debug in application code
Logger variable name must always be log (e.g., const log = createLogger('module-name'))
Use logger levels: log.debug() (DEV-only, stripped in production), log.warn(), log.error()
Pass dynamic/untrusted values as separate arguments to logger (not interpolated into the message string) so they go through sanitizeArg
Wrap attacker-controlled fields inside structured objects with sanitizeForLog() before embedding in logs
Callers MUST NOT wrap store mutation calls in try / catch; the store owns the error UX
Display counts in paginated lists come from data.length; the wire envelope no longer carries total
Any new caller of health/readiness endpoints must handle the 503 path explicitly
Use sanitizeWsString() and sanitizeWsEnum() from web/src/utils/ws-sanitize.ts (pure helpers, re-exported from @/stores/notifications) for WebSocket payload sanitization
Any new WS payload handler that ingests untrusted strings MUST route through sanitizeWsString() or sanitizeWsEnum(); raw (sanitizeWsString(x, n) ?? '') as EnumType casts are forbidden
NEVER hardcode Motion transition durations; use @/lib/motion presets instead
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from @/utils/format instead
NEVER hardcode currency symbols / codes; use DEFAULT_CURRENCY from @/utils/currencies instead
Use <StatusBadge> instead of inline status dots (defaults to role="img" with aria-label; decorative for adjacent-labeled, announce for liv...

Files:

  • web/src/api/endpoints/clients.ts
web/src/api/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

getLiveness() is always 200 while the process is alive; getReadiness() is 200 healthy / 503 unavailable (binary 'ok' | 'unavailable' outcome, no tri-state)

Files:

  • web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx,css}

📄 CodeRabbit inference engine (web/CLAUDE.md)

NEVER hardcode hex colors, font-family declarations, or pixel spacing; use design tokens instead

Files:

  • web/src/api/endpoints/clients.ts
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/backup/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/providers/errors.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/errors.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/backup/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/providers/errors.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/errors.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/backup/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/providers/errors.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/errors.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/backup/errors.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/providers/errors.py
  • src/synthorg/settings/subscribers/backup_subscriber.py
  • src/synthorg/client/store.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/hr/errors.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/communication/event_stream/stream.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/api/controllers/simulations.py
  • src/synthorg/backup/scheduler.py
  • src/synthorg/hr/pruning/service.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/budget/test_trends_currency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/budget/test_trends_currency.py
  • tests/unit/providers/test_health_prober_lifecycle.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • tests/unit/backup/test_scheduler_lifecycle.py
  • tests/unit/hr/pruning/test_service_lifecycle.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/backup.py
  • src/synthorg/api/controllers/simulations.py
🔇 Additional comments (28)
tests/unit/budget/test_trends_currency.py (2)

26-33: Hardcoded ISO currency literals are still present in alternate-currency selection.

_ALTERNATE_CURRENCY still uses "USD" / "EUR" literals; this was previously flagged and should be switched to project currency constants/types in tests as well.

As per coding guidelines, Currency: never hardcode ISO 4217 codes or symbols.


94-123: Nice coverage on windowed mixed-currency behavior and forecast edge cases.

The out-of-window regression plus mixed-currency/empty-input assertions for both bucket_cost_records and project_daily_spend are aligned with the new invariants.

Also applies to: 136-150

src/synthorg/client/store.py (2)

177-192: LGTM!

The register_if_absent implementation correctly performs atomic check-and-insert under the lock, preventing TOCTOU races where two concurrent callers could both observe absence and proceed.


194-224: LGTM!

The compare-and-delete semantics using identity comparison (current is not expected) are correct for this use case. Since SimulationRecord is frozen, the caller's original claim object remains stable, and identity comparison ensures that a fast retry that won the slot between failure and rollback is preserved rather than silently deleted.

src/synthorg/api/controllers/simulations.py (4)

115-138: LGTM!

The callback attachment order is correct: registering the exception logger before the discard callback ensures that a fast-completing failure surfaces its error before being removed from the background task set.


141-178: LGTM!

The rollback helper correctly handles the task lifecycle: it cancels, awaits drain (catching CancelledError), logs drain failures, and then unregisters with compare-and-delete semantics by passing expected=record. The MemoryError, RecursionError propagation follows project convention for system-fatal exceptions.


349-363: LGTM!

The atomic claim flow correctly uses register_if_absent to prevent duplicate runners. The comment clearly explains the JetStream redelivery / HTTP 5xx retry scenario that motivates the 409-on-duplicate guard.


414-452: LGTM!

The post-claim setup is properly wrapped with rollback handling. The implementation:

  • Tracks whether the task was spawned before failure
  • Logs the failure cause with structured context before rollback
  • Propagates system-fatal exceptions (MemoryError, RecursionError) without rollback
  • Passes the original record to unregister for compare-and-delete safety

This addresses all the concerns from past reviews.

web/src/api/endpoints/clients.ts (2)

264-275: LGTM!

The configsEqual helper correctly performs field-by-field comparison of all five SimulationConfig fields. This is more robust than JSON.stringify which is sensitive to key order, and the explicit field list serves as documentation of the comparison contract.


280-302: LGTM!

The 409 handling correctly implements idempotent retry semantics:

  • Detects Axios errors with status 409
  • Fetches the existing simulation to compare configs
  • Returns the existing result only when configs match exactly
  • Rethrows the original error when configs differ, ensuring a true collision surfaces to the caller

This addresses the past review concern about silently inheriting an unrelated in-flight runner.

src/synthorg/communication/event_stream/stream.py (5)

1-30: LGTM!

Imports are well-organized, event constants are properly imported from the observability module, and default values are sensibly chosen.


69-107: LGTM!

The constructor correctly implements fail-fast validation for all parameters and follows the clock injection pattern per guidelines. The docstring is comprehensive and accurately describes the behavior of dedup_ttl_seconds=0 (disables time-based eviction, size-bound only).


139-155: LGTM!

The dedup-window cleanup on last unsubscribe correctly prevents memory leaks for sessions with no active subscribers. The pop(session_id, None) is safe and the rationale is well-documented.


157-209: LGTM!

The publish flow correctly addresses the past review concerns: subscribers are checked before recording dedup state (preventing false suppression on retry after reconnect), and the dedup window is cleaned up for sessions with no subscribers. The lock scope is minimal—delivery happens outside the lock to avoid serializing publishers.


210-253: LGTM!

The helper methods correctly implement the dedup logic:

  • _is_duplicate_locked properly handles TTL=0 by skipping time-based eviction (per the documented behavior).
  • _record_published_locked bounds entries with FIFO eviction via popitem(last=False).
  • Both methods are correctly marked as requiring the caller to hold self._lock.
src/synthorg/backup/errors.py (1)

40-52: LGTM!

The BackupUnrestartableError follows the established lifecycle error pattern (matching ProviderLifecycleConflictError, InflectionMonitorLifecycleError, PruningUnrestartableError), correctly inherits from ConflictError for HTTP 409 semantics, and includes a clear docstring explaining the lifecycle scenario.

src/synthorg/backup/scheduler.py (4)

37-42: LGTM!

The lifecycle primitives are correctly initialized following the canonical pattern from docs/reference/lifecycle-sync.md: dedicated lock, stop event, drain timeout, and unrestartable flag.


49-87: LGTM!

The start() method correctly implements the canonical async lifecycle pattern:

  • Protected by _lifecycle_lock for concurrent safety
  • Refuses to start when _stop_failed is set (raising domain-specific BackupUnrestartableError)
  • Idempotent via is_running check
  • Clears events before spawning the task
  • Attaches log_task_exceptions callback to surface unexpected loop deaths

89-146: LGTM!

The stop() method correctly implements coordinated shutdown:

  • Protected by _lifecycle_lock
  • Signals stop via both events and task cancellation
  • Shielded drain with hard timeout that marks the scheduler unrestartable on failure
  • Events are recreated while holding the lock to prevent racing start() calls from binding to stale events
  • The lock instance itself is preserved (addressing the past review concern)

170-203: LGTM!

The _run_loop() correctly honors cooperative shutdown:

  • Loops while _stop_event is not set
  • Re-checks _stop_event after the sleep/wait before firing a backup
  • Uses structured logging with safe_error_description for backup failures
src/synthorg/settings/subscribers/backup_subscriber.py (1)

112-134: LGTM!

The error handling for scheduler.start() correctly follows the coding guidelines:

  • MemoryError/RecursionError are re-raised immediately
  • Other exceptions are logged with structured fields (error_type, error=safe_error_description(exc)) plus context (subscriber, namespace, key, note) before re-raising
  • This addresses the past review comment about surfacing startup failures with context
tests/unit/backup/test_scheduler_lifecycle.py (3)

22-25: LGTM!

The _make_scheduler helper correctly uses spec-bound mocks per the mock-spec gate requirement, addressing the past review comment.


31-61: LGTM!

The test correctly verifies the canonical lifecycle contract by:

  • Counting actual create_task calls via a patched wrapper
  • Asserting exactly one task is spawned across three concurrent start() calls
  • Properly cleaning up with a finally block

This addresses the past review comment about explicitly asserting single task creation.


72-108: LGTM!

The test correctly verifies the unrestartable-after-drain-timeout behavior:

  • Uses asyncio.Event().wait() for indefinite blocking (not wall-clock sleep)
  • Uses entered and release events for deterministic test control
  • Properly drains the orphan task in finally to prevent leakage
  • Asserts _stop_failed is set and subsequent start() raises BackupUnrestartableError

This addresses the past review comments about event-gated cancel paths.

src/synthorg/providers/health_prober.py (1)

194-279: Nice lifecycle race fix.

Keeping _stop_event recreation under _lifecycle_lock closes the earlier start/stop gap cleanly and matches the canonical unrestartable-on-timeout pattern.

tests/unit/providers/test_health_prober_lifecycle.py (2)

25-42: ⚡ Quick win

Keep the lifecycle helper hermetic; it currently opens real HTTP probes.

_make_prober() returns an eligible localhost provider, and start() immediately schedules _probe_all() -> _execute_probe(). That means these unit tests depend on whatever is listening on 127.0.0.1:11434, or on a real connect failure when nothing is there. For lifecycle-only coverage, return no providers here or stub _execute_probe().

[sraise_major_issue]

Suggested patch
-from synthorg.config.schema import ProviderConfig
 ...
 def _make_prober() -> ProviderHealthProber:
-    config = MagicMock(spec=ProviderConfig)
-    config.base_url = "http://localhost:11434"
-    config.litellm_provider = "ollama"
-    config.auth_type = "none"
-    config.api_key = None
     resolver = MagicMock(spec=ConfigResolver)
     resolver.get_provider_configs = AsyncMock(
         spec=ConfigResolver.get_provider_configs,
-        return_value={"test-local": config},
+        return_value={},
     )
     resolver.get_int = AsyncMock(spec=ConfigResolver.get_int, return_value=11434)

As per coding guidelines, "Flaky tests: NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally."


74-112: Good fix on the timeout-path cleanup.

Gating the stuck loop on release and awaiting the saved task removes the pending-task leak from the earlier version.

src/synthorg/integrations/webhooks/replay_protection.py (1)

124-135: The WEBHOOK_REPLAY_DETECTED event does not require _ALLOWED_PROPERTIES registration.

WEBHOOK_REPLAY_DETECTED is an observability event (from synthorg.observability.events.integrations), not a telemetry event. Observability logs use the structlog processor pipeline (sanitize_sensitive_fields, scrub_event_fields) for field sanitization; they do not go through the telemetry.privacy.PrivacyScrubber or its _ALLOWED_PROPERTIES allowlist. The privacy validation framework in src/synthorg/telemetry/privacy.py applies only to TelemetryEvent objects sent through the telemetry reporting pipeline, not to structured observability logs. The new fields (skew, nonce_length, max_nonce_chars, nonce_fingerprint) are safe to log without registry updates.

Comment thread src/synthorg/api/controllers/backup.py Outdated
Comment thread src/synthorg/hr/pruning/service.py Outdated
Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py
Comment thread src/synthorg/integrations/webhooks/replay_protection.py
Comment thread src/synthorg/meta/chief_of_staff/monitor.py Outdated
- backup.py: drop runtime NotBlankStr() wrapping in run_idempotent (Annotated alias is not a constructor; type validation runs through Pydantic at the boundary, not on a no-op call)

- pruning/service.py stop(): cooperative drain (await wait_for(shield(task), timeout)) before escalating to task.cancel(); only cancel + mark unrestartable on TimeoutError

- ngrok_adapter idempotent-start: log TUNNEL_ALREADY_ACTIVE at INFO instead of TUNNEL_ERROR at WARNING (legitimate reconnects are not failures and were triggering tunnel-failure alerts)

- monitor.py stop(): replace shield+create_task drain helper with direct wait_for(shield(task)); add explicit cancel + unrestartable mark on TimeoutError; cooperative cancel handling for the no-deadline path

- health_prober: add Clock seam (clock=None constructor param defaulting to SystemClock); convert _execute_probe from staticmethod to instance method to use the injected clock for monotonic timing

- replay_protection _evict_locked: walk every entry instead of early-exiting on the first non-expired one (round 9's sample-outside-lock change broke the insertion-order = timestamp-order assumption under contention)

- test_monitor_lifecycle.test_unrestartable_after_drain_timeout: suppress CancelledError on the post-cleanup task await (Python 3.11+ retains the cancelled state even when the coroutine catches and returns)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Line 168: Replace the parenthesized exception tuple usage "except (ValueError,
TypeError) as exc" with the project's mandated PEP 758 comma-separated form
"except ValueError, TypeError as exc" in the backup exception handler; locate
the handler in src/synthorg/api/controllers/backup.py (the except block shown in
the diff) and update the syntax accordingly so it no longer uses parentheses.

In `@src/synthorg/hr/pruning/service.py`:
- Around line 183-225: The CancelledError handler in stop() currently treats any
CancelledError as the scheduler having drained, but that also catches external
cancellation of the stop() coroutine; change the handler to distinguish the two
by checking the run task's state (self._task or local task variable): if task is
not done (i.e., still running) re-raise the CancelledError so the external
cancellation propagates, otherwise treat it as successful drain; ensure you do
not clear self._task or recreate events when re-raising. Target the cancel
handling around the await asyncio.wait_for(asyncio.shield(task), ...) in stop(),
and use task.done()/task.cancelled() (or equivalent) on the task created by
_run_loop() to decide whether to re-raise.

In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 177-183: The adapter currently emits the global TUNNEL_STARTED
event from ngrok_adapter.start() (the logger.info call that logs TUNNEL_STARTED
and returns self._public_url), which duplicates the same event emitted by
src/synthorg/api/controllers/tunnel.py after await tunnel.start(); remove or
change that logger.info to avoid double-counting: either delete the
TUNNEL_STARTED emit from ngrok_adapter.start(), or replace it with a
provider-scoped event name (e.g., NGROK_TUNNEL_STARTED) or a lower-severity log
(debug) so only the controller emits the global TUNNEL_STARTED while the adapter
keeps internal/provider-level logging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: aa4acf3e-07ed-4d18-a307-e5f39aec797d

📥 Commits

Reviewing files that changed from the base of the PR and between df6b618 and dafc6f4.

📒 Files selected for processing (8)
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/integrations/webhooks/replay_protection.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/observability/events/integrations.py
  • src/synthorg/providers/health_prober.py
  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Deploy Preview
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
  • src/synthorg/providers/health_prober.py
  • src/synthorg/observability/events/integrations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Mock-spec gate (#1604): every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update and commit the diff so the change is reviewable. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from #1604).
Shared mocks: tests/conftest.py exposes mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch); use it instead of building the spec inline.
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam (see ## Code Conventions for the legacy-Callable list); when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals.
Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for...

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/providers/health_prober.py
  • src/synthorg/observability/events/integrations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/providers/health_prober.py
  • src/synthorg/observability/events/integrations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/providers/health_prober.py
  • src/synthorg/observability/events/integrations.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/providers/health_prober.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/api/controllers/backup.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/meta/chief_of_staff/monitor.py
  • src/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...) so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/controllers/backup.py
🔇 Additional comments (26)
src/synthorg/integrations/webhooks/replay_protection.py (6)

230-248: Previous review concern is addressed.

The eviction now correctly walks all entries via list comprehension (lines 246-248) instead of the previous early-exit approach. This fixes the issue where sampling now before acquiring the lock could leave expired entries behind when insertion order diverges from timestamp order under contention. The docstring thoroughly documents this design decision.


5-16: LGTM!

The module docstring now correctly explains the race condition without hardcoded line numbers that would drift with edits. The threading import properly supports the new concurrency controls.


92-92: LGTM!

The threading.Lock instance correctly provides thread-safe access to the shared _seen cache for concurrent webhook handlers.


110-137: LGTM!

The _check_freshness_at helper enables check() to pass a single clock sample through both freshness validation and nonce eviction, closing the boundary replay gap identified in past review. The public check_freshness() method correctly delegates while maintaining backward compatibility for callers using durable dedup.


171-179: LGTM!

Sampling now exactly once per check() call and passing the same snapshot to both freshness and nonce-eviction decisions correctly eliminates the boundary replay gap from the earlier implementation.


181-228: LGTM!

Thread safety is correctly implemented:

  • The lock guards all accesses to self._seen (eviction, duplicate check, insertion, overflow handling)
  • The duplicate local variable captures state while holding the lock; reading it after release is safe
  • Computing the fingerprint outside the lock is correct since _fingerprint_nonce is a pure function with no shared state

The oversized nonce rejection at lines 199-206 aligns with the same MAX_NONCE_CHARS guard applied in the webhook controller before durable dedup.

src/synthorg/providers/health_prober.py (5)

17-18: LGTM!

Imports correctly bring in the Clock seam for testability, safe_error_description for secure error logging, and ProviderLifecycleConflictError as the domain-scoped error for lifecycle conflicts.

Also applies to: 32-32


146-189: LGTM!

The lifecycle infrastructure is correctly initialized:

  • Clock seam follows the documented pattern with SystemClock() default.
  • _lifecycle_lock is constructed eagerly and never replaced (per lifecycle-sync.md).
  • _stop_failed and _stop_drain_timeout_seconds support the unrestartable-on-timeout behavior.

191-222: LGTM!

The start() method correctly implements the canonical lifecycle pattern:

  • Serializes concurrent calls via _lifecycle_lock.
  • Refuses restart after a timed-out stop with a domain-scoped ProviderLifecycleConflictError.
  • Logs at WARNING before raising (per guidelines).
  • Idempotent when already running.

224-285: LGTM!

The stop() method is well-implemented:

  • Holds _lifecycle_lock throughout to prevent racing start() calls.
  • Shielded drain with bounded timeout; on timeout marks prober unrestartable and propagates the error.
  • Recreates _stop_event inside the lock (fixing the past race condition).
  • Uses safe_error_description for secure error logging.
  • _lifecycle_lock remains a stable instance throughout the service lifetime.

436-475: LGTM!

The _execute_probe method now uses the injected Clock seam for timing, enabling tests to drive probe timing via FakeClock instead of relying on wall time. The Clock protocol correctly defines monotonic() -> float, and HealthProber properly initializes it with a SystemClock default.

src/synthorg/observability/events/integrations.py (1)

113-113: Good addition of a non-error tunnel lifecycle event.

This gives the adapter a dedicated event for the idempotent already-active path instead of overloading TUNNEL_ERROR.

src/synthorg/api/controllers/backup.py (2)

88-107: LGTM!

The idempotency_key parameter changes are well-implemented: NotBlankStr type validates non-whitespace input at the boundary, required=True enforces the header presence, and max_length=255 bounds storage requirements with a clear security rationale in the comment.


143-165: LGTM!

The idempotency flow correctly passes plain strings (the literal "backup" and the already-validated header value) to run_idempotent, avoiding the previous runtime NotBlankStr(...) call issue. The timeout path properly discriminates an in-flight claim from a legitimate None return and surfaces HTTP 409 via ConflictError.

src/synthorg/meta/chief_of_staff/monitor.py (6)

34-46: LGTM!

The InflectionMonitorLifecycleError properly inherits from ConflictError (which is part of the DomainError hierarchy) and follows the domain error naming convention. The ClassVar default_message is well-defined.


79-85: LGTM!

Lifecycle primitives are correctly constructed eagerly in __init__, following the canonical lifecycle pattern documented in docs/reference/lifecycle-sync.md. This prevents a racing stop() from observing a half-initialized lock.


87-119: LGTM!

The start() method correctly implements the canonical lifecycle pattern: serializes on _lifecycle_lock, rejects restarts after timed-out stops with a proper domain error, and is idempotent for concurrent calls. The task naming and done-callback for exception logging are good practices.


121-182: LGTM!

The stop() method correctly implements the canonical lifecycle pattern:

  • Serializes on _lifecycle_lock to prevent concurrent start() from recreating the task mid-stop.
  • Uses shield to allow in-flight checks to complete naturally while respecting the drain timeout.
  • On timeout, explicitly cancels and marks unrestartable before re-raising.
  • Recreates _stop_event inside the lock, addressing the race condition identified in earlier reviews.

The exception handling properly uses safe_error_description and follows the PEP 758 except MemoryError, RecursionError: pattern.


184-210: LGTM!

The _loop() method correctly uses wait_for(_stop_event.wait(), timeout=interval) instead of bare asyncio.sleep(). This allows cooperative shutdown to wake the loop immediately when _stop_event is set, rather than waiting for the full interval to elapse.


234-251: LGTM!

The _emit_to_sinks() method properly uses asyncio.TaskGroup for parallel fan-out as recommended by the coding guidelines. The per-sink exception handling ensures one sink's failure doesn't block emissions to other sinks.

tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (4)

25-36: LGTM!

The _make_monitor() helper correctly uses AsyncMock(spec=SnapshotBuilder) and sets builder.build.return_value instead of reassigning the method, addressing the past review feedback about spec-bound mocks.


41-89: LGTM!

The test properly addresses the past review feedback by using a controllable blocking_build coroutine to ensure the first task remains alive while concurrent start() calls run. The asyncio.create_task counting pattern and final assertion len(spawned) == 1 definitively prove the single-spawn invariant. Cleanup in the finally block ensures no task leaks.


91-104: LGTM!

The test includes the positive assertion assert monitor._task is not None after the second start(), which addresses past review feedback. This ensures a regression where start() silently no-ops after a clean stop would be detected.


106-156: LGTM!

The test addresses all past review feedback:

  • Uses asyncio.Event().wait() instead of wall-clock-sensitive asyncio.sleep() for the hung loop, following the guidelines for blocking indefinitely until cancelled.
  • Uses entered/release events for deterministic control flow.
  • Wraps cleanup in finally so a failed assertion still releases the hung loop and drains the task.
  • Uses contextlib.suppress(asyncio.CancelledError) when awaiting the cancelled task, handling the expected residual cancellation state.

The final assertion correctly expects InflectionMonitorLifecycleError with a message containing "unrestartable".

src/synthorg/hr/pruning/service.py (2)

143-163: Canonical start() serialization looks good.

Using a dedicated _lifecycle_lock together with PruningUnrestartableError makes concurrent start() calls idempotent and preserves the unrestartable contract after a timed-out stop.


763-775: Good stop check after wakeup.

Re-checking _stop_event before run_pruning_cycle() avoids launching a fresh cycle after shutdown has already been requested.

raise ConflictError(msg)
try:
manifest = BackupManifest.model_validate(outcome.result)
except (ValueError, TypeError) as exc:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use PEP 758 except syntax without parentheses.

The project mandates PEP 758 comma-separated exception syntax without parentheses.

Suggested fix
-        except (ValueError, TypeError) as exc:
+        except ValueError, TypeError as exc:

As per coding guidelines, "This project uses Python 3.14+ with PEP 758 except syntax: 'except A, B:' (comma-separated, no parentheses) is correct and mandatory."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except (ValueError, TypeError) as exc:
except ValueError, TypeError as exc:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/backup.py` at line 168, Replace the
parenthesized exception tuple usage "except (ValueError, TypeError) as exc" with
the project's mandated PEP 758 comma-separated form "except ValueError,
TypeError as exc" in the backup exception handler; locate the handler in
src/synthorg/api/controllers/backup.py (the except block shown in the diff) and
update the syntax accordingly so it no longer uses parentheses.

Comment thread src/synthorg/hr/pruning/service.py
Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py
…ly wrong)

- pruning/service.py stop(): distinguish external CancelledError from drain-success. If task is still running when CancelledError is caught, the cancel was aimed at stop() (not the loop); re-raise so the external cancellation propagates instead of pretending the service drained

- ngrok_adapter: emit NGROK_TUNNEL_STARTED (provider-scoped) instead of the global TUNNEL_STARTED. The tunnel controller already emits the global event after the adapter returns; emitting both inflates metrics keyed on the global event

Skipped (factually wrong, logged):

- backup.py:168 'use PEP 758 except ValueError, TypeError as exc' is invalid Python. CLAUDE.md explicitly says: 'as exc requires parens (except (A, B) as exc:)'. The current parenthesized form is correct.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 157-163: Replace the fragile locals().get("tunnel") lookup by
explicitly initializing the tunnel variable to None before the try/cleanup block
and then use that tunnel variable (or local_tunnel) to call ngrok.disconnect; in
other words, set tunnel = None above the try, ensure any assignment to the
tunnel reference within the function updates that variable, and then call await
asyncio.to_thread(ngrok.disconnect, getattr(tunnel, "public_url", None)) inside
the cleanup so the code uses an explicit, well-scoped symbol instead of relying
on locals().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: adf51712-250d-48f6-984e-550c5f682cc8

📥 Commits

Reviewing files that changed from the base of the PR and between dafc6f4 and d003360.

📒 Files selected for processing (3)
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/observability/events/integrations.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Web benchmarks
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Dashboard Test
  • GitHub Check: Dashboard Build
  • GitHub Check: Dashboard Type Check
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency.
Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format which read getLocale() from @/utils/locale. The backend has no operator-tunable locale setting; backend Intl formatting uses the system locale plus the browser timezone. The company.name_locales list controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render via Intl without passing timeZone (browser tz wins).
Date / number format: always via Intl; no hand-rolled templates.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/observability/events/integrations.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced by scripts/check_persistence_boundary.py (pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g. ArtifactService, WorkflowService, MemoryService, CustomRulesService, UserService, ProjectService, SsrfViolationService, SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced by scripts/check_persistence_boundary.py).
Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError).
Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/observability/events/integrations.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/observability/events/integrations.py
src/synthorg/!(api)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden; domain errors flow through EXCEPTION_HANDLERS for centralised RFC 9457 routing.

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/hr/pruning/service.py
  • src/synthorg/observability/events/integrations.py
src/synthorg/!(observability)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/!(observability)/**/*.py: Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Carve-outs (e.g. meta/mcp/handlers/common_logging.py keying at a fixed string) are documented in the module docstring.
Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py for handler-construction / bootstrap code).

Files:

  • src/synthorg/integrations/tunnel/ngrok_adapter.py
  • src/synthorg/hr/pruning/service.py
🔇 Additional comments (11)
src/synthorg/observability/events/integrations.py (1)

113-120: LGTM!

The new event constants follow established naming conventions and are well documented. The provider-scoped NGROK_TUNNEL_STARTED correctly complements the controller-level TUNNEL_STARTED without double-counting, as confirmed by the usage in ngrok_adapter.py.

src/synthorg/integrations/tunnel/ngrok_adapter.py (6)

20-25: LGTM!

New event constants are properly imported and used throughout the module.


48-75: LGTM!

The constructor correctly handles the auth token as a bootstrap secret (the sanctioned init-time exception), and the lifecycle lock documentation appropriately explains why no drain timeout is needed for this adapter pattern.


96-112: LGTM!

Idempotent reconnect path correctly uses INFO-level logging with the non-error TUNNEL_ALREADY_ACTIVE event, avoiding false tunnel-failure alerts for legitimate retry scenarios.


113-125: LGTM!

Per-call PyngrokConfig avoids global state mutation and keeps auth tokens instance-local, correctly addressing the process-global token overwrite concern.


174-187: LGTM!

State is assigned atomically within the lock, and the provider-scoped NGROK_TUNNEL_STARTED event correctly avoids double-counting with the controller's TUNNEL_STARTED.


189-214: LGTM!

The stop method correctly implements best-effort cleanup semantics: it logs disconnect failures without propagating them and unconditionally clears state to prevent stale handles from blocking subsequent operations.

src/synthorg/hr/pruning/service.py (4)

24-24: Good call using a pruning-scoped unrestartable error.

This keeps the lifecycle failure on the domain error path instead of bubbling a bare runtime exception.


98-105: The dedicated lifecycle state is wired up cleanly.

Keeping _lifecycle_lock separate from _processing_lock and tracking _stop_failed / drain timeout here matches the canonical async lifecycle pattern.


136-231: The async start/stop flow looks solid.

The lock-guarded spawn, cooperative drain, timeout-to-unrestartable transition, and external-cancellation re-raise cover the lifecycle edge cases that were risky in this area.


763-788: The scheduler loop now cooperates with shutdown correctly.

Re-checking _stop_event before each cycle and preserving CancelledError propagation keeps stop semantics predictable while still allowing wake() to short-circuit the sleep.

Comment thread src/synthorg/integrations/tunnel/ngrok_adapter.py Outdated
- ngrok_adapter.start(): replace fragile locals().get('tunnel') lookup with explicit tunnel: Any = None initialised before the try block. The cleanup path now references an unconditionally-defined symbol; the previous form would have raised NameError if a tooling refactor renamed 'tunnel' without updating the locals() string. Type is Any because pyngrok ships untyped stubs.
@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Aureliolo Aureliolo merged commit bcce097 into main May 2, 2026
77 checks passed
@Aureliolo Aureliolo deleted the fix/audit-cleanup-c-persistence branch May 2, 2026 18:44
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 2, 2026 18:44 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 3, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Frontend and UX polishing improves user interface responsiveness and
visual consistency.
- API hygiene and validation enhancements provide smoother and more
reliable interactions.

### What's new
- Introduced typed-boundary helpers enabling better type safety and
parse_typed workflows.
- Added codebase-audit skill prompt tuning for improved project
auditing.

### Under the hood
- Eliminated flaky tests caused by module-level state for more stable
test outcomes.
- Unified image tag management under CLI and Renovate for consistent
dependency updates.
- Added cross-PR file-overlap analysis to the review dependency pull
request skill.
- Updated multiple dependencies including Python, Web, CLI, and
container libraries.
- Improved CI tooling and lock file maintenance for better build
reliability.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.8](v0.7.7...v0.7.8)
(2026-05-03)


### Features

* **api:** typed-boundary helper + codebase-audit skill prompt tuning
([#1712](#1712))
([40ee65b](40ee65b))
* **boundary:** RFC
[#1711](#1711) Phases 2 + 3
— typed boundaries via parse_typed
([#1720](#1720))
([7b9f409](7b9f409))


### Bug Fixes

* **api:** audit cleanup B -- API hygiene & validation
([#1719](#1719))
([3d790d9](3d790d9))
* audit cleanup C - persistence, concurrency & data integrity
([#1708](#1708))
([#1717](#1717))
([bcce097](bcce097))
* **test:** exterminate xdist-flaky tests with module-level state
([#1713](#1713))
([#1721](#1721))
([8d258dd](8d258dd))
* **web:** audit cleanup E -- frontend & UX polish
([#1710](#1710))
([#1718](#1718))
([3a3591a](3a3591a))


### Refactoring

* **cli:** single source of truth for DHI image tags + Renovate manager
([#1723](#1723))
([57980a2](57980a2))


### Documentation

* audit cleanup D -- public-facing & docs sync
([#1709](#1709))
([#1715](#1715))
([ade03b7](ade03b7))


### Tests

* **engine:** make TestDrainTimeout deterministic + preserve subclass
type in [@Ontology](https://github.com/ontology)_entity
([#1729](#1729))
([b00fb05](b00fb05))


### CI/CD

* Update CI tool dependencies
([#1703](#1703))
([355a9ff](355a9ff))


### Maintenance

* add cross-PR file-overlap analysis to review-dep-pr skill
([#1722](#1722))
([3861d8a](3861d8a))
* **ci:** unify apko-version under workflow env so Renovate manages it
everywhere ([#1724](#1724))
([9c0a7fd](9c0a7fd))
* consolidate DHI image-pin custom regex managers
([#1726](#1726))
([b8b0cba](b8b0cba))
* **deps:** update dependency chainguard-dev/melange to v0.50.4
([#1701](#1701))
([8cbf83a](8cbf83a))
* Lock file maintenance
([#1705](#1705))
([414cfea](414cfea))
* Lock file maintenance
([#1727](#1727))
([5cb1212](5cb1212))
* Update CLI dependencies
([#1702](#1702))
([9fb57b9](9fb57b9))
* Update Container dependencies
([#1698](#1698))
([6d24fd6](6d24fd6))
* Update dependency @eslint-react/eslint-plugin to v5
([#1704](#1704))
([1cb1294](1cb1294))
* Update Python dependencies
([#1699](#1699))
([8e7af3a](8e7af3a))
* Update Python dependencies to v4.15.0
([#1725](#1725))
([69164c8](69164c8))
* Update Web dependencies
([#1700](#1700))
([715300d](715300d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit cleanup C: persistence, concurrency & data integrity

1 participant