fix: audit cleanup C - persistence, concurrency & data integrity (#1708)#1717
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🧰 Additional context used📓 Path-based instructions (4)**/*.{py,ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
src/synthorg/!(api)/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/!(observability)/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🔇 Additional comments (1)
WalkthroughThis PR applies concurrency, lifecycle, and idempotency fixes across the codebase and updates documentation, tests, and frontend callers. Backup endpoints now require an |
There was a problem hiding this comment.
Code Review
This pull request implements comprehensive concurrency safety, idempotency, and lifecycle management across the system. Key changes include the adoption of a canonical lifecycle pattern for background services (scheduler, prober, monitor, pruning), thread-safety enhancements for in-memory stores (ticket store, replay protection, MCP cache), and the introduction of mandatory idempotency keys for manual backups. Additionally, it adds per-session event deduplication in the event stream hub, currency validation in budget aggregation, and refactors escalation factories to use a registry-based dispatch. Documentation has been updated with a new ADR regarding LGPL Postgres drivers and design notes on idempotency. I have no feedback to provide as there are no review comments.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1717 +/- ##
==========================================
+ Coverage 84.62% 84.65% +0.02%
==========================================
Files 1782 1782
Lines 101896 102180 +284
Branches 8968 8992 +24
==========================================
+ Hits 86227 86497 +270
- Misses 13476 13491 +15
+ Partials 2193 2192 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/integrations/webhooks/replay_protection.py (1)
131-220: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winSplit
check()and reusecheck_freshness()to avoid logic drift.Line 131 keeps
check()well over the 50-line limit and duplicates freshness checks already implemented incheck_freshness(). Extracting nonce handling into a helper and delegating freshness tocheck_freshness()will keep behavior aligned and reduce maintenance risk.♻️ Proposed refactor direction
def check( self, *, nonce: str | None, timestamp: float | None, ) -> bool: @@ - now = self._clock.now().timestamp() - if nonce is None and timestamp is None: logger.warning( WEBHOOK_REPLAY_DETECTED, reason="no freshness signal (nonce and timestamp both missing)", ) return False - - if timestamp is not None and not math.isfinite(timestamp): - logger.warning( - WEBHOOK_REPLAY_DETECTED, - reason="non-finite timestamp", - ) - return False - - if timestamp is not None and abs(now - timestamp) > self._window: - logger.warning( - WEBHOOK_REPLAY_DETECTED, - reason="timestamp outside window", - skew=abs(now - timestamp), - ) + if not self.check_freshness(timestamp): return False + now = self._clock.now().timestamp() + return self._check_nonce(nonce=nonce, now=now) + +def _check_nonce(self, *, nonce: str | None, now: float) -> bool: + if nonce is None: + with self._lock: + self._evict_locked(now) + return True + if len(nonce) > MAX_NONCE_CHARS: + logger.warning( + WEBHOOK_REPLAY_DETECTED, + reason="nonce exceeds max size", + nonce_length=len(nonce), + max_nonce_chars=MAX_NONCE_CHARS, + ) + return False + key = _fingerprint_nonce(nonce) + with self._lock: + self._evict_locked(now) + duplicate = key in self._seen + if not duplicate: + self._seen[key] = now + while len(self._seen) > self._max_entries: + self._seen.popitem(last=False) + if duplicate: + logger.warning( + WEBHOOK_REPLAY_DETECTED, + reason="duplicate nonce", + nonce_fingerprint=key[:16], + ) + return False + return TrueAs per coding guidelines, "Functions must be < 50 lines."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/integrations/webhooks/replay_protection.py` around lines 131 - 220, The check() method currently mixes freshness validation and nonce cache logic; extract and reuse the existing check_freshness(timestamp) for all timestamp-related checks and move nonce-specific logic into a small helper (e.g., _check_nonce(nonce, now) or inline after freshness delegation) so check() stays under 50 lines. Concretely: have check() first fail-fast on both nonce and timestamp missing (preserve the same logger call), then call self.check_freshness(timestamp) (which should return bool or raise) and return False on failure, then handle the nonce branch: if nonce is None acquire self._lock, call self._evict_locked(now) and return True; otherwise keep the oversized check using MAX_NONCE_CHARS and the logger, compute key = _fingerprint_nonce(nonce), with self._lock call self._evict_locked(now) and the duplicate detection/update of self._seen (respecting _max_entries and popitem), and preserve the duplicate logger with nonce_fingerprint=key[:16]; keep use of _lock, _evict_locked, _fingerprint_nonce, _seen and _max_entries to ensure behavior is unchanged.tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (1)
41-105:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCover the registry fallback path too.
These tests only exercise the registered keys. Add one assertion for an unknown backend/strategy value so the new registry contract still fails with
ValueErrorif a key is misspelled or omitted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py` around lines 41 - 105, Add tests that assert the registry fallback raises ValueError for unknown keys: for build_escalation_queue_store call it with EscalationQueueConfig(backend="unknown") (and persistence=None or a fake) and assert raises ValueError with an appropriate message about unknown backend; likewise for build_decision_processor call it with EscalationQueueConfig(decision_strategy="unknown") and assert it raises ValueError. Place these assertions alongside the existing TestQueueStoreRegistry and TestDecisionProcessorRegistry tests to cover the fallback/unregistered-path behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/auth/ticket_store.py`:
- Around line 156-159: The create() path raises TicketLimitExceededError when
user_pending >= self._max_pending but does not log the rejection; add a
WARNING/ERROR log immediately before raising that includes the user identifier,
the current pending count and the max (e.g. use self._logger.warning(...) or
module logger.warning(...)) so the rejection is recorded (reference symbols:
create(), user_pending, self._max_pending, TicketLimitExceededError,
user.user_id).
In `@src/synthorg/backup/scheduler.py`:
- Around line 128-132: The lifecycle lock is being replaced after stop which
allows overlap between callers on the old and new locks; instead ensure a single
dedicated self._lifecycle_lock instance exists for the service lifetime (create
it once, e.g. in __init__ or at construction) and do not reassign or replace it
in stop() or start(); if you need to reinitialize synchronization state
keep/replace only events like self._stop_event or self._wake_event but never
rebind self._lifecycle_lock, and update start()/stop() to use the persistent
self._lifecycle_lock for mutual exclusion.
In `@src/synthorg/client/continuous.py`:
- Around line 11-15: The module docstring incorrectly states that
self._lifecycle_lock "spans the full body of start() and stop()"; update the
docstring to match the implementation by noting that self._lifecycle_lock is
only held briefly in start() to serialize the running-flag check (and not held
for the full start execution) and that stop() does not use this lock; reference
self._lifecycle_lock, start(), and stop() in the revised sentence so future
readers understand the actual locking scope and intent.
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 60-67: Validate constructor inputs for dedup parameters: in the
class initializer that accepts dedup_ttl_seconds and
dedup_max_entries_per_session (the shown constructor), check that
dedup_ttl_seconds is non-negative and dedup_max_entries_per_session is a
non-negative (or strictly positive if logical) integer and raise a ValueError
with a clear message if not; this prevents the trim loop in
_record_published_locked() from calling popitem() on an empty OrderedDict at
publish time. Update validation to coerce or reject bad types (e.g., non-int for
dedup_max_entries_per_session) and document the error messages so callers
fail-fast when creating the Stream instead of crashing later in
_record_published_locked().
In `@src/synthorg/hr/pruning/service.py`:
- Around line 209-213: The code currently replaces self._lifecycle_lock which
breaks synchronization for callers queued on the original lock; instead, do not
reassign self._lifecycle_lock during restart preparation — only recreate
loop-bound primitives like self._stop_event and self._wake_event after releasing
the lock. Concretely, remove or revert the assignment of self._lifecycle_lock =
asyncio.Lock() in the restart path, keep the existing self._lifecycle_lock used
by start()/stop(), and only set new asyncio.Event() instances for
self._stop_event and self._wake_event (and ensure this happens outside the held
lifecycle lock) so events rebind to a new loop while preserving the single
dedicated lifecycle lock.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 81-83: The duplicate-start branch in the ngrok adapter (the check
on self._tunnel in the method that starts the tunnel) raises RuntimeError
without logging; modify that branch to emit a warning or error log (e.g., use
the adapter or module logger such as self.logger or module-level logger) with
the message "ngrok tunnel already active on this adapter" and any relevant
context (adapter id/state) immediately before raising the RuntimeError to
satisfy the error-path observability rule.
In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 152-155: The code currently reassigns self._lifecycle_lock during
stop which can split synchronization between old waiters and new entrants;
instead ensure self._lifecycle_lock is created once (e.g., in __init__) and
never reassigned, and only recreate or reset non-lock primitives (like
self._stop_event) if necessary for event-loop rebinding; update the code that
now sets self._lifecycle_lock = asyncio.Lock() to remove that reassignment, keep
start()/stop() protected by the single dedicated self._lifecycle_lock, and if
you must recreate loop-bound primitives do so for self._stop_event only while
leaving self._lifecycle_lock intact.
In `@src/synthorg/providers/health_prober.py`:
- Around line 264-267: The change replaces self._lifecycle_lock with a new
asyncio.Lock() during stop, which breaks serialization because callers may still
be awaiting the old lock; instead do not swap out self._lifecycle_lock — keep it
created once (e.g., at init) and never reassign it in stop()/start(); if you
need to reset state on stop, only recreate self._stop_event (or clear it) but
leave self._lifecycle_lock intact so start() and stop() still serialize
correctly.
In `@tests/unit/api/auth/test_ticket_store_threadsafety.py`:
- Around line 46-49: Tests currently start worker bodies opportunistically (the
attempt function) causing nondeterministic races; introduce a shared start gate
(e.g., threading.Event or threading.Barrier) named start_event that each worker
(the attempt function) waits on before proceeding, submit all futures with the
workers blocked, call start_event.set() once all tasks are submitted to release
them simultaneously, then collect results as before; apply the same pattern to
the other test blocks that create ThreadPoolExecutor/futures around the attempt
worker (the blocks at the other ranges).
In `@tests/unit/api/controllers/test_backup_required_idempotency.py`:
- Around line 42-55: The helper _make_state uses bare AsyncMock/MagicMock
instances; replace them with spec-bound mocks to prevent interface drift: set
service = AsyncMock(spec=BackupServiceInterface) (or the concrete backup service
class used in production) and service.create_backup = AsyncMock(spec=Callable,
return_value=_make_manifest()), set app_state = MagicMock(spec=AppStateClass),
idempotency_service = MagicMock(spec=IdempotencyServiceClass) and assign
idempotency_service.run_idempotent = AsyncMock(spec=Callable) (or use
spec_set=... if stricter), and keep CursorSecret.from_key unchanged; update the
other mock site around lines 80-84 similarly to use the same spec classes
referenced here.
In `@tests/unit/api/controllers/test_backup.py`:
- Around line 76-96: The test uses a bare MagicMock for idempotency_service
which can mask interface drift; replace it with a specced async double by
creating idempotency_service = MagicMock(spec=IdempotencyService) (or the
concrete class used in production) and implement run_idempotent as an async def
_run_idempotent(...) that accepts only the expected signature, uses an AsyncMock
for the callback (or awaits the provided callback to enforce the awaitable
contract), sets outcome.timed_out/result/fresh as before, assign
idempotency_service.run_idempotent = _run_idempotent, and set
app_state.idempotency_service = idempotency_service so the test enforces the
real interface for idempotency_service and its run_idempotent method.
In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 33-60: The helpers _make_state and _make_request create bare
MagicMock/AsyncMock instances (e.g., sim_state, app_state, state, and the return
of _make_request) which violates the spec= requirement; replace each bare
MagicMock()/AsyncMock() call with mocks that declare the concrete interface via
spec= (for example spec=SimulationState, spec=SimulationStore, spec=AppState,
spec=PoolClientList, etc.) and ensure async methods use AsyncMock(spec=...)
where appropriate (e.g., register_if_absent, save, list_clients); update
_make_request to return a MagicMock(spec=IncomingRequestClass) and then remove
the test-level baseline suppressions that were added to silence bare-mock
checks.
In `@tests/unit/backup/test_scheduler_lifecycle.py`:
- Around line 21-23: The test helper _make_scheduler creates a bare MagicMock
and AsyncMock; replace them with spec-bound mocks of the concrete service type
used by BackupScheduler (e.g., use MagicMock(spec=BackupService) and set
service.create_backup = AsyncMock(spec=Callable or the concrete method
signature) or AsyncMock(spec=BackupService.create_backup)) so the mock enforces
the real interface for BackupScheduler and its create_backup method; update
_make_scheduler to construct service = MagicMock(spec=BackupService) and assign
service.create_backup = AsyncMock(spec=BackupService.create_backup) (or the
appropriate concrete class/method) before returning BackupScheduler(service,
interval_hours=1).
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 21-23: Replace the bare AsyncMock with a spec-bound mock to
SnapshotBuilder and stop reassigning build: create builder =
AsyncMock(spec=SnapshotBuilder) (or use create_autospec/Sentinel as your test
helpers) and set builder.build.return_value to the desired value instead of
reassigning builder.build; then pass that builder into OrgInflectionMonitor so
the build method stays an AsyncMock consistent with the SnapshotBuilder
interface.
In `@tests/unit/settings/test_backup_subscriber.py`:
- Around line 30-34: The scheduler test double is too loose: tighten it by
creating the scheduler mock with a concrete spec (e.g., scheduler =
MagicMock(spec=Scheduler)) and give its async methods proper AsyncMock specs
(scheduler.start = AsyncMock(spec=Scheduler.start), scheduler.stop =
AsyncMock(spec=Scheduler.stop)) and reschedule as
MagicMock(spec=Scheduler.reschedule); keep type(scheduler).is_running =
PropertyMock(...) for the property. Finally update the enabled-path assertion to
wait for the coroutine by using scheduler.start.assert_awaited_once() (and
similarly use assert_awaited_once() for stop where applicable).
In `@tests/unit/tools/mcp/test_cache_threadsafety.py`:
- Around line 22-38: The test test_concurrent_get_put_no_corruption doesn't
exercise cache hits because writer and reader use different payloads; update the
reader in that test so it queries the exact same key+input the preceding writer
used (e.g., change reader to call cache.get(f"tool-{i % 8}", {"i": i-1}) when i
is odd) so the get frequently hits the existing entry and exercises the
MCPResultCache hit path (move_to_end + deepcopy) modified in this PR.
In `@web/src/api/endpoints/backup.ts`:
- Around line 13-17: The idempotencyKey is currently accepted as-is (using ??)
which preserves empty or whitespace-only strings; change the logic that defines
key (the variable assigned from idempotencyKey) to treat blank/whitespace-only
values as "not provided" and generate a UUID via crypto.randomUUID() instead.
Specifically, update the assignment that sets key (used in the headers for
apiClient.post) to check idempotencyKey.trim() (or equivalent) and only reuse it
when non-empty; otherwise call crypto.randomUUID() so no empty Idempotency-Key
is sent.
---
Outside diff comments:
In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 131-220: The check() method currently mixes freshness validation
and nonce cache logic; extract and reuse the existing check_freshness(timestamp)
for all timestamp-related checks and move nonce-specific logic into a small
helper (e.g., _check_nonce(nonce, now) or inline after freshness delegation) so
check() stays under 50 lines. Concretely: have check() first fail-fast on both
nonce and timestamp missing (preserve the same logger call), then call
self.check_freshness(timestamp) (which should return bool or raise) and return
False on failure, then handle the nonce branch: if nonce is None acquire
self._lock, call self._evict_locked(now) and return True; otherwise keep the
oversized check using MAX_NONCE_CHARS and the logger, compute key =
_fingerprint_nonce(nonce), with self._lock call self._evict_locked(now) and the
duplicate detection/update of self._seen (respecting _max_entries and popitem),
and preserve the duplicate logger with nonce_fingerprint=key[:16]; keep use of
_lock, _evict_locked, _fingerprint_nonce, _seen and _max_entries to ensure
behavior is unchanged.
In
`@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py`:
- Around line 41-105: Add tests that assert the registry fallback raises
ValueError for unknown keys: for build_escalation_queue_store call it with
EscalationQueueConfig(backend="unknown") (and persistence=None or a fake) and
assert raises ValueError with an appropriate message about unknown backend;
likewise for build_decision_processor call it with
EscalationQueueConfig(decision_strategy="unknown") and assert it raises
ValueError. Place these assertions alongside the existing TestQueueStoreRegistry
and TestDecisionProcessorRegistry tests to cover the fallback/unregistered-path
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: afa59223-3a93-4652-a57f-3dd9e2165a11
📒 Files selected for processing (46)
docs/design/backup.mddocs/design/client-simulation.mddocs/design/observability.mddocs/licensing.mddocs/reference/lifecycle-sync.mddocs/research/lgpl-postgres-driver-decision.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/auth/ticket_store.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/backup/service.pysrc/synthorg/budget/trends.pysrc/synthorg/client/continuous.pysrc/synthorg/client/store.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/tools/mcp/cache.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/backup/test_scheduler.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/client/test_continuous_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/hr/pruning/test_service.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/tools/mcp/test_cache_threadsafety.pyweb/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Backend
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Test
- GitHub Check: Dashboard Type Check
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (12)
web/src/**/!(logger).ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/!(logger).ts?(x): Always usecreateLoggerfrom@/lib/loggerinstead of bareconsole.warn,console.error, orconsole.debugin application code; use variable namelog
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go throughsanitizeArg
Wrap attacker-controlled fields inside structured objects withsanitizeForLog()before embedding in logs
Files:
web/src/api/endpoints/clients.tsweb/src/api/endpoints/backup.ts
web/src/**/*.ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls intry/catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from@/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from@/utils/formatand design tokens
NEVER hardcode currency symbols or codes; useDEFAULT_CURRENCYfrom@/utils/currencies
Files:
web/src/api/endpoints/clients.tsweb/src/api/endpoints/backup.ts
web/src/api/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Use
getLiveness()for 200 always-alive checks andgetReadiness()for binary'ok' | 'unavailable'health checks (no tri-state); any new caller must handle the 503 path explicitly
Files:
web/src/api/endpoints/clients.tsweb/src/api/endpoints/backup.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
web/src/**/*.{ts,tsx}: Reuse components fromweb/src/components/ui/before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens,@/lib/motionpresets, and the helpers in@/utils/format. Enforced byscripts/check_web_design_system.py(PostToolUse hook on everyweb/src/edit). Seeweb/CLAUDE.mdfor the component inventory, token rules, and post-training references (TS6, Storybook 10).
Never hardcode BCP 47 locale strings or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale.
Currency: never hardcode ISO 4217 codes or symbols. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency. Field naming: no_usdsuffix on money fields anywhere.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins). Date / number format: always viaIntl; no hand-rolled templates. Units: metric only.
Files:
web/src/api/endpoints/clients.tsweb/src/api/endpoints/backup.ts
web/**/*.{ts,tsx,js,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Web Dashboard: see
web/CLAUDE.mdfor commands, design system, and component inventory. The CI-matching full-suite leak check isnpm --prefix web run test -- --coverage --detect-async-leaks(theLeaks N leakssummary must stay at or below theMAX_ASYNC_LEAKSceiling in.github/workflows/ci.yml; any new store that schedules timers must expose a teardown hook perweb/CLAUDE.md).
Files:
web/src/api/endpoints/clients.tsweb/src/api/endpoints/backup.ts
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/design/**/*.md: ALWAYS read the relevantdocs/design/page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking the design pages underdocs/design/. The design spec is the starting point for architecture, data models, and behavior.
If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevantdocs/design/page to reflect the new reality.
Files:
docs/design/backup.mddocs/design/observability.mddocs/design/client-simulation.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2) for architecture diagrams, nested container layouts, complex entity relationships. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, pipelines. Use Markdown tables for grid/matrix data that is semantically tabular. Never use```textblocks with ASCII/Unicode box-drawing characters for diagrams. D2 uses theme 200 (Dark Mauve), configured globally inmkdocs.yml. Review agentdiagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.
Files:
docs/design/backup.mddocs/design/observability.mddocs/licensing.mddocs/reference/lifecycle-sync.mddocs/research/lgpl-postgres-driver-decision.mddocs/design/client-simulation.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
Nofrom __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax:except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries usecopy.deepcopy()at construction and wrap withMappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults toConfigDict(frozen=True, allow_inf_nan=False)unless documented otherwise; mutations go thro...
Files:
src/synthorg/api/auth/ticket_store.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/budget/test_trends_currency.pytests/unit/settings/test_backup_subscriber.pysrc/synthorg/client/store.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/communication/event_stream/test_stream_dedup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pytests/unit/api/controllers/test_backup_required_idempotency.pysrc/synthorg/settings/subscribers/backup_subscriber.pytests/unit/hr/pruning/test_service.pytests/unit/backup/test_scheduler_lifecycle.pysrc/synthorg/observability/events/event_stream.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pysrc/synthorg/providers/health_prober.pysrc/synthorg/api/controllers/backup.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pysrc/synthorg/backup/service.pysrc/synthorg/tools/mcp/cache.pytests/unit/backup/test_scheduler.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pysrc/synthorg/client/continuous.pytests/unit/api/controllers/test_backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/backup/scheduler.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/budget/trends.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every business-logic module must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)at module level. Variable name is alwayslogger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction and bootstrap code.
Event names must always import constants fromsynthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split betweenTELEMETRY_*(log events) andTELEMETRY_EVENT_*(payload types).
Always log structured kwargs:logger.info(EVENT, key=value). Never uselogger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never uselogger.exception(EVENT, error=str(exc)),logger.warning(EVENT, error=str(exc)), orlogger.error(EVENT, error=str(exc)); use structured logging witherror_type=type(exc).__name__anderror=safe_error_description(exc)(SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops likePENDING -> RUNNING) must log at INFO using a domain-scoped*_STATUS_TRANSITIONEDconstant carryingfrom_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFOsettings.value.resolvedevent on first cold read per process, carryingsourceandyaml_pathwhen applicable.
All provider calls must go throughBaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. ConfigureRetryConfigandRateLimiterConfigper-provider inProviderConfig....
Files:
src/synthorg/api/auth/ticket_store.pysrc/synthorg/client/store.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/backup/service.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/client/continuous.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/backup/scheduler.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/budget/trends.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/auth/ticket_store.pysrc/synthorg/client/store.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/backup/service.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/client/continuous.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/backup/scheduler.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/budget/trends.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: EveryMock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable.
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test. UseFakeClock.sleepwhich advances virtual time AND yields viaasyncio.sleep(0). For tests driving cooperative tasks, useawait clock.advance_async(seconds). FakeClock-first: when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Tests must usetest-provider,test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit@example(...)decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patchtime.monotonic()/asyncio.sleep()only for legacy code paths without aClockseam. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via--ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global inpyproject.toml). Parallelism:pytest-xdistvia-n 8. ALWAYS include-n 8when running pytest locally, never run tests ...
Files:
tests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/budget/test_trends_currency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/hr/pruning/test_service.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/backup/test_scheduler.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/api/controllers/test_backup.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/budget/test_trends_currency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/hr/pruning/test_service.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/backup/test_scheduler.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/api/controllers/test_backup.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer
@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/budget/test_trends_currency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/hr/pruning/test_service.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/backup/test_scheduler.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/api/controllers/test_backup.py
🪛 LanguageTool
docs/design/observability.md
[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (28)
docs/research/lgpl-postgres-driver-decision.md (1)
1-80: Well-structured ADR and audit traceability are solid.The decision record is clear, scoped, and operationally actionable, with good linkage to findings and supporting references.
docs/licensing.md (1)
112-123: Licensing addendum is clear and consistent with the ADR linkage.The new section cleanly distinguishes SQLite default licensing from optional Postgres LGPL obligations and provides a direct path to the decision record.
src/synthorg/budget/trends.py (2)
216-241: Currency-invariant enforcement looks correct here.Validating the full record set before bucketing prevents a mixed-currency aggregate from being produced, and the new
Raises:docstring matches the behavior.
413-460: Projection logic now respects the same invariant.The mixed-currency check happens before daily spend and exhaustion calculations, so the forecast can no longer conflate currencies. The empty-input fast path still behaves sensibly.
tests/unit/budget/test_trends_currency.py (2)
27-83: Good coverage for the bucketing path.The helper keeps the fixtures concise, and the tests cover single-currency aggregation, mixed-currency rejection, and the empty-input case.
85-108: Good coverage for the projection path.These cases exercise the new currency guard plus the empty-input behavior, which is enough to catch regressions in the new invariant.
tests/unit/integrations/test_replay_protection_threadsafety.py (1)
25-75: Strong concurrency regression coverage.These tests cleanly validate single-accept duplicate handling and bounded cache behavior under concurrent pressure.
src/synthorg/api/auth/ticket_store.py (1)
149-166: Lock scope is correctly applied to all mutation-critical sections.The lock now guards the exact TOCTOU-sensitive blocks (count+insert, pop+consume, snapshot+delete), which aligns with the threadpool-concurrency objective.
Also applies to: 187-189, 221-226
tests/unit/api/auth/test_ticket_store_threadsafety.py (1)
35-106: Great coverage of the thread-safety invariants.The four tests map cleanly to the race fixes (cap enforcement, user isolation, single-consumer guarantee, and mixed-operation stability).
src/synthorg/observability/events/event_stream.py (1)
20-20: Event constant addition looks correct.
EVENT_STREAM_HUB_PUBLISH_DEDUPEDis consistent with the existing event taxonomy and supports the new dedup log path cleanly.docs/design/observability.md (1)
200-206: Event-stream observability docs are clear and aligned.This section documents both failure and dedup outcomes with concrete runtime semantics (TTL + bounded state), which matches the implemented hub behavior.
src/synthorg/communication/event_stream/stream.py (1)
146-207: Dedup lock-scoped check/record flow is solid.The duplicate check + record under a single lock and bounded per-session eviction is a good, race-safe implementation.
tests/unit/communication/event_stream/test_stream_dedup.py (1)
39-139: Test coverage for dedup behavior is thorough.These cases exercise the core invariants (TTL window, per-session isolation, bounded memory, and unsubscribe cleanup) and give good confidence in the new publish semantics.
src/synthorg/tools/mcp/cache.py (1)
8-8: Thread-safety update looks correct.The lock scopes keep
get,put, andinvalidateatomic without holding the mutex across logging/metrics, and the deepcopy-on-return preserves the cache’s immutability contract.Also applies to: 30-37, 58-58, 78-105, 124-142, 154-161
src/synthorg/client/store.py (1)
177-193: Atomic registration logic is solid.This lock-guarded
register_if_absentimplementation correctly prevents duplicate runner claims under concurrency.src/synthorg/api/controllers/simulations.py (1)
292-297: Idempotency gate is placed correctly before runner spawn.Rejecting duplicate claims here prevents double-execution and preserves run integrity.
src/synthorg/communication/conflict_resolution/escalation/factory.py (1)
150-158: Registry dispatch refactor is clean and safer to extend.Centralized factory maps plus explicit unknown-key errors improve maintainability without changing public behavior.
Also applies to: 256-264
tests/integration/api/controllers/test_client_simulation.py (1)
318-347: Good integration coverage for duplicate-start idempotency.This test validates the externally visible
201then409contract for repeatedsimulation_id.docs/design/client-simulation.md (1)
227-237: Idempotency spec update is clear and implementation-aligned.The section accurately documents duplicate
simulation_idconflict semantics and retry behavior.web/src/api/endpoints/clients.ts (1)
267-284: 409 fallback togetSimulation()is a good idempotent client behavior.This keeps retries user-safe while preserving normal error propagation for non-conflict failures.
src/synthorg/backup/service.py (1)
94-97: Async scheduler startup is wired through correctly.Awaiting
self._scheduler.start()here matches the refactored scheduler lifecycle and keeps service startup consistent.tests/unit/backup/test_scheduler.py (1)
36-126: Async start/idempotency coverage looks good.The updated tests now await
start()and verify the no-op second call, which is the right contract for the scheduler rewrite.tests/unit/hr/pruning/test_service.py (1)
767-844: Lifecycle await updates look consistent.These tests now await
service.start()before assertingis_runningand waking the loop, which matches the async lifecycle change.src/synthorg/settings/subscribers/backup_subscriber.py (1)
112-123: Async scheduler startup is wired through correctly.Awaiting
scheduler.start()here keeps the subscriber aligned with the async lifecycle implementation.docs/design/backup.md (1)
29-29: Spec update looks aligned with the controller change.The manual backup section now clearly captures the required
Idempotency-Keybehavior and the 24h retry semantics.docs/reference/lifecycle-sync.md (1)
34-37: Clear in-place runner exception.The new
ContinuousModesection cleanly distinguishes lock scope for in-place loops vs background-task services, which prevents lifecycle pattern misuse.src/synthorg/api/controllers/backup.py (1)
88-107: Idempotency enforcement path looks solid.Required header validation plus unconditional
run_idempotent()with a dedicated in-flight409branch is a good hardening of the backup create flow.Also applies to: 143-159
tests/unit/hr/pruning/test_service_lifecycle.py (1)
45-90: Lifecycle coverage here is strong.These tests exercise the key concurrency and unrestartable-state transitions expected from the lifecycle contract.
…iff) Findings (per the rolling CodeRabbit review on PR #1717): Concurrency: - 5 services: stop no longer reassigns _lifecycle_lock; only the loop-bound stop_event / wake_event is recreated. Replacing the lock let a caller queued on the old lock and a fresh caller acquiring the new lock both proceed concurrently. Affects scheduler.py, monitor.py, health_prober.py, hr/pruning/service.py, and the canonical sweeper.py (same pattern, same bug). - stream.py: validate dedup_ttl_seconds and dedup_max_entries_per_session at construction so the trim loop cannot popitem an empty OrderedDict at publish time. Logging: - ticket_store.py: log API_WS_TICKET_LIMIT_EXCEEDED before raising TicketLimitExceededError so cap rejections appear in audit logs (new event constant added to events/api.py). - ngrok_adapter.py: log TUNNEL_ERROR before raising on duplicate start. Refactors: - replay_protection.py: split check() into check() + _check_nonce() reusing check_freshness() so the public method stays under 50 lines. Docstring: - continuous.py: module docstring now matches implementation (lifecycle lock guards the running-flag transition, not the loop body; stop() does not acquire the lock). Tests: - 6 test files: add spec=ConcreteClass to bare Mock/AsyncMock call sites (test_scheduler_lifecycle, test_monitor_lifecycle, test_backup_subscriber, test_backup_required_idempotency, test_backup, test_simulations_idempotency). assert_awaited_once replaces assert_called_once where applicable. - test_ticket_store_threadsafety.py: threading.Event start gate so worker bodies hit store.create together rather than as the pool fills (tighter contention on the lock). - test_cache_threadsafety.py: seed a shared key so the reader path exercises the locked hit branch (move_to_end + deepcopy). - test_factory_registry.py: cover the unknown-key fallback (both registries) via model_construct bypassing Pydantic literal validation. - test_stream_dedup.py: 2 new tests for unsubscribe dedup-window cleanup (already in the prior PR commit). Frontend: - backup.ts: trim whitespace-only Idempotency-Key inputs and fall through to a fresh UUID; ?? alone would forward an empty string and the server rejects it. 26138 unit tests + web type-check + web lint clean; mypy + ruff clean.
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/communication/event_stream/stream.py (1)
40-44: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winDocument the new public constructor parameters.
The class docstring still only describes
max_queue_size, butEventStreamHubnow publicly exposesdedup_ttl_seconds,dedup_max_entries_per_session, andclock. Please add them to theArgs:block so the API contract stays discoverable.As per coding guidelines, "All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules)."
Also applies to: 56-63
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/communication/event_stream/stream.py` around lines 40 - 44, Update the EventStreamHub public class docstring Args block to document the newly exposed constructor parameters dedup_ttl_seconds (float|int - TTL for dedup cache in seconds), dedup_max_entries_per_session (int - max dedup entries kept per session), and clock (callable/Clock-like object used for time) alongside the existing max_queue_size entry; ensure the docstring follows Google-style formatting and includes type hints in the descriptions so it satisfies mypy/ruff D rules, and apply the same addition to the secondary docstring region referenced around lines 56-63.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/mock_spec_baseline.txt`:
- Around line 304-326: The PR added new bare Mock/AsyncMock/MagicMock usages in
the backup test modules that were incorrectly added to the frozen mock baseline;
convert each bare double in the affected backup tests to a spec-bound double
(e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass) or
MagicMock(spec=MyClass)) by identifying the mocks in the backup test modules and
replacing their constructors, then remove the corresponding new entries from the
mock_spec_baseline so the pre-commit check_mock_spec.py still enforces
spec-bound mocks rather than grandfathering these new sites.
In `@src/synthorg/communication/conflict_resolution/escalation/sweeper.py`:
- Around line 203-216: The stop() implementation should recreate the loop-bound
self._stop_event while holding self._lifecycle_lock to match the canonical
pattern used by PruningService/BackupScheduler: implement a private helper (e.g.
_drain_and_reset()) that is called inside the async with self._lifecycle_lock in
stop(), which drains or signals the existing self._stop_event as needed and then
assigns a fresh asyncio.Event() to self._stop_event before releasing the lock;
keep _lifecycle_lock as the same instance, ensure _run() continues to read
self._stop_event each iteration (no captured reference), and update stop() to
call this helper inside the lock rather than reassigning self._stop_event after
the lock.
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 165-177: The publish() logic currently calls
_record_published_locked(event, now) before checking subscribers, causing events
to be marked deduped even when there are no live subscribers; change the flow in
publish() so you first (inside the same async with self._lock) obtain
queues_snapshot = list(self._subscribers.get(event.session_id, ())) and if
queues_snapshot is empty return without calling _record_published_locked or
logging EVENT_STREAM_HUB_PUBLISH_DEDUPED; only when queues_snapshot is non-empty
call _record_published_locked(event, now) and then proceed to release the lock
and publish to the queues, ensuring _is_duplicate_locked and
_record_published_locked remain protected by the same lock to avoid races and
orphaned entries in _seen_event_ids and incorrect use of _dedup_ttl_seconds.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 81-89: Replace the generic RuntimeError raised when a start is
attempted on an already-active adapter with a domain-specific non-retryable
error: define a new TunnelAlreadyActiveError that inherits from the appropriate
domain base (e.g., IntegrationError or a domain-specific DomainError) and raise
that instead of RuntimeError in the block that checks self._tunnel (the branch
that logs TUNNEL_ERROR and sets msg "ngrok tunnel already active on this
adapter"); ensure the new class name follows the <Domain><Condition>Error
pattern and does not inherit from TunnelError (which is retryable).
- Around line 90-95: The code currently mutates pyngrok's global config via
conf.get_default().auth_token which can leak/stomp tokens across NgrokAdapter
instances; instead, read the token from self._auth_token_env, build a
conf.PyngrokConfig instance (using the token only when non-empty) and pass that
instance to ngrok.connect as the pyngrok_config argument (call site:
ngrok.connect in NgrokAdapter), removing any assignment to
conf.get_default().auth_token so tokens remain instance-local and blank tokens
do not reuse stale global credentials.
In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 9-11: Docstring currently references hardcoded line numbers (e.g.,
"192" and "199") which will drift; edit the module docstring in
src/synthorg/integrations/webhooks/replay_protection.py to remove those numeric
citations and instead describe the race condition generically (mention the
duplicate check and the insertion of the nonce/record such that simultaneous
deliveries can both see the nonce as fresh and proceed). Keep the race-condition
explanation and any references to the nonce/duplicate check logic but do not
include specific line numbers.
In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 152-158: The bug is replacing self._stop_event while still holding
self._lifecycle_lock which lets a racing start() create a monitor task bound to
the old event and then stop() swap the event so the task never sees the signal;
fix stop() so you never assign to self._stop_event while the lifecycle lock is
held: signal/await the current event and shut down tasks while holding
self._lifecycle_lock, release the lock, then create/assign a fresh
asyncio.Event() to self._stop_event (or create the new event beforehand and only
assign it after releasing the lock); update the stop() implementation and any
related shutdown sequence that references _stop_event, _lifecycle_lock, start(),
and the monitor task to follow this ordering.
In `@src/synthorg/providers/health_prober.py`:
- Around line 264-271: The _stop_event is being recreated after releasing
self._lifecycle_lock which allows a racing start() to spawn tasks waiting on the
old event; move the creation of self._stop_event = asyncio.Event() so it happens
while holding self._lifecycle_lock (i.e. before releasing the lock in stop()),
ensuring the same self._lifecycle_lock instance is kept and only the event
object is swapped while still under the lock so concurrent start()/stop() cannot
race with probe tasks waiting on the old event.
In `@tests/unit/api/controllers/test_backup_required_idempotency.py`:
- Around line 75-101: The test currently doesn't verify that the callback
actually calls BackupService.create_backup; update
test_handler_invokes_idempotency_service to inject a mocked backup_service with
create_backup set to an AsyncMock (or MagicMock) and inside
fake_run_idempotent’s callback await that mock via the controller path, then
after calling ctrl.create_backup.fn assert the mock was awaited exactly once
(e.g. mock_create_backup.assert_awaited_once() or assert
mock_create_backup.await_count == 1) to ensure the callback wiring invokes
BackupService.create_backup; reference BackupController.create_backup.fn,
fake_run_idempotent, and BackupService.create_backup when making the changes.
In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 105-109: The test currently uses contextlib.suppress(Exception)
which swallows all downstream failures; replace that with a targeted fix: either
stub/mock the runner-spawning collaborator so the handler completes without
raising (e.g., mock the method that performs spawn — spawn_runner /
SimulationRunner.start / RunnerManager.start depending on which symbol exists —
to return successfully), or narrow the suppression to the single expected
exception type (e.g., contextlib.suppress(RunnerSpawnError)) and import that
specific exception class; remove broad Exception suppression and ensure the test
only tolerates the known downstream error or fully fakes the collaborator so no
error is raised.
---
Outside diff comments:
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 40-44: Update the EventStreamHub public class docstring Args block
to document the newly exposed constructor parameters dedup_ttl_seconds
(float|int - TTL for dedup cache in seconds), dedup_max_entries_per_session (int
- max dedup entries kept per session), and clock (callable/Clock-like object
used for time) alongside the existing max_queue_size entry; ensure the docstring
follows Google-style formatting and includes type hints in the descriptions so
it satisfies mypy/ruff D rules, and apply the same addition to the secondary
docstring region referenced around lines 56-63.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 823be26c-de35-417a-ac70-d22ce689ad97
📒 Files selected for processing (22)
scripts/mock_spec_baseline.txtsrc/synthorg/api/auth/ticket_store.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/health_prober.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/tools/mcp/test_cache_threadsafety.pyweb/src/api/endpoints/backup.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Type Check
- GitHub Check: Dashboard Test
- GitHub Check: Type Check
- GitHub Check: Build Preview
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
Nofrom __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax:except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries usecopy.deepcopy()at construction and wrap withMappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults toConfigDict(frozen=True, allow_inf_nan=False)unless documented otherwise; mutations go thro...
Files:
tests/unit/settings/test_backup_subscriber.pytests/unit/api/auth/test_ticket_store_threadsafety.pysrc/synthorg/observability/events/api.pytests/unit/tools/mcp/test_cache_threadsafety.pysrc/synthorg/client/continuous.pysrc/synthorg/providers/health_prober.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/api/controllers/test_backup.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pytests/unit/backup/test_scheduler_lifecycle.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/integrations/webhooks/replay_protection.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.pysrc/synthorg/hr/pruning/service.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: EveryMock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable.
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test. UseFakeClock.sleepwhich advances virtual time AND yields viaasyncio.sleep(0). For tests driving cooperative tasks, useawait clock.advance_async(seconds). FakeClock-first: when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Tests must usetest-provider,test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit@example(...)decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patchtime.monotonic()/asyncio.sleep()only for legacy code paths without aClockseam. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via--ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global inpyproject.toml). Parallelism:pytest-xdistvia-n 8. ALWAYS include-n 8when running pytest locally, never run tests ...
Files:
tests/unit/settings/test_backup_subscriber.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/api/controllers/test_backup.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/settings/test_backup_subscriber.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/api/controllers/test_backup.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer
@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/settings/test_backup_subscriber.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/api/controllers/test_backup.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.py
web/src/**/!(logger).ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/!(logger).ts?(x): Always usecreateLoggerfrom@/lib/loggerinstead of bareconsole.warn,console.error, orconsole.debugin application code; use variable namelog
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go throughsanitizeArg
Wrap attacker-controlled fields inside structured objects withsanitizeForLog()before embedding in logs
Files:
web/src/api/endpoints/backup.ts
web/src/**/*.ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls intry/catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from@/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from@/utils/formatand design tokens
NEVER hardcode currency symbols or codes; useDEFAULT_CURRENCYfrom@/utils/currencies
Files:
web/src/api/endpoints/backup.ts
web/src/api/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Use
getLiveness()for 200 always-alive checks andgetReadiness()for binary'ok' | 'unavailable'health checks (no tri-state); any new caller must handle the 503 path explicitly
Files:
web/src/api/endpoints/backup.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
web/src/**/*.{ts,tsx}: Reuse components fromweb/src/components/ui/before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens,@/lib/motionpresets, and the helpers in@/utils/format. Enforced byscripts/check_web_design_system.py(PostToolUse hook on everyweb/src/edit). Seeweb/CLAUDE.mdfor the component inventory, token rules, and post-training references (TS6, Storybook 10).
Never hardcode BCP 47 locale strings or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale.
Currency: never hardcode ISO 4217 codes or symbols. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency. Field naming: no_usdsuffix on money fields anywhere.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins). Date / number format: always viaIntl; no hand-rolled templates. Units: metric only.
Files:
web/src/api/endpoints/backup.ts
web/**/*.{ts,tsx,js,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Web Dashboard: see
web/CLAUDE.mdfor commands, design system, and component inventory. The CI-matching full-suite leak check isnpm --prefix web run test -- --coverage --detect-async-leaks(theLeaks N leakssummary must stay at or below theMAX_ASYNC_LEAKSceiling in.github/workflows/ci.yml; any new store that schedules timers must expose a teardown hook perweb/CLAUDE.md).
Files:
web/src/api/endpoints/backup.ts
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every business-logic module must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)at module level. Variable name is alwayslogger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction and bootstrap code.
Event names must always import constants fromsynthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split betweenTELEMETRY_*(log events) andTELEMETRY_EVENT_*(payload types).
Always log structured kwargs:logger.info(EVENT, key=value). Never uselogger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never uselogger.exception(EVENT, error=str(exc)),logger.warning(EVENT, error=str(exc)), orlogger.error(EVENT, error=str(exc)); use structured logging witherror_type=type(exc).__name__anderror=safe_error_description(exc)(SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops likePENDING -> RUNNING) must log at INFO using a domain-scoped*_STATUS_TRANSITIONEDconstant carryingfrom_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFOsettings.value.resolvedevent on first cold read per process, carryingsourceandyaml_pathwhen applicable.
All provider calls must go throughBaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. ConfigureRetryConfigandRateLimiterConfigper-provider inProviderConfig....
Files:
src/synthorg/observability/events/api.pysrc/synthorg/client/continuous.pysrc/synthorg/providers/health_prober.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/pruning/service.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/api.pysrc/synthorg/client/continuous.pysrc/synthorg/providers/health_prober.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/pruning/service.py
🔇 Additional comments (31)
src/synthorg/observability/events/api.py (1)
80-80: Event constant addition is consistent and correctly scoped.
API_WS_TICKET_LIMIT_EXCEEDEDfollows the existing API event naming/value convention and matches the emitted event insrc/synthorg/api/auth/ticket_store.py(Line 141-181).tests/unit/tools/mcp/test_cache_threadsafety.py (3)
22-48: Good fix: this now truly exercises the locked cache-hit path under contention.Seeding and repeatedly reading the shared key makes this test validate the exact
getpath changed in the cache implementation.
49-68: Concurrency coverage forinvalidate()vsput()looks solid.Collecting all futures and calling
result()ensures threadpool exceptions are surfaced instead of being silently ignored.
69-80: Eviction-bound assertion under concurrent writes is well targeted.This directly checks the LRU capacity invariant remains enforced under load.
src/synthorg/api/auth/ticket_store.py (3)
150-174: Atomic create path now correctly closes the cap-check TOCTOU window.Great fix: Line 150 through Line 174 keeps count, limit check, and insert in one
critical section, and logs the rejection path before raising.
195-197: Single-consumer guarantee is properly enforced.Locking around
dict.pop()on Line 195 through Line 197 gives clean
single-winner semantics for concurrentvalidate_and_consume()calls.
229-239: Cleanup pass is safely synchronized and logs consistent counts.Holding the lock for snapshot/delete and computing
remaininginside the same
critical section avoids iteration/mutation races and inconsistent telemetry.tests/unit/api/auth/test_ticket_store_threadsafety.py (2)
40-57: Concurrency harness is strong and deterministic.Using a shared start gate before releasing worker threads is a solid improvement
for reproducible contention across all four threadpool scenarios.Also applies to: 66-79, 89-99, 107-121
104-124: Mixed create/cleanup stress test is valuable coverage for dict safety.This case directly exercises the historical corruption surface under concurrent
mutation and cleanup and is well-targeted for regression protection.src/synthorg/client/continuous.py (5)
1-21: LGTM — docstring now accurately reflects locking behavior.The updated module documentation correctly describes that
_lifecycle_lockis held briefly for_runningflag transitions (not the full loop body), and thatstop()is synchronous without lock acquisition. This resolves the prior review concern about docstring/implementation mismatch.
66-71: LGTM — lifecycle lock properly named and documented.The comment explains the naming rationale (uniformity with other services per lifecycle reference doc), and the lock is correctly initialized as an
asyncio.Lock().
98-113: LGTM — brief lock acquisition correctly gates_runningtransition.The lock is held only to atomically check and set
_running, preventing concurrentstart()calls from racing. The explanatory comment clearly articulates why holding the lock for the full loop would cause deadlock.
133-136: LGTM — finally block correctly clears_runningunder lock.Re-acquiring
_lifecycle_lockin thefinallyensures_runningis cleared atomically even if an exception or cancellation occurs mid-loop, preventing stale state from blocking futurestart()calls.
138-147: LGTM — synchronousstop()correctly signals without lock contention.The docstring accurately explains the design:
stop()only sets the event, leaving the runningstart()coroutine to observe it on the next loop iteration. Not acquiring the lock here is correct since the lock guards only_runningflag transitions.src/synthorg/integrations/webhooks/replay_protection.py (6)
14-46: LGTM!Imports are appropriate, constants are well-documented with security rationale, and
_fingerprint_noncecorrectly bounds cache key size using SHA-256.
70-92: LGTM!Constructor correctly validates configuration upfront (preventing silent misconfiguration), initializes the threading.Lock for thread safety, and follows the Clock injection pattern for testability.
94-129: LGTM!Method correctly handles the None case for callers delegating dedup to durable stores, rejects non-finite timestamps, and reads the clock once to ensure the comparison and logged skew agree.
131-166: LGTM!The fail-closed behavior when both nonce and timestamp are missing is correct for security. The clock is intentionally read twice—once for timestamp validation, once for nonce tracking—since these serve different purposes and the small delta is negligible relative to the 300s default window.
168-215: LGTM!Thread-safety pattern is correct: all mutations to
_seenoccur under the lock, eliminating the TOCTOU race. Logging outside the lock minimizes hold time while the capturedduplicateboolean remains valid. Oversized nonce rejection before hashing is good DoS mitigation.
217-229: LGTM!Method correctly documents the caller lock requirement, leverages OrderedDict insertion order for efficient oldest-first eviction, and stops early since insertions are monotonically ordered by time.
src/synthorg/communication/conflict_resolution/escalation/sweeper.py (1)
209-214: Lock preservation invariant is correctly documented.The comment clearly articulates the critical invariant:
_lifecycle_lockmust remain the same instance for the service's lifetime. Replacing it would allow concurrent callers—one queued on the old lock, one acquiring the new—to proceed simultaneously, breaking start/stop serialization. This matches the canonical pattern inPruningService(context snippet 1, lines 211–214) andBackupScheduler(context snippet 2, lines 130–133).src/synthorg/hr/pruning/service.py (2)
97-104: Lifecycle lock correctly preserved across service lifetime.The past review concern has been addressed:
_lifecycle_lockremains the same instance for the service's lifetime (lines 211-214 explicitly document this constraint), while only the loop-bound_stop_eventand_wake_eventare recreated after releasing the lock to support rebinding on a different event loop.Also applies to: 209-216
747-783: Clean cooperative shutdown pattern.The loop correctly honors
_stop_eventfor graceful shutdown, re-checks the stop flag after waking (line 765-766), and properly propagatesCancelledErrorto support task cancellation.src/synthorg/backup/scheduler.py (2)
128-135: Lifecycle lock correctly preserved.The past review concern has been addressed:
_lifecycle_lockremains the same instance (documented in lines 130-133), while only the loop-bound events are recreated for event-loop rebinding.
47-78: All BackupScheduler.start() callers correctly use await.Verification confirms all five call sites of
BackupScheduler.start()in production and test code useawait:backup/service.py:97,backup_subscriber.py:113, and three test contexts intest_scheduler_lifecycle.pyandtest_scheduler.py. The async transition is complete.tests/unit/settings/test_backup_subscriber.py (2)
33-42: LGTM!All mocks are now spec-bound (
spec=BackupScheduler,spec=BackupService,spec=SettingsService), and async methods correctly useAsyncMock(spec=...). This addresses the past review concern and enforces interface correctness.
101-106: Correct use ofassert_awaited_once().Using
assert_awaited_once()instead ofassert_called_once()correctly catches the regression where a coroutine is created but never awaited, as documented in the comment.tests/unit/backup/test_scheduler_lifecycle.py (2)
21-24: Spec-bound mocks correctly address past review concern.The helper now uses
MagicMock(spec=BackupService)andAsyncMock(spec=BackupService.create_backup), enforcing interface correctness per coding guidelines.
30-71: Comprehensive lifecycle-lock coverage.The tests correctly verify:
- Concurrent
start()calls spawn only one task (idempotency under contention)- Clean restart works after a successful
stop()- A drain timeout marks the scheduler unrestartable, and subsequent
start()raisesRuntimeErrorThe hung-loop simulation (lines 55-60) properly ignores
CancelledErrorto force the timeout path.tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)
34-38: Spec-bound mock for the protocol, return value acceptable.The
backendmock correctly usesspec=PersistenceBackend. Thebuild_escalationsreturn value is a bareMagicMock(), but this is acceptable since tests only assert on call arguments, not the return value's interface.
107-128: Good defensive testing viamodel_constructbypass.Using
model_construct()to bypass Pydantic's literal validation is the right approach to verify the factory's defensiveValueErrorpath. This ensures operators see a helpful error message if an invalid backend somehow reaches the factory.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/integrations/tunnel/ngrok_adapter.py (1)
75-82:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the
Raisesdocstring to match actual exception type.Line 80 still lists
RuntimeError, but Line 92 raisesTunnelAlreadyActiveError. Please update the docstring to keep API expectations accurate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/integrations/tunnel/ngrok_adapter.py` around lines 75 - 82, The docstring's Raises section is inaccurate: replace the listed RuntimeError with TunnelAlreadyActiveError to match the actual exception raised; update the Raises block in the docstring for the method that starts the ngrok tunnel (the docstring surrounding the start/start_tunnel method in ngrok_adapter.py) so it lists TunnelAlreadyActiveError (and keeps TunnelError for other failures) and ensure the exception name matches the custom exception class TunnelAlreadyActiveError used in the code.
♻️ Duplicate comments (1)
scripts/mock_spec_baseline.txt (1)
304-305:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove new baseline suppressions for touched tests.
Line 304, Line 305, and Line 395 should not be grandfathered here; these suppressions weaken enforcement for tests modified in this PR. Fix the bare-mock sites in those tests and drop these new baseline entries.
♻️ Suggested patch
-tests/unit/api/controllers/test_backup.py:112:12 -tests/unit/api/controllers/test_backup_required_idempotency.py:72:12 ... -tests/unit/api/controllers/test_simulations_idempotency.py:69:12As per coding guidelines, "Every
Mock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass... pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt."Also applies to: 395-395
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/mock_spec_baseline.txt` around lines 304 - 305, The new baseline entries were added to whitelist bare Mock()/AsyncMock()/MagicMock() calls for tests modified in this PR; instead, update the tests tests/unit/api/controllers/test_backup.py and tests/unit/api/controllers/test_backup_required_idempotency.py to replace bare Mock()/AsyncMock()/MagicMock() usages with mocks that declare the concrete interface via spec=ConcreteClass (or spec_set) for the object being mocked (e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass)), run the pre-commit checker (scripts/check_mock_spec.py) to ensure no bare-call sites remain, and then remove the corresponding lines (304, 305 and the other added line around 395) from scripts/mock_spec_baseline.txt so no new suppressions are introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 222-227: The TTL eviction loop treats dedup_ttl_seconds == 0 as
immediate expiry; change the logic in the eviction block so that TTL eviction
only runs when self._dedup_ttl_seconds > 0: wrap the computation of cutoff = now
- self._dedup_ttl_seconds and the while seen: ... eviction loop in a conditional
(e.g., if self._dedup_ttl_seconds > 0:) so entries in seen (the dedup map used
for event.id) are not removed when TTL is disabled; keep the existing eviction
behavior unchanged for positive TTLs.
In `@src/synthorg/integrations/errors.py`:
- Around line 195-206: TunnelAlreadyActiveError currently inherits
IntegrationError defaults (502/INTEGRATION_ERROR) but represents a conflict;
update the class to override the integration error metadata to reflect an HTTP
409 conflict and a conflict-style error code. In the TunnelAlreadyActiveError
class (subclassing IntegrationError) add/override the appropriate metadata
attributes (e.g. http status/status_code = 409 and an error code like code =
"ALREADY_ACTIVE" or code = "CONFLICT") so the API surfaces a 409-style domain
error instead of the 502 integration default.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 84-92: The new start() raises TunnelAlreadyActiveError when
self._tunnel is set, breaking the tunnel facade's expectation of idempotent
start/connect; modify ngrok_adapter.start() (the block checking self._tunnel) to
preserve idempotency: keep the warning log (TUNNEL_ERROR, phase="start",
reason="already_active", port=self._port) but do not raise
TunnelAlreadyActiveError—simply return the existing tunnel (or None if that
matches the adapter API) so repeated calls behave like a no-op and align with
mcp_service.connect()'s expected reconnect semantics.
In `@tests/unit/api/controllers/test_backup.py`:
- Around line 107-114: The comment explains why
SimpleNamespace(app_state=app_state) is used as the test state carrier; rewrite
it to focus on the technical rationale only: state.app_state must return the
assigned AppState object reliably, and SimpleNamespace has no auto-mocking
behavior so it satisfies that; MagicMock(spec=State) could trigger
State.__getattr__ and return an unintended mock, and a plain MagicMock() would
also be inappropriate—update the comment near the return
SimpleNamespace(app_state=app_state), service to this concise rationale without
any mention of gates, review processes, or internal taxonomy.
In `@tests/unit/api/controllers/test_simulations_idempotency.py`:
- Around line 64-69: The comment block around the SimpleNamespace sentinel
contains review-history/internal-gate phrasing; replace it with a concise
behavior-focused explanation: state that SimpleNamespace is used as the sentinel
because it returns the assigned object for state.app_state without Litestar's
State.__getattr__ interception (unlike MagicMock(spec=State)) and avoids
bare-mock issues present with MagicMock(); remove any wording about internal
taxonomy, review history, or gates. Ensure the same change is applied to the
similar comment at lines referencing the same rationale (the block around
SimpleNamespace and the mention of MagicMock/spec).
In
`@tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py`:
- Around line 62-83: Add a unit test that exercises the "auto" branch of
cross_instance_notify: create an EscalationQueueConfig with backend="postgres",
cross_instance_notify="auto", and notify_channel="escalations", use
_fake_persistence("postgres"), call build_escalation_queue_store(config,
backend), and assert
backend.build_escalations.assert_called_once_with(notify_channel="escalations")
to ensure the "auto" path behaves the same as "on".
- Around line 124-135: The tests test_unknown_queue_backend_raises_value_error
and test_unknown_decision_strategy_raises_value_error currently only assert that
a ValueError is raised; update them to also assert that the exception message
includes the available registry names so callers know which backends/strategies
are registered. Specifically, when calling build_escalation_queue_store with
EscalationQueueConfig.model_construct(backend="unknown"), capture the raised
ValueError and assert its string contains the registry keys (e.g., from the
escalation queue registry used by build_escalation_queue_store); do the same for
build_decision_processor with decision_strategy="unknown" and assert the error
message includes the decision processor registry keys. Ensure you reference the
same registry identifiers used by those factory functions when building the
expected substring.
---
Outside diff comments:
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 75-82: The docstring's Raises section is inaccurate: replace the
listed RuntimeError with TunnelAlreadyActiveError to match the actual exception
raised; update the Raises block in the docstring for the method that starts the
ngrok tunnel (the docstring surrounding the start/start_tunnel method in
ngrok_adapter.py) so it lists TunnelAlreadyActiveError (and keeps TunnelError
for other failures) and ensure the exception name matches the custom exception
class TunnelAlreadyActiveError used in the code.
---
Duplicate comments:
In `@scripts/mock_spec_baseline.txt`:
- Around line 304-305: The new baseline entries were added to whitelist bare
Mock()/AsyncMock()/MagicMock() calls for tests modified in this PR; instead,
update the tests tests/unit/api/controllers/test_backup.py and
tests/unit/api/controllers/test_backup_required_idempotency.py to replace bare
Mock()/AsyncMock()/MagicMock() usages with mocks that declare the concrete
interface via spec=ConcreteClass (or spec_set) for the object being mocked
(e.g., Mock(spec=MyClass) or AsyncMock(spec=MyAsyncClass)), run the pre-commit
checker (scripts/check_mock_spec.py) to ensure no bare-call sites remain, and
then remove the corresponding lines (304, 305 and the other added line around
395) from scripts/mock_spec_baseline.txt so no new suppressions are introduced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6caec15d-c417-426e-b6eb-583bd3f935c5
📒 Files selected for processing (16)
scripts/mock_spec_baseline.txtsrc/synthorg/backup/scheduler.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/errors.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/providers/health_prober.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/settings/test_backup_subscriber.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, actively look for ways to improve the design; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for specific upstream bugs (with stable bug-tracker URL), and why a non-obvious choice was made.
Nofrom __future__ import annotations; Python 3.14 has PEP 649.
Use PEP 758 except syntax:except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:). Enforced by ruff on 3.14.
All public functions and public classes require type hints; mypy strict mode is enforced. Google-style docstrings are required on public classes and functions (ruff D rules).
Create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries usecopy.deepcopy()at construction and wrap withMappingProxyType; deepcopy at system boundaries (tool execution, provider serialization, persistence).
Distinguish config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.
Every Pydantic model defaults toConfigDict(frozen=True, allow_inf_nan=False)unless documented otherwise; mutations go thro...
Files:
src/synthorg/communication/conflict_resolution/escalation/sweeper.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/settings/test_backup_subscriber.pysrc/synthorg/integrations/errors.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pysrc/synthorg/integrations/webhooks/replay_protection.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pytests/unit/api/controllers/test_backup.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every business-logic module must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)at module level. Variable name is alwayslogger. Carve-outs (e.g. fixed-key logging) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction and bootstrap code.
Event names must always import constants fromsynthorg.observability.events.<domain>; never use string literals. See conventions.md §13 for the domain inventory and the split betweenTELEMETRY_*(log events) andTELEMETRY_EVENT_*(payload types).
Always log structured kwargs:logger.info(EVENT, key=value). Never uselogger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising. Never uselogger.exception(EVENT, error=str(exc)),logger.warning(EVENT, error=str(exc)), orlogger.error(EVENT, error=str(exc)); use structured logging witherror_type=type(exc).__name__anderror=safe_error_description(exc)(SEC-1). Keep severity appropriate to context.
Every status-enum transition (including non-terminal hops likePENDING -> RUNNING) must log at INFO using a domain-scoped*_STATUS_TRANSITIONEDconstant carryingfrom_status/to_status/domain id, AFTER the persistence write succeeds.
DEBUG log level is for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Every settings read emits one INFOsettings.value.resolvedevent on first cold read per process, carryingsourceandyaml_pathwhen applicable.
All provider calls must go throughBaseCompletionProvider, which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. ConfigureRetryConfigandRateLimiterConfigper-provider inProviderConfig....
Files:
src/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/integrations/errors.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/integrations/errors.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: EveryMock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable.
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test. UseFakeClock.sleepwhich advances virtual time AND yields viaasyncio.sleep(0). For tests driving cooperative tasks, useawait clock.advance_async(seconds). FakeClock-first: when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Tests must usetest-provider,test-small-001, etc. for vendor-agnostic naming. NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, or tests.
When Hypothesis finds a failure, read the shrunk example, fix the underlying bug, and add an explicit@example(...)decorator so the case is permanently covered. Flaky tests: NEVER skip, dismiss, or ignore them; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first; patchtime.monotonic()/asyncio.sleep()only for legacy code paths without aClockseam. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Coverage: 80% minimum (enforced in CI; benchmarks excluded via--ignore=tests/benchmarks/). Test timeout: 30 seconds per test (global inpyproject.toml). Parallelism:pytest-xdistvia-n 8. ALWAYS include-n 8when running pytest locally, never run tests ...
Files:
tests/unit/api/controllers/test_simulations_idempotency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_backup.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/api/controllers/test_simulations_idempotency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_backup.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Prefer
@pytest.mark.parametrizefor testing similar cases.
Files:
tests/unit/api/controllers/test_simulations_idempotency.pytests/unit/settings/test_backup_subscriber.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_backup.py
🔇 Additional comments (8)
src/synthorg/integrations/webhooks/replay_protection.py (1)
92-93: Thread-safe replay check refactor looks solid.Locking the nonce mutation path and separating freshness from nonce dedup removes the race window while keeping fail-closed behavior explicit when both signals are missing.
Also applies to: 157-167, 168-215
tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)
37-60: Helper and primary queue-store dispatch tests look solid.The mocked backend is strict enough to catch call-shape drift, and the memory/sqlite coverage matches the factory branches.
103-111: Decision-processor dispatch coverage looks good.These assertions line up with the registry-backed factory behavior and should catch accidental dispatcher regressions.
src/synthorg/communication/conflict_resolution/escalation/sweeper.py (1)
202-214: Canonical lifecycle race fix looks correct.Recreating
self._stop_eventinside the lifecycle lock matches the intended pattern and prevents start/stop event-binding races.src/synthorg/providers/health_prober.py (1)
185-215: Lifecycle refactor is consistent and race-safe.The lock-guarded async
start()/stop(), timeout-to-unrestartable behavior, and in-lock_stop_eventswap are correctly implemented.Also applies to: 218-278
src/synthorg/meta/chief_of_staff/monitor.py (1)
73-105: Monitor lifecycle changes look solid.The service now follows the same safe async lifecycle contract (serialized start/stop, hard drain deadline, cooperative loop wake/exit).
Also applies to: 107-160, 163-187
src/synthorg/hr/pruning/service.py (1)
135-163: PruningService lifecycle conversion is well implemented.Async
start()/stop()serialization, timeout drain behavior, and cooperative loop shutdown are all aligned with the intended contract.Also applies to: 165-218, 750-785
tests/unit/api/controllers/test_backup_required_idempotency.py (1)
49-76: Good test wiring and mock-spec discipline.Using spec-bound mocks plus the explicit
create_backup.assert_awaited_once_with(...)check gives strong coverage for idempotency callback delegation.Also applies to: 88-123
| class TunnelAlreadyActiveError(IntegrationError): | ||
| """A tunnel is already active on the adapter; refuse re-start. | ||
|
|
||
| Lifecycle conflict, not a transient I/O error: the operator must | ||
| ``stop()`` the running tunnel before issuing a fresh ``start()``. | ||
| Marked non-retryable so the resilience layer does not loop on a | ||
| permanent state error and so the API surface returns a 409-style | ||
| domain error instead of a generic 500. | ||
| """ | ||
|
|
||
| is_retryable = False | ||
| retryable: ClassVar[bool] = False |
There was a problem hiding this comment.
Set conflict HTTP/error metadata on TunnelAlreadyActiveError.
On Line 195, this class is documented as a conflict-state error, but it inherits IntegrationError defaults (502 / INTEGRATION_ERROR). That will surface the wrong API semantics for “already active”.
Proposed fix
class TunnelAlreadyActiveError(IntegrationError):
@@
is_retryable = False
retryable: ClassVar[bool] = False
+ status_code: ClassVar[int] = 409
+ error_code: ClassVar[ErrorCode] = ErrorCode.RESOURCE_CONFLICT
+ error_category: ClassVar[ErrorCategory] = ErrorCategory.CONFLICT
+ default_message: ClassVar[str] = "Tunnel already active"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/integrations/errors.py` around lines 195 - 206,
TunnelAlreadyActiveError currently inherits IntegrationError defaults
(502/INTEGRATION_ERROR) but represents a conflict; update the class to override
the integration error metadata to reflect an HTTP 409 conflict and a
conflict-style error code. In the TunnelAlreadyActiveError class (subclassing
IntegrationError) add/override the appropriate metadata attributes (e.g. http
status/status_code = 409 and an error code like code = "ALREADY_ACTIVE" or code
= "CONFLICT") so the API surfaces a 409-style domain error instead of the 502
integration default.
…iff) Findings (per the rolling CodeRabbit review on PR #1717): Concurrency: - 5 services: stop no longer reassigns _lifecycle_lock; only the loop-bound stop_event / wake_event is recreated. Replacing the lock let a caller queued on the old lock and a fresh caller acquiring the new lock both proceed concurrently. Affects scheduler.py, monitor.py, health_prober.py, hr/pruning/service.py, and the canonical sweeper.py (same pattern, same bug). - stream.py: validate dedup_ttl_seconds and dedup_max_entries_per_session at construction so the trim loop cannot popitem an empty OrderedDict at publish time. Logging: - ticket_store.py: log API_WS_TICKET_LIMIT_EXCEEDED before raising TicketLimitExceededError so cap rejections appear in audit logs (new event constant added to events/api.py). - ngrok_adapter.py: log TUNNEL_ERROR before raising on duplicate start. Refactors: - replay_protection.py: split check() into check() + _check_nonce() reusing check_freshness() so the public method stays under 50 lines. Docstring: - continuous.py: module docstring now matches implementation (lifecycle lock guards the running-flag transition, not the loop body; stop() does not acquire the lock). Tests: - 6 test files: add spec=ConcreteClass to bare Mock/AsyncMock call sites (test_scheduler_lifecycle, test_monitor_lifecycle, test_backup_subscriber, test_backup_required_idempotency, test_backup, test_simulations_idempotency). assert_awaited_once replaces assert_called_once where applicable. - test_ticket_store_threadsafety.py: threading.Event start gate so worker bodies hit store.create together rather than as the pool fills (tighter contention on the lock). - test_cache_threadsafety.py: seed a shared key so the reader path exercises the locked hit branch (move_to_end + deepcopy). - test_factory_registry.py: cover the unknown-key fallback (both registries) via model_construct bypassing Pydantic literal validation. - test_stream_dedup.py: 2 new tests for unsubscribe dedup-window cleanup (already in the prior PR commit). Frontend: - backup.ts: trim whitespace-only Idempotency-Key inputs and fall through to a fresh UUID; ?? alone would forward an empty string and the server rejects it. 26138 unit tests + web type-check + web lint clean; mypy + ruff clean.
877bbd8 to
5b1dada
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/synthorg/budget/trends.py (1)
241-261:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate only the records that actually contribute to the bucket.
Calling
_assert_single_currency(records)before the[start, end)filter meansbucket_cost_records()now fails if the input contains mixed currencies outside the requested window, even though those rows are ignored by the aggregation. That is stricter than the function’s semantics and can break valid partial-range queries.♻️ Suggested fix
def bucket_cost_records( records: Sequence[CostRecord], start: datetime, end: datetime, bucket_size: BucketSize, ) -> tuple[TrendDataPoint, ...]: - _assert_single_currency(records) bucket_starts = generate_bucket_starts(start, end, bucket_size) + in_window_records = tuple( + record for record in records + if start <= record.timestamp < end + ) + _assert_single_currency(in_window_records) sums: dict[datetime, list[float]] = defaultdict(list) - for record in records: - ts = record.timestamp - if ts < start or ts >= end: - continue - key = _bucket_key(ts, bucket_size) + for record in in_window_records: + key = _bucket_key(record.timestamp, bucket_size) sums[key].append(record.cost)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/budget/trends.py` around lines 241 - 261, The currency assertion is applied to all input records before filtering by the requested time window, causing bucket_cost_records to reject inputs that include out-of-range mixed-currency rows; change the logic so you first filter records to the [start, end) window (using the same condition as the loop that computes key = _bucket_key(ts, bucket_size)) and only then call _assert_single_currency on the filtered list (or alternatively validate currency per-record when adding to sums), so generate_bucket_starts, the sums aggregation and TrendDataPoint creation remain unchanged.tests/unit/hr/pruning/test_service_lifecycle.py (1)
69-90:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClean up the intentionally hung task after the timeout assertion.
This path leaves
service._taskalive for about a second after Line 85, which makes the test wall-clock sensitive and can leak a pending task into later cases. Use anasyncio.Eventgate inhung_loop, wait for the timeout, then release and await the task in the test.Patch sketch
async def test_unrestartable_after_drain_timeout(self) -> None: service = _make_service() service._stop_drain_timeout_seconds = 0.05 + entered = asyncio.Event() + release = asyncio.Event() async def hung_loop(self: PruningService) -> None: del self + entered.set() try: await asyncio.Event().wait() except asyncio.CancelledError: - await asyncio.sleep(1.0) + await release.wait() with patch.object(PruningService, "_run_loop", hung_loop): await service.start() - await asyncio.sleep(0) + await entered.wait() with pytest.raises(TimeoutError): await service.stop() assert service._stop_failed is True + task = service._task + assert task is not None + release.set() + await task with pytest.raises(RuntimeError, match="unrestartable"): await service.start()As per coding guidelines, "For tasks that must block indefinitely until cancelled (e.g. simulating a slow provider or stubborn coroutine), use
asyncio.Event().wait()instead ofasyncio.sleep(large_number); it is cancellation-safe and carries no timing assumptions."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/hr/pruning/test_service_lifecycle.py` around lines 69 - 90, The test leaves the intentionally hung task running after asserting the TimeoutError; modify the test_unrestartable_after_drain_timeout to create an asyncio.Event gate and patch PruningService._run_loop with a hung_loop that awaits that event (i.e., await gate.wait()) so the test can release the gate after the timeout assertion, then await service._task to ensure the hung coroutine finishes; specifically, reference the patched PruningService._run_loop, the local hung_loop, the created gate Event, and the service._task to locate where to add waiting/release and the final await to clean up the task.tests/unit/api/controllers/test_backup.py (2)
190-193:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winCompare
ApiResponse.databy value here, not identity.These checks are brittle: if the controller ever clones or serializes the payload,
iswill fail even though the response is still correct.==keeps the tests focused on behavior.♻️ Suggested fix
- assert result.data is manifest + assert result.data == manifest @@ - assert result.data is response + assert result.data == responseAlso applies to: 258-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/api/controllers/test_backup.py` around lines 190 - 193, Replace identity checks with value equality in the backup controller tests: instead of asserting ApiResponse.data is manifest, use equality (==) so the test compares payloads by value not identity; update the assertions around result, ApiResponse and manifest in the block that calls service.get_backup.assert_awaited_once_with("abc123def456") and the similar block at the 258-263 range so they use assert result.data == manifest while keeping assert isinstance(result, ApiResponse) and the get_backup await assertion unchanged.
367-368:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse
assert_not_called()for the confirm=false gate.
assert_not_awaited()misses an unawaited call, so this test could still pass if the controller touchedrestore_from_backupand forgot to await it. Sinceconfirm=Falseshould block any service interaction, assert that the method was never called at all.♻️ Suggested fix
- service.restore_from_backup.assert_not_awaited() + service.restore_from_backup.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/api/controllers/test_backup.py` around lines 367 - 368, The test currently uses service.restore_from_backup.assert_not_awaited() which can miss an un-awaited invocation; change the assertion to service.restore_from_backup.assert_not_called() so the test verifies the confirm=False gate prevents any call at all (replace the assert_not_awaited() call in the test for the confirm=false scenario with assert_not_called() on the restore_from_backup mock).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 292-298: The code currently claims the simulation id via
sim_state.simulation_store.register_if_absent(record) and then calls
_publish_event(WsEventType.SIMULATION_STARTED, record) (and likely spawns the
runner) without rollback; if any subsequent exception occurs the record remains
claimed and blocks retries. Wrap the post-claim steps (the call to
_publish_event and any runner/task creation) in a try/except/finally and on
failure call the store rollback/unregister method (e.g.,
simulation_store.unregister or simulation_store.deregister(record.simulation_id)
— whatever the store exposes) or mark the record as not-running, then re-raise
the exception; ensure you reference
sim_state.simulation_store.register_if_absent, _publish_event, and
WsEventType.SIMULATION_STARTED when adding the rollback logic.
In `@src/synthorg/backup/scheduler.py`:
- Around line 71-78: The spawned scheduler task created in the constructor
(self._task = asyncio.create_task(self._run_loop(), name="backup-scheduler"))
can die silently if _run_loop() raises; attach a done callback like
OrgInflectionMonitor does that inspects the completed task, calls
task.exception() to retrieve any exception, and logs an error (including
traceback/context) if one exists so unexpected task deaths are surfaced and
handled; add the callback registration immediately after creating self._task and
reference self._task and _run_loop in the change.
In `@src/synthorg/communication/conflict_resolution/escalation/factory.py`:
- Around line 150-158: The branch that looks up factory =
_QUEUE_STORE_FACTORIES.get(config.backend) (and the similar branch at the other
location) currently raises ValueError without logging; update both places to
emit a warning or error log with context (including config.backend and any
strategy identifier) using the module/class logger before raising the
ValueError, e.g., logger.warning("Unknown escalation queue backend=%r;
registered=%s", config.backend, sorted(_QUEUE_STORE_FACTORIES) or ["(none)"])
and then raise the ValueError as before so the misconfiguration is recorded in
logs prior to exception propagation.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 107-112: The start() method in ngrok_adapter.py currently reads
the ngrok token directly via os.environ.get(self._auth_token_env), which
violates the settings resolution policy; replace this direct environment access
by resolving the token through the application's settings resolver (e.g., use
ConfigResolver or SettingsService to call the appropriate get_* method for the
registered ngrok auth token setting) and pass the resolved token into
conf.PyngrokConfig(auth_token=...) (fall back to conf.PyngrokConfig() when the
resolver returns empty); update any references to self._auth_token_env to use
the canonical setting name registered under src/synthorg/settings/definitions
and ensure the code path remains in start() and preserves the existing
conditional behavior.
In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 22-44: Replace the ad-hoc asyncio.sleep calls with a runner-driven
asyncio.Event: modify _FakeRunner to create an asyncio.Event (e.g., self.ready =
asyncio.Event()) and in run() set the event on its first entry
(self.ready.set()) then await self.ready.wait() at the point where the test
previously relied on sleeps so the test drives the second start/stop by clearing
or recreating the event; update the tests that call _FakeRunner.run() to wait
for runner.ready.wait() to observe the first invocation and then set/clear the
event to allow the runner to proceed instead of using await asyncio.sleep(0) /
await asyncio.sleep(0.01).
In `@tests/unit/integrations/test_ngrok_adapter_lifecycle.py`:
- Around line 95-100: Replace the swallowing fake with a Mock and assert it was
never called: patch
"synthorg.integrations.tunnel.ngrok_adapter.ngrok.disconnect" with a Mock
(instead of _fake_disconnect) around the await adapter.stop() call, then assert
the mock's assert_not_called() (or assert mock.call_count == 0) to ensure
ngrok.disconnect is not invoked on the no-start path; keep the existing final
assertion that adapter._tunnel is None.
In `@tests/unit/integrations/test_replay_protection_threadsafety.py`:
- Around line 33-35: The test's ThreadPoolExecutor worker tasks (the attempt
callable submitted in the pool) are currently started opportunistically; modify
the test to add a shared start gate (e.g., threading.Event or threading.Barrier)
so each worker waits on the gate inside attempt before doing the critical work,
then trigger the gate after all futures are submitted to start them
simultaneously—apply the same change to the other ThreadPoolExecutor blocks that
create futures and call f.result() so the race path is consistently stressed.
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 58-79: The test leaves a hung monitor task alive causing
timing-sensitive leaks; modify the patched OrgInflectionMonitor._loop (used in
test_unrestartable_after_drain_timeout) to wait on a dedicated asyncio.Event
instead of sleeping, expose that event from the test scope so after
cancelling/triggering the drain timeout you set the event to let the fake _loop
finish, then await monitor._task to ensure the task has completed before exiting
the patch scope and before calling monitor.start() again; ensure you still
simulate suppression of CancelledError inside the fake _loop to reproduce the
stuck-drain behavior while making teardown deterministic.
In `@tests/unit/settings/test_backup_subscriber.py`:
- Around line 103-108: The test uses assert_not_awaited() for scheduler methods
which can falsely pass if the async method was called but its coroutine never
awaited; update the assertions to use assert_not_called() for the scheduler
paths that must remain untouched: replace
service.scheduler.stop.assert_not_awaited() (and any similar occurrences like
the other two assertions around service.scheduler.stop/assert_not_awaited at the
other test cases) with service.scheduler.stop.assert_not_called(), and ensure
service.scheduler.start still uses service.scheduler.start.assert_awaited_once()
where appropriate.
---
Outside diff comments:
In `@src/synthorg/budget/trends.py`:
- Around line 241-261: The currency assertion is applied to all input records
before filtering by the requested time window, causing bucket_cost_records to
reject inputs that include out-of-range mixed-currency rows; change the logic so
you first filter records to the [start, end) window (using the same condition as
the loop that computes key = _bucket_key(ts, bucket_size)) and only then call
_assert_single_currency on the filtered list (or alternatively validate currency
per-record when adding to sums), so generate_bucket_starts, the sums aggregation
and TrendDataPoint creation remain unchanged.
In `@tests/unit/api/controllers/test_backup.py`:
- Around line 190-193: Replace identity checks with value equality in the backup
controller tests: instead of asserting ApiResponse.data is manifest, use
equality (==) so the test compares payloads by value not identity; update the
assertions around result, ApiResponse and manifest in the block that calls
service.get_backup.assert_awaited_once_with("abc123def456") and the similar
block at the 258-263 range so they use assert result.data == manifest while
keeping assert isinstance(result, ApiResponse) and the get_backup await
assertion unchanged.
- Around line 367-368: The test currently uses
service.restore_from_backup.assert_not_awaited() which can miss an un-awaited
invocation; change the assertion to
service.restore_from_backup.assert_not_called() so the test verifies the
confirm=False gate prevents any call at all (replace the assert_not_awaited()
call in the test for the confirm=false scenario with assert_not_called() on the
restore_from_backup mock).
In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 69-90: The test leaves the intentionally hung task running after
asserting the TimeoutError; modify the test_unrestartable_after_drain_timeout to
create an asyncio.Event gate and patch PruningService._run_loop with a hung_loop
that awaits that event (i.e., await gate.wait()) so the test can release the
gate after the timeout assertion, then await service._task to ensure the hung
coroutine finishes; specifically, reference the patched
PruningService._run_loop, the local hung_loop, the created gate Event, and the
service._task to locate where to add waiting/release and the final await to
clean up the task.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cc4e1775-c0ef-4a5d-bd43-757bc816ccf7
📒 Files selected for processing (48)
docs/design/backup.mddocs/design/client-simulation.mddocs/design/observability.mddocs/licensing.mddocs/reference/lifecycle-sync.mddocs/research/lgpl-postgres-driver-decision.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/auth/ticket_store.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/backup/service.pysrc/synthorg/budget/trends.pysrc/synthorg/client/continuous.pysrc/synthorg/client/store.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/observability/events/api.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/tools/mcp/cache.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/backup/test_scheduler.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/client/test_continuous_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/hr/pruning/test_service.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/tools/mcp/test_cache_threadsafety.pyweb/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
💤 Files with no reviewable changes (1)
- scripts/mock_spec_baseline.txt
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (13)
web/src/**/!(logger).ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/!(logger).ts?(x): Always usecreateLoggerfrom@/lib/loggerinstead of bareconsole.warn,console.error, orconsole.debugin application code; use variable namelog
Pass dynamic/untrusted values as separate args to log methods, not interpolated into the message string, so they go throughsanitizeArg
Wrap attacker-controlled fields inside structured objects withsanitizeForLog()before embedding in logs
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/**/*.ts?(x)
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.ts?(x): Callers MUST NOT wrap store mutation calls intry/catch; the store owns the error UX
NEVER hardcode Motion transition durations; use presets from@/lib/motion
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from@/utils/formatand design tokens
NEVER hardcode currency symbols or codes; useDEFAULT_CURRENCYfrom@/utils/currencies
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/api/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Use
getLiveness()for 200 always-alive checks andgetReadiness()for binary'ok' | 'unavailable'health checks (no tri-state); any new caller must handle the 503 path explicitly
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
web/src/api/endpoints/backup.tssrc/synthorg/observability/events/event_stream.pyweb/src/api/endpoints/clients.tssrc/synthorg/client/store.pysrc/synthorg/backup/service.pytests/unit/client/test_continuous_lifecycle.pysrc/synthorg/api/controllers/simulations.pytests/unit/settings/test_backup_subscriber.pytests/integration/api/controllers/test_client_simulation.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/api.pytests/unit/backup/test_scheduler_lifecycle.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/budget/trends.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pysrc/synthorg/meta/chief_of_staff/monitor.pytests/unit/providers/test_health_prober_lifecycle.pysrc/synthorg/tools/mcp/cache.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/backup/test_scheduler.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pytests/unit/budget/test_trends_currency.pytests/unit/communication/event_stream/test_stream_dedup.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pytests/unit/api/controllers/test_backup.pysrc/synthorg/client/continuous.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.py
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Units: metric only. Spelling: International / British English UI default (
colour,behaviour,organise,centred,analyse,cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through@/utils/format,@/utils/locale,DEFAULT_CURRENCY, anduseSettingsStore().currency, with no contradiction to the locale-neutral defaults above.
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse components from
web/src/components/ui/before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens,@/lib/motionpresets, and the helpers in@/utils/format. Enforced byscripts/check_web_design_system.py(PostToolUse hook on everyweb/src/edit).
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/**/*.md: Use Mermaid (\``mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side viapymdownx.superfences. Use **Markdown tables**: grid/matrix data that is semantically tabular (not diagrams). D2 uses theme 200 (Dark Mauve), dark-only render, configured globally inmkdocs.yml. Never use```text` blocks with ASCII/Unicode box-drawing characters for diagrams.
Files:
docs/research/lgpl-postgres-driver-decision.mddocs/licensing.mddocs/design/backup.mddocs/design/client-simulation.mddocs/design/observability.mddocs/reference/lifecycle-sync.md
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/observability/events/event_stream.pysrc/synthorg/client/store.pysrc/synthorg/backup/service.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/budget/trends.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/client/continuous.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/event_stream.pysrc/synthorg/client/store.pysrc/synthorg/backup/service.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/budget/trends.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/client/continuous.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/observability/events/event_stream.pysrc/synthorg/client/store.pysrc/synthorg/backup/service.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/budget/trends.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/client/continuous.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.py
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time viamkdocs-d2-plugin(dagre layout). Requires the [D2 CLI](https://d2lang.com/tour/install) onPATHlocally and in CI (pinned to v0.7.1 via.github/workflows/pages.yml`).
Files:
docs/design/backup.mddocs/design/client-simulation.mddocs/design/observability.md
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/client/store.pysrc/synthorg/backup/service.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/budget/trends.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/client/continuous.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/backup/scheduler.pysrc/synthorg/api/auth/ticket_store.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/client/test_continuous_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/integration/api/controllers/test_client_simulation.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/backup/test_scheduler.pytests/unit/budget/test_trends_currency.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/integrations/test_ngrok_adapter_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/client/test_continuous_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/integration/api/controllers/test_client_simulation.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/backup/test_scheduler.pytests/unit/budget/test_trends_currency.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/integrations/test_ngrok_adapter_lifecycle.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/simulations.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.py
🪛 LanguageTool
docs/design/observability.md
[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (25)
docs/research/lgpl-postgres-driver-decision.md (1)
1-80: Well-structured ADR documentation and guideline-compliant formatting.Clear decision framing, traceable audit linkage, and correct use of Markdown tables for trade-offs; no docs-format guideline violations found.
docs/licensing.md (1)
112-124: Good operator-facing licensing clarification for the optional Postgres path.This section cleanly distinguishes default vs optional dependency obligations and links to the deeper ADR context.
src/synthorg/observability/events/event_stream.py (1)
20-20: Good event constant addition.
EVENT_STREAM_HUB_PUBLISH_DEDUPEDfollows the existing event naming scheme and keeps observability events centralized.src/synthorg/observability/events/api.py (1)
80-80: Looks good.
API_WS_TICKET_LIMIT_EXCEEDEDis consistent with adjacent WS ticket event constants and improves event coverage cleanly.src/synthorg/communication/event_stream/stream.py (2)
69-107: Constructor hardening is solid.The validation and injected
Clockseam make dedup behavior fail-fast, testable, and safer under misconfiguration.
177-256: Dedup publish path is correctly synchronized and bounded.The lock-scoped duplicate check/record flow, no-subscriber short-circuit, and per-session TTL/size eviction logic are implemented cleanly.
docs/design/observability.md (1)
200-206: Nice observability doc update.The new event-stream section accurately documents dedup behavior and warning events in a clear, operator-friendly way.
tests/unit/communication/event_stream/test_stream_dedup.py (1)
39-165: Strong dedup regression coverage.These tests exercise the key concurrency-safe dedup invariants (TTL, zero-TTL mode, per-session isolation, bounded state, and unsubscribe cleanup) with proper
FakeClockusage.src/synthorg/api/auth/ticket_store.py (2)
150-174: Atomic cap-check + insert path looks correct.The lock now covers the full count-and-insert critical section, which closes the concurrent over-issuance race.
195-241: Consume and cleanup mutation paths are now safely serialized.
popand bulk-expiry deletion are both protected, preventing concurrent dictionary mutation hazards.src/synthorg/tools/mcp/cache.py (2)
77-106: Locked get-path refactor is solid.The lookup, TTL decision, and LRU move are now atomic under contention while preserving copy-on-read isolation.
125-162: Put/invalidate locking correctly protects cache state.This closes concurrent mutation races around eviction, replacement, and targeted invalidation.
tests/unit/api/auth/test_ticket_store_threadsafety.py (2)
36-61: Strong contention test for per-user cap.Using a start gate here materially improves the chance of surfacing cap-related races.
104-124: Create/cleanup mixed workload coverage is valuable.Good stress case for dictionary-mutation safety under concurrent operations.
tests/unit/tools/mcp/test_cache_threadsafety.py (2)
22-48: Nice improvement: concurrent hit-path is now explicitly exercised.Seeding + shared-key reads ensures the locked hit branch is tested, not only miss behavior.
69-80: Bounded-size assertion is a good integrity guard.This directly validates LRU capacity under concurrent insert pressure.
src/synthorg/integrations/webhooks/replay_protection.py (2)
94-167: Freshness/nonce split is clean and improves safety readability.The fail-closed path plus explicit freshness validation makes behavior easier to reason about.
200-215: Atomic duplicate-check + insert is correctly enforced.Locking this block closes the replay race where identical nonces arrive concurrently.
tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (1)
37-160: Coverage looks solid.The suite exercises all registry branches, including the
autonotify case and themodel_constructfallback checks for the expanded error messages.src/synthorg/client/store.py (1)
177-193: Atomic check-and-insert is implemented correctly.
register_if_absent()closes the race window cleanly underself._lockand matches the intended idempotency contract.web/src/api/endpoints/clients.ts (1)
1-1: Good idempotency fallback on HTTP 409.The
axios.isAxiosError(...status===409)branch correctly converges retries togetSimulation(simulation_id)and preserves caller flow.Also applies to: 267-285
tests/integration/api/controllers/test_client_simulation.py (1)
318-347: Strong integration coverage for duplicate-start rejection.This test validates the wire-level 201→409 sequence for same
simulation_id, which is the critical idempotency contract.tests/unit/api/controllers/test_simulations_idempotency.py (1)
1-147: Unit tests are focused and well-isolated.The success/failure claim branches are covered, and the patched collaborators avoid masking unrelated failures while keeping the idempotency assertion precise.
docs/design/client-simulation.md (1)
227-237: Documentation update is clear and aligned with runtime behavior.The idempotency section accurately explains atomic registration and the 409-on-duplicate contract.
src/synthorg/client/continuous.py (1)
134-135: The concern about_runningstaying True after cancellation is unfounded.The
finallyblock executes completely and acquires the lock successfully even when the task is cancelled. In Python's asyncio, aCancelledErrordoes not interrupt lock acquisition in afinallyblock—the finally block runs to completion, acquires the lock, resets_running = False, and only then does the exception propagate. The current code is correct and requires no changes.> Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 353-379: If create_task(...) for runner_task succeeds but a later
setup step fails, the except BaseException path currently unregisters the
simulation without cancelling the spawned task; fix this by detecting the
created task (the local task from asyncio.create_task called for runner_task),
and in the except BaseException handler cancel that task (task.cancel()), await
it (catch asyncio.CancelledError and ignore) or await asyncio.shield(task) with
a timeout, then proceed to await
sim_state.simulation_store.unregister(record.simulation_id) and re-raise; ensure
you still attach log_task_exceptions and sim_state.background_tasks.add/discard
only when setup completes successfully to avoid double-registering the task.
In `@src/synthorg/budget/trends.py`:
- Around line 237-239: The docstring incorrectly states
MixedCurrencyAggregationError is raised when the input parameter records span
multiple currencies, but the implementation checks only the filtered variable
in_window_records (see in_window_records check around where
MixedCurrencyAggregationError is raised); update the docstring wording to state
the error is raised when the records remaining after window filtering
(in_window_records) span multiple currencies, or otherwise make the check match
the docstring by inspecting the original records—mention the symbols
MixedCurrencyAggregationError, records, and in_window_records so reviewers can
find and align the docstring with actual behavior.
In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 3-8: Update the docstring/comments to stop claiming the lifecycle
lock spans the full run loop: state that ContinuousMode.start() only acquires
_lifecycle_lock for the _running check/set transition (it sets _running
true/raises if already running) and then releases the lock before entering the
synchronous run loop; remove or reword any sentence that says the lock is held
"through the loop" or "spans the full body" and ensure references to stop() and
the semantics of _running remain accurate (i.e., stop() signals termination
while the loop runs without holding _lifecycle_lock).
In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 45-53: The test test_concurrent_starts_spawn_one_task only asserts
service.is_running which doesn't prove only one run-loop task was spawned;
modify the test to explicitly verify a single task was created by either (a)
asserting that service._run_task (or the concrete attribute holding the created
Task) is non-None and that repeated start() calls return the same Task object
(e.g., compare identity), or (b) monkeypatching/patching asyncio.create_task (or
the service's internal task-creation hook) to count invocations and assert it
was called exactly once when calling service.start() concurrently; target
symbols: test_concurrent_starts_spawn_one_task, service.start(), and the service
attribute that stores the Task (e.g., service._run_task) or asyncio.create_task.
In `@tests/unit/integrations/test_replay_protection_threadsafety.py`:
- Around line 40-41: Add a per-wait timeout to the Barrier.wait() calls to avoid
tests hanging: change each barrier.wait() (the ones surrounding the
protector.check(nonce="duplicate-nonce", timestamp=ts) call and the other two
occurrences) to barrier.wait(timeout=5) (or another small reasonable value) and
handle a failed wait by catching threading.BrokenBarrierError (or failing the
test) so a timeout produces a fast, diagnosable failure instead of a long
deadlock-style hang.
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 37-47: The test can race-fail because the first spawned task may
finish before other start() calls run; modify the test to ensure the first task
remains running while the concurrent start() calls happen by making
builder.build return a coroutine that blocks on a controllable asyncio.Event (or
similar) so the spawned task does not complete immediately. Update
test_concurrent_starts_spawn_one_task to create an Event, patch or pass a
builder into _make_monitor so its build returns a coroutine that awaits that
Event, run asyncio.gather(monitor.start(), monitor.start(), monitor.start()),
assert monitor._task is not None and that all start() calls did not create
multiple tasks (e.g., compare identity of monitor._task or count spawns), then
set the Event to allow the task to finish and await monitor.stop(); reference
monitor.start(), monitor._task, _make_monitor, and builder.build to locate
changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c4a833d9-d9ad-4f1c-adfa-7fb383421533
📒 Files selected for processing (13)
src/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/budget/trends.pysrc/synthorg/client/store.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pytests/unit/api/controllers/test_backup.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/settings/test_backup_subscriber.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Build Backend
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Build Preview
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
tests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pysrc/synthorg/budget/trends.pytests/unit/settings/test_backup_subscriber.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pysrc/synthorg/client/store.pytests/unit/client/test_continuous_lifecycle.pysrc/synthorg/backup/scheduler.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/api/controllers/test_backup.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/api/controllers/test_backup.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/api/controllers/test_backup.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/client/store.pysrc/synthorg/backup/scheduler.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/client/store.pysrc/synthorg/backup/scheduler.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/budget/trends.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/client/store.pysrc/synthorg/backup/scheduler.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/client/store.pysrc/synthorg/backup/scheduler.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/simulations.py
🔇 Additional comments (25)
tests/unit/integrations/test_replay_protection_threadsafety.py (1)
14-17: Good test harness setup for deterministic thread-safety checks.Using
FakeClockfrom the shared test seam and a unit marker keeps these tests aligned with repo test conventions.tests/unit/client/test_continuous_lifecycle.py (1)
22-132: Good event-driven lifecycle tests and unit marker usage.Nice job replacing scheduler-dependent sleeps with runner signaling; this makes the concurrency assertions deterministic and aligned with the lock-gated start/stop behavior.
src/synthorg/communication/conflict_resolution/escalation/factory.py (3)
52-89: Strong defensive backend validation with pre-raise logging.
_require_persistencenow fails fast on missing/mismatched backend and logs structured context before raising, which aligns well with startup diagnosability requirements.
150-169: Unknown queue backend handling is now robust and operator-friendly.Good improvement: enumerating registered backends in both the error and structured warning makes misconfiguration triage much faster.
242-283: Decision-processor registry dispatch is clean and extensible.The
MappingProxyTyperegistry + explicit unknown-strategy warning/raise path is a solid replacement for conditional branching.src/synthorg/backup/scheduler.py (4)
36-41: LGTM!Lifecycle primitives are correctly initialized in
__init__: dedicated_lifecycle_lock(retained for service lifetime per past review),_stop_eventfor cooperative shutdown,_stop_failedflag for unrestartable state, and configurable drain timeout. This aligns with the canonical lifecycle pattern documented indocs/reference/lifecycle-sync.md.
48-86: LGTM!The
start()method correctly implements the canonical async lifecycle pattern: guarded by_lifecycle_lock, idempotent when already running, refuses restart after a timed-out stop, and attacheslog_task_exceptionscallback to surface unexpected loop termination. This addresses the past review concern about silent scheduler deaths.
88-145: LGTM!The
stop()method correctly implements coordinated shutdown: signals both events, cancels the task, drains with a shielded timeout, and marks the scheduler unrestartable on timeout to prevent stacking loops on orphan tasks. Events are recreated while holding_lifecycle_lock(lines 135-144), preventing the race condition where a concurrentstart()could bind to stale events. The lock itself is correctly preserved across the service lifetime.
169-202: LGTM!The
_run_loop()correctly honors cooperative shutdown via_stop_event, includes a post-wake stop check (lines 187-188) to prevent firing after shutdown is requested, and uses proper structured logging withsafe_error_descriptionfor exception handling. The PEP 758 except syntax and explicitCancelledErrorpropagation are correct.tests/unit/settings/test_backup_subscriber.py (3)
34-56: LGTM!The
_make_subscriberhelper correctly specs all mocks per the coding guidelines:BackupScheduler,BackupService,SettingsService, and their async methods. Thestartandstopmethods areAsyncMock(spec=...)to enforce the awaitable contract, whilerescheduleis a synchronousMagicMock(spec=...).
95-133: LGTM!The test assertions correctly use
assert_awaited_once()for async methods that must be awaited andassert_not_called()for paths that must avoid the scheduler entirely. The inline comments (lines 103-111) clearly document why these stricter assertions catch regressions thatassert_called_once()orassert_not_awaited()would miss.
136-176: LGTM!Advisory key tests correctly use
assert_not_called()for bothstartandstopmethods, ensuring the scheduler is never touched (not even with an unawaited coroutine) when processing advisory settings likecompression,on_shutdown, oron_startup.tests/unit/api/controllers/test_backup.py (4)
74-113: LGTM!The
_make_state_and_servicehelper correctly implements spec'd mocks forBackupService,AppState, andIdempotencyService. Therun_idempotentis implemented as an inline async helper that exercises the awaitable contract by callingawait callback(). The comment explainingSimpleNamespaceusage (lines 107-112) focuses on technical rationale without internal gate terminology, addressing past review feedback.
125-153: LGTM!The tests correctly pass
idempotency_keytocreate_backup.fn(), aligning with the controller now requiring theIdempotency-Keyheader. The assertion uses equality (==) rather than identity (is) for comparingresult.datato the manifest, which is appropriate for value comparison.
354-372: LGTM!The restore confirm gate test correctly uses
assert_not_called()with a clear comment explaining why this is stricter thanassert_not_awaited()— it catches regressions where the controller creates but forgets to await the service call.
395-403: LGTM!The guard test fixture correctly specs
BackupServiceandBackupSchedulermocks, withscheduler.stopas anAsyncMock(spec=BackupScheduler.stop)to match the now-async lifecycle method.src/synthorg/budget/trends.py (2)
242-256: Good ordering: window filter before currency assertion.This sequence is correct: Line 249 filters contributing rows first, then Line 252 validates currency on the actual aggregation set. Nice data-integrity fix for partial-window queries.
467-467: HTTP 409 translation forMixedCurrencyAggregationErroris correctly implemented.The exception raised at line 467 in
project_daily_spendis properly mapped:
MixedCurrencyAggregationErrordeclaresstatus_code: ClassVar[int] = 409in its definition (src/synthorg/budget/errors.py:144).- The handler is registered in
EXCEPTION_HANDLERS(src/synthorg/api/exception_handlers.py:853) and dispatched tohandle_domain_error, which reads thestatus_codeClassVar viagetattr(exc, "status_code", 500)at line 617.analytics.pyimports and callsproject_daily_spend; the exception bubbles up through the controller to the handler without being caught.No changes needed.
tests/unit/hr/pruning/test_service_lifecycle.py (1)
72-99: Good cancellation-safe timeout harness.Using
asyncio.Event().wait()plus the explicitreleasegate is a solid way to avoid flaky wall-clock sleeps and task leaks in this timeout-path test.src/synthorg/integrations/tunnel/ngrok_adapter.py (3)
47-74: LGTM on initialization and lifecycle setup.The auth token is correctly captured at init time (the sanctioned "init-time only" exception for bootstrap secrets), keeping the runtime
start()path off theos.environAPI. The lifecycle lock and state variables are properly initialized.
95-150: LGTM on idempotent start() with proper concurrency handling.The implementation correctly:
- Serializes access with
_lifecycle_lock- Returns the existing URL if already active (idempotent behavior)
- Builds a per-call
PyngrokConfiginstead of mutating the process-global config- Uses
safe_error_description(exc)for scrubbing sensitive error details before logging/raisingThis addresses all the past review concerns about global config mutation, idempotency, and the settings resolution path.
163-177: LGTM on best-effort stop() cleanup.The stop method properly:
- Acquires the lifecycle lock
- Returns early if no tunnel is active (no-op case)
- Performs best-effort disconnect with scrubbed error logging
- Clears state even on disconnect failure (documented intentional behavior)
tests/unit/integrations/test_ngrok_adapter_lifecycle.py (3)
38-65: LGTM on idempotency test.The test correctly verifies that a second
start()while active reuses the existing tunnel rather than invoking a second upstream connect.
67-91: LGTM on concurrency test.The
asyncio.gatherapproach validates that the lifecycle lock serializes concurrentstart()calls, resulting in only one tunnel creation.
93-107: LGTM on stop-without-start test.The test now uses
MagicMock(spec=ngrok.disconnect)andassert_not_called()to verify the no-op path, addressing the past review feedback about proving disconnect is never invoked.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tests/unit/client/test_continuous_lifecycle.py (1)
67-94:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReword the lock-scope comment.
await runner.ready.wait()only tells yourun()has started; it does not mean_lifecycle_lockis still held. The current wording repeats the stale lock-scope claim and could mislead future concurrency changes.Suggested wording
- # Wait until the runner reports it has entered ``run()``; this - # guarantees the first ``start()`` already holds the lifecycle - # lock before the second call below races for it. + # Wait until the runner reports it has entered ``run()``; this + # guarantees the first ``start()`` has set ``_running=True`` + # before the second call below races for it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/client/test_continuous_lifecycle.py` around lines 67 - 94, The comment incorrectly claims that awaiting runner.ready.wait() guarantees the lifecycle lock is still held; update the comment near the test_double_start_raises_when_already_running to accurately state that awaiting runner.ready.wait() only ensures the runner's run() has begun (i.e., run() entered and set ready), but does not assert that ContinuousMode's _lifecycle_lock is still held—explain that this timing merely increases the chance the first start() acquired the lock before the second start() races for it, and reference mode.start(), runner.ready.wait(), run(), and _lifecycle_lock in the reworded comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/simulations.py`:
- Around line 428-434: The exception handler around the post-claim setup
currently rolls back via _rollback_register_if_absent and re-raises without
logging the original error; modify the except BaseException block to capture the
exception (e.g., "except BaseException as exc") and log it at WARNING or ERROR
with context including spawned_task, sim_state and record.simulation_id (use the
module/class logger and include exc or exc_info=True) before calling
_rollback_register_if_absent and re-raising so the root cause is recorded for
failed starts.
In `@tests/unit/hr/pruning/test_service_lifecycle.py`:
- Around line 80-90: Add a positive running-state assertion after the second
await service.start() in test_restart_after_clean_stop to ensure the restart
actually takes effect: after calling await service.start() the second time,
assert service.is_running is True (do not access is_running before the second
start to avoid mypy narrowing); you can then await service.stop() as before and
optionally assert not service.is_running afterward.
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 122-128: The test can leak the monitor task when the
timeout/assertion raises; wrap the timeout/assertion block in a try/finally so
cleanup always runs: call await monitor.stop() inside the try, assert
monitor._stop_failed and capture monitor._task, and in the finally always call
release.set() and await the saved task (if not None) to ensure the hung monitor
is released; reference monitor.stop, monitor._stop_failed, monitor._task,
release.set and awaiting the task when implementing the change.
---
Duplicate comments:
In `@tests/unit/client/test_continuous_lifecycle.py`:
- Around line 67-94: The comment incorrectly claims that awaiting
runner.ready.wait() guarantees the lifecycle lock is still held; update the
comment near the test_double_start_raises_when_already_running to accurately
state that awaiting runner.ready.wait() only ensures the runner's run() has
begun (i.e., run() entered and set ready), but does not assert that
ContinuousMode's _lifecycle_lock is still held—explain that this timing merely
increases the chance the first start() acquired the lock before the second
start() races for it, and reference mode.start(), runner.ready.wait(), run(),
and _lifecycle_lock in the reworded comment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 34c568eb-986f-4875-be80-d684a16e0645
📒 Files selected for processing (6)
src/synthorg/api/controllers/simulations.pysrc/synthorg/budget/trends.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Build Backend
- GitHub Check: Dashboard Test
- GitHub Check: Dashboard Build
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Dashboard Type Check
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
src/synthorg/budget/trends.pytests/unit/client/test_continuous_lifecycle.pysrc/synthorg/api/controllers/simulations.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/budget/trends.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/budget/trends.pysrc/synthorg/api/controllers/simulations.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/client/test_continuous_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/client/test_continuous_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/simulations.py
🔇 Additional comments (15)
tests/unit/integrations/test_replay_protection_threadsafety.py (1)
16-94: Looks good.The concurrency tests are properly synchronized, deterministic, and cover the intended deduplication and bounded-eviction paths under contention.
src/synthorg/budget/trends.py (3)
17-17: No concern.
The helper import matches the updated validation flow.
236-270: Window-scoped currency validation is correct.
Filtering before_assert_single_currency()preserves partial-range queries while still rejecting mixed-currency buckets inside the requested window.
451-487: Single-currency enforcement is in the right place.
Validating after the empty-input fast path keeps the existing zero-input behavior and still blocks mixed-currency projections for real data.src/synthorg/api/controllers/simulations.py (3)
115-139: Good callback ordering to avoid silent task-failure drops.This sequence correctly registers exception logging before discard/add bookkeeping, which protects against fast-completing tasks.
141-174: Rollback sequencing is correct (cancel/drain before unregister).Cancelling and draining the spawned runner before unregistering the claim avoids the runner/store race you were targeting.
344-358: Atomic claim + explicit 409 conflict handling looks correct.Using
register_if_absent()here closes the duplicate-runner TOCTOU gap and cleanly enforces idempotency semantics.tests/unit/client/test_continuous_lifecycle.py (4)
1-64: Looks good.The docstring, imports, and
_FakeRunnercoordination helper are clean and avoid the old sleep-based synchronization.
95-103: Looks good.The disabled-path short-circuit is straightforward and the runner stays untouched as expected.
105-126: Looks good.The stop/release test is deterministic enough here and uses the runner event instead of ad-hoc sleeps.
128-135: Looks good.The
_lifecycle_locksmoke check matches the new lifecycle contract.tests/unit/hr/pruning/test_service_lifecycle.py (2)
45-78: Good fix for the concurrent-start race.Counting
asyncio.create_taskat the service call site directly proves only one run-loop task is spawned and closes the gap from the previous review.
92-124: Good coverage for the timeout and unrestartable path.This exercises the drain timeout, confirms
_stop_failedis set, and verifies a subsequentstart()is rejected.tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (2)
21-31: Spec-bound builder mock is exactly right.This follows the mock-spec gate and keeps
SnapshotBuilder.buildmocked with the right interface.
37-95: Good coverage for concurrent start and clean restart.The blocked
build()pluscreate_taskcounting makes the single-spawn assertion deterministic, and the restart-after-stop path is exercised cleanly.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py`:
- Around line 87-94: The test_restart_after_clean_stop currently restarts the
monitor but lacks a positive assertion that the restart took effect; after the
second await monitor.start() add an assertion like assert monitor.is_running (or
the monitor's equivalent running-state property) to mirror the PruningService
test and ensure start() actually transitions the monitor back to running; locate
the test function test_restart_after_clean_stop and update it to assert the
monitor's running state after the second start.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fff25349-f799-4cbe-b5fa-ffe3519fc0b3
📒 Files selected for processing (4)
src/synthorg/api/controllers/simulations.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Build Backend
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Build
- GitHub Check: Dashboard Type Check
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: Type Check
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/client/test_continuous_lifecycle.pysrc/synthorg/api/controllers/simulations.pytests/unit/hr/pruning/test_service_lifecycle.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/client/test_continuous_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/api/controllers/simulations.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/controllers/simulations.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/api/controllers/simulations.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/simulations.py
🔇 Additional comments (9)
tests/unit/client/test_continuous_lifecycle.py (3)
3-12: Docstring now matchesContinuousMode.start()lock semantics.The lock scope and
_runningtransition behavior are described precisely and align with the current implementation.
26-34: Good shift to event-driven synchronization in lifecycle tests.Using
runner.ready.wait()instead of scheduler-timing sleeps makes these assertions deterministic and less flaky under CI contention.Also applies to: 84-92, 123-127
70-98: Lifecycle coverage is solid across conflict, disabled path, and cooperative stop.These tests exercise the important control-flow branches and validate both behavior and state (
runs_completed) without leaking tasks.Also applies to: 99-107, 109-131
src/synthorg/api/controllers/simulations.py (4)
115-139: LGTM!The callback ordering logic is correctly implemented—attaching the exception logger before the discard callback ensures fast-completing failures are surfaced. The docstring clearly explains this subtlety.
141-174: LGTM!The rollback helper correctly sequences: cancel → drain → unregister. Drain failures are logged without blocking the unregister, preventing orphaned claims. This addresses the past concern about leaving tasks running after claim removal.
344-358: LGTM!The atomic registration via
register_if_absentwith proper 409 response for duplicates closes the idempotency gap. The detailed comment explaining the JetStream redelivery / HTTP retry scenario provides excellent context for future maintainers.
409-447: LGTM!The rollback handling correctly addresses the previous review concerns:
- Logs the original exception with context before rollback (lines 434-441)
- Cancels/drains the spawned task before unregistering via
_rollback_register_if_absent- The
spawned_taskboolean in the log aids diagnosis of pre- vs post-spawn failuresThe
MemoryError, RecursionErrorpassthrough (PEP 758) correctly propagates system errors without masking them.tests/unit/hr/pruning/test_service_lifecycle.py (1)
1-135: LGTM!The test suite comprehensively validates the canonical lifecycle pattern:
test_concurrent_starts_spawn_one_task: Correctly patchesasyncio.create_taskto count spawns and asserts exactly one task is created under concurrentstart()calls.test_restart_after_clean_stop: Properly assertsservice.is_runningafter the secondstart()to verify restart took effect.test_unrestartable_after_drain_timeout: Usesasyncio.Event().wait()for deterministic blocking (per guidelines), capturessaved_taskbefore assertions, and usestry/finallyto ensure cleanup even if assertions fail.Past review feedback has been addressed.
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (1)
21-31: LGTM on mock-spec compliance and lifecycle test patterns.All
AsyncMockusages properly specifyspec=SnapshotBuilderper the mock-spec gate requirement. The tests correctly useasyncio.Event().wait()for indefinite blocking andtry/finallyfor deterministic cleanup, addressing all prior review feedback.Also applies to: 37-85, 96-138
Source code: - simulations.py: roll back register_if_absent claim if post-claim setup raises (publish/spawn/callback registration); add SimulationStore.unregister - backup/scheduler.py: add log_task_exceptions done_callback so silent _run_loop deaths are surfaced via BACKUP_FAILED - escalation/factory.py: log API_APP_STARTUP warning with config value + registered keys before raising ValueError on unknown backend / strategy - ngrok_adapter.py: read NGROK_AUTHTOKEN at __init__ time per CLAUDE.md bootstrap-secret exception (no os.environ at runtime) - budget/trends.py: filter records to [start, end) before _assert_single_currency so out-of-window mixed-currency rows do not reject valid partial-range queries Tests: - test_backup.py: == not is for ApiResponse.data; assert_not_called for confirm=false gate - test_continuous_lifecycle: Event-driven runner, drop wall-clock asyncio.sleep gates - test_ngrok_adapter_lifecycle: replace swallowing fake disconnect with strict MagicMock(spec=ngrok.disconnect) + assert_not_called on stop-without-start path - test_replay_protection_threadsafety: threading.Barrier across all 3 ThreadPoolExecutor blocks so workers all hit the critical section simultaneously - test_monitor_lifecycle / test_pruning_service_lifecycle: asyncio.Event gate around hung loops; release+await task after timeout assertion (no leaked tasks) - test_backup_subscriber: assert_not_called instead of assert_not_awaited on scheduler.stop (catches unawaited coroutine creation)
Source code: - simulations.py: cancel + drain spawned runner_task before unregister on rollback path; extract _attach_runner_callbacks and _rollback_register_if_absent helpers - budget/trends.py: docstring matches in_window_records check (currency error fires on the post-filter set, not raw input) Tests: - test_continuous_lifecycle: module docstring reflects actual lock semantics (released before run loop body) - test_pruning_service_lifecycle: patch asyncio.create_task and assert exactly one spawn under concurrent start() callers - test_replay_protection: threading.Barrier.wait(timeout=5) on all three pools so a stalled worker fails fast instead of hanging the suite - test_monitor_lifecycle: concurrent-starts now uses a blocking builder.build (asyncio.Event) and counts asyncio.create_task spawns to assert exactly one task
Source code: - simulations.py: capture and log original exception in except BaseException before _rollback_register_if_absent + re-raise; without this the rollback drain log was the only trace of a failed start Tests: - test_continuous_lifecycle: reword sequencing comment to drop misleading 'lock still held' claim (lock released before run loop body; ready.wait() only proves run() has entered) - test_pruning_service_lifecycle.test_restart_after_clean_stop: assert is_running is True after the second start() so a silent restart no-op cannot pass - test_pruning_service_lifecycle.test_unrestartable_after_drain_timeout / test_monitor_lifecycle.test_unrestartable_after_drain_timeout: wrap timeout/assertion blocks in try/finally so release.set() + saved_task await always run, even when an assertion above raises (no orphan task leaks into next test)
- test_monitor_lifecycle.test_restart_after_clean_stop: assert monitor._task is not None after the second start() so a silent restart no-op cannot pass the test (mirrors the PruningService equivalent)
1463bd9 to
03887ab
Compare
There was a problem hiding this comment.
Actionable comments posted: 17
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Around line 88-107: Change the idempotency_key parameter type from plain str
to NotBlankStr to ensure whitespace-only Idempotency-Key header values are
rejected at parameter parsing; update the Annotated type for idempotency_key in
the function signature (the parameter named idempotency_key) and add the
necessary import for NotBlankStr from core.types so the validation runs before
any downstream logic that currently re-wraps the value.
- Around line 143-159: The cached idempotency result (outcome.result) is assumed
to always validate to BackupManifest but a corrupt/stale payload can make
BackupManifest.model_validate(...) raise and bypass the BACKUP_FAILED logging;
wrap the call to BackupManifest.model_validate(outcome.result) in a try/except
that catches the pydantic/validation error (and any deserialization error), log
an ERROR/WARNING using BACKUP_FAILED with context (scope="backup",
idempotency_key, endpoint="backup.create") and then raise an appropriate
InternalError/ServiceError (instead of letting the raw validation exception
leak); place this handling immediately after obtaining outcome from
app_state.idempotency_service.run_idempotent so the failure path from cached
manifests is translated and logged.
In `@src/synthorg/backup/scheduler.py`:
- Around line 57-67: The code raises a bare RuntimeError when self._stop_failed
in BackupScheduler.start(); define a domain-specific exception (e.g.,
BackupUnrestartableError) that inherits from DomainError and replace the
RuntimeError raise with raising this new exception, keeping the existing
BACKUP_FAILED logger call and message; ensure the new class follows the naming
pattern <Domain><Condition>Error and is exported/used by callers so the
backup-domain exception family flows through EXCEPTION_HANDLERS.
In `@src/synthorg/client/store.py`:
- Around line 194-205: unregister currently unconditionally deletes whatever is
at _runs[simulation_id]; change it to a compare-and-delete so it only removes
the entry when it still equals the originally claimed snapshot: add an optional
parameter expected (e.g. expected: Optional[RegistrationRecord]=None) to
unregister, and inside the async with self._lock check current =
self._runs.get(simulation_id) and only pop(simulation_id) and return True if
current is expected (or if expected is None preserve existing behavior),
otherwise return False; update the rollback caller in
start_simulation/register_if_absent to pass the originally claimed record as
expected so you don’t delete a later real run.
In `@src/synthorg/communication/event_stream/stream.py`:
- Around line 199-200: Remove the redundant unreachable early return that checks
"if not queues_snapshot: return" (the second check after the lock) — the
empty-queues case is already handled inside the lock when the snapshot is
created, so delete this dead branch referencing queues_snapshot to avoid
unreachable code in the event stream logic.
In `@src/synthorg/hr/pruning/service.py`:
- Around line 143-153: Replace the bare RuntimeError raised in
PruningService.start() when self._stop_failed is true with a pruning-domain
error: define a new DomainError subclass named PruningUnrestartableError
(following the <Domain><Condition>Error pattern) that inherits from DomainError,
import it into service.py, and raise PruningUnrestartableError(msg) instead of
RuntimeError(msg); keep the existing logger.warning call but ensure the raised
exception is the new pruning-scoped error so it flows through the centralized
EXCEPTION_HANDLERS and domain error handling.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 123-131: The adapter assigns self._tunnel before materializing the
tunnel.public_url, which can leave the object half-started if converting
public_url raises; update the start logic in ngrok_adapter (the async block that
calls ngrok.connect and assigns tunnel/self._tunnel) to first compute/convert
the public URL from the returned tunnel (e.g., materialize public_url into a
local variable or set self._public_url) and only after that assign self._tunnel
and publish state; additionally wrap the conversion in a try/except to
close/disconnect the newly created tunnel on failure so stop() won't later try
to disconnect a None/second tunnel.
In `@src/synthorg/integrations/webhooks/replay_protection.py`:
- Around line 163-166: The code samples the clock twice causing a boundary
replay gap; fix by sampling now once in check() and using that same now for
freshness and nonce eviction: add a helper or overload (e.g. implement
_check_freshness_at(timestamp, now) or extend check_freshness to accept an
optional now), change check() to do now = self._clock.now().timestamp() once,
call self._check_freshness_at(timestamp, now) (or
self.check_freshness(timestamp, now)) and then call
self._check_nonce(nonce=nonce, now=now); update any internal callers to use the
new helper/signature so all freshness and nonce decisions use the identical
timestamp.
In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 80-90: Replace the bare RuntimeError raised in
OrgInflectionMonitor when self._stop_failed is true with a domain-scoped
lifecycle error: define a new exception class named
InflectionMonitorLifecycleError (or OrgInflectionLifecycleError) that subclasses
DomainError, import DomainError into this module, and raise that new class with
the same msg; keep the existing logger.warning call (using
COS_INFLECTION_CHECK_FAILED) but ensure the raised exception type is the new
InflectionMonitorLifecycleError so the failure participates in the domain error
hierarchy and centralized EXCEPTION_HANDLERS routing.
In `@src/synthorg/providers/health_prober.py`:
- Around line 193-204: Replace the bare RuntimeError raised in
ProviderHealthProber when self._stop_failed is true with a provider-scoped
domain error: define a new error class (e.g. ProviderLifecycleConflictError)
that inherits from DomainError following the "<Domain><Condition>Error" naming
convention, import that class into health_prober.py, and raise
ProviderLifecycleConflictError(msg) instead of RuntimeError(msg); keep the
existing logger call (PROVIDER_HEALTH_PROBER_CYCLE_FAILED) and message intact so
the domain error is typed consistently across API/controller boundaries.
In `@src/synthorg/settings/subscribers/backup_subscriber.py`:
- Around line 112-113: Wrap the await scheduler.start() call in a try/except
inside the subscriber so any exception from BackupScheduler.start() is logged
with context before being allowed to propagate; catch Exception as exc, call the
module logger with structured fields error_type=type(exc).__name__ and
error=safe_error_description(exc) plus context like subscriber name/setting and
scheduler id, then re-raise (or translate if needed) so the failure still
surfaces. Ensure you reference the existing scheduler.start() call and use the
safe_error_description helper when logging.
In `@tests/unit/backup/test_scheduler_lifecycle.py`:
- Around line 55-68: The test's hung_loop implementation uses a fixed
asyncio.sleep(1.0) in the CancelledError path making the test slow/flaky; update
hung_loop (the patched BackupScheduler._run_loop) to use an asyncio.Event
created outside the coroutine and await event.wait() in the except/finally path
so the test can release the event deterministically (from the test body) before
asserting scheduler.stop() raises TimeoutError and checking
scheduler._stop_failed; ensure the Event is set in a finally block or by the
test to unblock the hung task rather than sleeping.
- Around line 30-39: The test test_concurrent_starts_spawn_one_task currently
only checks scheduler._task is not None which misses multiple creations; replace
this with a deterministic assertion that only one underlying task was created by
spying/mocking asyncio.create_task during the concurrent scheduler.start() calls
(or instrumenting scheduler to append created tasks to a list) and then assert
that the spy was called exactly once and that scheduler._task is the single
created task; target the scheduler.start() calls and scheduler._task symbol for
locating where to add the create_task spy/collector and the exact-one assertion.
In `@tests/unit/budget/test_trends_currency.py`:
- Around line 61-83: Add a regression test that ensures bucket_cost_records does
NOT raise MixedCurrencyAggregationError when mixed-currency records are all
outside the requested time window: create a new test (e.g.,
test_out_of_window_mixed_currency_no_error) that builds mixed-currency records
with timestamps outside [start, end) using the existing _record helper and _NOW
constant, call bucket_cost_records with that start/end and BucketSize.HOUR, and
assert it returns points (or zeros) without raising; follow the style of
test_mixed_currency_raises/test_empty_records_no_error and ensure you do not
change bucket_cost_records itself or other tests.
- Around line 50-51: Tests are hardcoding ISO codes ("EUR"/"USD") in
fixtures/assertions; import and use the project's Currency type/constant instead
(e.g., Currency.EUR or the project's equivalent factory) and replace the string
literals in the _record calls and all other occurrences in the test (references:
_record, the test_trends_currency test cases) so the tests derive currency
values from the shared Currency type rather than literal strings; ensure you add
the necessary import for Currency at the top of the test file and update every
"EUR"/"USD" occurrence indicated to use the Currency constant/type.
In `@tests/unit/providers/test_health_prober_lifecycle.py`:
- Around line 78-102: The patched ProviderHealthProber._run_loop in the test
suppresses cancellation and uses asyncio.sleep(1.0), leaving the background task
alive after the TimeoutError and leaking tasks; change the hung_loop to wait on
an asyncio.Event (e.g., cancel_started and a new release event), set
cancel_started on CancelledError, then in test cleanup set the release event and
await prober._task to ensure the background task finishes; specifically update
the patched _run_loop used in the with patch.object(ProviderHealthProber,
"_run_loop", hung_loop) block to gate the post-cancel wait on the release event,
call release.set() after the TimeoutError assertion, and await prober._task
before asserting the prober is unrestartable.
In `@web/src/api/endpoints/clients.ts`:
- Around line 273-283: When catching the 409 in the create path, don't
unconditionally return getSimulation(config.simulation_id); instead fetch the
existing run (via getSimulation(config.simulation_id)), compare its stored
config object to the requested config (use a deep-equality check such as
lodash/isEqual or an equivalent deepEqual helper) and only return the existing
run if the configs are identical; if they differ, rethrow the original err so
the 409 surfaces. Ensure you still detect axios.isAxiosError(err) &&
err.response?.status === 409, preserve the original error variable for rethrow,
and perform the comparison against the same fields used to create the simulation
(config) before returning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4638b438-f9e6-4733-9fac-75efd88ff139
📒 Files selected for processing (48)
docs/design/backup.mddocs/design/client-simulation.mddocs/design/observability.mddocs/licensing.mddocs/reference/lifecycle-sync.mddocs/research/lgpl-postgres-driver-decision.mdscripts/mock_spec_baseline.txtsrc/synthorg/api/auth/ticket_store.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/backup/service.pysrc/synthorg/budget/trends.pysrc/synthorg/client/continuous.pysrc/synthorg/client/store.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/observability/events/api.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/providers/health_prober.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/tools/mcp/cache.pytests/integration/api/controllers/test_client_simulation.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/api/controllers/test_backup.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/backup/test_scheduler.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/client/test_continuous_lifecycle.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/hr/pruning/test_service.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/tools/mcp/test_cache_threadsafety.pyweb/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
💤 Files with no reviewable changes (1)
- scripts/mock_spec_baseline.txt
📜 Review details
🧰 Additional context used
📓 Path-based instructions (12)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
src/synthorg/settings/subscribers/backup_subscriber.pyweb/src/api/endpoints/backup.tssrc/synthorg/observability/events/event_stream.pyweb/src/api/endpoints/clients.tssrc/synthorg/backup/service.pysrc/synthorg/client/store.pytests/integration/api/controllers/test_client_simulation.pysrc/synthorg/budget/trends.pytests/unit/backup/test_scheduler.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/observability/events/api.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/api/auth/test_ticket_store_threadsafety.pysrc/synthorg/meta/chief_of_staff/monitor.pytests/unit/client/test_continuous_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/providers/health_prober.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/budget/test_trends_currency.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/integrations/webhooks/replay_protection.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/communication/event_stream/test_stream_dedup.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pytests/unit/api/controllers/test_backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pytests/unit/integrations/test_ngrok_adapter_lifecycle.pysrc/synthorg/hr/pruning/service.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/backup/service.pysrc/synthorg/client/store.pysrc/synthorg/budget/trends.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/providers/health_prober.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/hr/pruning/service.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/backup/service.pysrc/synthorg/client/store.pysrc/synthorg/budget/trends.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/providers/health_prober.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/hr/pruning/service.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/observability/events/event_stream.pysrc/synthorg/backup/service.pysrc/synthorg/client/store.pysrc/synthorg/budget/trends.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/observability/events/api.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/providers/health_prober.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/hr/pruning/service.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/backup/service.pysrc/synthorg/client/store.pysrc/synthorg/budget/trends.pysrc/synthorg/tools/mcp/cache.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/providers/health_prober.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/continuous.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/communication/conflict_resolution/escalation/sweeper.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/communication/conflict_resolution/escalation/factory.pysrc/synthorg/hr/pruning/service.py
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Units: metric only. Spelling: International / British English UI default (
colour,behaviour,organise,centred,analyse,cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through@/utils/format,@/utils/locale,DEFAULT_CURRENCY, anduseSettingsStore().currency, with no contradiction to the locale-neutral defaults above.
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse components from
web/src/components/ui/before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens,@/lib/motionpresets, and the helpers in@/utils/format. Enforced byscripts/check_web_design_system.py(PostToolUse hook on everyweb/src/edit).
web/src/**/*.{ts,tsx}: Always usecreateLoggerfrom@/lib/logger; never use bareconsole.warn,console.error, orconsole.debugin application code
Logger variable name must always belog(e.g.,const log = createLogger('module-name'))
Use logger levels:log.debug()(DEV-only, stripped in production),log.warn(),log.error()
Pass dynamic/untrusted values as separate arguments to logger (not interpolated into the message string) so they go throughsanitizeArg
Wrap attacker-controlled fields inside structured objects withsanitizeForLog()before embedding in logs
Callers MUST NOT wrap store mutation calls intry/catch; the store owns the error UX
Display counts in paginated lists come fromdata.length; the wire envelope no longer carriestotal
Any new caller of health/readiness endpoints must handle the 503 path explicitly
UsesanitizeWsString()andsanitizeWsEnum()fromweb/src/utils/ws-sanitize.ts(pure helpers, re-exported from@/stores/notifications) for WebSocket payload sanitization
Any new WS payload handler that ingests untrusted strings MUST route throughsanitizeWsString()orsanitizeWsEnum(); raw(sanitizeWsString(x, n) ?? '') as EnumTypecasts are forbidden
NEVER hardcode Motion transition durations; use@/lib/motionpresets instead
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from@/utils/formatinstead
NEVER hardcode currency symbols / codes; useDEFAULT_CURRENCYfrom@/utils/currenciesinstead
Use<StatusBadge>instead of inline status dots (defaults torole="img"with aria-label;decorativefor adjacent-labeled,announcefor liv...
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/api/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
getLiveness()is always 200 while the process is alive;getReadiness()is 200 healthy / 503 unavailable (binary'ok' | 'unavailable'outcome, no tri-state)
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx,css}
📄 CodeRabbit inference engine (web/CLAUDE.md)
NEVER hardcode hex colors, font-family declarations, or pixel spacing; use design tokens instead
Files:
web/src/api/endpoints/backup.tsweb/src/api/endpoints/clients.ts
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time viamkdocs-d2-plugin(dagre layout). Requires the [D2 CLI](https://d2lang.com/tour/install) onPATHlocally and in CI (pinned to v0.7.1 via.github/workflows/pages.yml`).
Files:
docs/design/observability.mddocs/design/client-simulation.mddocs/design/backup.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/**/*.md: Use Mermaid (\``mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side viapymdownx.superfences. Use **Markdown tables**: grid/matrix data that is semantically tabular (not diagrams). D2 uses theme 200 (Dark Mauve), dark-only render, configured globally inmkdocs.yml. Never use```text` blocks with ASCII/Unicode box-drawing characters for diagrams.
Files:
docs/design/observability.mddocs/design/client-simulation.mddocs/design/backup.mddocs/licensing.mddocs/reference/lifecycle-sync.mddocs/research/lgpl-postgres-driver-decision.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/integration/api/controllers/test_client_simulation.pytests/unit/backup/test_scheduler.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/client/test_continuous_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup.pytests/unit/integrations/test_ngrok_adapter_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/integration/api/controllers/test_client_simulation.pytests/unit/backup/test_scheduler.pytests/unit/tools/mcp/test_cache_threadsafety.pytests/unit/api/controllers/test_backup_required_idempotency.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/api/auth/test_ticket_store_threadsafety.pytests/unit/client/test_continuous_lifecycle.pytests/unit/settings/test_backup_subscriber.pytests/unit/communication/conflict_resolution/escalation/test_factory_registry.pytests/unit/integrations/test_replay_protection_threadsafety.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/api/controllers/test_simulations_idempotency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/hr/pruning/test_service.pytests/unit/communication/event_stream/test_stream_dedup.pytests/unit/api/controllers/test_backup.pytests/unit/integrations/test_ngrok_adapter_lifecycle.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/backup.pysrc/synthorg/api/auth/ticket_store.pysrc/synthorg/api/controllers/simulations.py
🪛 LanguageTool
docs/design/observability.md
[grammar] ~205-~205: Ensure spelling is correct
Context: ...-window TTL (default 60s). The hub keys dedup on event.id; identical ids within the...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (24)
src/synthorg/observability/events/api.py (1)
80-80: Event constant addition is consistent and correct.
API_WS_TICKET_LIMIT_EXCEEDEDfollows the existingAPI_WS_TICKET_*naming/key convention and keeps the event catalog centralized.docs/research/lgpl-postgres-driver-decision.md (1)
24-29: Decision matrix formatting is correct and guideline-compliant.Using a Markdown table for the option trade-off section is the right format and keeps the ADR easy to scan.
As per coding guidelines, “Use Markdown tables: grid/matrix data that is semantically tabular (not diagrams).”
docs/licensing.md (1)
112-122: Third-party license section is clear and well-grounded.The new section cleanly scopes default vs optional install obligations and points to the decision record for traceability.
src/synthorg/communication/conflict_resolution/escalation/factory.py (2)
150-169: Defensive fallback behavior is solid.Unknown backend/strategy paths now log contextual startup warnings (including registered keys) before raising, which is exactly what operators need during misconfiguration triage.
Also applies to: 267-282
122-129: Registry refactor is clean and immutable.Using
MappingProxyTypefor dispatch tables keeps the extension point explicit and prevents accidental runtime mutation.Also applies to: 242-250
tests/unit/communication/conflict_resolution/escalation/test_factory_registry.py (2)
85-98: Goodcross_instance_notify="auto"regression coverage.This explicitly protects the shared
{"auto","on"}branch from narrowing regressions.
139-160: Fallback tests are well tightened.Asserting registry option names in the
ValueErrormessages validates the operator-facing diagnostics, not just exception type.src/synthorg/tools/mcp/cache.py (4)
58-58: LGTM!Adding
threading.Lockfor thread-safety is appropriate for a cache accessed from thread pools.
78-105: LGTM!The lock correctly guards the read-decision-write sequence (TTL check, deletion,
move_to_end). Logging/metrics are wisely moved outside the lock to minimize hold time. Thedeepcopyon the localhitreference outside the lock is safe sincehitholds a valid reference regardless of concurrent cache mutations.
124-142: LGTM!The lock correctly guards the entire put sequence (delete existing → evict oldest → insert). The eviction loop handles edge cases properly, including
max_size=0(disables caching). Logging evicted keys outside the lock minimizes contention.
154-161: LGTM!Lock guards all invalidation paths. Building the key list before deletion avoids "dictionary changed size during iteration" errors.
tests/unit/tools/mcp/test_cache_threadsafety.py (3)
22-47: LGTM!The test correctly exercises the cache hit path under contention by seeding a shared key and having readers query it while writers update it concurrently. The implicit "no exception raised" assertion is appropriate for thread-safety verification.
49-67: LGTM!Good coverage for concurrent
put+ tool-specificinvalidateoperations. The mix of writers and invalidators stresses the lock-guarded iteration logic.
69-80: LGTM!The test effectively stresses eviction under high concurrency with many more unique keys than
max_size. Asserting<= max_sizecorrectly validates the bounded-cache invariant.src/synthorg/budget/trends.py (1)
244-255: Currency invariant is enforced at the right aggregation boundaries.Filtering first (Line 251) and then validating currency uniformity, plus enforcing it in
project_daily_spend(Line 469), correctly prevents mixed-currency sums without rejecting irrelevant out-of-window rows.Also applies to: 469-470
src/synthorg/api/auth/ticket_store.py (1)
8-14: Lock scope and rejection logging look correct.The updated lock coverage closes the count-and-insert, pop-and-validate, and snapshot/delete race windows, and the cap-rejection path now logs before raising. This matches the audit goal cleanly.
Also applies to: 26-40, 118-241
tests/unit/api/auth/test_ticket_store_threadsafety.py (1)
32-124: The new contention tests exercise the right invariants.Using a shared start gate plus threadpool workers gives these tests meaningful concurrency pressure, and the four cases map well to the store's cap, isolation, single-consume, and cleanup guarantees.
src/synthorg/observability/events/event_stream.py (1)
20-20: LGTM!The new event constant follows the established naming convention (
<domain>.<noun>.<verb>) and is consistent with the existingEVENT_STREAM_HUB_*constants in this module.src/synthorg/communication/event_stream/stream.py (4)
69-107: LGTM!The constructor properly validates inputs fail-fast (preventing runtime crashes in
_record_published_locked), implements the Clock seam pattern for testability, and uses__slots__for memory efficiency. The validation bounds are correct:max_queue_size >= 1,dedup_ttl_seconds >= 0(allowing 0 to disable TTL-based eviction), anddedup_max_entries_per_session >= 1.
139-155: LGTM!The cleanup logic correctly removes per-session dedup state when the last subscriber for that session unsubscribes, preventing memory leaks for long-lived hubs with session churn. Using
pop(session_id, None)handles sessions that never had any published events gracefully.
212-239: LGTM!The duplicate-check logic correctly handles the
dedup_ttl_seconds=0case by skipping time-based eviction entirely (entries only fall out via size-bound trimming). The eviction loop properly walks oldest-first thanks toOrderedDictinsertion order, and the cleanup at lines 236-238 prevents empty dicts from accumulating.
241-255: LGTM!The bounded recording logic is correct: insert the new entry, then trim oldest entries via
popitem(last=False)until within bounds. The constructor's validation ofdedup_max_entries_per_session >= 1ensures the trim loop never callspopitem()on an empty dict.tests/unit/communication/event_stream/test_stream_dedup.py (1)
1-164: LGTM!Comprehensive test coverage for the dedup functionality:
- Duplicate suppression within TTL
- Redelivery after TTL expiry
- Distinct event IDs all delivered
- Per-session dedup isolation
- Bounded window size enforcement
- Cleanup on last unsubscribe
- Partial unsubscribe preserves dedup
dedup_ttl_seconds=0disables time-based evictionThe tests follow guidelines:
FakeClockinjection via theclock=parameter (FakeClock-first),pytestmarkfor unit marker, and no mocks needed since real objects are used.docs/design/observability.md (1)
200-206: LGTM!The documentation accurately describes both event-stream hub warning events, including the new
EVENT_STREAM_HUB_PUBLISH_DEDUPED. The description correctly captures the per-session sliding-window behavior, default TTL (60s), and bounded size (1024 entries). The static analysis hint about "dedup" is a false positive—it's standard technical shorthand.
Domain errors (CLAUDE.md naming convention): - BackupUnrestartableError (backup/errors.py) replaces RuntimeError in BackupScheduler.start() - PruningUnrestartableError (hr/errors.py) replaces RuntimeError in PruningService.start() - InflectionMonitorLifecycleError (meta/chief_of_staff/monitor.py) replaces RuntimeError in OrgInflectionMonitor.start() - ProviderLifecycleConflictError (providers/errors.py) replaces RuntimeError in ProviderHealthProber.start() Source code: - backup.py controller: idempotency_key str -> NotBlankStr; cached BackupManifest.model_validate wrapped in try/except + BACKUP_FAILED log + InternalServerException - SimulationStore.unregister: compare-and-delete via expected= snapshot so a fresh retry winning the slot is preserved instead of being silently deleted by the loser's rollback - event_stream/stream.py: drop dead post-lock if-not-queues branch (already handled inside lock) - ngrok_adapter: materialise public_url BEFORE assigning _tunnel; cleanup-disconnect on conversion failure - replay_protection: sample clock once per check() and pass the same now to freshness + nonce eviction (closes boundary replay gap) - backup_subscriber: try/except around scheduler.start() with structured log + re-raise Tests: - test_scheduler_lifecycle: spawn-count assertion + Event-driven hung loop with finally cleanup - test_trends_currency: derive currencies from DEFAULT_CURRENCY (validator-allowlist forces a real ISO code for the alternate); add out-of-window mixed-currency regression test - test_health_prober_lifecycle: Event-driven hung loop with finally release+await - All four lifecycle tests: pytest.raises now expects the new domain-error class instead of RuntimeError Frontend: - clients.ts startSimulation 409 path: deep-compare existing config to incoming config (field-by-field, type-safe); only inherit the in-flight run when configs match, else rethrow
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/providers/health_prober.py (1)
158-183: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftAdd a
Clockseam before the new timeout behavior hardens around wall time.
ProviderHealthProbernow owns more timing-sensitive lifecycle logic, but it still bakes in wall-clock reads/deadlines. That makes these timeout paths harder to drive deterministically and forces the tests onto real time. Please threadclock: Clock | None = Nonethrough the constructor and route the interval/drain timing through it so the lifecycle tests can useFakeClock.As per coding guidelines, "Time injection (Clock seam): classes that read time or sleep cooperatively take
clock: Clock | None = Nonedefaulting toSystemClock()(synthorg.core.clock); tests injectFakeClock."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/providers/health_prober.py` around lines 158 - 183, Add a Clock seam to ProviderHealthProber by adding a new constructor parameter clock: Clock | None = None (defaulting to SystemClock()) in __init__, import Clock and SystemClock from synthorg.core.clock, assign self._clock = clock or SystemClock(), and replace any direct wall-clock reads/sleeps or uses of _interval/_stop_drain_timeout_seconds in the class (e.g., any await asyncio.sleep(self._interval) or time-based deadline calculations) to use the injected clock for sleeping and deadline computation so tests can inject FakeClock; keep existing validation of interval_seconds and current attributes (_interval, _stop_drain_timeout_seconds, _lifecycle_lock, _stop_event, _task, _stop_failed) otherwise unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Around line 143-145: The code is incorrectly calling NotBlankStr(...) at
runtime which raises a TypeError; update the
app_state.idempotency_service.run_idempotent call to pass plain validated
strings (use scope="backup" and key=idempotency_key) instead of constructing
NotBlankStr("backup") or NotBlankStr(idempotency_key), relying on the type
annotations (NotBlankStr) in the service signature for validation rather than
runtime wrapping.
In `@src/synthorg/hr/pruning/service.py`:
- Around line 171-177: In stop(), don't call self._task.cancel() immediately;
instead set self._stop_event and self._wake_event, copy task = self._task,
release the _lifecycle_lock, then await the task to drain cooperatively (e.g.
await asyncio.wait_for(task, timeout=self._graceful_shutdown_timeout)) so
run_pruning_cycle() can finish via the stop/wake path; only on
asyncio.TimeoutError (hard deadline) call task.cancel() and await it to ensure
proper cancellation. Use _lifecycle_lock, _stop_event, _wake_event, _task,
stop(), and run_pruning_cycle() as the referenced symbols.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 101-107: The start() method currently logs TUNNEL_ERROR at WARNING
when self._public_url is already set (the idempotent no-op path); change this to
a non-error event/severity (e.g., use logger.info or logger.debug and a new
message/event like "tunnel_already_active" instead of TUNNEL_ERROR) so repeated
legitimate calls to start() (which check self._public_url and self._port) do not
emit tunnel-failure alerts; update the logging invocation in start() that
references self._public_url, self._port, and TUNNEL_ERROR accordingly and keep
the same contextual fields (phase="start", reason="already_active", port=...)
but with the lowered severity/event name.
In `@src/synthorg/meta/chief_of_staff/monitor.py`:
- Around line 135-163: The created helper task pattern leaks when wait_for times
out; replace creating drain_task and shielding it with directly awaiting the
_drain() coroutine under asyncio.wait_for so cancellation propagates and no
orphaned tasks remain: remove asyncio.create_task(_drain()) and
asyncio.shield(drain_task), call await asyncio.wait_for(_drain(),
timeout=self._stop_drain_timeout_seconds) instead, keeping the existing
try/except handling (including handling asyncio.CancelledError,
MemoryError/RecursionError re-raises, and logging via
COS_INFLECTION_CHECK_FAILED with safe_error_description) and preserving the
TimeoutError branch that sets self._stop_failed and logs the error.
---
Outside diff comments:
In `@src/synthorg/providers/health_prober.py`:
- Around line 158-183: Add a Clock seam to ProviderHealthProber by adding a new
constructor parameter clock: Clock | None = None (defaulting to SystemClock())
in __init__, import Clock and SystemClock from synthorg.core.clock, assign
self._clock = clock or SystemClock(), and replace any direct wall-clock
reads/sleeps or uses of _interval/_stop_drain_timeout_seconds in the class
(e.g., any await asyncio.sleep(self._interval) or time-based deadline
calculations) to use the injected clock for sleeping and deadline computation so
tests can inject FakeClock; keep existing validation of interval_seconds and
current attributes (_interval, _stop_drain_timeout_seconds, _lifecycle_lock,
_stop_event, _task, _stop_failed) otherwise unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bd57e3ce-5e15-481e-a5af-0d70a13fb224
📒 Files selected for processing (20)
src/synthorg/api/controllers/backup.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/errors.pysrc/synthorg/backup/scheduler.pysrc/synthorg/client/store.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/hr/errors.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/providers/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/settings/subscribers/backup_subscriber.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/budget/test_trends_currency.pytests/unit/hr/pruning/test_service_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/providers/test_health_prober_lifecycle.pyweb/src/api/endpoints/clients.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: Dashboard Test
- GitHub Check: Type Check
- GitHub Check: Test (Python 3.14)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Build Preview
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
web/src/api/endpoints/clients.tssrc/synthorg/backup/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/providers/errors.pytests/unit/budget/test_trends_currency.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/client/store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/errors.pysrc/synthorg/api/controllers/backup.pytests/unit/providers/test_health_prober_lifecycle.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/simulations.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.py
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Units: metric only. Spelling: International / British English UI default (
colour,behaviour,organise,centred,analyse,cancelled); document deviations. Spelling here is an editorial / UI-copy decision only; it does not affect runtime locale-sensitive formatting. Numbers, dates, times, currencies, and units still resolve via the user / company / browser / system fallback through@/utils/format,@/utils/locale,DEFAULT_CURRENCY, anduseSettingsStore().currency, with no contradiction to the locale-neutral defaults above.
Files:
web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Reuse components from
web/src/components/ui/before creating new ones. Never hardcode hex colors, font-family, pixel spacing, Motion transitions, or BCP 47 locale strings; use design tokens,@/lib/motionpresets, and the helpers in@/utils/format. Enforced byscripts/check_web_design_system.py(PostToolUse hook on everyweb/src/edit).
web/src/**/*.{ts,tsx}: Always usecreateLoggerfrom@/lib/logger; never use bareconsole.warn,console.error, orconsole.debugin application code
Logger variable name must always belog(e.g.,const log = createLogger('module-name'))
Use logger levels:log.debug()(DEV-only, stripped in production),log.warn(),log.error()
Pass dynamic/untrusted values as separate arguments to logger (not interpolated into the message string) so they go throughsanitizeArg
Wrap attacker-controlled fields inside structured objects withsanitizeForLog()before embedding in logs
Callers MUST NOT wrap store mutation calls intry/catch; the store owns the error UX
Display counts in paginated lists come fromdata.length; the wire envelope no longer carriestotal
Any new caller of health/readiness endpoints must handle the 503 path explicitly
UsesanitizeWsString()andsanitizeWsEnum()fromweb/src/utils/ws-sanitize.ts(pure helpers, re-exported from@/stores/notifications) for WebSocket payload sanitization
Any new WS payload handler that ingests untrusted strings MUST route throughsanitizeWsString()orsanitizeWsEnum(); raw(sanitizeWsString(x, n) ?? '') as EnumTypecasts are forbidden
NEVER hardcode Motion transition durations; use@/lib/motionpresets instead
NEVER hardcode BCP 47 locale literals ('en-US'); use helpers from@/utils/formatinstead
NEVER hardcode currency symbols / codes; useDEFAULT_CURRENCYfrom@/utils/currenciesinstead
Use<StatusBadge>instead of inline status dots (defaults torole="img"with aria-label;decorativefor adjacent-labeled,announcefor liv...
Files:
web/src/api/endpoints/clients.ts
web/src/api/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
getLiveness()is always 200 while the process is alive;getReadiness()is 200 healthy / 503 unavailable (binary'ok' | 'unavailable'outcome, no tri-state)
Files:
web/src/api/endpoints/clients.ts
web/src/**/*.{ts,tsx,css}
📄 CodeRabbit inference engine (web/CLAUDE.md)
NEVER hardcode hex colors, font-family declarations, or pixel spacing; use design tokens instead
Files:
web/src/api/endpoints/clients.ts
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/backup/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/providers/errors.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/client/store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/errors.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/backup/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/providers/errors.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/client/store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/errors.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/backup/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/providers/errors.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/client/store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/errors.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/backup/errors.pysrc/synthorg/providers/health_prober.pysrc/synthorg/providers/errors.pysrc/synthorg/settings/subscribers/backup_subscriber.pysrc/synthorg/client/store.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/hr/errors.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/communication/event_stream/stream.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/api/controllers/simulations.pysrc/synthorg/backup/scheduler.pysrc/synthorg/hr/pruning/service.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/budget/test_trends_currency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/budget/test_trends_currency.pytests/unit/providers/test_health_prober_lifecycle.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.pytests/unit/backup/test_scheduler_lifecycle.pytests/unit/hr/pruning/test_service_lifecycle.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/backup.pysrc/synthorg/api/controllers/simulations.py
🔇 Additional comments (28)
tests/unit/budget/test_trends_currency.py (2)
26-33: Hardcoded ISO currency literals are still present in alternate-currency selection.
_ALTERNATE_CURRENCYstill uses"USD"/"EUR"literals; this was previously flagged and should be switched to project currency constants/types in tests as well.As per coding guidelines, Currency: never hardcode ISO 4217 codes or symbols.
94-123: Nice coverage on windowed mixed-currency behavior and forecast edge cases.The out-of-window regression plus mixed-currency/empty-input assertions for both
bucket_cost_recordsandproject_daily_spendare aligned with the new invariants.Also applies to: 136-150
src/synthorg/client/store.py (2)
177-192: LGTM!The
register_if_absentimplementation correctly performs atomic check-and-insert under the lock, preventing TOCTOU races where two concurrent callers could both observe absence and proceed.
194-224: LGTM!The compare-and-delete semantics using identity comparison (
current is not expected) are correct for this use case. SinceSimulationRecordis frozen, the caller's original claim object remains stable, and identity comparison ensures that a fast retry that won the slot between failure and rollback is preserved rather than silently deleted.src/synthorg/api/controllers/simulations.py (4)
115-138: LGTM!The callback attachment order is correct: registering the exception logger before the discard callback ensures that a fast-completing failure surfaces its error before being removed from the background task set.
141-178: LGTM!The rollback helper correctly handles the task lifecycle: it cancels, awaits drain (catching
CancelledError), logs drain failures, and then unregisters with compare-and-delete semantics by passingexpected=record. TheMemoryError, RecursionErrorpropagation follows project convention for system-fatal exceptions.
349-363: LGTM!The atomic claim flow correctly uses
register_if_absentto prevent duplicate runners. The comment clearly explains the JetStream redelivery / HTTP 5xx retry scenario that motivates the 409-on-duplicate guard.
414-452: LGTM!The post-claim setup is properly wrapped with rollback handling. The implementation:
- Tracks whether the task was spawned before failure
- Logs the failure cause with structured context before rollback
- Propagates system-fatal exceptions (
MemoryError,RecursionError) without rollback- Passes the original
recordtounregisterfor compare-and-delete safetyThis addresses all the concerns from past reviews.
web/src/api/endpoints/clients.ts (2)
264-275: LGTM!The
configsEqualhelper correctly performs field-by-field comparison of all fiveSimulationConfigfields. This is more robust thanJSON.stringifywhich is sensitive to key order, and the explicit field list serves as documentation of the comparison contract.
280-302: LGTM!The 409 handling correctly implements idempotent retry semantics:
- Detects Axios errors with status 409
- Fetches the existing simulation to compare configs
- Returns the existing result only when configs match exactly
- Rethrows the original error when configs differ, ensuring a true collision surfaces to the caller
This addresses the past review concern about silently inheriting an unrelated in-flight runner.
src/synthorg/communication/event_stream/stream.py (5)
1-30: LGTM!Imports are well-organized, event constants are properly imported from the observability module, and default values are sensibly chosen.
69-107: LGTM!The constructor correctly implements fail-fast validation for all parameters and follows the clock injection pattern per guidelines. The docstring is comprehensive and accurately describes the behavior of
dedup_ttl_seconds=0(disables time-based eviction, size-bound only).
139-155: LGTM!The dedup-window cleanup on last unsubscribe correctly prevents memory leaks for sessions with no active subscribers. The
pop(session_id, None)is safe and the rationale is well-documented.
157-209: LGTM!The publish flow correctly addresses the past review concerns: subscribers are checked before recording dedup state (preventing false suppression on retry after reconnect), and the dedup window is cleaned up for sessions with no subscribers. The lock scope is minimal—delivery happens outside the lock to avoid serializing publishers.
210-253: LGTM!The helper methods correctly implement the dedup logic:
_is_duplicate_lockedproperly handlesTTL=0by skipping time-based eviction (per the documented behavior)._record_published_lockedbounds entries with FIFO eviction viapopitem(last=False).- Both methods are correctly marked as requiring the caller to hold
self._lock.src/synthorg/backup/errors.py (1)
40-52: LGTM!The
BackupUnrestartableErrorfollows the established lifecycle error pattern (matchingProviderLifecycleConflictError,InflectionMonitorLifecycleError,PruningUnrestartableError), correctly inherits fromConflictErrorfor HTTP 409 semantics, and includes a clear docstring explaining the lifecycle scenario.src/synthorg/backup/scheduler.py (4)
37-42: LGTM!The lifecycle primitives are correctly initialized following the canonical pattern from
docs/reference/lifecycle-sync.md: dedicated lock, stop event, drain timeout, and unrestartable flag.
49-87: LGTM!The
start()method correctly implements the canonical async lifecycle pattern:
- Protected by
_lifecycle_lockfor concurrent safety- Refuses to start when
_stop_failedis set (raising domain-specificBackupUnrestartableError)- Idempotent via
is_runningcheck- Clears events before spawning the task
- Attaches
log_task_exceptionscallback to surface unexpected loop deaths
89-146: LGTM!The
stop()method correctly implements coordinated shutdown:
- Protected by
_lifecycle_lock- Signals stop via both events and task cancellation
- Shielded drain with hard timeout that marks the scheduler unrestartable on failure
- Events are recreated while holding the lock to prevent racing
start()calls from binding to stale events- The lock instance itself is preserved (addressing the past review concern)
170-203: LGTM!The
_run_loop()correctly honors cooperative shutdown:
- Loops while
_stop_eventis not set- Re-checks
_stop_eventafter the sleep/wait before firing a backup- Uses structured logging with
safe_error_descriptionfor backup failuressrc/synthorg/settings/subscribers/backup_subscriber.py (1)
112-134: LGTM!The error handling for
scheduler.start()correctly follows the coding guidelines:
MemoryError/RecursionErrorare re-raised immediately- Other exceptions are logged with structured fields (
error_type,error=safe_error_description(exc)) plus context (subscriber, namespace, key, note) before re-raising- This addresses the past review comment about surfacing startup failures with context
tests/unit/backup/test_scheduler_lifecycle.py (3)
22-25: LGTM!The
_make_schedulerhelper correctly uses spec-bound mocks per the mock-spec gate requirement, addressing the past review comment.
31-61: LGTM!The test correctly verifies the canonical lifecycle contract by:
- Counting actual
create_taskcalls via a patched wrapper- Asserting exactly one task is spawned across three concurrent
start()calls- Properly cleaning up with a
finallyblockThis addresses the past review comment about explicitly asserting single task creation.
72-108: LGTM!The test correctly verifies the unrestartable-after-drain-timeout behavior:
- Uses
asyncio.Event().wait()for indefinite blocking (not wall-clock sleep)- Uses
enteredandreleaseevents for deterministic test control- Properly drains the orphan task in
finallyto prevent leakage- Asserts
_stop_failedis set and subsequentstart()raisesBackupUnrestartableErrorThis addresses the past review comments about event-gated cancel paths.
src/synthorg/providers/health_prober.py (1)
194-279: Nice lifecycle race fix.Keeping
_stop_eventrecreation under_lifecycle_lockcloses the earlier start/stop gap cleanly and matches the canonical unrestartable-on-timeout pattern.tests/unit/providers/test_health_prober_lifecycle.py (2)
25-42: ⚡ Quick winKeep the lifecycle helper hermetic; it currently opens real HTTP probes.
_make_prober()returns an eligible localhost provider, andstart()immediately schedules_probe_all()->_execute_probe(). That means these unit tests depend on whatever is listening on127.0.0.1:11434, or on a real connect failure when nothing is there. For lifecycle-only coverage, return no providers here or stub_execute_probe().[sraise_major_issue]
Suggested patch
-from synthorg.config.schema import ProviderConfig ... def _make_prober() -> ProviderHealthProber: - config = MagicMock(spec=ProviderConfig) - config.base_url = "http://localhost:11434" - config.litellm_provider = "ollama" - config.auth_type = "none" - config.api_key = None resolver = MagicMock(spec=ConfigResolver) resolver.get_provider_configs = AsyncMock( spec=ConfigResolver.get_provider_configs, - return_value={"test-local": config}, + return_value={}, ) resolver.get_int = AsyncMock(spec=ConfigResolver.get_int, return_value=11434)As per coding guidelines, "Flaky tests: NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally."
74-112: Good fix on the timeout-path cleanup.Gating the stuck loop on
releaseand awaiting the saved task removes the pending-task leak from the earlier version.src/synthorg/integrations/webhooks/replay_protection.py (1)
124-135: TheWEBHOOK_REPLAY_DETECTEDevent does not require_ALLOWED_PROPERTIESregistration.
WEBHOOK_REPLAY_DETECTEDis an observability event (fromsynthorg.observability.events.integrations), not a telemetry event. Observability logs use the structlog processor pipeline (sanitize_sensitive_fields,scrub_event_fields) for field sanitization; they do not go through thetelemetry.privacy.PrivacyScrubberor its_ALLOWED_PROPERTIESallowlist. The privacy validation framework insrc/synthorg/telemetry/privacy.pyapplies only toTelemetryEventobjects sent through the telemetry reporting pipeline, not to structured observability logs. The new fields (skew,nonce_length,max_nonce_chars,nonce_fingerprint) are safe to log without registry updates.
- backup.py: drop runtime NotBlankStr() wrapping in run_idempotent (Annotated alias is not a constructor; type validation runs through Pydantic at the boundary, not on a no-op call) - pruning/service.py stop(): cooperative drain (await wait_for(shield(task), timeout)) before escalating to task.cancel(); only cancel + mark unrestartable on TimeoutError - ngrok_adapter idempotent-start: log TUNNEL_ALREADY_ACTIVE at INFO instead of TUNNEL_ERROR at WARNING (legitimate reconnects are not failures and were triggering tunnel-failure alerts) - monitor.py stop(): replace shield+create_task drain helper with direct wait_for(shield(task)); add explicit cancel + unrestartable mark on TimeoutError; cooperative cancel handling for the no-deadline path - health_prober: add Clock seam (clock=None constructor param defaulting to SystemClock); convert _execute_probe from staticmethod to instance method to use the injected clock for monotonic timing - replay_protection _evict_locked: walk every entry instead of early-exiting on the first non-expired one (round 9's sample-outside-lock change broke the insertion-order = timestamp-order assumption under contention) - test_monitor_lifecycle.test_unrestartable_after_drain_timeout: suppress CancelledError on the post-cleanup task await (Python 3.11+ retains the cancelled state even when the coroutine catches and returns)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/controllers/backup.py`:
- Line 168: Replace the parenthesized exception tuple usage "except (ValueError,
TypeError) as exc" with the project's mandated PEP 758 comma-separated form
"except ValueError, TypeError as exc" in the backup exception handler; locate
the handler in src/synthorg/api/controllers/backup.py (the except block shown in
the diff) and update the syntax accordingly so it no longer uses parentheses.
In `@src/synthorg/hr/pruning/service.py`:
- Around line 183-225: The CancelledError handler in stop() currently treats any
CancelledError as the scheduler having drained, but that also catches external
cancellation of the stop() coroutine; change the handler to distinguish the two
by checking the run task's state (self._task or local task variable): if task is
not done (i.e., still running) re-raise the CancelledError so the external
cancellation propagates, otherwise treat it as successful drain; ensure you do
not clear self._task or recreate events when re-raising. Target the cancel
handling around the await asyncio.wait_for(asyncio.shield(task), ...) in stop(),
and use task.done()/task.cancelled() (or equivalent) on the task created by
_run_loop() to decide whether to re-raise.
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 177-183: The adapter currently emits the global TUNNEL_STARTED
event from ngrok_adapter.start() (the logger.info call that logs TUNNEL_STARTED
and returns self._public_url), which duplicates the same event emitted by
src/synthorg/api/controllers/tunnel.py after await tunnel.start(); remove or
change that logger.info to avoid double-counting: either delete the
TUNNEL_STARTED emit from ngrok_adapter.start(), or replace it with a
provider-scoped event name (e.g., NGROK_TUNNEL_STARTED) or a lower-severity log
(debug) so only the controller emits the global TUNNEL_STARTED while the adapter
keeps internal/provider-level logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: aa4acf3e-07ed-4d18-a307-e5f39aec797d
📒 Files selected for processing (8)
src/synthorg/api/controllers/backup.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/integrations/webhooks/replay_protection.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/observability/events/integrations.pysrc/synthorg/providers/health_prober.pytests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.pysrc/synthorg/providers/health_prober.pysrc/synthorg/observability/events/integrations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Mock-spec gate (#1604): everyMock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --updateand commit the diff so the change is reviewable. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test (the original "mock drift" finding from#1604).
Shared mocks:tests/conftest.pyexposesmock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch); use it instead of building the spec inline.
Time-driven tests: importFakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam (see## Code Conventionsfor the legacy-Callable list); when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.
Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor...
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/providers/health_prober.pysrc/synthorg/observability/events/integrations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/providers/health_prober.pysrc/synthorg/observability/events/integrations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/providers/health_prober.pysrc/synthorg/observability/events/integrations.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/providers/health_prober.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/api/controllers/backup.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/meta/chief_of_staff/monitor.pysrc/synthorg/integrations/webhooks/replay_protection.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/api/**/*.py: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceedapi.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...)so a connected-but-silent peer cannot indefinitely hold a slot.
WebSocket revalidation sliding window: persistence-backend failures during the periodic revalidation are tracked via a_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. A flaky persistence layer that returns one good response between every failure cluster cannot indefinitely keep stale-auth connections alive; once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/controllers/backup.py
🔇 Additional comments (26)
src/synthorg/integrations/webhooks/replay_protection.py (6)
230-248: Previous review concern is addressed.The eviction now correctly walks all entries via list comprehension (lines 246-248) instead of the previous early-exit approach. This fixes the issue where sampling
nowbefore acquiring the lock could leave expired entries behind when insertion order diverges from timestamp order under contention. The docstring thoroughly documents this design decision.
5-16: LGTM!The module docstring now correctly explains the race condition without hardcoded line numbers that would drift with edits. The
threadingimport properly supports the new concurrency controls.
92-92: LGTM!The
threading.Lockinstance correctly provides thread-safe access to the shared_seencache for concurrent webhook handlers.
110-137: LGTM!The
_check_freshness_athelper enablescheck()to pass a single clock sample through both freshness validation and nonce eviction, closing the boundary replay gap identified in past review. The publiccheck_freshness()method correctly delegates while maintaining backward compatibility for callers using durable dedup.
171-179: LGTM!Sampling
nowexactly once percheck()call and passing the same snapshot to both freshness and nonce-eviction decisions correctly eliminates the boundary replay gap from the earlier implementation.
181-228: LGTM!Thread safety is correctly implemented:
- The lock guards all accesses to
self._seen(eviction, duplicate check, insertion, overflow handling)- The
duplicatelocal variable captures state while holding the lock; reading it after release is safe- Computing the fingerprint outside the lock is correct since
_fingerprint_nonceis a pure function with no shared stateThe oversized nonce rejection at lines 199-206 aligns with the same
MAX_NONCE_CHARSguard applied in the webhook controller before durable dedup.src/synthorg/providers/health_prober.py (5)
17-18: LGTM!Imports correctly bring in the Clock seam for testability,
safe_error_descriptionfor secure error logging, andProviderLifecycleConflictErroras the domain-scoped error for lifecycle conflicts.Also applies to: 32-32
146-189: LGTM!The lifecycle infrastructure is correctly initialized:
- Clock seam follows the documented pattern with
SystemClock()default._lifecycle_lockis constructed eagerly and never replaced (per lifecycle-sync.md)._stop_failedand_stop_drain_timeout_secondssupport the unrestartable-on-timeout behavior.
191-222: LGTM!The
start()method correctly implements the canonical lifecycle pattern:
- Serializes concurrent calls via
_lifecycle_lock.- Refuses restart after a timed-out stop with a domain-scoped
ProviderLifecycleConflictError.- Logs at WARNING before raising (per guidelines).
- Idempotent when already running.
224-285: LGTM!The
stop()method is well-implemented:
- Holds
_lifecycle_lockthroughout to prevent racingstart()calls.- Shielded drain with bounded timeout; on timeout marks prober unrestartable and propagates the error.
- Recreates
_stop_eventinside the lock (fixing the past race condition).- Uses
safe_error_descriptionfor secure error logging._lifecycle_lockremains a stable instance throughout the service lifetime.
436-475: LGTM!The
_execute_probemethod now uses the injectedClockseam for timing, enabling tests to drive probe timing viaFakeClockinstead of relying on wall time. TheClockprotocol correctly definesmonotonic() -> float, andHealthProberproperly initializes it with aSystemClockdefault.src/synthorg/observability/events/integrations.py (1)
113-113: Good addition of a non-error tunnel lifecycle event.This gives the adapter a dedicated event for the idempotent already-active path instead of overloading
TUNNEL_ERROR.src/synthorg/api/controllers/backup.py (2)
88-107: LGTM!The
idempotency_keyparameter changes are well-implemented:NotBlankStrtype validates non-whitespace input at the boundary,required=Trueenforces the header presence, andmax_length=255bounds storage requirements with a clear security rationale in the comment.
143-165: LGTM!The idempotency flow correctly passes plain strings (the literal
"backup"and the already-validated header value) torun_idempotent, avoiding the previous runtimeNotBlankStr(...)call issue. The timeout path properly discriminates an in-flight claim from a legitimateNonereturn and surfaces HTTP 409 viaConflictError.src/synthorg/meta/chief_of_staff/monitor.py (6)
34-46: LGTM!The
InflectionMonitorLifecycleErrorproperly inherits fromConflictError(which is part of theDomainErrorhierarchy) and follows the domain error naming convention. TheClassVardefault_message is well-defined.
79-85: LGTM!Lifecycle primitives are correctly constructed eagerly in
__init__, following the canonical lifecycle pattern documented indocs/reference/lifecycle-sync.md. This prevents a racingstop()from observing a half-initialized lock.
87-119: LGTM!The
start()method correctly implements the canonical lifecycle pattern: serializes on_lifecycle_lock, rejects restarts after timed-out stops with a proper domain error, and is idempotent for concurrent calls. The task naming and done-callback for exception logging are good practices.
121-182: LGTM!The
stop()method correctly implements the canonical lifecycle pattern:
- Serializes on
_lifecycle_lockto prevent concurrentstart()from recreating the task mid-stop.- Uses
shieldto allow in-flight checks to complete naturally while respecting the drain timeout.- On timeout, explicitly cancels and marks unrestartable before re-raising.
- Recreates
_stop_eventinside the lock, addressing the race condition identified in earlier reviews.The exception handling properly uses
safe_error_descriptionand follows the PEP 758except MemoryError, RecursionError:pattern.
184-210: LGTM!The
_loop()method correctly useswait_for(_stop_event.wait(), timeout=interval)instead of bareasyncio.sleep(). This allows cooperative shutdown to wake the loop immediately when_stop_eventis set, rather than waiting for the full interval to elapse.
234-251: LGTM!The
_emit_to_sinks()method properly usesasyncio.TaskGroupfor parallel fan-out as recommended by the coding guidelines. The per-sink exception handling ensures one sink's failure doesn't block emissions to other sinks.tests/unit/meta/chief_of_staff/test_monitor_lifecycle.py (4)
25-36: LGTM!The
_make_monitor()helper correctly usesAsyncMock(spec=SnapshotBuilder)and setsbuilder.build.return_valueinstead of reassigning the method, addressing the past review feedback about spec-bound mocks.
41-89: LGTM!The test properly addresses the past review feedback by using a controllable
blocking_buildcoroutine to ensure the first task remains alive while concurrentstart()calls run. Theasyncio.create_taskcounting pattern and final assertionlen(spawned) == 1definitively prove the single-spawn invariant. Cleanup in thefinallyblock ensures no task leaks.
91-104: LGTM!The test includes the positive assertion
assert monitor._task is not Noneafter the secondstart(), which addresses past review feedback. This ensures a regression wherestart()silently no-ops after a clean stop would be detected.
106-156: LGTM!The test addresses all past review feedback:
- Uses
asyncio.Event().wait()instead of wall-clock-sensitiveasyncio.sleep()for the hung loop, following the guidelines for blocking indefinitely until cancelled.- Uses
entered/releaseevents for deterministic control flow.- Wraps cleanup in
finallyso a failed assertion still releases the hung loop and drains the task.- Uses
contextlib.suppress(asyncio.CancelledError)when awaiting the cancelled task, handling the expected residual cancellation state.The final assertion correctly expects
InflectionMonitorLifecycleErrorwith a message containing "unrestartable".src/synthorg/hr/pruning/service.py (2)
143-163: Canonicalstart()serialization looks good.Using a dedicated
_lifecycle_locktogether withPruningUnrestartableErrormakes concurrentstart()calls idempotent and preserves the unrestartable contract after a timed-out stop.
763-775: Good stop check after wakeup.Re-checking
_stop_eventbeforerun_pruning_cycle()avoids launching a fresh cycle after shutdown has already been requested.
| raise ConflictError(msg) | ||
| try: | ||
| manifest = BackupManifest.model_validate(outcome.result) | ||
| except (ValueError, TypeError) as exc: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Use PEP 758 except syntax without parentheses.
The project mandates PEP 758 comma-separated exception syntax without parentheses.
Suggested fix
- except (ValueError, TypeError) as exc:
+ except ValueError, TypeError as exc:As per coding guidelines, "This project uses Python 3.14+ with PEP 758 except syntax: 'except A, B:' (comma-separated, no parentheses) is correct and mandatory."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (ValueError, TypeError) as exc: | |
| except ValueError, TypeError as exc: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/api/controllers/backup.py` at line 168, Replace the
parenthesized exception tuple usage "except (ValueError, TypeError) as exc" with
the project's mandated PEP 758 comma-separated form "except ValueError,
TypeError as exc" in the backup exception handler; locate the handler in
src/synthorg/api/controllers/backup.py (the except block shown in the diff) and
update the syntax accordingly so it no longer uses parentheses.
…ly wrong) - pruning/service.py stop(): distinguish external CancelledError from drain-success. If task is still running when CancelledError is caught, the cancel was aimed at stop() (not the loop); re-raise so the external cancellation propagates instead of pretending the service drained - ngrok_adapter: emit NGROK_TUNNEL_STARTED (provider-scoped) instead of the global TUNNEL_STARTED. The tunnel controller already emits the global event after the adapter returns; emitting both inflates metrics keyed on the global event Skipped (factually wrong, logged): - backup.py:168 'use PEP 758 except ValueError, TypeError as exc' is invalid Python. CLAUDE.md explicitly says: 'as exc requires parens (except (A, B) as exc:)'. The current parenthesized form is correct.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/integrations/tunnel/ngrok_adapter.py`:
- Around line 157-163: Replace the fragile locals().get("tunnel") lookup by
explicitly initializing the tunnel variable to None before the try/cleanup block
and then use that tunnel variable (or local_tunnel) to call ngrok.disconnect; in
other words, set tunnel = None above the try, ensure any assignment to the
tunnel reference within the function updates that variable, and then call await
asyncio.to_thread(ngrok.disconnect, getattr(tunnel, "public_url", None)) inside
the cleanup so the code uses an explicit, well-scoped symbol instead of relying
on locals().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: adf51712-250d-48f6-984e-550c5f682cc8
📒 Files selected for processing (3)
src/synthorg/hr/pruning/service.pysrc/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/observability/events/integrations.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: Build Backend
- GitHub Check: CodSpeed Web benchmarks
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Dashboard Test
- GitHub Check: Dashboard Build
- GitHub Check: Dashboard Type Check
- GitHub Check: Test (Python 3.14)
- GitHub Check: Type Check
- GitHub Check: Build Preview
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{py,ts,tsx}: No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback.
Currency: never hardcode ISO 4217 codes or symbols. Backend:DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency.
Field naming: no_usdsuffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency.
Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/formatwhich readgetLocale()from@/utils/locale. The backend has no operator-tunable locale setting; backendIntlformatting uses the system locale plus the browser timezone. Thecompany.name_localeslist controls procedural-name generation only; it does not feed number / date / time formatting.
Timezone: store UTC; render viaIntlwithout passingtimeZone(browser tz wins).
Date / number format: always viaIntl; no hand-rolled templates.
Files:
src/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/observability/events/integrations.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Enforced byscripts/check_persistence_boundary.py(pre-push + CI).
Controllers and API endpoints access persistence through domain-scoped service layers (e.g.ArtifactService,WorkflowService,MemoryService,CustomRulesService,UserService,ProjectService,SsrfViolationService,SettingsService; list non-exhaustive), never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories must not log mutations themselves (enforced byscripts/check_persistence_boundary.py).
Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.
For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG.
Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError).
Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.
Type hints: all public functions, mypy strict mode.
Docstrings: Google style, required on public classes / functions (ruff D rules).
Immutability: create new objects, never muta...
Files:
src/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/observability/events/integrations.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/observability/events/integrations.py
src/synthorg/!(api)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden; domain errors flow throughEXCEPTION_HANDLERSfor centralised RFC 9457 routing.
Files:
src/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/hr/pruning/service.pysrc/synthorg/observability/events/integrations.py
src/synthorg/!(observability)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/!(observability)/**/*.py: Every business-logic module hasfrom synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Carve-outs (e.g.meta/mcp/handlers/common_logging.pykeying at a fixed string) are documented in the module docstring.
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.pyfor handler-construction / bootstrap code).
Files:
src/synthorg/integrations/tunnel/ngrok_adapter.pysrc/synthorg/hr/pruning/service.py
🔇 Additional comments (11)
src/synthorg/observability/events/integrations.py (1)
113-120: LGTM!The new event constants follow established naming conventions and are well documented. The provider-scoped
NGROK_TUNNEL_STARTEDcorrectly complements the controller-levelTUNNEL_STARTEDwithout double-counting, as confirmed by the usage inngrok_adapter.py.src/synthorg/integrations/tunnel/ngrok_adapter.py (6)
20-25: LGTM!New event constants are properly imported and used throughout the module.
48-75: LGTM!The constructor correctly handles the auth token as a bootstrap secret (the sanctioned init-time exception), and the lifecycle lock documentation appropriately explains why no drain timeout is needed for this adapter pattern.
96-112: LGTM!Idempotent reconnect path correctly uses INFO-level logging with the non-error
TUNNEL_ALREADY_ACTIVEevent, avoiding false tunnel-failure alerts for legitimate retry scenarios.
113-125: LGTM!Per-call
PyngrokConfigavoids global state mutation and keeps auth tokens instance-local, correctly addressing the process-global token overwrite concern.
174-187: LGTM!State is assigned atomically within the lock, and the provider-scoped
NGROK_TUNNEL_STARTEDevent correctly avoids double-counting with the controller'sTUNNEL_STARTED.
189-214: LGTM!The stop method correctly implements best-effort cleanup semantics: it logs disconnect failures without propagating them and unconditionally clears state to prevent stale handles from blocking subsequent operations.
src/synthorg/hr/pruning/service.py (4)
24-24: Good call using a pruning-scoped unrestartable error.This keeps the lifecycle failure on the domain error path instead of bubbling a bare runtime exception.
98-105: The dedicated lifecycle state is wired up cleanly.Keeping
_lifecycle_lockseparate from_processing_lockand tracking_stop_failed/ drain timeout here matches the canonical async lifecycle pattern.
136-231: The async start/stop flow looks solid.The lock-guarded spawn, cooperative drain, timeout-to-unrestartable transition, and external-cancellation re-raise cover the lifecycle edge cases that were risky in this area.
763-788: The scheduler loop now cooperates with shutdown correctly.Re-checking
_stop_eventbefore each cycle and preservingCancelledErrorpropagation keeps stop semantics predictable while still allowingwake()to short-circuit the sleep.
- ngrok_adapter.start(): replace fragile locals().get('tunnel') lookup with explicit tunnel: Any = None initialised before the try block. The cleanup path now references an unconditionally-defined symbol; the previous form would have raised NameError if a tooling refactor renamed 'tunnel' without updating the locals() string. Type is Any because pyngrok ships untyped stubs.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Frontend and UX polishing improves user interface responsiveness and visual consistency. - API hygiene and validation enhancements provide smoother and more reliable interactions. ### What's new - Introduced typed-boundary helpers enabling better type safety and parse_typed workflows. - Added codebase-audit skill prompt tuning for improved project auditing. ### Under the hood - Eliminated flaky tests caused by module-level state for more stable test outcomes. - Unified image tag management under CLI and Renovate for consistent dependency updates. - Added cross-PR file-overlap analysis to the review dependency pull request skill. - Updated multiple dependencies including Python, Web, CLI, and container libraries. - Improved CI tooling and lock file maintenance for better build reliability. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.7.8](v0.7.7...v0.7.8) (2026-05-03) ### Features * **api:** typed-boundary helper + codebase-audit skill prompt tuning ([#1712](#1712)) ([40ee65b](40ee65b)) * **boundary:** RFC [#1711](#1711) Phases 2 + 3 — typed boundaries via parse_typed ([#1720](#1720)) ([7b9f409](7b9f409)) ### Bug Fixes * **api:** audit cleanup B -- API hygiene & validation ([#1719](#1719)) ([3d790d9](3d790d9)) * audit cleanup C - persistence, concurrency & data integrity ([#1708](#1708)) ([#1717](#1717)) ([bcce097](bcce097)) * **test:** exterminate xdist-flaky tests with module-level state ([#1713](#1713)) ([#1721](#1721)) ([8d258dd](8d258dd)) * **web:** audit cleanup E -- frontend & UX polish ([#1710](#1710)) ([#1718](#1718)) ([3a3591a](3a3591a)) ### Refactoring * **cli:** single source of truth for DHI image tags + Renovate manager ([#1723](#1723)) ([57980a2](57980a2)) ### Documentation * audit cleanup D -- public-facing & docs sync ([#1709](#1709)) ([#1715](#1715)) ([ade03b7](ade03b7)) ### Tests * **engine:** make TestDrainTimeout deterministic + preserve subclass type in [@Ontology](https://github.com/ontology)_entity ([#1729](#1729)) ([b00fb05](b00fb05)) ### CI/CD * Update CI tool dependencies ([#1703](#1703)) ([355a9ff](355a9ff)) ### Maintenance * add cross-PR file-overlap analysis to review-dep-pr skill ([#1722](#1722)) ([3861d8a](3861d8a)) * **ci:** unify apko-version under workflow env so Renovate manages it everywhere ([#1724](#1724)) ([9c0a7fd](9c0a7fd)) * consolidate DHI image-pin custom regex managers ([#1726](#1726)) ([b8b0cba](b8b0cba)) * **deps:** update dependency chainguard-dev/melange to v0.50.4 ([#1701](#1701)) ([8cbf83a](8cbf83a)) * Lock file maintenance ([#1705](#1705)) ([414cfea](414cfea)) * Lock file maintenance ([#1727](#1727)) ([5cb1212](5cb1212)) * Update CLI dependencies ([#1702](#1702)) ([9fb57b9](9fb57b9)) * Update Container dependencies ([#1698](#1698)) ([6d24fd6](6d24fd6)) * Update dependency @eslint-react/eslint-plugin to v5 ([#1704](#1704)) ([1cb1294](1cb1294)) * Update Python dependencies ([#1699](#1699)) ([8e7af3a](8e7af3a)) * Update Python dependencies to v4.15.0 ([#1725](#1725)) ([69164c8](69164c8)) * Update Web dependencies ([#1700](#1700)) ([715300d](715300d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Closes #1708.
Implements the persistence / concurrency / data-integrity cleanup bundle from the 2026-05-01 codebase audit (
_audit/runs/2026-05-01-225703/).Summary
Lifecycle (canonical pattern per
docs/reference/lifecycle-sync.md)providers/health_prober.py,meta/chief_of_staff/monitor.py: add_lifecycle_lock+_stop_failed+ drain timeout. Cooperative loop honours_stop_event.backup/scheduler.py,hr/pruning/service.py: convert syncstart()→ async, add canonical pattern. Callers updated toawait.integrations/tunnel/ngrok_adapter.py: add_lifecycle_lockfor uniformity (no spawned task).client/continuous.py: rename_lock→_lifecycle_lock. Documents the in-place runner variant where the lock guards the_runningflag transition only (not the loop body).Race conditions (
threading.Lockfor sync methods callable from threadpool)api/auth/ticket_store.py: lock count-and-insert increate(), plus pop-and-validate and bulk eviction.tools/mcp/cache.py: lockget/put/invalidate.integrations/webhooks/replay_protection.py: lock nonce dedup incheck().Currency aggregation invariant
budget/trends.py:_assert_single_currencyinbucket_cost_recordsandproject_daily_spend(raisesMixedCurrencyAggregationErroron mixed input).Idempotency
api/controllers/simulations.py+client/store.py: new atomicSimulationStore.register_if_absentreplaces the racy get-then-save; concurrent requests with the samesimulation_idget HTTP 409 Conflict instead of both spawning runners.communication/event_stream/stream.py: per-session sliding dedup window keyed byevent.id(60s TTL, bounded per session). Cleaned up on last unsubscribe.api/controllers/backup.py:Idempotency-Keyheader is now mandatory (HTTP 400 on missing,max_length=255).Escalation factory
communication/conflict_resolution/escalation/factory.py: refactor if/elif dispatch into registry maps (MappingProxyType) for queue store and decision processor — shape mirrorsPersistenceBackendRegistry.Frontend
web/src/api/endpoints/backup.ts:createBackup()sends a generatedIdempotency-Keyheader (or accepts a caller-supplied one).web/src/api/endpoints/clients.ts:startSimulation()falls back togetSimulation()on HTTP 409 so retries observe the in-flight runner.Documentation
docs/research/lgpl-postgres-driver-decision.md(accept-with-ADR for LGPL psycopg + psycopg-pool in the optionalpostgresextra; resolves audit findings Implement Budget and Cost domain models #61 and docs: add crash recovery, sandboxing, analytics, and testing decisions #127 false-positives in passing).docs/licensing.md: third-party LGPL note for thepostgresextra.docs/reference/lifecycle-sync.md: appended the 6 newly-compliant services + in-place runner variant.docs/design/observability.md: enumeratedEVENT_STREAM_HUB_PUBLISH_DEDUPED.docs/design/backup.md,docs/design/client-simulation.md: documented the new mandatory-key and 409 contracts.docs/openapi/openapi.json: regenerated.Test plan
uv run python -m pytest tests/ -m unit -n 8.test_start_simulation_duplicate_id_returns_409).npm --prefix web run lint+type-check+test(2966 tests) all clean.check_persistence_boundary.py,check_backend_regional_defaults.py,check_forbidden_literals.py) pass.atlas migrate validate --env sqlitepasses; schema parity confirmed (the audit-flagged Postgres-only migration was already absorbed into the SQLite squashed baseline — documented in the ADR).Pre-PR review
Reviewed by 14 of 16 spawned agents (
/pre-pr-review). Triage at_audit/pre-pr-review/triage.md. CRITICAL TOCTOU finding from security-reviewer (atomic check-and-register on simulations) addressed in the follow-up commit. PEP 758 false positives from 4 agents ignored:except A, B:(no parens, no binding) is the correct Python 3.14 syntax mandated byCLAUDE.md.