fix(test): exterminate xdist-flaky tests with module-level state (#1713)#1721
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
WalkthroughThis change centralizes per-request locking in AppState (new lock registry, get/create, async acquire, and conditional eviction helpers) and moves background cost-record task ownership into CostTracker with tracking and drain APIs. Tests add three autouse fixtures to reset process-global state and update many tests to use instance-scoped drains and direct attribute spying on lazy logger proxies. Test tooling: pytest xdist scheduling switched to 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 1/5 review remaining, refill in 43 minutes and 57 seconds. Comment |
Merging this PR will improve performance by 12.62%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ⚡ | test_scrub_small_clean |
110.7 µs | 98.3 µs | +12.62% |
Comparing fix/test-isolation-hunt (c631d0a) with main (355a9ff)2
Footnotes
-
21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
main(8cbf83a) during the generation of this report, so 355a9ff was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
There was a problem hiding this comment.
Code Review
This pull request addresses test isolation regressions (Issue #1713) by migrating module-level state, including request locks and background task sets, into instance-bound registries within AppState and CostTracker. It introduces global autouse fixtures to reset shared state like structlog and internal caches between tests, and updates the test runner with an isolation regression gate using pytest-repeat. Feedback suggests refining the task drain logic to handle asyncio.CancelledError separately and improving the test fake's interface to avoid direct manipulation of internal connection state.
| if isinstance(outcome, BaseException): | ||
| # ``_record_cost_in_background`` already logs + swallows | ||
| # recoverable failures, so reaching this branch means | ||
| # something downstream raised without going through the | ||
| # documented logging path. Surface defensively at WARN | ||
| # so the regression is visible in test output rather | ||
| # than silently dropped by ``return_exceptions=True``. | ||
| logger.warning( | ||
| BUDGET_PENDING_RECORD_DRAIN_UNEXPECTED, | ||
| error_type=type(outcome).__name__, | ||
| error=safe_error_description(outcome), | ||
| ) |
There was a problem hiding this comment.
The drain_pending_records method logs unexpected exceptions at the WARNING level. While this is appropriate for surfacing regressions in tests, consider if asyncio.CancelledError should be handled separately to avoid noisy logs during graceful shutdowns or test cancellations, as it is a BaseException but often expected.
| fake_persistence = FakePersistenceBackend() | ||
| fake_persistence._connected = True |
There was a problem hiding this comment.
Manually setting _connected = True on the FakePersistenceBackend is a fragile pattern as it relies on internal implementation details of the fake. If the fake's connection logic changes, this test might break or pass incorrectly. Consider adding a public connect() method to the fake if it doesn't already exist.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/run_affected_tests.py (1)
515-519:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat
pyproject.tomlchanges as test-affecting.Right now a config-only change skips the hook completely because everything non-
.pyis discarded here. In this PR, changingpyproject.tomlwould print “No Python files changed -- skipping unit tests,” so the new pytest addopts / xdist /pytest-repeatbehavior never gets validated before push.Suggested fix
+ if "pyproject.toml" in changed: + print("pyproject.toml changed -- running full unit suite.") + return _run_pytest(["tests/unit/"], run_all=True) + # Filter to Python files only. py_changed = [f for f in changed if f.endswith(".py")] if not py_changed: print("No Python files changed -- skipping unit tests.") return 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/run_affected_tests.py` around lines 515 - 519, The current filter only treats files ending with ".py" as test-affecting (py_changed = [f for f in changed if f.endswith(".py")]), which causes config-only changes like pyproject.toml to skip tests; update the logic to consider pyproject.toml as affecting tests by either including it in the py_changed collection (e.g., include f == "pyproject.toml" or f.endswith("pyproject.toml")) or add an explicit check on the changed list for "pyproject.toml" and treat that as a trigger to run tests (so the "No Python files changed -- skipping unit tests." path is not taken when pyproject.toml changed). Ensure references to the py_changed variable and the early return remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/run_affected_tests.py`:
- Around line 13-17: Replace hardcoded issue references like "#1713" in the
module docstring and runtime banner with stable wording that describes the
failure mode or links to a persistent docs page; update the explanatory text
near the environment flag SYNTHORG_SKIP_ISOLATION_GATE and any banner generation
logic (the strings in scripts/run_affected_tests.py that mention the isolation
regression gate and the two-run pytest command) to say e.g. "an isolation
regression gate runs (double-run pytest to detect module-level state leaks)" or
point to a documented URL, and remove any "(`#NNNN`)" style references in the
other occurrences noted (lines ~450-454, ~491-494) so comments only explain
what/why not origin/issue IDs.
In `@src/synthorg/api/state_services.py`:
- Around line 456-490: get_or_create_request_lock has a race where the returned
asyncio.Lock can be evicted before the caller enters "async with", so add a
lightweight refcount like _request_lock_refs (dict) guarded by
_request_locks_guard; in get_or_create_request_lock (and where you create/reuse
an entry in _request_locks) increment _request_lock_refs[request_id] before
returning the Lock, and ensure the caller decrements the refcount when it
actually acquires/releases the lock (provide a helper or require using an async
context manager wrapper that decrements on enter/exit). Update
_evict_idle_request_locks_locked to only evict an entry when both not
lock.locked() and _request_lock_refs.get(request_id, 0) == 0, and keep all
operations touching _request_locks and _request_lock_refs under
_request_locks_guard; reference functions/attrs: get_or_create_request_lock,
_evict_idle_request_locks_locked, _request_locks, _request_locks_guard, and
_MAX_REQUEST_LOCKS.
In `@tests/unit/api/controllers/test_departments_health.py`:
- Line 110: Remove the in-code issue reference "(`#1713`)" from the docstring in
the test_departments_health module (tests unit module named
test_departments_health); locate the docstring that contains "(`#1713`)" and
delete that substring so the comment/docstring no longer references an issue
number, leaving the rest of the explanatory text intact or rephrasing it to be
issue-free.
In `@tests/unit/api/controllers/test_memory_admin.py`:
- Around line 252-255: The comment in
tests/unit/api/controllers/test_memory_admin.py uses an internal shorthand
"SEC-1" instead of a clear rationale; update the comment around the assertion
that "exc_info" is not in kwargs to explicitly state that including the full
traceback would bypass safe_error_description and could leak filesystem paths or
backend metadata (sensitive environment details), so exc_info must not be
set—refer to the assertion and the safe_error_description behavior to make the
safety rationale obvious to future readers.
In `@tests/unit/api/test_state.py`:
- Around line 450-474: Add an async unit test (e.g.,
test_eviction_preserves_held_lock) that patches _ss._MAX_REQUEST_LOCKS to a
small value (3), creates a state via _make_state, obtains and acquires the
oldest lock via state.get_or_create_request_lock("oldest") using an async with
block, then fill the registry to the cap with additional keys and call
state.get_or_create_request_lock("trigger") to force eviction; assert that
"oldest" remains in state._request_locks and that the stored object is the same
instance as oldest_lock to verify _evict_idle_request_locks_locked preserves
held locks. Ensure you import synthorg.api.state_services as _ss and use
state._request_locks for the assertions.
In `@tests/unit/tools/sandbox/test_docker_config.py`:
- Around line 104-110: Edit the inline comment in
tests/unit/tools/sandbox/test_docker_config.py that explains why direct setattr
+ try/finally delattr is used for module.logger (a BoundLoggerLazyProxy) to
remove the repository tracker reference "(issue `#1713`)" while keeping the
explanatory rationale about __getattr__, monkeypatch.setattr restoring __dict__
and breaking capture_logs; ensure the comment still mentions module.logger,
BoundLoggerLazyProxy, __getattr__, monkeypatch.setattr, and capture_logs so the
reasoning remains clear but contains no issue/PR ID.
---
Outside diff comments:
In `@scripts/run_affected_tests.py`:
- Around line 515-519: The current filter only treats files ending with ".py" as
test-affecting (py_changed = [f for f in changed if f.endswith(".py")]), which
causes config-only changes like pyproject.toml to skip tests; update the logic
to consider pyproject.toml as affecting tests by either including it in the
py_changed collection (e.g., include f == "pyproject.toml" or
f.endswith("pyproject.toml")) or add an explicit check on the changed list for
"pyproject.toml" and treat that as a trigger to run tests (so the "No Python
files changed -- skipping unit tests." path is not taken when pyproject.toml
changed). Ensure references to the py_changed variable and the early return
remain consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cad69896-b1dc-456b-91cf-d2039ffc20e4
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
CLAUDE.mdpyproject.tomlscripts/mock_spec_baseline.txtscripts/run_affected_tests.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/api/state.pysrc/synthorg/api/state_services.pysrc/synthorg/budget/tracker.pysrc/synthorg/observability/events/api.pysrc/synthorg/observability/events/budget.pysrc/synthorg/providers/cost_recording.pytests/conftest.pytests/unit/a2a/test_well_known.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_memory_admin.pytests/unit/api/test_module_cache_isolation.pytests/unit/api/test_state.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/observability/conftest.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/providers/test_cost_recording_chokepoint.pytests/unit/settings/test_service.pytests/unit/telemetry/test_event_counter_race.pytests/unit/tools/sandbox/test_docker_config.py
💤 Files with no reviewable changes (2)
- tests/unit/observability/conftest.py
- tests/unit/a2a/test_well_known.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Doc Drift Gate
- GitHub Check: Build Backend Base (apko)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{md,txt}
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2) for architecture diagrams, nested container layouts, and complex entity relationships. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, and pipelines. Use Markdown tables for grid/matrix data that is semantically tabular. Never use```text` blocks with ASCII/Unicode box-drawing characters for diagrams.
Files:
CLAUDE.mdscripts/mock_spec_baseline.txt
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Read the relevant
docs/design/page before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical and actively look for design improvements; surface improvements as suggestions, not silent changes.
Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations (
pre-PR review#N``), in-code issue/PR back-references ((#NNNN`)`), cryptic internal-taxonomy shorthand (e.g. `SEC-1` naked), migration/rebrand framing (`ported from`), round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for upstream bugs (with stable URL, not internal back-refs), and why a non-obvious choice was made framed in terms the next reader can verify.No
from __future__ import annotations; Python 3.14 has PEP 649.PEP 758 except syntax: use
except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:). Ruff enforces this on 3.14.Type hints: all public functions must have type hints. Use mypy strict mode.
Docstrings: Google style, required on public classes and functions. Ruff enforces D rules.
Immutability: create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use
copy.deepcopy()at construction +MappingProxyTypewrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).Config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (
model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.Pydantic v2 conv...
Files:
tests/unit/tools/sandbox/test_docker_config.pytests/unit/api/test_module_cache_isolation.pysrc/synthorg/api/state.pytests/unit/telemetry/test_event_counter_race.pytests/unit/api/controllers/test_memory_admin.pytests/unit/settings/test_service.pytests/unit/providers/management/test_probe_cost_recording.pyscripts/run_affected_tests.pysrc/synthorg/observability/events/budget.pysrc/synthorg/api/state_services.pysrc/synthorg/api/controllers/requests.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/cost_recording.pytests/unit/providers/test_cost_recording_chokepoint.pysrc/synthorg/budget/tracker.pytests/unit/api/test_state.pytests/unit/api/controllers/test_departments_health.pytests/conftest.py
**/*.{py,ts,tsx,go,sh}
📄 CodeRabbit inference engine (CLAUDE.md)
Never use
cdin commands; the working directory is already set to the project root. Use absolute paths or run commands directly. Exception:bash -c "cd <dir> && <cmd>"is safe (runs in a child process). Usego -C cli(nevercd cli). Never use Bash to write or modify files; use the Write or Edit tools instead.
Files:
tests/unit/tools/sandbox/test_docker_config.pytests/unit/api/test_module_cache_isolation.pysrc/synthorg/api/state.pytests/unit/telemetry/test_event_counter_race.pytests/unit/api/controllers/test_memory_admin.pytests/unit/settings/test_service.pytests/unit/providers/management/test_probe_cost_recording.pyscripts/run_affected_tests.pysrc/synthorg/observability/events/budget.pysrc/synthorg/api/state_services.pysrc/synthorg/api/controllers/requests.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/cost_recording.pytests/unit/providers/test_cost_recording_chokepoint.pysrc/synthorg/budget/tracker.pytests/unit/api/test_state.pytests/unit/api/controllers/test_departments_health.pytests/conftest.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test regression (MANDATORY): NEVER delete, skip, or mark tests
xfailto fix slowness. NEVER use--no-verifyto bypass pre-push hooks. NEVER modifytests/baselines/unit_timing.json. First run: identify slow tests viauv run python -m pytest tests/unit/ -m unit -n 8 --durations=50 --durations-min=0.5 -q. Then compare against baseline. If suite time exceedsbaseline * 1.3, this is a source code regression; fix the source code, not the tests.Testing markers:
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.Mock-spec gate: every
Mock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass(Protocol or class). Pre-commit gate (scripts/check_mock_spec.py) blocks new bare sites. Shared mocks: usemock_dispatcherfromtests/conftest.pyinstead of building inline.Time-driven tests: import
FakeClockfromtests._shared.fake_clockand inject into the class under test. Useclock.advance(...)/await clock.advance_async(...)to drive virtual time. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number). FakeClock-first: patchtime.monotonic()/asyncio.sleep()at module level only for legacy code paths without aClockseam.Heap-ceiling tests (peak-heap assertions via
tracemalloc) live undertests/unit/perf/and are marked@pytest.mark.unitbecause they are real assertions on captured peak-heap, not throughput measurements.Coverage: 80% minimum (enforced in CI; benchmarks excluded via
--ignore=tests/benchmarks/). Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncio. Timeout: 30 seconds per test (global; non-default overrides liketimeout(60)allowed). Parallelism:pytest-xdistvia-n 8, distribution--dist=loadfile. ALWAYS include-n 8locally, never run tests sequentially. Isolation regression gate: affected-tests pre-push runner runs affected subset twice via `pytest-...
Files:
tests/unit/tools/sandbox/test_docker_config.pytests/unit/api/test_module_cache_isolation.pytests/unit/telemetry/test_event_counter_race.pytests/unit/api/controllers/test_memory_admin.pytests/unit/settings/test_service.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/providers/test_cost_recording_chokepoint.pytests/unit/api/test_state.pytests/unit/api/controllers/test_departments_health.pytests/conftest.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/tools/sandbox/test_docker_config.pytests/unit/api/test_module_cache_isolation.pytests/unit/telemetry/test_event_counter_race.pytests/unit/api/controllers/test_memory_admin.pytests/unit/settings/test_service.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/providers/test_cost_recording_chokepoint.pytests/unit/api/test_state.pytests/unit/api/controllers/test_departments_health.pytests/conftest.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Regional defaults (MANDATORY): No default may privilege a single region, currency, or locale. Backend: use
DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Never hardcode ISO 4217 codes or symbols; never use_usdsuffix on money fields. Every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raisesMixedCurrencyAggregationError, HTTP 409).Configuration precedence (MANDATORY): For every mutable setting, resolve through: DB > env (
SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default. Resolve viaSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedwithsource+yaml_path; subsequent reads stay at DEBUG. Do NOT use directos.environ.get(...)outside startup. New settings register insrc/synthorg/settings/definitions/<namespace>.py.Domain error class naming: error classes in domain modules use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden.Repository CRUD vocabulary: persistence repositories use
save(entity) -> None(insert-or-update, idempotent),get(id) -> Entity | None(None on miss, never raises),delete(id) -> bool(True on removal, False if absent),list_items(...) -> tuple[Entity, ...](paginated/filtered), andquery(...) -> tuple[Entity, ...]. Query methods always return tuples, never lists.Logging (MANDATORY): Every business-logic module has
from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Never useimport logging/logging.getLogger()/print()in application code (exception:observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py). Event names always import constants fromsynthorg.observability.events.<domain>; never use string lite...
Files:
src/synthorg/api/state.pysrc/synthorg/observability/events/budget.pysrc/synthorg/api/state_services.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/cost_recording.pysrc/synthorg/budget/tracker.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/state.pysrc/synthorg/observability/events/budget.pysrc/synthorg/api/state_services.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/cost_recording.pysrc/synthorg/budget/tracker.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T07:38:14.533Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T07:38:14.533Z
Learning: Post-implementation (MANDATORY): After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T07:38:14.533Z
Learning: Pre-PR review (MANDATORY): NEVER create a PR directly via `gh pr create` (blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but runs automated checks. After the PR exists, use `/aurelio-review-pr` for external reviewer feedback. Fix everything valid, never skip; no deferring or out-of-scope skipping.
🪛 LanguageTool
CLAUDE.md
[style] ~213-~213: Consider using “who” when you are referring to a person instead of an object.
Context: ...r__is shadowed. Use a context manager that wraps directsetattr+try/finally d...
(THAT_WHO)
🔇 Additional comments (30)
src/synthorg/observability/events/api.py (1)
234-237: LGTM!The new event constant follows the established naming convention and domain namespace pattern. It correctly supports the observability requirement for logging when lock release is skipped due to the lock still being held.
src/synthorg/api/state_services.py (3)
106-112: LGTM!The defence-in-depth cap is well-documented with a clear rationale for the 10k limit. The docstring explains the edge case (scope-but-never-advance) that necessitates the bound.
492-506: LGTM!The eviction logic correctly:
- Snapshots keys before iteration to avoid mutation during traversal
- Checks
lock.locked()to preserve in-flight locks- Uses insertion order (oldest-first) via
OrderedDictThe potential gap (locks returned but not yet acquired) is noted in the previous comment.
508-536: LGTM!The release method correctly:
- Documents the contract (call after releasing the lock)
- Handles contract violations gracefully with a DEBUG log (not ERROR, since the no-op is safe)
- Uses the proper event constant for observability
src/synthorg/api/state.py (3)
9-10: LGTM!The imports are correctly added at the top level (not under
TYPE_CHECKING) since boththreading.LockandOrderedDictare instantiated at runtime in__init__.
235-236: LGTM!The
__slots__entries are correctly added in alphabetical order within the existing slot list, following the file's convention.
472-491: LGTM!The docstring thoroughly explains:
- The isolation rationale (xdist workers /
--count Nrepeat runs)- Why
threading.Lockguardsasyncio.Lockcreation- Why
OrderedDictenables O(1) oldest-first eviction- The defence-in-depth cap for abandoned requests
The initialization correctly uses
OrderedDict()(notdict) to preserve insertion order for eviction semantics.tests/unit/api/test_state.py (2)
356-411: LGTM!The test class provides solid coverage of the core lock registry behaviors:
- Identity caching ensures serialization works
- Per-AppState isolation validates the xdist fix
- Release semantics tests cover both idle and held states
- Edge case for unknown IDs ensures defensive behavior
413-448: LGTM!The concurrent creation test (
test_concurrent_create_returns_same_lock) effectively validates the double-checked locking pattern by racing 16 threads. The assertion that all returned locks share identity confirms the guard works correctly.src/synthorg/api/controllers/requests.py (3)
183-183: LGTM!
scope_requestcorrectly acquires the per-request lock but does not release it, sinceSCOPINGis a non-terminal state. The subsequentapprove_requestorreject_requestwill reuse the same lock instance from the registry.
265-292: LGTM!
approve_requestcorrectly:
- Serializes the get/check/save critical section under the lock
- Publishes the WS event inside the lock (preserving ordering)
- Calls
release_request_lock_if_idleafter exiting theasync withblock, satisfying the documented contract
311-336: LGTM!
reject_requestfollows the same correct pattern asapprove_request-- terminal transition with release outside the lock scope.tests/unit/settings/test_service.py (2)
936-977: LGTM! Canonical logger spy pattern correctly implemented.The context manager properly avoids the
monkeypatch.setattrantipattern onBoundLoggerLazyProxyby using direct attribute assignment anddelattrcleanup. The docstring thoroughly documents why this pattern is necessary (preventing stale bound method caching that bypassesstructlog.testing.capture_logs()).
1135-1141: LGTM! Clean migration to context manager pattern.The test correctly uses the new
_logger_info_spycontext manager, properly scoping the spy lifetime and asserting on captured events within the context block.tests/conftest.py (4)
114-116: LGTM! Per-worktree namespacing prevents concurrent access conflicts.Namespacing the shared Hypothesis examples directory by
Path.cwd().resolve().nameisolates pre-push runs across concurrent worktrees on the same machine, avoiding WindowsWinError 32sharing violations.
455-491: LGTM! Structlog isolation fixture correctly scoped.The fixture appropriately resets structlog defaults and sets root logger level to
NOTSET(notWARNING) so INFO events from production code aren't silently filtered. The docstring clearly explains why handler cleanup is intentionally omitted (observability tests have their own broader reset).
494-509: LGTM! A2A card cache isolation fixture.The before-and-after
clear()pattern ensures both that each test starts with an empty cache and that mutations don't leak to the next test in the same xdist worker.
512-530: LGTM! Prometheus label snapshot isolation fixture.Correctly resets the label validator snapshot before and after each test to prevent cross-test pollution from
PrometheusCollector.refresh()calls.tests/unit/api/test_module_cache_isolation.py (1)
1-91: LGTM! Well-designed regression coverage for global cache resets.The tests are strategically placed outside
tests/unit/a2a/andtests/unit/observability/to catch any future refactor that accidentally re-scopes the autouse fixtures to those directories. The pollution-leak test pairs verify that mutations in one test don't affect the next.One minor observation: these tests are marked
asyncbut don't perform any async operations. This is harmless underasyncio_mode = "auto"but could be simplified to sync if preferred.src/synthorg/budget/tracker.py (2)
578-595: LGTM! Clean pending task tracking with self-eviction.The
track_pending_recordmethod correctly usesadd_done_callbackto auto-remove completed tasks, ensuring the set never grows unbounded. Per-instance ownership onCostTrackerprovides automatic xdist test isolation.
597-634: LGTM! Robust drain implementation with proper exception handling.The drain correctly:
- Snapshots the set before awaiting to avoid mutation-during-iteration
- Uses
return_exceptions=Trueto collect all outcomes- Re-raises
MemoryError/RecursionError(interpreter-fatal signals)- Logs unexpected exceptions at WARNING with
safe_error_description(SEC-1 compliant)src/synthorg/providers/cost_recording.py (1)
382-387: LGTM! Task tracking correctly delegated to CostTracker instance.Moving task tracking from module-level to
CostTrackerinstance provides automatic xdist test isolation — each test's tracker owns its own pending task set, preventing cross-test leakage of tasks bound to closed event loops.tests/unit/providers/management/test_probe_cost_recording.py (2)
80-80: LGTM! Correctly migrated to instance-based draining.The test properly awaits
tracker.drain_pending_records()before asserting on tracker state, ensuring background recording tasks have completed.
120-122: LGTM! Clear documentation of no-tracker contract.The comment correctly explains that when no tracker is passed, no cost-recording scope is opened, so there's nothing to drain.
tests/unit/hr/performance/test_llm_judge_quality_strategy.py (2)
403-403: LGTM! Correctly migrated to instance-based draining.
454-454: LGTM! Consistent drain pattern across cost tracking tests.tests/unit/telemetry/test_event_counter_race.py (1)
69-115: LGTM! Correctly implements logger spy pattern without monkeypatch.The implementation properly avoids the
monkeypatch.setattrantipattern onBoundLoggerLazyProxyby using direct attribute assignment and cleanup viadelin afinallyblock. The detailed comment explains the underlying issue clearly.tests/unit/providers/test_cost_recording_chokepoint.py (2)
121-278: Good migration to per-tracker draining for deterministic isolation.Using
await tracker.drain_pending_records()on the instance under test removes dependence on module-level state and makes these tests stable under repeat/xdist execution.
353-474: Strong regression coverage for pending-task isolation and drain contracts.The new suite cleanly pins per-tracker ownership, repeated-drain behavior, fatal-error propagation, and defensive logging for unexpected exceptions.
tests/unit/api/controllers/test_departments_health.py (1)
94-127: Nice isolation fix in_build_dept_client.Creating and seeding a fresh
FakePersistenceBackendper call is a solid way to prevent cross-test state bleed in this endpoint suite.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1721 +/- ##
=======================================
Coverage 84.70% 84.70%
=======================================
Files 1786 1786
Lines 102259 102311 +52
Branches 8980 8991 +11
=======================================
+ Hits 86614 86665 +51
Misses 13457 13457
- Partials 2188 2189 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Part of #1713 test isolation hunt. Pre-flight setup for the debugging window: - Flip .claude/hookify.enforce-parallel-tests.md to enabled:false so reconnaissance runs (--count 5, -n 0, -n 4) are not blocked. Re-enabled in the cleanup phase before the PR lands. - Add pytest-repeat==0.9.4 to the test dependency group so --count N becomes available for surfacing intermittent isolation flakes.
Part of #1713 test isolation hunt. The cost-recording chokepoint held a process-level set of strong references to in-flight background recording tasks; under pytest-xdist a task scheduled in test N could outlive the test's event loop and leak into test N+1's loop, surfacing as 'Task attached to a different loop' on the next read. Move ownership to CostTracker (one per AppState, fresh per test) so isolation comes for free without threading the manager through 30+ cost_recording_scope callsites. CostTracker is the natural sink: it already receives the recorded values; tracking the in-flight tasks that feed it keeps the strong-ref discipline next to the data flow it protects. Tests construct fresh trackers per test, so the per-instance set is automatically per-test. - Delete module-level _pending_record_tasks set and the test-only drain_pending_cost_records() free function from cost_recording.py. - Add CostTracker.track_pending_record() and async drain_pending_records() with the same self-eviction (add_done_callback) discipline and the same MemoryError/RecursionError propagation rule. - Update emit_cost_record_from_context() to call ctx.cost_tracker.track_pending_record(task). - Update the 13 test callsites of drain_pending_cost_records() to call the per-tracker drain. - Add TestPendingRecordIsolation regression class: - drain leaves pending set empty - pending set is per-tracker (no cross-tracker bleed) - repeated emission on same tracker drains clean (mirrors --count 2) - Mock-spec baseline: two pre-existing bare-mock entries in test_llm_judge_quality_strategy.py shift by one line because an import was removed; refresh via scripts/check_mock_spec.py --update. Verified: 90/90 tests pass under -n 0 --count 2 and -n 8 --count 2; mypy clean.
Part of #1713 test isolation hunt. The request controller held a process-level dict of asyncio.Lock objects keyed by request id; under pytest-xdist a Lock created in test N's event loop could be reused in test N+1's loop, surfacing as 'Task attached to a different loop' on the first acquire. Move ownership to AppState so each app instance (one per test) has its own dict. The helpers move to AppStateServicesMixin as get_or_create_request_lock() and release_request_lock_if_idle(), following the same encapsulation contract as the existing ticket store and presence accessors. Tests construct fresh AppState per test, so the per-instance dict is automatically per-test. - Add _request_locks (dict[str, asyncio.Lock]) and _request_locks_guard (threading.Lock) slots + init in api/state.py. - Add type annotations and the two helper methods to api/state_services.py mixin. - Delete module-level _REQUEST_LOCKS / _REQUEST_LOCKS_GUARD and the _lock_for_request / _release_request_lock free functions from api/controllers/requests.py. - Update scope_request, approve_request, reject_request to call the AppState methods. - Add TestAppStateRequestLocks regression class: - same id returns the same Lock identity - locks are per-AppState (no cross-state bleed) - release evicts idle locks - release keeps locked entries (no waiter strand) - release is a no-op for unknown ids - repeat create/release cycle drains clean (mirrors --count 2) Verified: 102/102 unit-state tests pass under -n 0 --count 2; 3/3 TestRequestController integration tests still pass; mypy + ruff clean.
Part of #1713 test isolation hunt. Two subsystems hold a process-global cache by design (the Agent Card cache and the Prometheus label-validator snapshot). Both had local autouse-reset fixtures scoped to a single test directory: - synthorg.a2a.well_known._card_cache was cleared only inside tests/unit/a2a/test_well_known.py. - synthorg.observability.prometheus_labels._snapshot was reset only inside tests/unit/observability/conftest.py. When a test outside those directories imports the subsystem, it observes whatever the prior test in the same xdist worker last wrote. Promote both resets to tests/conftest.py so they fire for every unit test. Both fixtures are O(1) (a dict .clear() and a single module-global rebind) so the suite-wide cost is negligible. - Add _reset_a2a_card_cache and _reset_prometheus_label_snapshot autouse fixtures to tests/conftest.py. - Remove the local autouse from tests/unit/a2a/test_well_known.py and the (now duplicate) _reset_label_snapshot from tests/unit/observability/conftest.py. - Add tests/unit/api/test_module_cache_isolation.py with two regression classes that assert the caches are empty when accessed outside their original directory, plus a populate-then-clear pair that would only pass when the fixture fires globally. Verified: 30/30 isolation tests pass under -n 0 --count 5 with random ordering; 100/100 pass when combined with the original a2a + prometheus test suites under --count 2.
Part of #1713 test isolation hunt. The shared Hypothesis failure log at ~/.synthorg/hypothesis-examples/ was the only filesystem path under Path.home() that the unit suite writes to and that crosses worktree boundaries. Pre-push hooks across multiple concurrent worktrees on the same machine all wrote to the same directory tree; on Windows, two pytest sessions writing to the same hypothesis-examples folder can hit WinError 32 sharing-violation races during DirectoryBasedExampleDatabase save. Append Path.cwd().resolve().name to the path so each worktree gets its own subdirectory: ~/.synthorg/hypothesis-examples/WORKTREE_BASENAME/. The path stays under ~/.synthorg/ so every worktree failure log still survives git worktree remove (the original reason the dir lives outside the working tree). The OSError fallback to local-only DB is unchanged. Verified: tests/unit/api/test_module_cache_isolation.py runs green; the new synthorg-wt-test-isolation-hunt subdirectory is created on first run.
Part of #1713 test isolation hunt. Two changes that close the loop on the cross-test isolation contract. 1. structlog defaults reset (Vector C, surfaced during reconnaissance) structlog defaults (processors, wrapper class, context-vars binding) are process-level. A test that calls structlog.configure() or holds structlog.testing.capture_logs() open across an unexpected exit leaves residual state for the next test in the same xdist worker. The canonical symptom hit during reconnaissance was the tests/unit/settings/test_source_resolution_log.py failure described in #1713: 5/6 tests pass alone but fail under -n 8 because a prior observability test left structlog wired to a filter that swallows the INFO emission the settings code makes. Promote the existing clear_logging_state() helper to a global autouse _reset_structlog_state fixture in tests/conftest.py so every unit test starts and ends on library defaults. The tests/unit/observability/conftest.py local _reset_logging is now redundant; it stays for documentation but is shadowed by the global. Verified: 5457/5457 unit tests across settings, observability, api, a2a, providers pass under -n 8 --dist loadscope (the previous order surfaced 6 failures). 2. Isolation regression gate (scripts/run_affected_tests.py) After a green primary run on affected paths, run pytest --count 2 -x over the same selection. A test that passes once but fails on repeat almost always means a fixture leaked process-global state. Catches a new module-level-state offender the moment it lands, before push. - Skipped when run_all=True (full-suite + count would double a multi-minute run for marginal gain on routine pushes). - Skipped when paths is empty. - Operator escape hatch: SYNTHORG_SKIP_ISOLATION_GATE=1 (per-push, not durable; the primary run still gates first-run correctness). - On regression: prints a clear ISOLATION REGRESSION banner with a pointer to #1713 for canonical offenders + fix patterns.
Part of #1713 test isolation hunt. Cleanup phase. - Re-enable hookify.enforce-parallel-tests (flipped to enabled:false in the Phase 0 setup commit so reconnaissance and per-fix verification could run --count 5 / -n 0 / -n 4 variants). The fixes are committed and verified; the parallel-tests hook returns to its normal duty of blocking pytest invocations missing -n 8. - Register the @pytest.mark.serial marker in pyproject.toml so future tests that legitimately cannot run under xdist (the issue calls this out as out of scope for the current PR) have a sanctioned escape hatch instead of relying on undocumented marker strings. The marker description carries a do-not-apply-without-approval note so it is not silently used to paper over isolation regressions the gate at scripts/run_affected_tests.py would otherwise catch.
Refinements to the structlog reset fixture and Hypothesis CI profile introduced in commit 0325ba6 after running --count 2 across the full unit suite (52,332 tests). The first pass closed all stdlib root handlers in addition to resetting structlog defaults; tests that import synthorg.observability at module load time hold long-lived handler references, so the close-on-teardown made a second iteration of the same test see closed handlers. Narrow the reset to structlog state only (defaults + contextvars). Observability tests retain their dedicated _reset_logging autouse in tests/unit/observability/conftest.py for the broader stdlib-root reset they actually need. The Hypothesis ci profile now also suppresses HealthCheck.differing _executors, a known false positive when pytest-repeat runs the same property test method from two separate collection items. The suppression is safe because our property tests use derandomize=True (fixed seed) and the example database is write-only via _WriteOnlyDatabase, so the warned-against database-persistence race cannot apply. Validated: full-suite --count 2 dropped from 189 failures (after the first pass) to 26 (after this refinement). Remaining residual flakes are deeper xdist-loadscope-affinity issues that require per-test investigation beyond the four confirmed module-level offenders fixed earlier in this PR.
Two final isolation refinements after running the affected-tests runner end-to-end on the rebased branch. 1. Stdlib root level reset (tests/conftest.py) Production setup_logging() may set logging.root.level to WARNING; a later test that uses structlog.testing.capture_logs() then sees only DEBUG events because the BoundLogger writes through stdlib, which filters INFO out at the root level. The narrower _reset_structlog_state autouse from commit 7910c6d touched only structlog defaults and contextvars; extend it to reset logging.root.level back to NOTSET (without closing handlers, which broke 822 tests in earlier iterations). Settles the canonical tests/unit/settings/test_source_resolution_log.py and tests/unit/settings/test_service.py::test_emits_audit_event failures referenced in #1713. 2. Fresh fake_persistence per _build_dept_client call (tests/unit/api/controllers/test_departments_health.py) _build_dept_client previously took the session-scoped fake_persistence fixture; settings written by other tests through _shared_app consumers leaked into the config-resolver lookup, producing spurious 404s on departments declared in the test config. Construct a fresh FakePersistenceBackend inside the helper instead, ignoring the shared fixture. Drops the fake_persistence parameter from the helper and from all five test method signatures that called it. Verified: full unit suite (26,329 tests) passes under -n 8 --dist loadscope across multiple runs; mypy + ruff clean. Mock-spec baseline refreshed for the bare AsyncMock at line 504 (was 503).
Pre-reviewed by 16 agents, 11 findings addressed. Root cause: 4 sites used monkeypatch.setattr(module.logger, level, spy) on a BoundLoggerLazyProxy. The proxy serves log methods via __getattr__ (per-call rebind); monkeypatch snapshots the bound method getattr returns and restores it via setattr at undo, permanently caching a stale bound method into proxy.__dict__ and shadowing __getattr__ for the lifetime of the proxy. Subsequent capture_logs() calls then route around the cached BoundLogger (which holds a stale processor list), so events bypass the capture buffer. The 5/6 settings tests in test_source_resolution_log.py + the agent test_factory all trace to this. Fix sites (direct setattr + try/finally delattr): tests/unit/settings/test_service.py (_logger_info_spy contextmgr), tests/unit/telemetry/test_event_counter_race.py, tests/unit/api/controllers/test_memory_admin.py, tests/unit/tools/sandbox/test_docker_config.py. Triage findings implemented: request_locks dict bounded by _MAX_REQUEST_LOCKS=10000 with idle-only LRU eviction; release_request_lock_if_idle now logs DEBUG when caller still holds the lock; drain_pending_records logs WARN on unexpected non-fatal exceptions; 3 fixtures unified to Iterator[None] yield form; 3 sync tests no longer marked async def; 2 propagation tests for MemoryError/RecursionError; 1 thread-race test for the DCL guard; 1 eviction-cap test; 1 drain_unexpected log-capture test; CLAUDE.md Testing section now documents pytest-repeat + --dist=loadfile + the monkeypatch-on-LazyProxy antipattern; 2 test comments trimmed to technical WHY; conftest.py baseline-loader docstring rewritten. Verification: ruff + mypy --strict clean across 3616 source files; pytest -m unit -n 8 --dist=loadfile passes 26343/26343 in ~95s; canonical 6-test --count 5 passes 35/35. Closes #1713
Round 1 of /babysit-pr applied to PR #1721: - state_services.py: add acquire_request_lock async context manager that bumps a per-id refcount before returning the Lock; eviction sweep now skips entries with refs > 0, closing the window between get_or_create_request_lock returning and the caller entering async with where the next caller would otherwise mint a different Lock for the same id. - budget/tracker.py: drain_pending_records re-raises asyncio.CancelledError instead of WARN-logging it; cancellation is the expected outcome of graceful shutdown, not a regression. - run_affected_tests.py: pyproject.toml changes now trigger the full unit suite so pytest config edits ship verified. - fakes_backend.py: FakePersistenceBackend.mark_connected() public sync helper replaces private _connected mutation in test scaffolding. - conftest.py, test_*.py: strip in-code issue/PR back-references and SEC-1 shorthand; rewrite docstrings to explain failure modes in terms the next reader can verify against the code. - test_state.py: add test_eviction_preserves_held_lock and test_eviction_preserves_referenced_lock covering both the lock.locked() and refcount eviction-skip branches. - test_cost_recording_chokepoint.py: add test_drain_propagates_cancellation_without_warning.
9d05c14 to
c631d0a
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/api/controllers/requests.py (1)
183-237:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRelease request-lock entries on failure paths too.
acquire_request_lock()registers everyrequest_id, including unknown IDs.scope_request()never evicts those entries onNotFoundError/ConflictError, andapprove_request()/reject_request()only evict on the success path. A stream of bad IDs or mid-handler failures will keep idle locks around until the 10k sweep and can churn legitimate retained entries.Suggested cleanup pattern
async def scope_request( self, request: Request[Any, Any, Any], state: State, request_id: str, data: ScopingPayload, ) -> ApiResponse[ClientRequest]: @@ - async with app_state.acquire_request_lock(request_id): - try: - stored = await sim_state.request_store.get(request_id) - except KeyError as exc: - msg = f"Request {request_id!r} not found" - raise NotFoundError(msg) from exc - if stored.status not in { - RequestStatus.SUBMITTED, - RequestStatus.TRIAGING, - }: - msg = ( - f"Request {request_id!r} cannot be scoped from " - f"status {stored.status.value!r}" - ) - raise ConflictError(msg) - metadata = dict(stored.metadata) - metadata["scoping_notes"] = data.notes - requirement = stored.requirement - overrides: dict[str, Any] = {} - if ( - data.refined_title is not None - or data.refined_description is not None - or data.refined_acceptance_criteria is not None - ): - overrides["requirement"] = requirement.model_copy( - update={ - k: v - for k, v in { - "title": data.refined_title, - "description": data.refined_description, - "acceptance_criteria": data.refined_acceptance_criteria, - }.items() - if v is not None - }, - ) - walked = stored - if walked.status == RequestStatus.SUBMITTED: - walked = walked.with_status( - RequestStatus.TRIAGING, - metadata=metadata, - ) - scoped = walked.with_status( - RequestStatus.SCOPING, - metadata=metadata, - **overrides, - ) - await sim_state.request_store.save(scoped) - _publish(request, WsEventType.REQUEST_SCOPED, scoped) + keep_lock = False + try: + async with app_state.acquire_request_lock(request_id): + try: + stored = await sim_state.request_store.get(request_id) + except KeyError as exc: + msg = f"Request {request_id!r} not found" + raise NotFoundError(msg) from exc + if stored.status not in { + RequestStatus.SUBMITTED, + RequestStatus.TRIAGING, + }: + msg = ( + f"Request {request_id!r} cannot be scoped from " + f"status {stored.status.value!r}" + ) + raise ConflictError(msg) + metadata = dict(stored.metadata) + metadata["scoping_notes"] = data.notes + requirement = stored.requirement + overrides: dict[str, Any] = {} + if ( + data.refined_title is not None + or data.refined_description is not None + or data.refined_acceptance_criteria is not None + ): + overrides["requirement"] = requirement.model_copy( + update={ + k: v + for k, v in { + "title": data.refined_title, + "description": data.refined_description, + "acceptance_criteria": data.refined_acceptance_criteria, + }.items() + if v is not None + }, + ) + walked = stored + if walked.status == RequestStatus.SUBMITTED: + walked = walked.with_status( + RequestStatus.TRIAGING, + metadata=metadata, + ) + scoped = walked.with_status( + RequestStatus.SCOPING, + metadata=metadata, + **overrides, + ) + await sim_state.request_store.save(scoped) + _publish(request, WsEventType.REQUEST_SCOPED, scoped) + keep_lock = True + finally: + if not keep_lock: + app_state.release_request_lock_if_idle(request_id) return ApiResponse(data=scoped)- async with app_state.acquire_request_lock(request_id): - ... - app_state.release_request_lock_if_idle(request_id) + try: + async with app_state.acquire_request_lock(request_id): + ... + finally: + app_state.release_request_lock_if_idle(request_id)Also applies to: 265-292, 311-336
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/api/controllers/requests.py` around lines 183 - 237, The handler acquires a request lock via acquire_request_lock(request_id) but returns/raises on NotFoundError or ConflictError without evicting the registered lock; update scope_request (and similarly approve_request/reject_request at the other sites) to ensure the lock is released on all failure paths by wrapping the locked section in try/finally: after async with app_state.acquire_request_lock(request_id): immediately enter a try block for the existing logic and in the finally call the lock-release helper (e.g. app_state.release_request_lock(request_id) or the corresponding eviction method) so that any exception (from sim_state.request_store.get, validation, or save) always triggers lock cleanup. Ensure you reference the exact symbols acquire_request_lock, release_request_lock (or the existing eviction method), scope_request, approve_request, and reject_request when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/synthorg/api/controllers/requests.py`:
- Around line 183-237: The handler acquires a request lock via
acquire_request_lock(request_id) but returns/raises on NotFoundError or
ConflictError without evicting the registered lock; update scope_request (and
similarly approve_request/reject_request at the other sites) to ensure the lock
is released on all failure paths by wrapping the locked section in try/finally:
after async with app_state.acquire_request_lock(request_id): immediately enter a
try block for the existing logic and in the finally call the lock-release helper
(e.g. app_state.release_request_lock(request_id) or the corresponding eviction
method) so that any exception (from sim_state.request_store.get, validation, or
save) always triggers lock cleanup. Ensure you reference the exact symbols
acquire_request_lock, release_request_lock (or the existing eviction method),
scope_request, approve_request, and reject_request when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: db3fa771-d140-4a14-8d99-5140fa9b4755
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
CLAUDE.mdpyproject.tomlscripts/mock_spec_baseline.txtscripts/run_affected_tests.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/api/state.pysrc/synthorg/api/state_services.pysrc/synthorg/budget/tracker.pysrc/synthorg/observability/events/api.pysrc/synthorg/observability/events/budget.pysrc/synthorg/providers/cost_recording.pytests/conftest.pytests/unit/a2a/test_well_known.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_memory_admin.pytests/unit/api/fakes_backend.pytests/unit/api/test_module_cache_isolation.pytests/unit/api/test_state.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/observability/conftest.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/providers/test_cost_recording_chokepoint.pytests/unit/settings/test_service.pytests/unit/telemetry/test_event_counter_race.pytests/unit/tools/sandbox/test_docker_config.py
💤 Files with no reviewable changes (2)
- tests/unit/a2a/test_well_known.py
- tests/unit/observability/conftest.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
- GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Socket Security: Pull Request Alerts
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Always read the relevant
docs/design/page before implementing any feature or planning any issue. If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; do NOT silently diverge without explicit user approval.No
from __future__ import annotations: Python 3.14 has PEP 649.Use PEP 758 except syntax:
except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:).Type hints required on all public functions; mypy strict mode enforced.
Docstrings required on public classes and functions using Google style.
Create new objects instead of mutating existing ones; use frozen Pydantic models for config/identity; for non-Pydantic registries use
copy.deepcopy()at construction +MappingProxyTypewrapping.Use frozen Pydantic models (
ConfigDict(frozen=True, allow_inf_nan=False)) for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.Use
ConfigDict(frozen=True, allow_inf_nan=False)everywhere on Pydantic models; useextra="forbid"on request DTOs; use@computed_fieldfor derived values; useNotBlankStrfromcore.typesfor identifier/name fields.Prefer
asyncio.TaskGroupfor fan-out/fan-in async concurrency; wrap independent task bodies inasync defhelpers that catchException(re-raise onlyMemoryError/RecursionError) so one failure doesn't unwind the group.Line length: 88 (ruff); functions: < 50 lines; files: < 800 lines.
Handle errors explicitly, never swallow. Domain error families register a base-class entry in
EXCEPTION_HANDLERSso subtypes get correct status codes.
Files:
src/synthorg/observability/events/budget.pysrc/synthorg/observability/events/api.pytests/unit/telemetry/test_event_counter_race.pytests/unit/api/test_module_cache_isolation.pysrc/synthorg/api/state.pysrc/synthorg/providers/cost_recording.pytests/unit/api/fakes_backend.pysrc/synthorg/api/controllers/requests.pytests/unit/tools/sandbox/test_docker_config.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/api/test_state.pytests/unit/providers/management/test_probe_cost_recording.pysrc/synthorg/budget/tracker.pytests/unit/settings/test_service.pytests/unit/api/controllers/test_departments_health.pysrc/synthorg/api/state_services.pytests/unit/api/controllers/test_memory_admin.pytests/unit/providers/test_cost_recording_chokepoint.pytests/conftest.pyscripts/run_affected_tests.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every
BaseToolsubclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatch.When migrating an entry-point from raw
dict[str, Any]to typed Pydantic validation (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security export), callparse_typed()fromsynthorg.api.boundarywith a hardcoded literal boundary label; the helper validates against the boundary's typed model and emitsAPI_BOUNDARY_VALIDATION_FAILEDon failure.Classes that read time or sleep cooperatively must take
clock: Clock | None = Nonedefaulting toSystemClock()(synthorg.core.clock); tests injectFakeClock. Patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam.Async
start()/stop()services must own a dedicatedself._lifecycle_lock; timed-out stops must mark the service unrestartable.Wrap attacker-controllable strings at LLM call sites via
wrap_untrusted()fromsynthorg.engine.prompt_safety; appenduntrusted_content_directive(tags)to the enclosing system prompt (SEC-1 untrusted-content fences).Never call
lxml.html.fromstringon attacker input; useHTMLParseGuard(synthorg.tools.html_parse_guard) (SEC-1 HTML parsing).Error classes in domain modules must use
<Domain><Condition>Errorand inherit fromDomainError(or a domain-scoped intermediate that itself inheritsDomainError). BareException/RuntimeErrorat domain boundaries is forbidden.Every Pydantic model must be
ConfigDict(frozen=True, ...)unless documented otherwise; mutations go throughmodel_copy(update=...), never direct attribute assignment. Runtime-state models that must mutate (rare) must document the deviation in a module-level comment.Validate at system boundaries (user input, external APIs, config files).
Every business-logic module must have `from synthorg.observability i...
Files:
src/synthorg/observability/events/budget.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/state.pysrc/synthorg/providers/cost_recording.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/budget/tracker.pysrc/synthorg/api/state_services.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/budget.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/state.pysrc/synthorg/providers/cost_recording.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/budget/tracker.pysrc/synthorg/api/state_services.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every
Mock()/AsyncMock()/MagicMock()intests/MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate blocks new bare-call sites. Pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate viauv run python scripts/check_mock_spec.py --update(mock-spec gate#1604).Use
mock_dispatcher(anAsyncMock(spec=NotificationDispatcher)coveringregister/start/aclose/dispatch) fromtests/conftest.pyinstead of building the spec inline.For time-driven tests, import
FakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0). For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam.Coverage minimum: 80% (enforced in CI; benchmarks excluded via
--ignore=tests/benchmarks/in coverage runs).Use
asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded.Timeout: 30 seconds per test (global in
pyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed).Parallelism:
pytest-xdistvia-n 8, distribution--dist=loadfile(default). ALWAYS include-n 8when running pytest locally, never run tests sequentially.loadfilekeeps every test in a file pinned to the same xdist worker.Never use
monkeypatch.setattr(module.logger, "info", spy)(logger spying antipattern). TheBoundLoggerLazyProxyserves log methods via__getattr__and caches stale bound methods. Use a context manager that wraps directsetattr+try/finally del proxy.<level>instead (canonical pattern intests/unit/settings/test_service.py).Prefer
@pytest.mark.parametrizefor testing similar cases.Use ge...
Files:
tests/unit/telemetry/test_event_counter_race.pytests/unit/api/test_module_cache_isolation.pytests/unit/api/fakes_backend.pytests/unit/tools/sandbox/test_docker_config.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/api/test_state.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/settings/test_service.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_memory_admin.pytests/unit/providers/test_cost_recording_chokepoint.pytests/conftest.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/telemetry/test_event_counter_race.pytests/unit/api/test_module_cache_isolation.pytests/unit/api/fakes_backend.pytests/unit/tools/sandbox/test_docker_config.pytests/unit/hr/performance/test_llm_judge_quality_strategy.pytests/unit/api/test_state.pytests/unit/providers/management/test_probe_cost_recording.pytests/unit/settings/test_service.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_memory_admin.pytests/unit/providers/test_cost_recording_chokepoint.pytests/conftest.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed
api.ws_frame_timeout_seconds(default 30s) without sending a frame. Wrapssocket.receive_text()inasyncio.wait_for(...).WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via
_SlidingWindowRateLimiter(api.ws_revalidation_window_secondsdefault 60s,api.ws_revalidation_max_failuresdefault 5) instead of a reset-on-success streak counter. Once the window saturates, the socket closes with code 4011.
Files:
src/synthorg/api/state.pysrc/synthorg/api/controllers/requests.pysrc/synthorg/api/state_services.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts (MANDATORY Planning rule).
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: At every phase of planning and implementation, be critical and actively look for ways to improve the design in the spirit of robustness, correctness, simplicity, and future-proofing. Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first (MANDATORY Planning rule).
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing (Post-Implementation MANDATORY rule).
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: NEVER create a PR directly (gh pr create is blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents. After the PR exists, use `/aurelio-review-pr` for external reviewer feedback. Fix everything valid, never skip (Pre-PR Review MANDATORY rule).
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Git commits: `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits required on `main` and all other refs via branch protection. Exception: GitHub App-signed commits from `synthorg-repo-bot` via Git Data API also satisfy `required_signatures`.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Branches: `<type>/<slug>` from main.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Never use `cd` in Bash commands; the working directory is already set to the project root. Use absolute paths or run commands directly. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in a child process). Use this for tools without a `-C` flag, e.g. `bash -c "cd web && npm install"`.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:18:35.257Z
Learning: Never use Bash to write or modify files; use the Write or Edit tools. Do not use `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)"`', or `tee` to create/modify files (read-only/inspection uses like piping to stdout are fine). This applies to all files and all subagents.
🪛 LanguageTool
CLAUDE.md
[style] ~213-~213: Consider using “who” when you are referring to a person instead of an object.
Context: ...r__is shadowed. Use a context manager that wraps directsetattr+try/finally d...
(THAT_WHO)
🔇 Additional comments (2)
tests/unit/api/controllers/test_memory_admin.py (1)
231-265: Good fix for the LazyProxy logger-spy flake.This direct
setattr+finally: del proxy.warningpattern avoids caching a stale bound method on the proxy and keeps latercapture_logs()assertions isolated. Based on learnings, the repo now treatsmonkeypatch.setattr(module.logger, "<level>", spy)on aBoundLoggerLazyProxyas a test-isolation antipattern.pyproject.toml (1)
352-362: Add routing infrastructure before applying@pytest.mark.serialto any tests.Line 360 declares the
serialmarker with a promise to route tests to a no-xdist suite, but no such routing exists in conftest.py, CI workflows, or test runner scripts. Currently, no tests use the marker, so this is not blocking. However, before any test is marked@pytest.mark.serial, you must implement one of the following:
- A conftest fixture or pytest hook that deselects serial-marked tests from the default
pytest tests/run- A separate CI job or runner script that explicitly excludes
-n(xdist) for serial tests- A wrapper that splits
serialand non-serialtest suites into separate invocationsDocument the chosen approach so developers know how to correctly apply the marker.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Frontend and UX polishing improves user interface responsiveness and visual consistency. - API hygiene and validation enhancements provide smoother and more reliable interactions. ### What's new - Introduced typed-boundary helpers enabling better type safety and parse_typed workflows. - Added codebase-audit skill prompt tuning for improved project auditing. ### Under the hood - Eliminated flaky tests caused by module-level state for more stable test outcomes. - Unified image tag management under CLI and Renovate for consistent dependency updates. - Added cross-PR file-overlap analysis to the review dependency pull request skill. - Updated multiple dependencies including Python, Web, CLI, and container libraries. - Improved CI tooling and lock file maintenance for better build reliability. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.7.8](v0.7.7...v0.7.8) (2026-05-03) ### Features * **api:** typed-boundary helper + codebase-audit skill prompt tuning ([#1712](#1712)) ([40ee65b](40ee65b)) * **boundary:** RFC [#1711](#1711) Phases 2 + 3 — typed boundaries via parse_typed ([#1720](#1720)) ([7b9f409](7b9f409)) ### Bug Fixes * **api:** audit cleanup B -- API hygiene & validation ([#1719](#1719)) ([3d790d9](3d790d9)) * audit cleanup C - persistence, concurrency & data integrity ([#1708](#1708)) ([#1717](#1717)) ([bcce097](bcce097)) * **test:** exterminate xdist-flaky tests with module-level state ([#1713](#1713)) ([#1721](#1721)) ([8d258dd](8d258dd)) * **web:** audit cleanup E -- frontend & UX polish ([#1710](#1710)) ([#1718](#1718)) ([3a3591a](3a3591a)) ### Refactoring * **cli:** single source of truth for DHI image tags + Renovate manager ([#1723](#1723)) ([57980a2](57980a2)) ### Documentation * audit cleanup D -- public-facing & docs sync ([#1709](#1709)) ([#1715](#1715)) ([ade03b7](ade03b7)) ### Tests * **engine:** make TestDrainTimeout deterministic + preserve subclass type in [@Ontology](https://github.com/ontology)_entity ([#1729](#1729)) ([b00fb05](b00fb05)) ### CI/CD * Update CI tool dependencies ([#1703](#1703)) ([355a9ff](355a9ff)) ### Maintenance * add cross-PR file-overlap analysis to review-dep-pr skill ([#1722](#1722)) ([3861d8a](3861d8a)) * **ci:** unify apko-version under workflow env so Renovate manages it everywhere ([#1724](#1724)) ([9c0a7fd](9c0a7fd)) * consolidate DHI image-pin custom regex managers ([#1726](#1726)) ([b8b0cba](b8b0cba)) * **deps:** update dependency chainguard-dev/melange to v0.50.4 ([#1701](#1701)) ([8cbf83a](8cbf83a)) * Lock file maintenance ([#1705](#1705)) ([414cfea](414cfea)) * Lock file maintenance ([#1727](#1727)) ([5cb1212](5cb1212)) * Update CLI dependencies ([#1702](#1702)) ([9fb57b9](9fb57b9)) * Update Container dependencies ([#1698](#1698)) ([6d24fd6](6d24fd6)) * Update dependency @eslint-react/eslint-plugin to v5 ([#1704](#1704)) ([1cb1294](1cb1294)) * Update Python dependencies ([#1699](#1699)) ([8e7af3a](8e7af3a)) * Update Python dependencies to v4.15.0 ([#1725](#1725)) ([69164c8](69164c8)) * Update Web dependencies ([#1700](#1700)) ([715300d](715300d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Closes #1713.
The 5/6 settings tests in
tests/unit/settings/test_source_resolution_log.py(and the agenttest_factory) flake underpytest -m unit -n 8because four test files usemonkeypatch.setattr(module.logger, "<level>", spy)on aBoundLoggerLazyProxy. The proxy serves log methods via__getattr__(per-call rebind) and stores nothing on the instance dict.monkeypatch.setattrsnapshots the bound methodgetattrreturns and "restores" it viasetattrat undo, permanently caching the stale bound method intoproxy.__dict__and shadowing__getattr__for the lifetime of the proxy (process-global). Subsequentstructlog.testing.capture_logs()calls then route around the cachedBoundLogger(which holds a stale processor list), so events bypass the capture buffer.I traced this with instrumentation that printed proxy state at the fire site:
'info' in logger.__dict__was true, andbound._processorswas the original (pre-capture_logs) processor list whilecfg["processors"]was[LogCapture].Fix
Direct
setattr+try/finally del proxy.<level>cleanup in 4 sites, so__getattr__resumes serving fresh bound loggers between tests:tests/unit/settings/test_service.py—_install_logger_info_spyhelper replaced with the_logger_info_spycontext managertests/unit/telemetry/test_event_counter_race.pytests/unit/api/controllers/test_memory_admin.pytests/unit/tools/sandbox/test_docker_config.pyDefence in depth
The PR also implements every actionable finding from the 16 review agents (full triage in
_audit/pre-pr-review/triage.md):_pending_record_tasksset onCostTracker(was module-level incost_recording.py)request_locks: OrderedDictonAppState(was module-level inrequests.py)_card_cacheand_snapshotreset via global autouse fixtures_request_locksbounded by_MAX_REQUEST_LOCKS=10000with idle-only LRU eviction (defence-in-depth against authenticated unbounded growth via abandoned scopings — pre-existing leak the refactor inherited)release_request_lock_if_idlenow logs DEBUG when caller still holds the lockdrain_pending_recordslogs WARN on unexpected non-fatal exceptions (defensively surfaces a regression in_record_cost_in_backgroundthat fails to log)Iterator[None]yield form (with post-yield re-clear for symmetry)async defMemoryError/RecursionErrorpropagation through drain, thread-race on the DCL guard, eviction-cap behaviour, drain-unexpected log captureInfrastructure
pyproject.toml: default--dist=worksteal→--dist=loadfile. Loadfile keeps every test in a file pinned to the same xdist worker, avoiding the cumulative resource leak that crashes workers under worksteal on Python 3.14 + Windows (separate, pre-existing infra issue per thepyproject.tomlfilterwarningsblock).scripts/run_affected_tests.py: new_run_isolation_gate()runspytest --count 2 -xover the affected subset after the primary green pass, catching the next module-level-state offender at PR time.SYNTHORG_SKIP_ISOLATION_GATE=1opt-out.pytest-repeat==0.9.4added to the test dependency group.Documentation
CLAUDE.md## Testingnow documents:--dist=loadfiledefault + rationalepytest-repeatand the isolation gatemonkeypatch.setattr(module.logger, ...)antipattern this PR fixes, with a forward link to_logger_info_spyVerification
uv run ruff check src/ tests/→ cleanuv run ruff format src/ tests/→ cleanuv run mypy src/ tests/→ 0 issues across 3616 source filesuv run python -m pytest tests/ -m unit -n 8→ 26343/26343 pass in ~95s (default--dist=loadfile)--count 5 -n 8→ 35/35 passTest plan
pytest -m unit -n 8 --count 5 tests/unit/settings/test_source_resolution_log.py tests/unit/core/test_agent.py::TestAgentIdentity::test_factorypasses 35/35 (issue's "Definition of done" criterion 1).scripts/run_affected_tests.pyisolation gate runs--count 2 -x(criterion 4).Mock()sites —mock_spec_baseline.txtregenerated only for shifted line numbers.Review coverage
16 specialist agents reviewed the diff:
docs-consistency,comment-quality-rot,code-reviewer,python-reviewer,pr-test-analyzer,silent-failure-hunter,comment-analyzer,type-design-analyzer,logging-audit,resilience-audit,conventions-enforcer,security-reviewer,api-contract-drift,test-quality-reviewer,async-concurrency-reviewer,issue-resolution-verifier. 11 actionable findings, 3 false positives (PEP 758except A, B:is correct on Python 3.14; the DCL guard is implemented correctly;_update_project_aggregate's broad except is pre-existing best-effort logging). All 11 actionable findings fixed; full table in_audit/pre-pr-review/triage.md.