Skip to content

fix(test): exterminate xdist-flaky tests with module-level state (#1713)#1721

Merged
Aureliolo merged 11 commits into
mainfrom
fix/test-isolation-hunt
May 3, 2026
Merged

fix(test): exterminate xdist-flaky tests with module-level state (#1713)#1721
Aureliolo merged 11 commits into
mainfrom
fix/test-isolation-hunt

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Closes #1713.

The 5/6 settings tests in tests/unit/settings/test_source_resolution_log.py (and the agent test_factory) flake under pytest -m unit -n 8 because four test files use monkeypatch.setattr(module.logger, "<level>", spy) on a BoundLoggerLazyProxy. The proxy serves log methods via __getattr__ (per-call rebind) and stores nothing on the instance dict. monkeypatch.setattr snapshots the bound method getattr returns and "restores" it via setattr at undo, permanently caching the stale bound method into proxy.__dict__ and shadowing __getattr__ for the lifetime of the proxy (process-global). Subsequent structlog.testing.capture_logs() calls then route around the cached BoundLogger (which holds a stale processor list), so events bypass the capture buffer.

I traced this with instrumentation that printed proxy state at the fire site: 'info' in logger.__dict__ was true, and bound._processors was the original (pre-capture_logs) processor list while cfg["processors"] was [LogCapture].

Fix

Direct setattr + try/finally del proxy.<level> cleanup in 4 sites, so __getattr__ resumes serving fresh bound loggers between tests:

  • tests/unit/settings/test_service.py_install_logger_info_spy helper replaced with the _logger_info_spy context manager
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/tools/sandbox/test_docker_config.py

Defence in depth

The PR also implements every actionable finding from the 16 review agents (full triage in _audit/pre-pr-review/triage.md):

  • Module-level state moved to per-instance state in earlier commits on this branch:
    • _pending_record_tasks set on CostTracker (was module-level in cost_recording.py)
    • request_locks: OrderedDict on AppState (was module-level in requests.py)
    • _card_cache and _snapshot reset via global autouse fixtures
    • Hypothesis examples DB namespaced by worktree
  • _request_locks bounded by _MAX_REQUEST_LOCKS=10000 with idle-only LRU eviction (defence-in-depth against authenticated unbounded growth via abandoned scopings — pre-existing leak the refactor inherited)
  • release_request_lock_if_idle now logs DEBUG when caller still holds the lock
  • drain_pending_records logs WARN on unexpected non-fatal exceptions (defensively surfaces a regression in _record_cost_in_background that fails to log)
  • 3 fixtures unified to Iterator[None] yield form (with post-yield re-clear for symmetry)
  • 3 sync tests no longer marked async def
  • New regression tests: MemoryError / RecursionError propagation through drain, thread-race on the DCL guard, eviction-cap behaviour, drain-unexpected log capture

Infrastructure

  • pyproject.toml: default --dist=worksteal--dist=loadfile. Loadfile keeps every test in a file pinned to the same xdist worker, avoiding the cumulative resource leak that crashes workers under worksteal on Python 3.14 + Windows (separate, pre-existing infra issue per the pyproject.toml filterwarnings block).
  • scripts/run_affected_tests.py: new _run_isolation_gate() runs pytest --count 2 -x over the affected subset after the primary green pass, catching the next module-level-state offender at PR time. SYNTHORG_SKIP_ISOLATION_GATE=1 opt-out.
  • pytest-repeat==0.9.4 added to the test dependency group.

Documentation

CLAUDE.md ## Testing now documents:

  • the new --dist=loadfile default + rationale
  • pytest-repeat and the isolation gate
  • the monkeypatch.setattr(module.logger, ...) antipattern this PR fixes, with a forward link to _logger_info_spy

Verification

  • uv run ruff check src/ tests/ → clean
  • uv run ruff format src/ tests/ → clean
  • uv run mypy src/ tests/ → 0 issues across 3616 source files
  • uv run python -m pytest tests/ -m unit -n 826343/26343 pass in ~95s (default --dist=loadfile)
  • Canonical settings + agent tests --count 5 -n 835/35 pass

Test plan

  • pytest -m unit -n 8 --count 5 tests/unit/settings/test_source_resolution_log.py tests/unit/core/test_agent.py::TestAgentIdentity::test_factory passes 35/35 (issue's "Definition of done" criterion 1).
  • scripts/run_affected_tests.py isolation gate runs --count 2 -x (criterion 4).
  • Pre-push hook never produces an xdist-isolation false-positive (verified locally; one-week observation window will validate).
  • No new bare-Mock() sites — mock_spec_baseline.txt regenerated only for shifted line numbers.
  • No persistence/migration changes; no API-contract drift.

Review coverage

16 specialist agents reviewed the diff: docs-consistency, comment-quality-rot, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, api-contract-drift, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier. 11 actionable findings, 3 false positives (PEP 758 except A, B: is correct on Python 3.14; the DCL guard is implemented correctly; _update_project_aggregate's broad except is pre-existing best-effort logging). All 11 actionable findings fixed; full table in _audit/pre-pr-review/triage.md.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
pip/pytest-repeat 0.9.4 UnknownUnknown

Scanned Files

  • uv.lock

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

Walkthrough

This change centralizes per-request locking in AppState (new lock registry, get/create, async acquire, and conditional eviction helpers) and moves background cost-record task ownership into CostTracker with tracking and drain APIs. Tests add three autouse fixtures to reset process-global state and update many tests to use instance-scoped drains and direct attribute spying on lazy logger proxies. Test tooling: pytest xdist scheduling switched to --dist=loadfile, pytest-repeat added, and scripts/run_affected_tests.py gains an isolation regression gate that reruns affected tests with pytest --count 2 -x (opt-out via env var).

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing xdist-flaky tests caused by module-level state through direct setattr cleanup and infrastructure improvements.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the root cause (monkeypatch.setattr on BoundLoggerLazyProxy), the fix, and all audit-driven improvements.
Linked Issues check ✅ Passed The PR fully addresses issue #1713's objectives: fixes the identified xdist flakes (monkeypatch.setattr on loggers), moves module-level state to per-instance, adds isolation gates with pytest-repeat, and achieves consistent green results for the specified test runs.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing module-level state isolation and infrastructure improvements. No unrelated refactors, feature additions, or scope drift detected beyond the linked issue's objectives.
Docstring Coverage ✅ Passed Docstring coverage is 50.00% which is sufficient. The required threshold is 40.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 1/5 review remaining, refill in 43 minutes and 57 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 3, 2026

Merging this PR will improve performance by 12.62%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 1 improved benchmark
✅ 32 untouched benchmarks
⏩ 21 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
test_scrub_small_clean 110.7 µs 98.3 µs +12.62%

Comparing fix/test-isolation-hunt (c631d0a) with main (355a9ff)2

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (8cbf83a) during the generation of this report, so 355a9ff was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses test isolation regressions (Issue #1713) by migrating module-level state, including request locks and background task sets, into instance-bound registries within AppState and CostTracker. It introduces global autouse fixtures to reset shared state like structlog and internal caches between tests, and updates the test runner with an isolation regression gate using pytest-repeat. Feedback suggests refining the task drain logic to handle asyncio.CancelledError separately and improving the test fake's interface to avoid direct manipulation of internal connection state.

Comment on lines +623 to +634
if isinstance(outcome, BaseException):
# ``_record_cost_in_background`` already logs + swallows
# recoverable failures, so reaching this branch means
# something downstream raised without going through the
# documented logging path. Surface defensively at WARN
# so the regression is visible in test output rather
# than silently dropped by ``return_exceptions=True``.
logger.warning(
BUDGET_PENDING_RECORD_DRAIN_UNEXPECTED,
error_type=type(outcome).__name__,
error=safe_error_description(outcome),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The drain_pending_records method logs unexpected exceptions at the WARNING level. While this is appropriate for surfacing regressions in tests, consider if asyncio.CancelledError should be handled separately to avoid noisy logs during graceful shutdowns or test cancellations, as it is a BaseException but often expected.

Comment on lines +116 to +117
fake_persistence = FakePersistenceBackend()
fake_persistence._connected = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manually setting _connected = True on the FakePersistenceBackend is a fragile pattern as it relies on internal implementation details of the fake. If the fake's connection logic changes, this test might break or pass incorrectly. Consider adding a public connect() method to the fake if it doesn't already exist.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/run_affected_tests.py (1)

515-519: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat pyproject.toml changes as test-affecting.

Right now a config-only change skips the hook completely because everything non-.py is discarded here. In this PR, changing pyproject.toml would print “No Python files changed -- skipping unit tests,” so the new pytest addopts / xdist / pytest-repeat behavior never gets validated before push.

Suggested fix
+    if "pyproject.toml" in changed:
+        print("pyproject.toml changed -- running full unit suite.")
+        return _run_pytest(["tests/unit/"], run_all=True)
+
     # Filter to Python files only.
     py_changed = [f for f in changed if f.endswith(".py")]
     if not py_changed:
         print("No Python files changed -- skipping unit tests.")
         return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/run_affected_tests.py` around lines 515 - 519, The current filter
only treats files ending with ".py" as test-affecting (py_changed = [f for f in
changed if f.endswith(".py")]), which causes config-only changes like
pyproject.toml to skip tests; update the logic to consider pyproject.toml as
affecting tests by either including it in the py_changed collection (e.g.,
include f == "pyproject.toml" or f.endswith("pyproject.toml")) or add an
explicit check on the changed list for "pyproject.toml" and treat that as a
trigger to run tests (so the "No Python files changed -- skipping unit tests."
path is not taken when pyproject.toml changed). Ensure references to the
py_changed variable and the early return remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/run_affected_tests.py`:
- Around line 13-17: Replace hardcoded issue references like "#1713" in the
module docstring and runtime banner with stable wording that describes the
failure mode or links to a persistent docs page; update the explanatory text
near the environment flag SYNTHORG_SKIP_ISOLATION_GATE and any banner generation
logic (the strings in scripts/run_affected_tests.py that mention the isolation
regression gate and the two-run pytest command) to say e.g. "an isolation
regression gate runs (double-run pytest to detect module-level state leaks)" or
point to a documented URL, and remove any "(`#NNNN`)" style references in the
other occurrences noted (lines ~450-454, ~491-494) so comments only explain
what/why not origin/issue IDs.

In `@src/synthorg/api/state_services.py`:
- Around line 456-490: get_or_create_request_lock has a race where the returned
asyncio.Lock can be evicted before the caller enters "async with", so add a
lightweight refcount like _request_lock_refs (dict) guarded by
_request_locks_guard; in get_or_create_request_lock (and where you create/reuse
an entry in _request_locks) increment _request_lock_refs[request_id] before
returning the Lock, and ensure the caller decrements the refcount when it
actually acquires/releases the lock (provide a helper or require using an async
context manager wrapper that decrements on enter/exit). Update
_evict_idle_request_locks_locked to only evict an entry when both not
lock.locked() and _request_lock_refs.get(request_id, 0) == 0, and keep all
operations touching _request_locks and _request_lock_refs under
_request_locks_guard; reference functions/attrs: get_or_create_request_lock,
_evict_idle_request_locks_locked, _request_locks, _request_locks_guard, and
_MAX_REQUEST_LOCKS.

In `@tests/unit/api/controllers/test_departments_health.py`:
- Line 110: Remove the in-code issue reference "(`#1713`)" from the docstring in
the test_departments_health module (tests unit module named
test_departments_health); locate the docstring that contains "(`#1713`)" and
delete that substring so the comment/docstring no longer references an issue
number, leaving the rest of the explanatory text intact or rephrasing it to be
issue-free.

In `@tests/unit/api/controllers/test_memory_admin.py`:
- Around line 252-255: The comment in
tests/unit/api/controllers/test_memory_admin.py uses an internal shorthand
"SEC-1" instead of a clear rationale; update the comment around the assertion
that "exc_info" is not in kwargs to explicitly state that including the full
traceback would bypass safe_error_description and could leak filesystem paths or
backend metadata (sensitive environment details), so exc_info must not be
set—refer to the assertion and the safe_error_description behavior to make the
safety rationale obvious to future readers.

In `@tests/unit/api/test_state.py`:
- Around line 450-474: Add an async unit test (e.g.,
test_eviction_preserves_held_lock) that patches _ss._MAX_REQUEST_LOCKS to a
small value (3), creates a state via _make_state, obtains and acquires the
oldest lock via state.get_or_create_request_lock("oldest") using an async with
block, then fill the registry to the cap with additional keys and call
state.get_or_create_request_lock("trigger") to force eviction; assert that
"oldest" remains in state._request_locks and that the stored object is the same
instance as oldest_lock to verify _evict_idle_request_locks_locked preserves
held locks. Ensure you import synthorg.api.state_services as _ss and use
state._request_locks for the assertions.

In `@tests/unit/tools/sandbox/test_docker_config.py`:
- Around line 104-110: Edit the inline comment in
tests/unit/tools/sandbox/test_docker_config.py that explains why direct setattr
+ try/finally delattr is used for module.logger (a BoundLoggerLazyProxy) to
remove the repository tracker reference "(issue `#1713`)" while keeping the
explanatory rationale about __getattr__, monkeypatch.setattr restoring __dict__
and breaking capture_logs; ensure the comment still mentions module.logger,
BoundLoggerLazyProxy, __getattr__, monkeypatch.setattr, and capture_logs so the
reasoning remains clear but contains no issue/PR ID.

---

Outside diff comments:
In `@scripts/run_affected_tests.py`:
- Around line 515-519: The current filter only treats files ending with ".py" as
test-affecting (py_changed = [f for f in changed if f.endswith(".py")]), which
causes config-only changes like pyproject.toml to skip tests; update the logic
to consider pyproject.toml as affecting tests by either including it in the
py_changed collection (e.g., include f == "pyproject.toml" or
f.endswith("pyproject.toml")) or add an explicit check on the changed list for
"pyproject.toml" and treat that as a trigger to run tests (so the "No Python
files changed -- skipping unit tests." path is not taken when pyproject.toml
changed). Ensure references to the py_changed variable and the early return
remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cad69896-b1dc-456b-91cf-d2039ffc20e4

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb57b9 and 9d05c14.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • CLAUDE.md
  • pyproject.toml
  • scripts/mock_spec_baseline.txt
  • scripts/run_affected_tests.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services.py
  • src/synthorg/budget/tracker.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/providers/cost_recording.py
  • tests/conftest.py
  • tests/unit/a2a/test_well_known.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/api/test_state.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/observability/conftest.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/unit/settings/test_service.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/tools/sandbox/test_docker_config.py
💤 Files with no reviewable changes (2)
  • tests/unit/observability/conftest.py
  • tests/unit/a2a/test_well_known.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Doc Drift Gate
  • GitHub Check: Build Backend Base (apko)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{md,txt}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2) for architecture diagrams, nested container layouts, and complex entity relationships. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, and pipelines. Use Markdown tables for grid/matrix data that is semantically tabular. Never use ```text` blocks with ASCII/Unicode box-drawing characters for diagrams.

Files:

  • CLAUDE.md
  • scripts/mock_spec_baseline.txt
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Read the relevant docs/design/ page before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.

Present every implementation plan to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical and actively look for design improvements; surface improvements as suggestions, not silent changes.

Comments explain WHY only, never origin/review/issue context. Forbidden: reviewer citations (pre-PR review #N``), in-code issue/PR back-references ((#NNNN`)`), cryptic internal-taxonomy shorthand (e.g. `SEC-1` naked), migration/rebrand framing (`ported from`), round/iteration narrative, self-evident restatements. What stays: hidden constraints, subtle invariants, workarounds for upstream bugs (with stable URL, not internal back-refs), and why a non-obvious choice was made framed in terms the next reader can verify.

No from __future__ import annotations; Python 3.14 has PEP 649.

PEP 758 except syntax: use except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:). Ruff enforces this on 3.14.

Type hints: all public functions must have type hints. Use mypy strict mode.

Docstrings: Google style, required on public classes and functions. Ruff enforces D rules.

Immutability: create new objects, never mutate existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction + MappingProxyType wrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).

Config vs runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.

Pydantic v2 conv...

Files:

  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/api/test_module_cache_isolation.py
  • src/synthorg/api/state.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/settings/test_service.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • scripts/run_affected_tests.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/controllers/requests.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/providers/cost_recording.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • src/synthorg/budget/tracker.py
  • tests/unit/api/test_state.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/conftest.py
**/*.{py,ts,tsx,go,sh}

📄 CodeRabbit inference engine (CLAUDE.md)

Never use cd in commands; the working directory is already set to the project root. Use absolute paths or run commands directly. Exception: bash -c "cd <dir> && <cmd>" is safe (runs in a child process). Use go -C cli (never cd cli). Never use Bash to write or modify files; use the Write or Edit tools instead.

Files:

  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/api/test_module_cache_isolation.py
  • src/synthorg/api/state.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/settings/test_service.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • scripts/run_affected_tests.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/controllers/requests.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/providers/cost_recording.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • src/synthorg/budget/tracker.py
  • tests/unit/api/test_state.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/conftest.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test regression (MANDATORY): NEVER delete, skip, or mark tests xfail to fix slowness. NEVER use --no-verify to bypass pre-push hooks. NEVER modify tests/baselines/unit_timing.json. First run: identify slow tests via uv run python -m pytest tests/unit/ -m unit -n 8 --durations=50 --durations-min=0.5 -q. Then compare against baseline. If suite time exceeds baseline * 1.3, this is a source code regression; fix the source code, not the tests.

Testing markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.

Mock-spec gate: every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass (Protocol or class). Pre-commit gate (scripts/check_mock_spec.py) blocks new bare sites. Shared mocks: use mock_dispatcher from tests/conftest.py instead of building inline.

Time-driven tests: import FakeClock from tests._shared.fake_clock and inject into the class under test. Use clock.advance(...) / await clock.advance_async(...) to drive virtual time. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number). FakeClock-first: patch time.monotonic() / asyncio.sleep() at module level only for legacy code paths without a Clock seam.

Heap-ceiling tests (peak-heap assertions via tracemalloc) live under tests/unit/perf/ and are marked @pytest.mark.unit because they are real assertions on captured peak-heap, not throughput measurements.

Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/). Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio. Timeout: 30 seconds per test (global; non-default overrides like timeout(60) allowed). Parallelism: pytest-xdist via -n 8, distribution --dist=loadfile. ALWAYS include -n 8 locally, never run tests sequentially. Isolation regression gate: affected-tests pre-push runner runs affected subset twice via `pytest-...

Files:

  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/settings/test_service.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/unit/api/test_state.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/conftest.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/settings/test_service.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/unit/api/test_state.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/conftest.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Regional defaults (MANDATORY): No default may privilege a single region, currency, or locale. Backend: use DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Never hardcode ISO 4217 codes or symbols; never use _usd suffix on money fields. Every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant (mixing raises MixedCurrencyAggregationError, HTTP 409).

Configuration precedence (MANDATORY): For every mutable setting, resolve through: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default. Resolve via SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved with source + yaml_path; subsequent reads stay at DEBUG. Do NOT use direct os.environ.get(...) outside startup. New settings register in src/synthorg/settings/definitions/<namespace>.py.

Domain error class naming: error classes in domain modules use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception/RuntimeError at domain boundaries is forbidden.

Repository CRUD vocabulary: persistence repositories use save(entity) -> None (insert-or-update, idempotent), get(id) -> Entity | None (None on miss, never raises), delete(id) -> bool (True on removal, False if absent), list_items(...) -> tuple[Entity, ...] (paginated/filtered), and query(...) -> tuple[Entity, ...]. Query methods always return tuples, never lists.

Logging (MANDATORY): Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Never use import logging / logging.getLogger() / print() in application code (exception: observability/{setup,sinks,syslog_handler,http_handler,otlp_handler}.py). Event names always import constants from synthorg.observability.events.<domain>; never use string lite...

Files:

  • src/synthorg/api/state.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/providers/cost_recording.py
  • src/synthorg/budget/tracker.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/state.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/api/state_services.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/providers/cost_recording.py
  • src/synthorg/budget/tracker.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T07:38:14.533Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T07:38:14.533Z
Learning: Post-implementation (MANDATORY): After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T07:38:14.533Z
Learning: Pre-PR review (MANDATORY): NEVER create a PR directly via `gh pr create` (blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but runs automated checks. After the PR exists, use `/aurelio-review-pr` for external reviewer feedback. Fix everything valid, never skip; no deferring or out-of-scope skipping.
🪛 LanguageTool
CLAUDE.md

[style] ~213-~213: Consider using “who” when you are referring to a person instead of an object.
Context: ...r__is shadowed. Use a context manager that wraps directsetattr+try/finally d...

(THAT_WHO)

🔇 Additional comments (30)
src/synthorg/observability/events/api.py (1)

234-237: LGTM!

The new event constant follows the established naming convention and domain namespace pattern. It correctly supports the observability requirement for logging when lock release is skipped due to the lock still being held.

src/synthorg/api/state_services.py (3)

106-112: LGTM!

The defence-in-depth cap is well-documented with a clear rationale for the 10k limit. The docstring explains the edge case (scope-but-never-advance) that necessitates the bound.


492-506: LGTM!

The eviction logic correctly:

  • Snapshots keys before iteration to avoid mutation during traversal
  • Checks lock.locked() to preserve in-flight locks
  • Uses insertion order (oldest-first) via OrderedDict

The potential gap (locks returned but not yet acquired) is noted in the previous comment.


508-536: LGTM!

The release method correctly:

  • Documents the contract (call after releasing the lock)
  • Handles contract violations gracefully with a DEBUG log (not ERROR, since the no-op is safe)
  • Uses the proper event constant for observability
src/synthorg/api/state.py (3)

9-10: LGTM!

The imports are correctly added at the top level (not under TYPE_CHECKING) since both threading.Lock and OrderedDict are instantiated at runtime in __init__.


235-236: LGTM!

The __slots__ entries are correctly added in alphabetical order within the existing slot list, following the file's convention.


472-491: LGTM!

The docstring thoroughly explains:

  • The isolation rationale (xdist workers / --count N repeat runs)
  • Why threading.Lock guards asyncio.Lock creation
  • Why OrderedDict enables O(1) oldest-first eviction
  • The defence-in-depth cap for abandoned requests

The initialization correctly uses OrderedDict() (not dict) to preserve insertion order for eviction semantics.

tests/unit/api/test_state.py (2)

356-411: LGTM!

The test class provides solid coverage of the core lock registry behaviors:

  • Identity caching ensures serialization works
  • Per-AppState isolation validates the xdist fix
  • Release semantics tests cover both idle and held states
  • Edge case for unknown IDs ensures defensive behavior

413-448: LGTM!

The concurrent creation test (test_concurrent_create_returns_same_lock) effectively validates the double-checked locking pattern by racing 16 threads. The assertion that all returned locks share identity confirms the guard works correctly.

src/synthorg/api/controllers/requests.py (3)

183-183: LGTM!

scope_request correctly acquires the per-request lock but does not release it, since SCOPING is a non-terminal state. The subsequent approve_request or reject_request will reuse the same lock instance from the registry.


265-292: LGTM!

approve_request correctly:

  • Serializes the get/check/save critical section under the lock
  • Publishes the WS event inside the lock (preserving ordering)
  • Calls release_request_lock_if_idle after exiting the async with block, satisfying the documented contract

311-336: LGTM!

reject_request follows the same correct pattern as approve_request -- terminal transition with release outside the lock scope.

tests/unit/settings/test_service.py (2)

936-977: LGTM! Canonical logger spy pattern correctly implemented.

The context manager properly avoids the monkeypatch.setattr antipattern on BoundLoggerLazyProxy by using direct attribute assignment and delattr cleanup. The docstring thoroughly documents why this pattern is necessary (preventing stale bound method caching that bypasses structlog.testing.capture_logs()).


1135-1141: LGTM! Clean migration to context manager pattern.

The test correctly uses the new _logger_info_spy context manager, properly scoping the spy lifetime and asserting on captured events within the context block.

tests/conftest.py (4)

114-116: LGTM! Per-worktree namespacing prevents concurrent access conflicts.

Namespacing the shared Hypothesis examples directory by Path.cwd().resolve().name isolates pre-push runs across concurrent worktrees on the same machine, avoiding Windows WinError 32 sharing violations.


455-491: LGTM! Structlog isolation fixture correctly scoped.

The fixture appropriately resets structlog defaults and sets root logger level to NOTSET (not WARNING) so INFO events from production code aren't silently filtered. The docstring clearly explains why handler cleanup is intentionally omitted (observability tests have their own broader reset).


494-509: LGTM! A2A card cache isolation fixture.

The before-and-after clear() pattern ensures both that each test starts with an empty cache and that mutations don't leak to the next test in the same xdist worker.


512-530: LGTM! Prometheus label snapshot isolation fixture.

Correctly resets the label validator snapshot before and after each test to prevent cross-test pollution from PrometheusCollector.refresh() calls.

tests/unit/api/test_module_cache_isolation.py (1)

1-91: LGTM! Well-designed regression coverage for global cache resets.

The tests are strategically placed outside tests/unit/a2a/ and tests/unit/observability/ to catch any future refactor that accidentally re-scopes the autouse fixtures to those directories. The pollution-leak test pairs verify that mutations in one test don't affect the next.

One minor observation: these tests are marked async but don't perform any async operations. This is harmless under asyncio_mode = "auto" but could be simplified to sync if preferred.

src/synthorg/budget/tracker.py (2)

578-595: LGTM! Clean pending task tracking with self-eviction.

The track_pending_record method correctly uses add_done_callback to auto-remove completed tasks, ensuring the set never grows unbounded. Per-instance ownership on CostTracker provides automatic xdist test isolation.


597-634: LGTM! Robust drain implementation with proper exception handling.

The drain correctly:

  1. Snapshots the set before awaiting to avoid mutation-during-iteration
  2. Uses return_exceptions=True to collect all outcomes
  3. Re-raises MemoryError/RecursionError (interpreter-fatal signals)
  4. Logs unexpected exceptions at WARNING with safe_error_description (SEC-1 compliant)
src/synthorg/providers/cost_recording.py (1)

382-387: LGTM! Task tracking correctly delegated to CostTracker instance.

Moving task tracking from module-level to CostTracker instance provides automatic xdist test isolation — each test's tracker owns its own pending task set, preventing cross-test leakage of tasks bound to closed event loops.

tests/unit/providers/management/test_probe_cost_recording.py (2)

80-80: LGTM! Correctly migrated to instance-based draining.

The test properly awaits tracker.drain_pending_records() before asserting on tracker state, ensuring background recording tasks have completed.


120-122: LGTM! Clear documentation of no-tracker contract.

The comment correctly explains that when no tracker is passed, no cost-recording scope is opened, so there's nothing to drain.

tests/unit/hr/performance/test_llm_judge_quality_strategy.py (2)

403-403: LGTM! Correctly migrated to instance-based draining.


454-454: LGTM! Consistent drain pattern across cost tracking tests.

tests/unit/telemetry/test_event_counter_race.py (1)

69-115: LGTM! Correctly implements logger spy pattern without monkeypatch.

The implementation properly avoids the monkeypatch.setattr antipattern on BoundLoggerLazyProxy by using direct attribute assignment and cleanup via del in a finally block. The detailed comment explains the underlying issue clearly.

tests/unit/providers/test_cost_recording_chokepoint.py (2)

121-278: Good migration to per-tracker draining for deterministic isolation.

Using await tracker.drain_pending_records() on the instance under test removes dependence on module-level state and makes these tests stable under repeat/xdist execution.


353-474: Strong regression coverage for pending-task isolation and drain contracts.

The new suite cleanly pins per-tracker ownership, repeated-drain behavior, fatal-error propagation, and defensive logging for unexpected exceptions.

tests/unit/api/controllers/test_departments_health.py (1)

94-127: Nice isolation fix in _build_dept_client.

Creating and seeding a fresh FakePersistenceBackend per call is a solid way to prevent cross-test state bleed in this endpoint suite.

Comment thread scripts/run_affected_tests.py
Comment thread src/synthorg/api/state_services.py
Comment thread tests/unit/api/controllers/test_departments_health.py Outdated
Comment thread tests/unit/api/controllers/test_memory_admin.py Outdated
Comment thread tests/unit/api/test_state.py
Comment thread tests/unit/tools/sandbox/test_docker_config.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 89.65517% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.70%. Comparing base (355a9ff) to head (c631d0a).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/state_services.py 87.03% 2 Missing and 5 partials ⚠️
src/synthorg/api/controllers/requests.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1721   +/-   ##
=======================================
  Coverage   84.70%   84.70%           
=======================================
  Files        1786     1786           
  Lines      102259   102311   +52     
  Branches     8980     8991   +11     
=======================================
+ Hits        86614    86665   +51     
  Misses      13457    13457           
- Partials     2188     2189    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aureliolo added 11 commits May 3, 2026 10:13
Part of #1713 test isolation hunt. Pre-flight setup for the debugging
window:

- Flip .claude/hookify.enforce-parallel-tests.md to enabled:false so
  reconnaissance runs (--count 5, -n 0, -n 4) are not blocked. Re-enabled
  in the cleanup phase before the PR lands.
- Add pytest-repeat==0.9.4 to the test dependency group so --count N
  becomes available for surfacing intermittent isolation flakes.
Part of #1713 test isolation hunt. The cost-recording chokepoint held a
process-level set of strong references to in-flight background recording
tasks; under pytest-xdist a task scheduled in test N could outlive the
test's event loop and leak into test N+1's loop, surfacing as 'Task
attached to a different loop' on the next read.

Move ownership to CostTracker (one per AppState, fresh per test) so
isolation comes for free without threading the manager through 30+
cost_recording_scope callsites. CostTracker is the natural sink: it
already receives the recorded values; tracking the in-flight tasks that
feed it keeps the strong-ref discipline next to the data flow it
protects. Tests construct fresh trackers per test, so the per-instance
set is automatically per-test.

- Delete module-level _pending_record_tasks set and the test-only
  drain_pending_cost_records() free function from cost_recording.py.
- Add CostTracker.track_pending_record() and async drain_pending_records()
  with the same self-eviction (add_done_callback) discipline and the
  same MemoryError/RecursionError propagation rule.
- Update emit_cost_record_from_context() to call
  ctx.cost_tracker.track_pending_record(task).
- Update the 13 test callsites of drain_pending_cost_records() to call
  the per-tracker drain.
- Add TestPendingRecordIsolation regression class:
  - drain leaves pending set empty
  - pending set is per-tracker (no cross-tracker bleed)
  - repeated emission on same tracker drains clean (mirrors --count 2)
- Mock-spec baseline: two pre-existing bare-mock entries in
  test_llm_judge_quality_strategy.py shift by one line because an import
  was removed; refresh via scripts/check_mock_spec.py --update.

Verified: 90/90 tests pass under -n 0 --count 2 and -n 8 --count 2; mypy
clean.
Part of #1713 test isolation hunt. The request controller held a
process-level dict of asyncio.Lock objects keyed by request id; under
pytest-xdist a Lock created in test N's event loop could be reused in
test N+1's loop, surfacing as 'Task attached to a different loop' on
the first acquire.

Move ownership to AppState so each app instance (one per test) has its
own dict. The helpers move to AppStateServicesMixin as
get_or_create_request_lock() and release_request_lock_if_idle(),
following the same encapsulation contract as the existing ticket store
and presence accessors. Tests construct fresh AppState per test, so the
per-instance dict is automatically per-test.

- Add _request_locks (dict[str, asyncio.Lock]) and _request_locks_guard
  (threading.Lock) slots + init in api/state.py.
- Add type annotations and the two helper methods to
  api/state_services.py mixin.
- Delete module-level _REQUEST_LOCKS / _REQUEST_LOCKS_GUARD and the
  _lock_for_request / _release_request_lock free functions from
  api/controllers/requests.py.
- Update scope_request, approve_request, reject_request to call the
  AppState methods.
- Add TestAppStateRequestLocks regression class:
  - same id returns the same Lock identity
  - locks are per-AppState (no cross-state bleed)
  - release evicts idle locks
  - release keeps locked entries (no waiter strand)
  - release is a no-op for unknown ids
  - repeat create/release cycle drains clean (mirrors --count 2)

Verified: 102/102 unit-state tests pass under -n 0 --count 2; 3/3
TestRequestController integration tests still pass; mypy + ruff clean.
Part of #1713 test isolation hunt. Two subsystems hold a process-global
cache by design (the Agent Card cache and the Prometheus
label-validator snapshot). Both had local autouse-reset fixtures scoped
to a single test directory:

- synthorg.a2a.well_known._card_cache was cleared only inside
  tests/unit/a2a/test_well_known.py.
- synthorg.observability.prometheus_labels._snapshot was reset only
  inside tests/unit/observability/conftest.py.

When a test outside those directories imports the subsystem, it
observes whatever the prior test in the same xdist worker last wrote.
Promote both resets to tests/conftest.py so they fire for every
unit test. Both fixtures are O(1) (a dict .clear() and a single
module-global rebind) so the suite-wide cost is negligible.

- Add _reset_a2a_card_cache and _reset_prometheus_label_snapshot
  autouse fixtures to tests/conftest.py.
- Remove the local autouse from tests/unit/a2a/test_well_known.py and
  the (now duplicate) _reset_label_snapshot from
  tests/unit/observability/conftest.py.
- Add tests/unit/api/test_module_cache_isolation.py with two regression
  classes that assert the caches are empty when accessed outside their
  original directory, plus a populate-then-clear pair that would only
  pass when the fixture fires globally.

Verified: 30/30 isolation tests pass under -n 0 --count 5 with random
ordering; 100/100 pass when combined with the original a2a + prometheus
test suites under --count 2.
Part of #1713 test isolation hunt. The shared Hypothesis failure log at
~/.synthorg/hypothesis-examples/ was the only filesystem path under
Path.home() that the unit suite writes to and that crosses worktree
boundaries. Pre-push hooks across multiple concurrent worktrees on the
same machine all wrote to the same directory tree; on Windows, two
pytest sessions writing to the same hypothesis-examples folder can hit
WinError 32 sharing-violation races during DirectoryBasedExampleDatabase
save.

Append Path.cwd().resolve().name to the path so each worktree gets its
own subdirectory: ~/.synthorg/hypothesis-examples/WORKTREE_BASENAME/.

The path stays under ~/.synthorg/ so every worktree failure log still
survives git worktree remove (the original reason the dir lives outside
the working tree). The OSError fallback to local-only DB is unchanged.

Verified: tests/unit/api/test_module_cache_isolation.py runs green; the
new synthorg-wt-test-isolation-hunt subdirectory is created on first
run.
Part of #1713 test isolation hunt. Two changes that close the loop on
the cross-test isolation contract.

1. structlog defaults reset (Vector C, surfaced during reconnaissance)

   structlog defaults (processors, wrapper class, context-vars
   binding) are process-level. A test that calls structlog.configure()
   or holds structlog.testing.capture_logs() open across an unexpected
   exit leaves residual state for the next test in the same xdist
   worker. The canonical symptom hit during reconnaissance was the
   tests/unit/settings/test_source_resolution_log.py failure described
   in #1713: 5/6 tests pass alone but fail under -n 8 because a prior
   observability test left structlog wired to a filter that swallows
   the INFO emission the settings code makes.

   Promote the existing clear_logging_state() helper to a global
   autouse _reset_structlog_state fixture in tests/conftest.py so
   every unit test starts and ends on library defaults. The
   tests/unit/observability/conftest.py local _reset_logging is now
   redundant; it stays for documentation but is shadowed by the
   global. Verified: 5457/5457 unit tests across settings,
   observability, api, a2a, providers pass under -n 8 --dist loadscope
   (the previous order surfaced 6 failures).

2. Isolation regression gate (scripts/run_affected_tests.py)

   After a green primary run on affected paths, run pytest --count 2
   -x over the same selection. A test that passes once but fails on
   repeat almost always means a fixture leaked process-global state.
   Catches a new module-level-state offender the moment it lands,
   before push.

   - Skipped when run_all=True (full-suite + count would double a
     multi-minute run for marginal gain on routine pushes).
   - Skipped when paths is empty.
   - Operator escape hatch: SYNTHORG_SKIP_ISOLATION_GATE=1 (per-push,
     not durable; the primary run still gates first-run correctness).
   - On regression: prints a clear ISOLATION REGRESSION banner with a
     pointer to #1713 for canonical offenders + fix patterns.
Part of #1713 test isolation hunt. Cleanup phase.

- Re-enable hookify.enforce-parallel-tests (flipped to enabled:false in
  the Phase 0 setup commit so reconnaissance and per-fix verification
  could run --count 5 / -n 0 / -n 4 variants). The fixes are committed
  and verified; the parallel-tests hook returns to its normal duty of
  blocking pytest invocations missing -n 8.
- Register the @pytest.mark.serial marker in pyproject.toml so future
  tests that legitimately cannot run under xdist (the issue calls this
  out as out of scope for the current PR) have a sanctioned escape
  hatch instead of relying on undocumented marker strings. The marker
  description carries a do-not-apply-without-approval note so it is
  not silently used to paper over isolation regressions the gate at
  scripts/run_affected_tests.py would otherwise catch.
Refinements to the structlog reset fixture and Hypothesis CI profile
introduced in commit 0325ba6 after running --count 2 across the full
unit suite (52,332 tests).

The first pass closed all stdlib root handlers in addition to resetting
structlog defaults; tests that import synthorg.observability at module
load time hold long-lived handler references, so the close-on-teardown
made a second iteration of the same test see closed handlers. Narrow
the reset to structlog state only (defaults + contextvars). Observability
tests retain their dedicated _reset_logging autouse in
tests/unit/observability/conftest.py for the broader stdlib-root reset
they actually need.

The Hypothesis ci profile now also suppresses HealthCheck.differing
_executors, a known false positive when pytest-repeat runs the same
property test method from two separate collection items. The
suppression is safe because our property tests use derandomize=True
(fixed seed) and the example database is write-only via
_WriteOnlyDatabase, so the warned-against database-persistence race
cannot apply.

Validated: full-suite --count 2 dropped from 189 failures (after the
first pass) to 26 (after this refinement). Remaining residual flakes
are deeper xdist-loadscope-affinity issues that require per-test
investigation beyond the four confirmed module-level offenders fixed
earlier in this PR.
Two final isolation refinements after running the affected-tests runner
end-to-end on the rebased branch.

1. Stdlib root level reset (tests/conftest.py)

   Production setup_logging() may set logging.root.level to WARNING; a
   later test that uses structlog.testing.capture_logs() then sees
   only DEBUG events because the BoundLogger writes through stdlib,
   which filters INFO out at the root level. The narrower
   _reset_structlog_state autouse from commit 7910c6d touched only
   structlog defaults and contextvars; extend it to reset
   logging.root.level back to NOTSET (without closing handlers, which
   broke 822 tests in earlier iterations). Settles the canonical
   tests/unit/settings/test_source_resolution_log.py and
   tests/unit/settings/test_service.py::test_emits_audit_event
   failures referenced in #1713.

2. Fresh fake_persistence per _build_dept_client call
   (tests/unit/api/controllers/test_departments_health.py)

   _build_dept_client previously took the session-scoped
   fake_persistence fixture; settings written by other tests through
   _shared_app consumers leaked into the config-resolver lookup,
   producing spurious 404s on departments declared in the test
   config. Construct a fresh FakePersistenceBackend inside the helper
   instead, ignoring the shared fixture. Drops the fake_persistence
   parameter from the helper and from all five test method signatures
   that called it.

Verified: full unit suite (26,329 tests) passes under -n 8
--dist loadscope across multiple runs; mypy + ruff clean. Mock-spec
baseline refreshed for the bare AsyncMock at line 504 (was 503).
Pre-reviewed by 16 agents, 11 findings addressed.

Root cause: 4 sites used monkeypatch.setattr(module.logger, level, spy) on a BoundLoggerLazyProxy. The proxy serves log methods via __getattr__ (per-call rebind); monkeypatch snapshots the bound method getattr returns and restores it via setattr at undo, permanently caching a stale bound method into proxy.__dict__ and shadowing __getattr__ for the lifetime of the proxy. Subsequent capture_logs() calls then route around the cached BoundLogger (which holds a stale processor list), so events bypass the capture buffer. The 5/6 settings tests in test_source_resolution_log.py + the agent test_factory all trace to this.

Fix sites (direct setattr + try/finally delattr): tests/unit/settings/test_service.py (_logger_info_spy contextmgr), tests/unit/telemetry/test_event_counter_race.py, tests/unit/api/controllers/test_memory_admin.py, tests/unit/tools/sandbox/test_docker_config.py.

Triage findings implemented: request_locks dict bounded by _MAX_REQUEST_LOCKS=10000 with idle-only LRU eviction; release_request_lock_if_idle now logs DEBUG when caller still holds the lock; drain_pending_records logs WARN on unexpected non-fatal exceptions; 3 fixtures unified to Iterator[None] yield form; 3 sync tests no longer marked async def; 2 propagation tests for MemoryError/RecursionError; 1 thread-race test for the DCL guard; 1 eviction-cap test; 1 drain_unexpected log-capture test; CLAUDE.md Testing section now documents pytest-repeat + --dist=loadfile + the monkeypatch-on-LazyProxy antipattern; 2 test comments trimmed to technical WHY; conftest.py baseline-loader docstring rewritten.

Verification: ruff + mypy --strict clean across 3616 source files; pytest -m unit -n 8 --dist=loadfile passes 26343/26343 in ~95s; canonical 6-test --count 5 passes 35/35.

Closes #1713
Round 1 of /babysit-pr applied to PR #1721:

- state_services.py: add acquire_request_lock async context manager that bumps a per-id refcount before returning the Lock; eviction sweep now skips entries with refs > 0, closing the window between get_or_create_request_lock returning and the caller entering async with where the next caller would otherwise mint a different Lock for the same id.

- budget/tracker.py: drain_pending_records re-raises asyncio.CancelledError instead of WARN-logging it; cancellation is the expected outcome of graceful shutdown, not a regression.

- run_affected_tests.py: pyproject.toml changes now trigger the full unit suite so pytest config edits ship verified.

- fakes_backend.py: FakePersistenceBackend.mark_connected() public sync helper replaces private _connected mutation in test scaffolding.

- conftest.py, test_*.py: strip in-code issue/PR back-references and SEC-1 shorthand; rewrite docstrings to explain failure modes in terms the next reader can verify against the code.

- test_state.py: add test_eviction_preserves_held_lock and test_eviction_preserves_referenced_lock covering both the lock.locked() and refcount eviction-skip branches.

- test_cost_recording_chokepoint.py: add test_drain_propagates_cancellation_without_warning.
@Aureliolo Aureliolo force-pushed the fix/test-isolation-hunt branch from 9d05c14 to c631d0a Compare May 3, 2026 08:16
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 08:18 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/api/controllers/requests.py (1)

183-237: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Release request-lock entries on failure paths too.

acquire_request_lock() registers every request_id, including unknown IDs. scope_request() never evicts those entries on NotFoundError/ConflictError, and approve_request() / reject_request() only evict on the success path. A stream of bad IDs or mid-handler failures will keep idle locks around until the 10k sweep and can churn legitimate retained entries.

Suggested cleanup pattern
     async def scope_request(
         self,
         request: Request[Any, Any, Any],
         state: State,
         request_id: str,
         data: ScopingPayload,
     ) -> ApiResponse[ClientRequest]:
@@
-        async with app_state.acquire_request_lock(request_id):
-            try:
-                stored = await sim_state.request_store.get(request_id)
-            except KeyError as exc:
-                msg = f"Request {request_id!r} not found"
-                raise NotFoundError(msg) from exc
-            if stored.status not in {
-                RequestStatus.SUBMITTED,
-                RequestStatus.TRIAGING,
-            }:
-                msg = (
-                    f"Request {request_id!r} cannot be scoped from "
-                    f"status {stored.status.value!r}"
-                )
-                raise ConflictError(msg)
-            metadata = dict(stored.metadata)
-            metadata["scoping_notes"] = data.notes
-            requirement = stored.requirement
-            overrides: dict[str, Any] = {}
-            if (
-                data.refined_title is not None
-                or data.refined_description is not None
-                or data.refined_acceptance_criteria is not None
-            ):
-                overrides["requirement"] = requirement.model_copy(
-                    update={
-                        k: v
-                        for k, v in {
-                            "title": data.refined_title,
-                            "description": data.refined_description,
-                            "acceptance_criteria": data.refined_acceptance_criteria,
-                        }.items()
-                        if v is not None
-                    },
-                )
-            walked = stored
-            if walked.status == RequestStatus.SUBMITTED:
-                walked = walked.with_status(
-                    RequestStatus.TRIAGING,
-                    metadata=metadata,
-                )
-            scoped = walked.with_status(
-                RequestStatus.SCOPING,
-                metadata=metadata,
-                **overrides,
-            )
-            await sim_state.request_store.save(scoped)
-            _publish(request, WsEventType.REQUEST_SCOPED, scoped)
+        keep_lock = False
+        try:
+            async with app_state.acquire_request_lock(request_id):
+                try:
+                    stored = await sim_state.request_store.get(request_id)
+                except KeyError as exc:
+                    msg = f"Request {request_id!r} not found"
+                    raise NotFoundError(msg) from exc
+                if stored.status not in {
+                    RequestStatus.SUBMITTED,
+                    RequestStatus.TRIAGING,
+                }:
+                    msg = (
+                        f"Request {request_id!r} cannot be scoped from "
+                        f"status {stored.status.value!r}"
+                    )
+                    raise ConflictError(msg)
+                metadata = dict(stored.metadata)
+                metadata["scoping_notes"] = data.notes
+                requirement = stored.requirement
+                overrides: dict[str, Any] = {}
+                if (
+                    data.refined_title is not None
+                    or data.refined_description is not None
+                    or data.refined_acceptance_criteria is not None
+                ):
+                    overrides["requirement"] = requirement.model_copy(
+                        update={
+                            k: v
+                            for k, v in {
+                                "title": data.refined_title,
+                                "description": data.refined_description,
+                                "acceptance_criteria": data.refined_acceptance_criteria,
+                            }.items()
+                            if v is not None
+                        },
+                    )
+                walked = stored
+                if walked.status == RequestStatus.SUBMITTED:
+                    walked = walked.with_status(
+                        RequestStatus.TRIAGING,
+                        metadata=metadata,
+                    )
+                scoped = walked.with_status(
+                    RequestStatus.SCOPING,
+                    metadata=metadata,
+                    **overrides,
+                )
+                await sim_state.request_store.save(scoped)
+                _publish(request, WsEventType.REQUEST_SCOPED, scoped)
+                keep_lock = True
+        finally:
+            if not keep_lock:
+                app_state.release_request_lock_if_idle(request_id)
         return ApiResponse(data=scoped)
-        async with app_state.acquire_request_lock(request_id):
-            ...
-        app_state.release_request_lock_if_idle(request_id)
+        try:
+            async with app_state.acquire_request_lock(request_id):
+                ...
+        finally:
+            app_state.release_request_lock_if_idle(request_id)

Also applies to: 265-292, 311-336

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/api/controllers/requests.py` around lines 183 - 237, The handler
acquires a request lock via acquire_request_lock(request_id) but returns/raises
on NotFoundError or ConflictError without evicting the registered lock; update
scope_request (and similarly approve_request/reject_request at the other sites)
to ensure the lock is released on all failure paths by wrapping the locked
section in try/finally: after async with
app_state.acquire_request_lock(request_id): immediately enter a try block for
the existing logic and in the finally call the lock-release helper (e.g.
app_state.release_request_lock(request_id) or the corresponding eviction method)
so that any exception (from sim_state.request_store.get, validation, or save)
always triggers lock cleanup. Ensure you reference the exact symbols
acquire_request_lock, release_request_lock (or the existing eviction method),
scope_request, approve_request, and reject_request when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/synthorg/api/controllers/requests.py`:
- Around line 183-237: The handler acquires a request lock via
acquire_request_lock(request_id) but returns/raises on NotFoundError or
ConflictError without evicting the registered lock; update scope_request (and
similarly approve_request/reject_request at the other sites) to ensure the lock
is released on all failure paths by wrapping the locked section in try/finally:
after async with app_state.acquire_request_lock(request_id): immediately enter a
try block for the existing logic and in the finally call the lock-release helper
(e.g. app_state.release_request_lock(request_id) or the corresponding eviction
method) so that any exception (from sim_state.request_store.get, validation, or
save) always triggers lock cleanup. Ensure you reference the exact symbols
acquire_request_lock, release_request_lock (or the existing eviction method),
scope_request, approve_request, and reject_request when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: db3fa771-d140-4a14-8d99-5140fa9b4755

📥 Commits

Reviewing files that changed from the base of the PR and between 9d05c14 and c631d0a.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • CLAUDE.md
  • pyproject.toml
  • scripts/mock_spec_baseline.txt
  • scripts/run_affected_tests.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/api/state.py
  • src/synthorg/api/state_services.py
  • src/synthorg/budget/tracker.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/providers/cost_recording.py
  • tests/conftest.py
  • tests/unit/a2a/test_well_known.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/api/test_state.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/observability/conftest.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/unit/settings/test_service.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/tools/sandbox/test_docker_config.py
💤 Files with no reviewable changes (2)
  • tests/unit/a2a/test_well_known.py
  • tests/unit/observability/conftest.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Socket Security: Pull Request Alerts
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Always read the relevant docs/design/ page before implementing any feature or planning any issue. If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; do NOT silently diverge without explicit user approval.

No from __future__ import annotations: Python 3.14 has PEP 649.

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:).

Type hints required on all public functions; mypy strict mode enforced.

Docstrings required on public classes and functions using Google style.

Create new objects instead of mutating existing ones; use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction + MappingProxyType wrapping.

Use frozen Pydantic models (ConfigDict(frozen=True, allow_inf_nan=False)) for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.

Use ConfigDict(frozen=True, allow_inf_nan=False) everywhere on Pydantic models; use extra="forbid" on request DTOs; use @computed_field for derived values; use NotBlankStr from core.types for identifier/name fields.

Prefer asyncio.TaskGroup for fan-out/fan-in async concurrency; wrap independent task bodies in async def helpers that catch Exception (re-raise only MemoryError / RecursionError) so one failure doesn't unwind the group.

Line length: 88 (ruff); functions: < 50 lines; files: < 800 lines.

Handle errors explicitly, never swallow. Domain error families register a base-class entry in EXCEPTION_HANDLERS so subtypes get correct status codes.

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/api.py
  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/test_module_cache_isolation.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/cost_recording.py
  • tests/unit/api/fakes_backend.py
  • src/synthorg/api/controllers/requests.py
  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/api/test_state.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • src/synthorg/budget/tracker.py
  • tests/unit/settings/test_service.py
  • tests/unit/api/controllers/test_departments_health.py
  • src/synthorg/api/state_services.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/conftest.py
  • scripts/run_affected_tests.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatch.

When migrating an entry-point from raw dict[str, Any] to typed Pydantic validation (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security export), call parse_typed() from synthorg.api.boundary with a hardcoded literal boundary label; the helper validates against the boundary's typed model and emits API_BOUNDARY_VALIDATION_FAILED on failure.

Classes that read time or sleep cooperatively must take clock: Clock | None = None defaulting to SystemClock() (synthorg.core.clock); tests inject FakeClock. Patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam.

Async start() / stop() services must own a dedicated self._lifecycle_lock; timed-out stops must mark the service unrestartable.

Wrap attacker-controllable strings at LLM call sites via wrap_untrusted() from synthorg.engine.prompt_safety; append untrusted_content_directive(tags) to the enclosing system prompt (SEC-1 untrusted-content fences).

Never call lxml.html.fromstring on attacker input; use HTMLParseGuard (synthorg.tools.html_parse_guard) (SEC-1 HTML parsing).

Error classes in domain modules must use <Domain><Condition>Error and inherit from DomainError (or a domain-scoped intermediate that itself inherits DomainError). Bare Exception / RuntimeError at domain boundaries is forbidden.

Every Pydantic model must be ConfigDict(frozen=True, ...) unless documented otherwise; mutations go through model_copy(update=...), never direct attribute assignment. Runtime-state models that must mutate (rare) must document the deviation in a module-level comment.

Validate at system boundaries (user input, external APIs, config files).

Every business-logic module must have `from synthorg.observability i...

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/cost_recording.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/budget/tracker.py
  • src/synthorg/api/state_services.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/api.py
  • src/synthorg/api/state.py
  • src/synthorg/providers/cost_recording.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/budget/tracker.py
  • src/synthorg/api/state_services.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every Mock() / AsyncMock() / MagicMock() in tests/ MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate blocks new bare-call sites. Pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate via uv run python scripts/check_mock_spec.py --update (mock-spec gate #1604).

Use mock_dispatcher (an AsyncMock(spec=NotificationDispatcher) covering register / start / aclose / dispatch) from tests/conftest.py instead of building the spec inline.

For time-driven tests, import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0). For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam.

Coverage minimum: 80% (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/ in coverage runs).

Use asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed.

Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).

Parallelism: pytest-xdist via -n 8, distribution --dist=loadfile (default). ALWAYS include -n 8 when running pytest locally, never run tests sequentially. loadfile keeps every test in a file pinned to the same xdist worker.

Never use monkeypatch.setattr(module.logger, "info", spy) (logger spying antipattern). The BoundLoggerLazyProxy serves log methods via __getattr__ and caches stale bound methods. Use a context manager that wraps direct setattr + try/finally del proxy.<level> instead (canonical pattern in tests/unit/settings/test_service.py).

Prefer @pytest.mark.parametrize for testing similar cases.

Use ge...

Files:

  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/api/test_state.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/settings/test_service.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/conftest.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/telemetry/test_event_counter_race.py
  • tests/unit/api/test_module_cache_isolation.py
  • tests/unit/api/fakes_backend.py
  • tests/unit/tools/sandbox/test_docker_config.py
  • tests/unit/hr/performance/test_llm_judge_quality_strategy.py
  • tests/unit/api/test_state.py
  • tests/unit/providers/management/test_probe_cost_recording.py
  • tests/unit/settings/test_service.py
  • tests/unit/api/controllers/test_departments_health.py
  • tests/unit/api/controllers/test_memory_admin.py
  • tests/unit/providers/test_cost_recording_chokepoint.py
  • tests/conftest.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed api.ws_frame_timeout_seconds (default 30s) without sending a frame. Wraps socket.receive_text() in asyncio.wait_for(...).

WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5) instead of a reset-on-success streak counter. Once the window saturates, the socket closes with code 4011.

Files:

  • src/synthorg/api/state.py
  • src/synthorg/api/controllers/requests.py
  • src/synthorg/api/state_services.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts (MANDATORY Planning rule).
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: At every phase of planning and implementation, be critical and actively look for ways to improve the design in the spirit of robustness, correctness, simplicity, and future-proofing. Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first (MANDATORY Planning rule).
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing (Post-Implementation MANDATORY rule).
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: NEVER create a PR directly (gh pr create is blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents. After the PR exists, use `/aurelio-review-pr` for external reviewer feedback. Fix everything valid, never skip (Pre-PR Review MANDATORY rule).
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Git commits: `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits required on `main` and all other refs via branch protection. Exception: GitHub App-signed commits from `synthorg-repo-bot` via Git Data API also satisfy `required_signatures`.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Branches: `<type>/<slug>` from main.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Never use `cd` in Bash commands; the working directory is already set to the project root. Use absolute paths or run commands directly. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in a child process). Use this for tools without a `-C` flag, e.g. `bash -c "cd web && npm install"`.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:18:35.257Z
Learning: Never use Bash to write or modify files; use the Write or Edit tools. Do not use `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)"`', or `tee` to create/modify files (read-only/inspection uses like piping to stdout are fine). This applies to all files and all subagents.
🪛 LanguageTool
CLAUDE.md

[style] ~213-~213: Consider using “who” when you are referring to a person instead of an object.
Context: ...r__is shadowed. Use a context manager that wraps directsetattr+try/finally d...

(THAT_WHO)

🔇 Additional comments (2)
tests/unit/api/controllers/test_memory_admin.py (1)

231-265: Good fix for the LazyProxy logger-spy flake.

This direct setattr + finally: del proxy.warning pattern avoids caching a stale bound method on the proxy and keeps later capture_logs() assertions isolated. Based on learnings, the repo now treats monkeypatch.setattr(module.logger, "<level>", spy) on a BoundLoggerLazyProxy as a test-isolation antipattern.

pyproject.toml (1)

352-362: Add routing infrastructure before applying @pytest.mark.serial to any tests.

Line 360 declares the serial marker with a promise to route tests to a no-xdist suite, but no such routing exists in conftest.py, CI workflows, or test runner scripts. Currently, no tests use the marker, so this is not blocking. However, before any test is marked @pytest.mark.serial, you must implement one of the following:

  • A conftest fixture or pytest hook that deselects serial-marked tests from the default pytest tests/ run
  • A separate CI job or runner script that explicitly excludes -n (xdist) for serial tests
  • A wrapper that splits serial and non-serial test suites into separate invocations

Document the chosen approach so developers know how to correctly apply the marker.

@Aureliolo Aureliolo merged commit 8d258dd into main May 3, 2026
78 checks passed
@Aureliolo Aureliolo deleted the fix/test-isolation-hunt branch May 3, 2026 08:33
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 08:33 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 3, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Frontend and UX polishing improves user interface responsiveness and
visual consistency.
- API hygiene and validation enhancements provide smoother and more
reliable interactions.

### What's new
- Introduced typed-boundary helpers enabling better type safety and
parse_typed workflows.
- Added codebase-audit skill prompt tuning for improved project
auditing.

### Under the hood
- Eliminated flaky tests caused by module-level state for more stable
test outcomes.
- Unified image tag management under CLI and Renovate for consistent
dependency updates.
- Added cross-PR file-overlap analysis to the review dependency pull
request skill.
- Updated multiple dependencies including Python, Web, CLI, and
container libraries.
- Improved CI tooling and lock file maintenance for better build
reliability.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.8](v0.7.7...v0.7.8)
(2026-05-03)


### Features

* **api:** typed-boundary helper + codebase-audit skill prompt tuning
([#1712](#1712))
([40ee65b](40ee65b))
* **boundary:** RFC
[#1711](#1711) Phases 2 + 3
— typed boundaries via parse_typed
([#1720](#1720))
([7b9f409](7b9f409))


### Bug Fixes

* **api:** audit cleanup B -- API hygiene & validation
([#1719](#1719))
([3d790d9](3d790d9))
* audit cleanup C - persistence, concurrency & data integrity
([#1708](#1708))
([#1717](#1717))
([bcce097](bcce097))
* **test:** exterminate xdist-flaky tests with module-level state
([#1713](#1713))
([#1721](#1721))
([8d258dd](8d258dd))
* **web:** audit cleanup E -- frontend & UX polish
([#1710](#1710))
([#1718](#1718))
([3a3591a](3a3591a))


### Refactoring

* **cli:** single source of truth for DHI image tags + Renovate manager
([#1723](#1723))
([57980a2](57980a2))


### Documentation

* audit cleanup D -- public-facing & docs sync
([#1709](#1709))
([#1715](#1715))
([ade03b7](ade03b7))


### Tests

* **engine:** make TestDrainTimeout deterministic + preserve subclass
type in [@Ontology](https://github.com/ontology)_entity
([#1729](#1729))
([b00fb05](b00fb05))


### CI/CD

* Update CI tool dependencies
([#1703](#1703))
([355a9ff](355a9ff))


### Maintenance

* add cross-PR file-overlap analysis to review-dep-pr skill
([#1722](#1722))
([3861d8a](3861d8a))
* **ci:** unify apko-version under workflow env so Renovate manages it
everywhere ([#1724](#1724))
([9c0a7fd](9c0a7fd))
* consolidate DHI image-pin custom regex managers
([#1726](#1726))
([b8b0cba](b8b0cba))
* **deps:** update dependency chainguard-dev/melange to v0.50.4
([#1701](#1701))
([8cbf83a](8cbf83a))
* Lock file maintenance
([#1705](#1705))
([414cfea](414cfea))
* Lock file maintenance
([#1727](#1727))
([5cb1212](5cb1212))
* Update CLI dependencies
([#1702](#1702))
([9fb57b9](9fb57b9))
* Update Container dependencies
([#1698](#1698))
([6d24fd6](6d24fd6))
* Update dependency @eslint-react/eslint-plugin to v5
([#1704](#1704))
([1cb1294](1cb1294))
* Update Python dependencies
([#1699](#1699))
([8e7af3a](8e7af3a))
* Update Python dependencies to v4.15.0
([#1725](#1725))
([69164c8](69164c8))
* Update Web dependencies
([#1700](#1700))
([715300d](715300d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test isolation hunt: exterminate xdist-flaky tests with module-level state

1 participant