test: replace test placeholders with real subsystem wiring by Aureliolo · Pull Request #1845 · Aureliolo/synthorg

Aureliolo · 2026-05-09T19:50:50Z

Closes #1837.

Two tests carried pytest.skip / model_copy workarounds with TODO comments. Both now exercise their real subsystems.

What changed

Sub-task A — real LLM provider e2e (`tests/e2e/test_single_agent_e2e.py`)

TestRealLLMIntegration.test_real_provider_text_completion no longer unconditionally calls pytest.skip("Real LLM provider integration not yet wired"). It now constructs a real LiteLLMDriver with a minimal ProviderConfig and runs AgentEngine.run end-to-end against the configured model.

Env-var gates (skip if any missing, CI-safe by default):

Var	Purpose
`REAL_LLM_TEST=1`	Class-level master switch (preserved)
`REAL_LLM_MODEL`	Model id (e.g. `llama3.2:1b`, `gpt-5-mini`)
`REAL_LLM_PROVIDER`	LiteLLM routing key (`openai`, `anthropic`, `ollama`, …)
`REAL_LLM_API_KEY`	Provider API key — required for hosted providers
`REAL_LLM_BASE_URL`	Endpoint URL — required for local providers (Ollama, LM Studio, vLLM)

At least one of REAL_LLM_API_KEY or REAL_LLM_BASE_URL must be set; both unset still skips cleanly. Empty-string env vars are normalised to None so ProviderConfig.NotBlankStr validation accepts them.

Sub-task B — ApprovalStore wiring in meta-cycle test (`tests/integration/meta/test_meta_cycle.py`)

test_proposal_rollout_succeeds no longer mutates the proposal via proposal.model_copy(update={"status": APPROVED, ...}) in isolation. The approval round-trips through the real ApprovalStore:

run_cycle triggers ApprovalGateGuard.evaluate, which registers an ApprovalItem with deterministic id uuid5(NAMESPACE_URL, "proposal:<id>").
The test fetches that item via approval_store.get(approval_id) and asserts is not None + status == PENDING.
The test approves via approval_store.save_if_pending(...) — same first-writer-wins path as synthorg.api.controllers.approvals._decide and synthorg.meta.mcp.handlers.approvals._decide.
The proposal handed to execute_rollout carries decision metadata (decided_at / decided_by / decision_reason) sourced from the store-resident ApprovalItem, not from a free-standing test mutation.

A regression in the guard (different id derivation, failure to register, store concurrency bug) now surfaces as a failed assert item is not None rather than a silently-bypassed flow. Uses in-memory ApprovalStore() (no persistence wiring) — persistence parity is covered by tests/conformance/persistence/.

Test plan

uv run ruff check src/ tests/ --fix — clean
uv run ruff format src/ tests/ — clean (3689 unchanged)
uv run mypy src/ tests/ — Success: no issues found in 3689 source files
uv run python -m pytest tests/integration/meta/ tests/e2e/ tests/unit/api/ -n 0 — 3253 passed, 4 skipped (one xdist scheduler crash on the full unit suite was unrelated worker churn — affected-module pre-push gate stayed clean)
Real LLM smoke test ran end-to-end against a local Ollama with llama3.2:1b (REAL_LLM_TEST=1, REAL_LLM_PROVIDER=ollama, REAL_LLM_BASE_URL=http://localhost:11434, no API key) — passed in 31s on the first run, 10s on the warm re-run after a docstring tweak.
CI-safe skip path verified: with all env vars unset and just REAL_LLM_TEST=1 set, the test skips with the precise method-level reason instead of running.

Review coverage

Code-reviewer pass on the initial two commits returned no findings.
Code-reviewer pass on the local-provider extension (commit 3) flagged a misleading docstring claim ("API key wins") — fixed in commit 4 with an accurate description (api_key for auth, base_url as endpoint, both forwarded together).

Commits

137ff3f — wire real LiteLLMDriver in single-agent e2e test
5694937 — route meta-cycle proposal approval through real ApprovalStore
8bcd1b5 — support local LLM providers (api_key optional, REAL_LLM_BASE_URL)
ca6ab9a — clarify auth-config docstring + comment in real-LLM test

Make REAL_LLM_API_KEY optional so the test can target local providers (Ollama, LM Studio, vLLM) reached via REAL_LLM_BASE_URL. At least one of the two must be set; both unset still skips cleanly with a documented reason.

github-actions · 2026-05-09T19:51:03Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-05-09T19:51:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 776c5367-f059-4021-b102-289c13441d94

📥 Commits

Reviewing files that changed from the base of the PR and between ca6ab9a and e64d1df.

📒 Files selected for processing (2)

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)

🧰 Additional context used

📓 Path-based instructions (2)

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

tests/integration/meta/test_meta_cycle.py
tests/e2e/test_single_agent_e2e.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/integration/meta/test_meta_cycle.py
tests/e2e/test_single_agent_e2e.py

{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

tests/integration/meta/test_meta_cycle.py
tests/e2e/test_single_agent_e2e.py

🧠 Learnings (1)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

tests/integration/meta/test_meta_cycle.py
tests/e2e/test_single_agent_e2e.py

🔇 Additional comments (6)

tests/e2e/test_single_agent_e2e.py (3)

17-22: LGTM!

Import additions are appropriate for constructing the real provider configuration and driver.

386-392: LGTM!

Docstring correctly uses vendor-agnostic phrasing ("configured provider driver", "local / self-hosted providers"), addressing the prior review feedback.

395-461: LGTM!

The test implementation correctly:

Normalizes empty-string env vars to None (lines 412-413).

Requires at least one of api_key or base_url (lines 414-418).

Constructs a minimal ProviderConfig and LiteLLMDriver.

Uses defensive assertions (>= 0 for cost to accommodate zero-cost local providers).

Ties cost record count to actual turns completed.

The vendor-agnostic wording in the docstring and skip messages is compliant with coding guidelines.

tests/integration/meta/test_meta_cycle.py (3)

7-8: LGTM!

Import additions support deterministic approval ID derivation (uuid5, NAMESPACE_URL) and typed approval handling (ApprovalStatus, NotBlankStr).

Also applies to: 12-13

164-195: LGTM!

The approval flow correctly:

Uses clock.now() from the shared FakeClock for deterministic timestamps.

Exercises save_if_pending (the same first-writer-wins path as API/MCP handlers).

Sources decision metadata from the store-resident ApprovalItem for the proposal.

Clearly documents why the proposal model_copy is mechanical (no adapter exists).

This ensures regressions in the store's concurrency model or approval registration surface as explicit test failures.

150-156: The uuid5 derivation pattern is correct and matches the guard implementation.

The test's uuid5(NAMESPACE_URL, f"proposal:{proposal.id}") is equivalent to the guard's uuid5(NAMESPACE_URL, f"proposal:{str(proposal.id)}") (guard line 101 converts proposal.id to string). Both produce identical hashes, so the test will catch any regression in the guard's approval ID generation as intended.

Walkthrough

Two integration test placeholders are converted to real subsystem flows. The e2e test's real LLM provider integration now reads provider configuration from environment variables, instantiates LiteLLMDriver, runs AgentEngine, and validates cost tracking and termination behavior. The meta-cycle test's approval flow is rewired to use the real ApprovalStore: it computes a deterministic approval id, fetches the pending approval item, approves it via save_if_pending, and sources the decision metadata from the stored approval before executing the rollout, eliminating the prior direct proposal mutation approach.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary change: replacing test placeholders with real subsystem wiring, which matches the changeset's core objective.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the two sub-tasks, environment variables, test execution results, and commit structure.
Linked Issues check	✅ Passed	All objectives from issue `#1837` are met: the real-LLM provider e2e test now exercises the actual provider integration path with environment-variable gating [`#1837`], and the meta-cycle test routes approval through the real ApprovalStore instead of isolated model_copy mutation [`#1837`].
Out of Scope Changes check	✅ Passed	All changes are scoped to the two target test files and directly address the removal of placeholders and TODO comments specified in issue `#1837`; no extraneous changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 40.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements end-to-end smoke tests for real LLM providers using LiteLLMDriver and enhances integration tests for the SelfImprovementService by integrating a real ApprovalStore workflow. Feedback focuses on improving test consistency by sharing a FakeClock instance between components to ensure synchronized time-dependent logic and timestamps.

gemini-code-assist · 2026-05-09T19:52:24Z

+        approval_store = ApprovalStore()
        svc = SelfImprovementService(
            config=SelfImprovementConfig(
                enabled=True,
                config_tuning_enabled=True,
            ),
            clock=FakeClock(),
            snapshot_builder=snapshot_builder,
-            approval_store=ApprovalStore(),
+            approval_store=approval_store,
        )


The ApprovalStore and SelfImprovementService are initialized with different clock sources. While SelfImprovementService is explicitly given a FakeClock, ApprovalStore defaults to SystemClock. For integration tests using a fake clock, it is safer and more consistent to share the same clock instance across all components that support it. This ensures that any time-dependent logic (like expiration checks in the store) remains synchronized with the test's simulated time.

Suggested change

approval_store = ApprovalStore()

svc = SelfImprovementService(

config=SelfImprovementConfig(

enabled=True,

config_tuning_enabled=True,

),

clock=FakeClock(),

snapshot_builder=snapshot_builder,

approval_store=ApprovalStore(),

approval_store=approval_store,

)

clock = FakeClock()

approval_store = ApprovalStore(clock=clock)

svc = SelfImprovementService(

config=SelfImprovementConfig(

enabled=True,

config_tuning_enabled=True,

),

clock=clock,

snapshot_builder=snapshot_builder,

approval_store=approval_store,

)

gemini-code-assist · 2026-05-09T19:52:24Z

+        # first-writer-wins path the API and MCP approval handlers
+        # take, so a regression in the store's concurrency model also
+        # surfaces here.
+        decided_at = datetime.now(UTC)


Since the test is being updated to use a shared clock variable (per the suggestion for lines 131-140), the decision timestamp should also be sourced from that clock to maintain consistency within the test scenario.

Suggested change

decided_at = datetime.now(UTC)

decided_at = clock.now()

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/test_single_agent_e2e.py`:
- Around line 386-392: The test docstrings/messages currently use vendor-branded
names ("LiteLLM", "Ollama", "LM Studio", "vLLM"); replace those with
vendor-agnostic phrases such as "configured provider driver", "local provider
endpoint", or "local/self-hosted provider" in the text that explains
REAL_LLM_MODEL, REAL_LLM_PROVIDER, REAL_LLM_API_KEY, and REAL_LLM_BASE_URL and
in the other occurrences near those same docstrings (the nearby strings
referenced in the comment). Ensure the meaning stays the same (auth via api_key
and endpoint via base_url) while removing specific vendor names.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e660204-71a0-4911-8075-1b3897b9daa9

📥 Commits

Reviewing files that changed from the base of the PR and between 099b871 and ca6ab9a.

📒 Files selected for processing (2)

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Type Check
GitHub Check: Test (Python 3.14)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (go)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (2)

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}. Async tests use auto. Global timeout 30s. Coverage 80% minimum.
xdist: use -n 8 --dist=loadfile (auto-applied via pyproject addopts; loadfile prevents 3.14+ Windows ProactorEventLoop leak).
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back.
Mock-spec: every Mock declares spec=ConcreteClass; baseline at scripts/mock_spec_baseline.txt.
FakeClock: import from tests._shared.fake_clock; inject via clock= parameter.
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail for flaky tests; fix fundamentally.
Use asyncio.Event().wait() instead of sleep(large) to avoid flaky tests.

Files:

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic naming: NEVER use real vendor names in project code/tests. Use example-provider, test-provider, example-{large,medium,small}-001. Allowed only in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/.

Files:

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

🧠 Learnings (1)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

tests/e2e/test_single_agent_e2e.py
tests/integration/meta/test_meta_cycle.py

🔇 Additional comments (1)

tests/integration/meta/test_meta_cycle.py (1)

131-195: Nice integration hardening of the approval path.

This now validates the real ApprovalStore lifecycle (register → fetch pending → save_if_pending → rollout with persisted decision metadata), which gives strong regression signal for guard/store coupling.

codecov · 2026-05-09T20:00:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.78%. Comparing base (099b871) to head (e64d1df).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1845      +/-   ##
==========================================
- Coverage   84.78%   84.78%   -0.01%     
==========================================
  Files        1800     1800              
  Lines      104864   104864              
  Branches     9202     9202              
==========================================
- Hits        88910    88904       -6     
- Misses      13716    13722       +6     
  Partials     2238     2238

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- tests/integration/meta/test_meta_cycle.py: share a single FakeClock between ApprovalStore and SelfImprovementService and source decided_at from clock.now() so time-dependent paths in the store (expiration, decision timestamps) stay in lockstep with the service's simulated time (Gemini #3213807073, #3213807075). - tests/e2e/test_single_agent_e2e.py: drop vendor-branded names (LiteLLM, Ollama, LM Studio, vLLM) from docstrings and skip messages in TestRealLLMIntegration; use vendor-agnostic phrasing (configured provider driver, local / self-hosted providers, provider routing key) per CLAUDE.md vendor-agnostic naming (CodeRabbit #3213808360). The LiteLLMDriver class identifier is retained: it is the project-internal wrapper around the third-party litellm package, which CLAUDE.md exempts under third-party imports.

## Summary `scripts/check_setting_to_startup_trace.py` froze 8 ghost-wired settings: 7 × `backup.*` and `security.timeout_check_interval_seconds`. Both owning services were either factory-gated to `None` in default config (`BackupService`) or bootstrapped with a hardcoded interval and never re-read at runtime (`ApprovalTimeoutScheduler`). The registered settings had no live consumer at boot. This branch wires both services through the canonical DB > env > YAML > default chain and regenerates the baseline to header-only. ## Changes **Phase 1 — `BackupService`** - Drop the `if not backup_config.enabled: return None` early return in `build_backup_service`. The factory always returns a real `BackupService`. `BackupService.start()` already gates internally on `self._config.enabled`, so a deployment with `backup.enabled=false` still avoids scheduler ticks. Adds an INFO log on the dormant path so operators can confirm the deliberate state from the boot log. **Phase 2 — `ApprovalTimeoutScheduler`** - New `_apply_security_timeout_interval` helper in `lifecycle_helpers.py`, called after `_apply_bridge_config`, resolves `security.timeout_check_interval_seconds` via `ConfigResolver` and calls `scheduler.reschedule(...)`. - New `SecurityTimeoutSettingsSubscriber` (mirrors `BackupSettingsSubscriber._reschedule`) handles live operator changes via the dispatcher. - `_build_settings_dispatcher` + `BuildDispatcherFn` Protocol + `auto_wire_settings` extended with the new optional `approval_timeout_scheduler` parameter; scheduler construction in `app.py` moved earlier so the dispatcher can wire the matching subscriber against the same instance. **Phase 3 — Baseline** - `scripts/setting_to_startup_trace_baseline.txt` regenerated to header-only (`uv run python scripts/check_setting_to_startup_trace.py --update-baseline`). - Smoke test `test_real_repo_violations_match_expected` updated to assert empty-violation set as the new ground truth. - `scripts/loop_bound_init_baseline.txt` line 74 → 75 to track the shifted import in `backup/service.py`. **Pre-PR review fixes (commit 4)** - Drop now-stale ghost-service examples in `docs/reference/configuration-precedence.md`; replace with a generic "fixing a ghost-wired service" guide referencing both as end-to-end examples. - Rewrite two app.py comments to explain the current architectural reason instead of framing the change relative to the prior site (CLAUDE.md no-migration-framing). - Bind exception in `build_backup_service` failure path; log via `safe_error_description` + `error_type` per SEC-1 redaction. - Bind float-parse failure in `SecurityTimeoutSettingsSubscriber`; add `error_type` so operators can distinguish a non-numeric value from a None value. - Document `BACKUP_SCHEDULER_STARTED` as the started-branch's observable signal in `BackupService.start`'s docstring. ## Test Plan - 16 new tests across 4 files: factory (3), helper (5), subscriber (8); plus updated lint smoke test. - Full unit suite: 29,463 passed (50 skipped Windows / opt-in markers). - Focused integration: 55 passed (3 skipped — Docker unavailable). - Gates clean: `check_setting_to_startup_trace.py`, `check_no_loop_bound_init.py`, `check_logger_exception_str_exc.py`, `check_no_magic_numbers.py`, `check_mock_spec.py`. - ruff + mypy strict + ruff format: clean. ## Review coverage `/pre-pr-review` ran 17 specialized agents in parallel (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, security-reviewer, test-quality-reviewer, async-concurrency-reviewer, comment-analyzer, docs-consistency, comment-quality-rot, api-contract-drift, plus 5 mini-pass agents from the codebase audit: missing-logger, missing-event-constants, missing-state-transition-log, unwired-settings, race-conditions). 6 valid findings retained and fixed; 1 false-positive cluster (PEP 758 `except A, B:` flagged as Python 2 syntax) discarded with rationale (CLAUDE.md explicitly permits this form). ## Notes - No PR-tracked GitHub issue (per /pre-pr-review prompt: user chose to proceed without). - Rebased onto `origin/main` to pull in `#1845` (unrelated test refactor) before pushing.

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Improved error logging and Prometheus instrumentation provide better system monitoring. - Eliminated race conditions in CI tagging for more reliable development releases. - Fixed critical configuration access and kill-switch bugs to enhance system stability. - Enhanced client experience with retry-after headers and better websocket reconnect behavior. ### What's new - Introduced composite indexes and cursor pagination for faster data queries. - Added server-sent events rate limiting and Ollama input sanitization for improved security. ### Under the hood - Centralized workflow error mappings to standardize error handling. - Refactored API lifecycle fallback to use a configuration snapshot for consistency. - Tightened startup settings baseline and reduced controller error baseline to zero. - Replaced flaky contributor-assistant GitHub action with a custom stable step. - Consolidated Renovate dependency groups to avoid update conflicts. - Upgraded in-toto-golang dependency to fix security vulnerabilities and dropped unnecessary CVE waivers. - Extensive lock file maintenance and multiple infrastructure and Python dependency updates.  :robot: I have created a release *beep* *boop* --- ## [0.8.2](v0.8.1...v0.8.2) (2026-05-10) ### Features * close audit gaps in error logging and Prometheus instrumentation ([#1821](#1821)) ([ef00fdc](ef00fdc)) ### Bug Fixes * **ci:** eliminate dev-release tag-vs-downstream race + CI hygiene audit ([#1827](#1827)) ([b7b9a59](b7b9a59)) * **config:** close 6 settings reachability + kill-switch gaps ([#1798](#1798)) ([410cb3b](410cb3b)) * correctness / safety fixes from 2026-05-05 audit (Wave 28) ([#1823](#1823)) ([d01e624](d01e624)) ### Performance * composite indexes + cursor pagination + clock seam + SSE rate-limit + Ollama sanitization + retry-after web client + WS reconnect jitter ([#1822](#1822)) ([d1faf86](d1faf86)) ### Refactoring * **api:** move activities lifecycle-cap fallback to ApiBridgeConfig snapshot ([#1840](#1840)) ([7a56e9c](7a56e9c)) * centralise workflow error mapping and shared error codes ([#1778](#1778) sub-tasks A + E) ([#1843](#1843)) ([11132cd](11132cd)) * drive controller-error baseline to zero ([#1778](#1778) sub-task A tail) ([#1846](#1846)) ([e96ae20](e96ae20)) * slim CLAUDE.md, port pr-review-toolkit agents, sync .opencode parity ([#1833](#1833)) ([e6372b8](e6372b8)) * tighten settings → startup-trace baseline (8 → 0) ([#1847](#1847)) ([3376ee2](3376ee2)) ### Documentation * fix CLAUDE.md inaccuracies and drop drift-prone counts ([#1844](#1844)) ([371925f](371925f)) ### Tests * replace test placeholders with real subsystem wiring ([#1845](#1845)) ([ddbb666](ddbb666)) ### CI/CD * **cla:** replace flaky contributor-assistant action with custom read-path step ([#1819](#1819)) ([11aeafe](11aeafe)) * tidy dev-release notes + stagger renovate lockfile day ([#1824](#1824)) ([ec746a9](ec746a9)) ### Maintenance * cleanup roundup, sub-tasks a/c/d/g/h/j/l/m of [#1781](#1781) ([#1838](#1838)) ([099b871](099b871)) * close remaining 5 sub-tasks of [#1781](#1781) (b/e/f/i/k) ([#1852](#1852)) ([59cf0b2](59cf0b2)) * collapse Renovate dep groups into Python / Web / Infrastructure to remove cross-PR overlap ([#1813](#1813)) ([4cbd857](4cbd857)) * **deps,security:** bump in-toto-golang v0.11.0 + drop two patched CVE waivers ([#1851](#1851)) ([0b8b5bb](0b8b5bb)) * disable Renovate vulnerabilityAlerts so security flows into normal updates ([#1834](#1834)) ([6b7d15f](6b7d15f)) * Lock file maintenance ([#1820](#1820)) ([ccbad73](ccbad73)) * Lock file maintenance ([#1842](#1842)) ([13b68a5](13b68a5)) * Lock file maintenance ([#1853](#1853)) ([db6650b](db6650b)) * Update dhi.io/nats:2.14-debian13 Docker digest to eb768bf ([#1841](#1841)) ([37f84fc](37f84fc)) * Update Infrastructure dependencies ([#1815](#1815)) ([75b12fe](75b12fe)) * Update Infrastructure dependencies ([#1831](#1831)) ([3f3c50b](3f3c50b)) * Update Python dependencies ([#1817](#1817)) ([e11332f](e11332f)) * Update Python dependencies ([#1832](#1832)) ([4515c8e](4515c8e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 4 commits May 9, 2026 21:05

test: wire real LiteLLM provider in single-agent e2e test

137ff3f

test: route meta-cycle proposal approval through real ApprovalStore

5694937

test: support local LLM providers in real-LLM e2e test

8bcd1b5

Make REAL_LLM_API_KEY optional so the test can target local providers (Ollama, LM Studio, vLLM) reached via REAL_LLM_BASE_URL. At least one of the two must be set; both unset still skips cleanly with a documented reason.

test: clarify auth-config docstring + comment in real-LLM test

ca6ab9a

Aureliolo temporarily deployed to atlas May 9, 2026 19:51 — with GitHub Actions Inactive

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

coderabbitai Bot requested changes May 9, 2026

View reviewed changes

Comment thread tests/e2e/test_single_agent_e2e.py Outdated

Aureliolo temporarily deployed to atlas May 9, 2026 20:05 — with GitHub Actions Inactive

coderabbitai Bot approved these changes May 9, 2026

View reviewed changes

Aureliolo merged commit ddbb666 into main May 9, 2026
43 checks passed

Aureliolo deleted the chore/issue-1837-test-placeholders branch May 9, 2026 20:23

synthorg-repo-bot Bot mentioned this pull request May 9, 2026

chore(main): release 0.8.2 #1814

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: replace test placeholders with real subsystem wiring#1845

test: replace test placeholders with real subsystem wiring#1845
Aureliolo merged 5 commits into
mainfrom
chore/issue-1837-test-placeholders

Aureliolo commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

gemini-code-assist Bot May 9, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented May 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aureliolo commented May 9, 2026

What changed

Sub-task A — real LLM provider e2e (tests/e2e/test_single_agent_e2e.py)

Sub-task B — ApprovalStore wiring in meta-cycle test (tests/integration/meta/test_meta_cycle.py)

Test plan

Review coverage

Commits

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sub-task A — real LLM provider e2e (`tests/e2e/test_single_agent_e2e.py`)

Sub-task B — ApprovalStore wiring in meta-cycle test (`tests/integration/meta/test_meta_cycle.py`)

github-actions Bot commented May 9, 2026 •

edited

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading

codecov Bot commented May 9, 2026 •

edited

Loading