Skip to content

feat: cost as a first-class dial (forecast gate, hard ceiling, Pareto)#2029

Merged
Aureliolo merged 30 commits into
mainfrom
feat/1982-cost-dial
May 21, 2026
Merged

feat: cost as a first-class dial (forecast gate, hard ceiling, Pareto)#2029
Aureliolo merged 30 commits into
mainfrom
feat/1982-cost-dial

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Makes cost a prospective, operator-facing control rather than a passive ledger (child of EPIC #1979).

  • Pre-flight forecast gateCostForecaster (hybrid static-prior + Bayesian-shrinkage estimate with an uncertainty band) feeds a ForecastGate at the work-entry seam. When forecast_required is set, a brief cannot dispatch without an operator-approved Forecast row; missing/pending yields a fresh pending row and a 402. New cost_forecasts table (dual-backend, conformance-tested) with a pending → approved | rejected | superseded state machine and a partial-unique-pending index.
  • Hard real-money kill-switch — the in-loop BudgetChecker raises RunHardCeilingExceededError (subclass of BudgetExhaustedError) the moment accumulated cost meets the per-task hard_ceiling (global run_hard_ceiling fallback). The engine routes to TerminationReason.PARKED via ApprovalGate.park_context, stamps a HaltContext onto the forecast row, and the operator raises the ceiling (raise_ceiling, rejected by RunHardCeilingTooLowError if it does not clear accumulated cost) to resume. The approved forecast_id + ceiling_amount are propagated onto the Task in the intake phase so the checker enforces the per-brief ceiling and the resume banner can render.
  • Cost / quality Pareto viewParetoAnalyzer + a contracted BenchmarkScoreProvider (calibrated stub pending Golden-company benchmark (eval spine) #1980) produce a frontier ranked by cost_saving_pct; every point carries a source field the dashboard surfaces so stub data is never mistaken for measured data.
  • DashboardBudgetForecastDialog, CostForecastApprovalCard, HardCeilingHaltedBanner, ParetoSection, and a budgetForecast Zustand store, wired into BudgetPage.

All four cost-dial symbols are ENFORCED in the ghost-wiring manifest.

Test plan

  • Unit: forecaster (cold-start vs warm blend, bounds), hard-ceiling enforcement + closure forecast_id propagation, park/halt-stamp degradation, Pareto frontier, domain-error hierarchy, model invariant validators, forecast-controller logic (503, raise-ceiling 422 + resume).
  • Conformance: cost_forecasts dual-backend (SQLite + Postgres) CRUD, state machine, same-currency invariant, halt_context round-trip, count().
  • Integration: tests/integration/test_cost_dial_e2e.py — full lifecycle (forecast → approve → dispatch → ceiling → PARKED → raise → resume → Pareto), validated locally on both backends.
  • Full mypy, ruff, web type-check/lint/tests, schema-drift, dto-sync, and every convention gate pass on push.

Review coverage

Pre-reviewed via /pre-pr-review (18 agents). 18 valid findings addressed (2 CRITICAL production-integration gaps — forecast_id + approved ceiling not reaching the Task; 5 MAJOR; 9 MEDIUM; 3 LOW); 4 false positives rejected with evidence. Two real e2e test bugs (un-awaited async handler, sync park_context double) caught and fixed.

Closes #1982

Aureliolo added 23 commits May 20, 2026 23:58
- align postgres cost_forecasts schema with sqlite (TEXT for ids/timestamps)
  so the schema-drift parity gate passes without a baseline change
- drop entity-mutation save log from cost_forecast repos (persistence-boundary
  gate forbids audit logs at the repo layer; service-layer log lands in #1982
  Phase 9 when the controllers are wired)
- replace hardcoded "USD" default in _enforcer_helpers with DEFAULT_CURRENCY
- annotate _assignments test helper return type
ForecastBudgetController exposes POST /budget/forecast, GET
/budget/forecasts/{id}, approve/reject/raise_ceiling decisions,
and GET /budget/pareto. AppState carries cost_forecaster,
cost_forecast_repo, pareto_analyzer, benchmark_provider, and
budget_config behind lock-protected swap_* methods mirroring the
existing worker / coordinator pattern; on_startup wires them
once persistence is connected. Three of the four PENDING
manifest lines flip to ENFORCED.
Adds web/src/pages/budget/ParetoSection.tsx + Storybook stories
covering stub / measured / empty / loading / unwired states.
Updates Task and BudgetConfig fixtures across the test + Storybook
suites for the new Task.hard_ceiling / Task.forecast_id fields and
the eight new BudgetConfig.forecast_* settings. Regenerates DTOs.
Extract _try_wire_cost_dial helper so a transient persistence-not-
yet-connected error during shared-app fixture boot does not poison
startup; controllers degrade to 503 when dependencies are absent.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request transforms cost management from a passive ledger into an active, operator-controlled dial. It introduces pre-flight forecasting to gate work entry, a hard ceiling mechanism to halt runaway costs mid-execution, and a Pareto-based analysis tool to help operators optimize cost versus quality. These changes provide granular control over spending and improve operational transparency.

Highlights

  • Pre-flight Forecast Gate: Introduced a ForecastGate that requires operator approval for briefs before dispatch, using a Bayesian-shrinkage estimation model to provide cost forecasts.
  • Hard Real-Money Ceiling: Implemented a per-task hard ceiling that halts execution mid-run if exceeded, allowing operators to raise the ceiling and resume via a new API endpoint.
  • Cost/Quality Pareto Analysis: Added a ParetoAnalyzer to surface cost-saving opportunities by downgrading models, supported by a StubBenchmarkScoreProvider for quality estimation.
  • Dashboard & API Integration: Wired new UI components (BudgetForecastDialog, HardCeilingHaltedBanner, ParetoSection) and API endpoints to manage forecasts and ceiling adjustments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3b87062f-ffb1-4728-aaac-7a109a906e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 40d76a3 and a0c612a.

📒 Files selected for processing (1)
  • tests/unit/engine/pipeline/test_forecast_gate.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Unit
  • GitHub Check: Test E2E
  • GitHub Check: Test Integration
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Comments explain WHY only; no reviewer citations / issue back-refs / migration framing, enforced by check_no_review_origin_in_code.py + check_no_migration_framing.py

No from __future__ import annotations (3.14 has PEP 649); use PEP 758 except for except A, B: without parens unless binding

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock; services own _lifecycle_lock; timed-out stops mark unrestartable

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min

Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py (zero-tolerance)

FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript

Flaky tests: NEVER skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
{src,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic: NEVER use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001; allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
tests/**/*test*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Read `docs/design/` page before implementing; deviations need approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: No region/currency/locale privileged; use metric units; British English per docs/reference/regional-defaults.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Only `src/synthorg/persistence/` may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from `persistence/_generics.py` generic categories per ADR-0001
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Every convention PR ships its enforcement gate per docs/reference/convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Configuration precedence: DB > env > code default via `SettingsService`/`ConfigResolver` (Cat-1) or env > code default (Cat-2, `read_only_post_init`); Cat-3 bootstrap secrets pure env at boot site per docs/reference/configuration-precedence.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Timeout/slow test failures = source-code regression; never edit `tests/baselines/unit_timing.json` or any `scripts/*_baseline.{txt,json}` / `scripts/_*_baseline.py` (PreToolUse-blocked); per-invocation bypass requires explicit user approval via `ALLOW_BASELINE_GROWTH=1 git commit`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review` (`gh pr create` blocked by `scripts/check_no_pr_create.sh`); after PR use `/aurelio-review-pr` for external feedback; fix EVERYTHING valid, no deferring
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: xdist `-n 8 --dist=loadfile` auto-applied via pyproject `addopts` (`loadfile` prevents 3.14+Windows ProactorEventLoop leak)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Pre-commit/pre-push hooks: `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: GitHub queries: use `gh issue list` via Bash, NOT MCP `list_issues`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: After every squash merge → `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:29:46.667Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
🔇 Additional comments (1)
tests/unit/engine/pipeline/test_forecast_gate.py (1)

225-231: LGTM!

Also applies to: 247-252


</review_stack_artifact>

Walkthrough

The PR implements a full “cost dial” system: immutable forecast domain models, a CostForecaster, benchmark protocol and stub, ParetoAnalyzer, persisted Forecast repository implementations (SQLite/Postgres) and migrations, ForecastGate at entry, task stamping of forecast/ceiling, in-loop hard‑ceiling enforcement that raises and can park runs, engine parking/stamp handling, AppState/runtime/controller wiring, Litestar API endpoints for forecasts/pareto, web client types/UI/store/mocks, settings and observability events, and extensive unit/integration/conformance tests.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 20, 2026 22:07 — with GitHub Actions Inactive
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 20, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks


Comparing feat/1982-cost-dial (a0c612a) with main (d2207e9)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/synthorg/engine/pipeline/forecast_gate.py (1)

122-145: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reuse matching PENDING forecasts instead of minting a new pending row.

When a matching existing forecast is PENDING, this path still falls through to _issue_fresh_forecast(). That can trip the unique-pending constraint for the same brief_hash and return a persistence failure instead of a stable approval-required response.

Suggested fix
         if existing is not None and self._forecast_covers_brief(work_item, existing):
             if existing.decision is ForecastDecision.APPROVED:
@@
             if existing.decision is ForecastDecision.REJECTED:
@@
                 raise CostForecastRejectedError(
                     msg,
                     forecast_id=existing.forecast_id,
                     brief_hash=existing.brief_hash,
                 )
+            if existing.decision is ForecastDecision.PENDING:
+                self._log_approval_required(existing)
+                msg = (
+                    "Pre-flight cost forecast required: "
+                    f"estimated {existing.estimated_cost:.4f} {existing.currency} "
+                    "awaiting operator approval"
+                )
+                raise CostForecastApprovalRequiredError(
+                    msg,
+                    forecast_id=existing.forecast_id,
+                    brief_hash=existing.brief_hash,
+                    estimated_cost=existing.estimated_cost,
+                    currency=existing.currency,
+                )
 
         fresh = await self._issue_fresh_forecast(work_item)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/engine/pipeline/forecast_gate.py` around lines 122 - 145,
Existing PENDING forecasts that match the brief are currently ignored and a new
pending row is minted; modify the decision branch after
self._lookup_forecast(...) to handle ForecastDecision.PENDING by reusing the
existing forecast instead of calling _issue_fresh_forecast: when existing is not
None and _forecast_covers_brief(...) and existing.decision is
ForecastDecision.PENDING, call self._log_approval_required(existing) (same
logging used for newly issued pending forecasts) and return/exit the method so
the code does not fall through to _issue_fresh_forecast; preserve the current
APPROVED and REJECTED handling in _forecast_covers_brief path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 122-145: Existing PENDING forecasts that match the brief are
currently ignored and a new pending row is minted; modify the decision branch
after self._lookup_forecast(...) to handle ForecastDecision.PENDING by reusing
the existing forecast instead of calling _issue_fresh_forecast: when existing is
not None and _forecast_covers_brief(...) and existing.decision is
ForecastDecision.PENDING, call self._log_approval_required(existing) (same
logging used for newly issued pending forecasts) and return/exit the method so
the code does not fall through to _issue_fresh_forecast; preserve the current
APPROVED and REJECTED handling in _forecast_covers_brief path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: f6312264-640b-45a8-8f36-0ae671b2712b

📥 Commits

Reviewing files that changed from the base of the PR and between 87c3e34 and 1ebff72.

📒 Files selected for processing (21)
  • src/synthorg/api/controllers/budget_forecast.py
  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/budget/benchmark_stub.py
  • src/synthorg/budget/forecast_models.py
  • src/synthorg/budget/forecaster.py
  • src/synthorg/budget/pareto.py
  • src/synthorg/engine/agent_engine_errors.py
  • src/synthorg/engine/pipeline/entry/boot.py
  • src/synthorg/engine/pipeline/forecast_gate.py
  • src/synthorg/persistence/postgres/cost_forecast_repo.py
  • src/synthorg/persistence/sqlite/cost_forecast_repo.py
  • tests/conformance/persistence/test_cost_forecast_repository.py
  • tests/unit/api/test_budget_forecast_controller.py
  • tests/unit/budget/test_forecast_model_invariants.py
  • tests/unit/budget/test_forecaster.py
  • tests/unit/budget/test_hard_ceiling.py
  • tests/unit/budget/test_pareto.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
  • web/src/api/endpoints/budget.ts
  • web/src/mocks/handlers/budget.ts
  • web/src/stores/tasks.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: CodSpeed Web Pass
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Integration
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env only; no os.environ.get outside startup per docs/reference/configuration-precedence.md

No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py

No from __future__ import annotations in Python 3.14+; use PEP 758 except: except A, B: requires parens when binding

Type hints required on public functions; mypy strict mode; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Error classes must follow Error pattern and inherit from DomainError (never Exception/RuntimeError directly). Enforced by check_domain_error_hierarchy.py

Pydantic v2: use frozen + extra="forbid" on every frozen model; @computed_field for derived fields; NotBlankStr for identifiers. Per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries. Enforced by check_frozen_model_extra_forbid.py

Args models required at every system boundary; use parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py

Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)

Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable

Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML

Use synthorg.observability.get_logger; variable always named 'logger'. Never use 'import logging' or print() in app code

Event names from observability.events. constants; stru...

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/budget/forecast_models.py
  • src/synthorg/budget/benchmark_stub.py
  • src/synthorg/persistence/sqlite/cost_forecast_repo.py
  • src/synthorg/budget/forecaster.py
  • src/synthorg/engine/agent_engine_errors.py
  • src/synthorg/engine/pipeline/forecast_gate.py
  • src/synthorg/engine/pipeline/entry/boot.py
  • src/synthorg/budget/pareto.py
  • src/synthorg/api/controllers/budget_forecast.py
  • src/synthorg/persistence/postgres/cost_forecast_repo.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup construction phase: synchronous service wiring in create_app body; on_startup wires persistence-dependent services. Ordering: agent_registry before auto_wire_meetings; tunnel_provider unconditional

API startup on_startup ordering: SettingsService auto-wire before WorkflowExecutionObserver; OntologyService after persistence.connect() via _wire_ontology_service

Cost-dial services wire via _try_wire_cost_dial after persistence connects (best-effort, idempotent); logs BUDGET_FORECAST_UNAVAILABLE if unavailable

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/api/controllers/budget_forecast.py
src/synthorg/api/controllers/setup/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Setup completion: post_setup_reinit() propagates failures; settings_svc.set("api", "setup_complete", "true") only after clean reinit. Sequence serialized under COMPLETE_LOCK in src/synthorg/api/controllers/setup/agent_helpers.py

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/budget/forecast_models.py
  • src/synthorg/budget/benchmark_stub.py
  • src/synthorg/persistence/sqlite/cost_forecast_repo.py
  • src/synthorg/budget/forecaster.py
  • src/synthorg/engine/agent_engine_errors.py
  • src/synthorg/engine/pipeline/forecast_gate.py
  • src/synthorg/engine/pipeline/entry/boot.py
  • src/synthorg/budget/pareto.py
  • src/synthorg/api/controllers/budget_forecast.py
  • src/synthorg/persistence/postgres/cost_forecast_repo.py
web/src/**/*.{js,jsx,ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{js,jsx,ts,tsx,mts}: Always use createLogger from @/lib/logger; never bare console.warn/console.error/console.debug in application code. Variable name must always be log. Only logger.ts itself may use bare console methods. Use log.debug() (DEV-only, stripped in production), log.warn(), log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go through sanitizeArg
Attacker-controlled fields inside structured objects must be wrapped in sanitizeForLog() before embedding in log calls
Error-code constants (MANDATORY): import ErrorCode and ErrorCategory from @/api/types/errors (re-exported from the generated web/src/api/types/error-codes.gen.ts). Discriminate on ErrorCode.<NAME>, never on raw integer literals.
Use @eslint-react/web-api-no-leaked-fetch to detect fetch() in effects without AbortController cleanup

Files:

  • web/src/api/endpoints/budget.ts
  • web/src/stores/tasks.ts
  • web/src/mocks/handlers/budget.ts
web/src/{api/endpoints,stores}/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via PaginationMeta. Stores must keep nextCursor + hasMore in state (not offset arithmetic) and early-return when !hasMore || !nextCursor. Display counts must come from data.length.

Files:

  • web/src/api/endpoints/budget.ts
  • web/src/stores/tasks.ts
web/src/api/endpoints/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

Health / readiness endpoints (MANDATORY): getLiveness() is always 200 while the process is alive; getReadiness() is 200 healthy / 503 unavailable (binary 'ok' | 'unavailable' outcome, no tri-state). Any new caller must handle the 503 path explicitly.

Files:

  • web/src/api/endpoints/budget.ts
web/src/**/*.{ts,tsx,mts}

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/**/*.{ts,tsx,mts}: Use @typescript-eslint/no-floating-promises to forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use @typescript-eslint/no-misused-promises (with checksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19 async event handlers stay allowed via the attributes: false exemption.

Files:

  • web/src/api/endpoints/budget.ts
  • web/src/stores/tasks.ts
  • web/src/mocks/handlers/budget.ts
web/src/stores/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/stores/**/*.ts: List reads (fetch*) must set error: string | null on the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the global afterEach. The global afterEach in web/src/test-setup.tsx already calls useToastStore.getState().dismissAll(), cancelPendingPersist(), and useThemeStore.getState().teardown().

Files:

  • web/src/stores/tasks.ts
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min. xdist -n 8 --dist=loadfile auto-applied via pyproject addopts

Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back

Test doubles ladder: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py (zero-tolerance)

FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)). Never skip/xfail flaky tests; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/budget/test_hard_ceiling.py
  • tests/unit/api/test_budget_forecast_controller.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
  • tests/unit/budget/test_pareto.py
  • tests/unit/budget/test_forecaster.py
  • tests/unit/budget/test_forecast_model_invariants.py
  • tests/conformance/persistence/test_cost_forecast_repository.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/budget/test_hard_ceiling.py
  • tests/unit/api/test_budget_forecast_controller.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
  • tests/unit/budget/test_pareto.py
  • tests/unit/budget/test_forecaster.py
  • tests/unit/budget/test_forecast_model_invariants.py
  • tests/conformance/persistence/test_cost_forecast_repository.py
src/synthorg/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from generic categories in persistence/_generics.py and stay @runtime_checkable per ADR-0001

Repository CRUD methods: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples

Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed

Files:

  • src/synthorg/persistence/sqlite/cost_forecast_repo.py
  • src/synthorg/persistence/postgres/cost_forecast_repo.py
web/src/mocks/handlers/**/*.ts

📄 CodeRabbit inference engine (web/CLAUDE.md)

web/src/mocks/handlers/**/*.ts: MSW handlers (MANDATORY): web/src/mocks/handlers/ must mirror web/src/api/endpoints/*.ts 1:1 with a default happy-path handler for every exported endpoint. Use onUnhandledRequest: 'error' in test setup; tests override per-case via server.use(...), never vi.mock('@/api/endpoints/*').
Use typed envelope helpers (successFor, paginatedFor, voidSuccess) to keep MSW handlers in lockstep with endpoint return types

Files:

  • web/src/mocks/handlers/budget.ts
tests/conformance/persistence/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres). Enforced by check_dual_backend_test_parity.py

Files:

  • tests/conformance/persistence/test_cost_forecast_repository.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Read design spec page before implementing; deviations require approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: No region/locale privileged; use metric units; British English per docs/reference/regional-defaults.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Every convention PR ships its enforcement gate per docs/reference/convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Timeout/slow test failures indicate source-code regression; never edit tests/baselines/unit_timing.json or scripts/*_baseline.{txt,json} / scripts/_*_baseline.py; both families are PreToolUse-blocked. Per-invocation bypass: ALLOW_BASELINE_GROWTH=1 git commit (requires explicit user approval)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: After issue: branch + commit + push (no auto-PR); use /pre-pr-review; use /aurelio-review-pr for external feedback after PR; fix EVERYTHING valid, no deferring
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Git commits: <type>: <description> (feat/fix/refactor/docs/test/chore/perf/ci) via commitizen. Signed commits required on protected refs. Branches: <type>/<slug> from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: Squash merge; PR body becomes squash commit; trailers (Release-As, Closes `#N`) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: After every squash merge → /post-merge-cleanup
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:16:50.000Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/api/controllers/setup/agent_helpers.py
  • src/synthorg/budget/forecast_models.py
  • tests/unit/budget/test_hard_ceiling.py
  • src/synthorg/budget/benchmark_stub.py
  • tests/unit/api/test_budget_forecast_controller.py
  • src/synthorg/persistence/sqlite/cost_forecast_repo.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
  • tests/unit/budget/test_pareto.py
  • src/synthorg/budget/forecaster.py
  • src/synthorg/engine/agent_engine_errors.py
  • src/synthorg/engine/pipeline/forecast_gate.py
  • src/synthorg/engine/pipeline/entry/boot.py
  • src/synthorg/budget/pareto.py
  • tests/unit/budget/test_forecaster.py
  • tests/unit/budget/test_forecast_model_invariants.py
  • tests/conformance/persistence/test_cost_forecast_repository.py
  • src/synthorg/api/controllers/budget_forecast.py
  • src/synthorg/persistence/postgres/cost_forecast_repo.py
🔇 Additional comments (21)
src/synthorg/persistence/sqlite/cost_forecast_repo.py (1)

480-493: LGTM!

src/synthorg/api/controllers/budget_forecast.py (2)

22-22: LGTM!

Also applies to: 32-32, 139-139, 190-190, 221-221, 250-250


280-285: LGTM!

src/synthorg/api/controllers/setup/agent_helpers.py (1)

170-185: LGTM!

src/synthorg/budget/benchmark_stub.py (1)

87-94: LGTM!

src/synthorg/budget/forecast_models.py (1)

187-213: LGTM!

src/synthorg/budget/forecaster.py (1)

44-47: LGTM!

Also applies to: 148-152, 249-253

src/synthorg/budget/pareto.py (1)

257-264: LGTM!

Also applies to: 289-293

src/synthorg/engine/agent_engine_errors.py (1)

9-9: LGTM!

Also applies to: 297-304

src/synthorg/engine/pipeline/entry/boot.py (1)

26-27: LGTM!

Also applies to: 63-80, 130-131, 193-194, 294-295

src/synthorg/persistence/postgres/cost_forecast_repo.py (1)

445-457: LGTM!

tests/conformance/persistence/test_cost_forecast_repository.py (1)

198-199: LGTM!

Also applies to: 275-280, 382-394

tests/unit/api/test_budget_forecast_controller.py (1)

22-23: LGTM!

Also applies to: 152-167

tests/unit/budget/test_forecast_model_invariants.py (1)

57-79: LGTM!

tests/unit/budget/test_forecaster.py (1)

15-16: LGTM!

Also applies to: 61-63, 79-81, 88-90, 108-109, 132-133, 147-149, 160-162, 182-184, 191-193, 198-200, 231-233

tests/unit/budget/test_hard_ceiling.py (1)

163-168: LGTM!

tests/unit/budget/test_pareto.py (1)

15-15: LGTM!

Also applies to: 57-57, 60-65, 73-73, 92-92, 114-114, 131-131, 155-155, 158-163, 176-176, 179-179

tests/unit/engine/pipeline/test_forecast_gate.py (1)

15-15: LGTM!

Also applies to: 17-17, 26-26, 51-55, 148-148, 203-203, 231-231, 253-280, 285-285

web/src/api/endpoints/budget.ts (1)

78-79: LGTM!

Also applies to: 83-84, 88-88, 91-91, 98-98, 102-102, 109-109, 113-113, 120-120, 124-124

web/src/mocks/handlers/budget.ts (1)

4-5: LGTM!

Also applies to: 8-11, 126-128, 131-131, 134-138, 142-149, 154-161, 166-172

web/src/stores/tasks.ts (1)

322-326: LGTM!

Also applies to: 439-443

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 143-162: The current logic only reuses a PENDING forecast when the
caller provided the matching forecast_id; before calling _issue_fresh_forecast()
you must lookup any existing row with decision == ForecastDecision.PENDING for
the incoming brief_hash regardless of the caller-provided forecast_id, and if
found call _log_approval_required(existing) and raise
CostForecastApprovalRequiredError populated with existing.forecast_id,
existing.brief_hash, existing.estimated_cost and existing.currency so the stable
pending forecast is reused instead of attempting to create a duplicate;
implement this lookup where the current existing/decision check occurs and
branch to raise the existing pending error path before minting a fresh forecast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 71500f71-316a-4509-8204-ba3ac1bb215c

📥 Commits

Reviewing files that changed from the base of the PR and between 1ebff72 and 0178d6f.

📒 Files selected for processing (2)
  • src/synthorg/engine/pipeline/forecast_gate.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Test Integration
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async tests are auto; 30s global timeout; 80% minimum coverage required

Use xdist -n 8 --dist=loadfile (auto-applied via pyproject addopts); on Windows use WindowsSelectorEventLoopPolicy for unit tests; subprocess tests override back

Use FakeClock from tests._shared for Clock seam injection; use mock_of[T](**overrides) for typed-boundary substitutions; use SimpleNamespace for attribute-bags; bare MagicMock at typed boundaries is prohibited (enforced by scripts/check_mock_spec.py)

Hypothesis: 10 deterministic CI examples; test failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests—fix fundamentally

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
!(web/public/provider-logos)/**

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001 (allowed in .claude/, third-party imports, providers/presets.py, web/public/provider-logos/)

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/!(persistence)/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env; no os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics must live in settings/definitions/; allowlist 0, 1, -1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal

Use comments to explain WHY only; no reviewer citations, issue back-refs, or migration framing

Do not use from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 except A, B: syntax without parens unless binding

Type hints required on public functions; mypy strict mode enforced; use Google-style docstrings; maximum line length 88; functions under 50 lines; files under 800 lines

Define domain errors as <Domain><Condition>Error inheriting from DomainError, never from Exception/RuntimeError directly; enforced by check_domain_error_hierarchy.py

Pydantic v2: every frozen model must have extra="forbid"; @computed_field auto-exempt; per-line # lint-allow: frozen-extra-forbid -- <reason> for exceptions; use @computed_field for derived fields; use NotBlankStr for identifiers

Use args models at every system boundary; call parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

Enforce immutability via model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Use asyncio.TaskGroup for async fan-out/fan-in; helpers must catch Exception and re-raise MemoryError/RecursionError

Use Clock seam pattern: clock: Clock | None = None; tests inject FakeClock; lifecycle services own _lifecycle_lock; timed-out stops mark unrestartable

Wrap untrusted content via wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML (SEC-1)

Use from synthorg.observability import get_logger with variable always named logger; never import logging or print() in app code

Use event names from observability.events.<domain> constants; log with structured kwargs (logger.info(EVENT, key=value))

Log error paths with WARNING/ERROR + context before raisi...

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Read design specification from `docs/design/` before implementing; deviations require approval
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Present every plan for accept/deny before coding (MANDATORY Planning gate)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: No region/locale/currency should be privileged; use metric units; use British English
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Every convention PR must ship its enforcement gate
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Commit messages: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced; signed commits required on protected refs
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Branch naming: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Squash merge with PR body becoming squash commit; include trailers (`Release-As`, `Closes `#N``) in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Use `.pre-commit-config.yaml` for pre-commit/pre-push hooks; tool-call gates in `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: After every squash merge → `/post-merge-cleanup` command
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: CLI features go in Docker (init/start/stop/status commands); new features go in dashboard + REST API
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: Use `go -C cli` (never `cd cli`) for Go CLI commands; see `cli/CLAUDE.md` for CLI-specific guidelines
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:34:24.704Z
Learning: See `shell` rules in `~/.claude/rules/common/bash.md` (canonical for `cd` / `git -C` / Bash file-write rules)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py

Comment thread src/synthorg/engine/pipeline/forecast_gate.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 83.89513% with 172 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.95%. Comparing base (d2207e9) to head (a0c612a).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/controllers/budget_forecast.py 66.66% 36 Missing and 5 partials ⚠️
.../synthorg/persistence/sqlite/cost_forecast_repo.py 78.52% 31 Missing and 1 partial ⚠️
...ynthorg/persistence/postgres/cost_forecast_repo.py 81.75% 24 Missing and 1 partial ⚠️
src/synthorg/api/state.py 62.22% 17 Missing ⚠️
src/synthorg/budget/pareto.py 88.23% 6 Missing and 6 partials ⚠️
src/synthorg/api/app.py 67.74% 9 Missing and 1 partial ⚠️
src/synthorg/engine/agent_engine_errors.py 78.72% 8 Missing and 2 partials ⚠️
src/synthorg/budget/forecaster.py 92.92% 5 Missing and 3 partials ⚠️
src/synthorg/engine/pipeline/service.py 40.00% 3 Missing and 3 partials ⚠️
src/synthorg/budget/benchmark_stub.py 84.00% 2 Missing and 2 partials ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2029      +/-   ##
==========================================
- Coverage   84.96%   84.95%   -0.01%     
==========================================
  Files        1981     1991      +10     
  Lines      117187   118250    +1063     
  Branches     9931    10014      +83     
==========================================
+ Hits        99569   100462     +893     
- Misses      15166    15309     +143     
- Partials     2452     2479      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/pipeline/forecast_gate.py (1)

151-180: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Bring ForecastGate.run() back under the 50-line body limit.

The new pending-reuse branch pushes run() to 61 body lines (Lines 120-180). Please collapse the duplicated approval-required path and hoist the shared signal/hash work so this method gets back under the repo limit.

As per coding guidelines: src/synthorg/**/*.py: Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/synthorg/engine/pipeline/forecast_gate.py` around lines 151 - 180,
ForecastGate.run has grown past the 50-line limit because the approval-required
flow is duplicated for pending vs fresh forecasts; refactor by hoisting the
shared signal and brief_hash work (use _signal_from_work_item and
compute_brief_hash once at the top), then centralize the approval path so both
cases funnel through the same approval handling: call
_pending_forecast_for_brief to get pending, if present set a local variable
approved_forecast = pending else call _forecaster.forecast and await
_forecast_repo.save to set approved_forecast; finally call the single approval
handler (_raise_approval_required or _log_approval_required as appropriate) and
raise CostForecastApprovalRequiredError using fields from approved_forecast
(forecast_id, brief_hash, estimated_cost, currency), keeping ForecastGate.run
under 50 lines and preserving existing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 160-166: The TOCTOU occurs because after calling
_pending_forecast_for_brief and creating fresh via _forecaster.forecast,
_forecast_repo.save(fresh) can fail due to a duplicate pending-row insert;
change the save path so that when _forecast_repo.save(fresh) raises the
repository-specific duplicate-pending/conflict error (the same condition that
indicates the partial-unique pending index violation), catch that exception,
re-query using _pending_forecast_for_brief(compute_brief_hash(signal)) to get
the existing pending row, and then call _raise_approval_required(pending) with
that existing row instead of letting the save error bubble up; keep the normal
flow (saving and proceeding) for non-conflict errors.

---

Outside diff comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 151-180: ForecastGate.run has grown past the 50-line limit because
the approval-required flow is duplicated for pending vs fresh forecasts;
refactor by hoisting the shared signal and brief_hash work (use
_signal_from_work_item and compute_brief_hash once at the top), then centralize
the approval path so both cases funnel through the same approval handling: call
_pending_forecast_for_brief to get pending, if present set a local variable
approved_forecast = pending else call _forecaster.forecast and await
_forecast_repo.save to set approved_forecast; finally call the single approval
handler (_raise_approval_required or _log_approval_required as appropriate) and
raise CostForecastApprovalRequiredError using fields from approved_forecast
(forecast_id, brief_hash, estimated_cost, currency), keeping ForecastGate.run
under 50 lines and preserving existing behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 069f5522-960c-4e1e-b60d-34649ca41277

📥 Commits

Reviewing files that changed from the base of the PR and between 0178d6f and 3e43d3d.

📒 Files selected for processing (2)
  • src/synthorg/engine/pipeline/forecast_gate.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Unit
  • GitHub Check: Test Integration
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% min

Windows unit tests use WindowsSelectorEventLoopPolicy; subprocess tests override back (3.14 IOCP teardown race fix)

Test doubles: use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary is blocked by check_mock_spec.py

FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests, fix fundamentally

Use asyncio.Event().wait() not sleep(large) to avoid flaky async tests

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets are pure env at boot site; no os.environ.get outside startup

Numerics must live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)

Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Error classes must follow <Domain><Condition>Error naming and inherit from DomainError, never directly from Exception/RuntimeError

Pydantic v2 frozen models must have extra="forbid" on every frozen model; use @computed_field for derived values; use NotBlankStr for identifiers

Use args models at every system boundary; call parse_typed() for every external dict ingestion

For immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries

Use asyncio.TaskGroup for async fan-out/fan-in; helpers must catch Exception and re-raise MemoryError/RecursionError

Services must own _lifecycle_lock; timed-out stops mark services as unrestartable

Wrap untrusted content via wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML (SEC-1)

Use from synthorg.observability import get_logger; variable name always logger; never use import logging or print() in app code

Event names must come from observability.events.<domain> constants; use structured kwargs (logger.info(EVENT, key=value))

Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED AFTER persistence write

Never log error as error=str(exc) or interpolate {exc}; use error_type=type(exc).__name__ + error=safe_error_description(exc); never use exc_info=True; for OTel use `span.set_attribute("exception.m...

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Read `docs/design/` page before implementing; deviations need approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: No region/locale privileged; use metric units; British English for all documentation and code
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Every convention PR must ship its enforcement gate per convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; `gh pr create` is blocked by `scripts/check_no_pr_create.sh`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: See `web/CLAUDE.md` for web guidelines; see `cli/CLAUDE.md` for CLI guidelines (use `go -C cli`, never `cd cli`); see `~/.claude/rules/common/bash.md` for shell rules (canonical for `cd` / `git -C` / Bash file-write rules)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: Squash merge with PR body becoming squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-20T23:52:45.960Z
Learning: After every squash merge → `/post-merge-cleanup`
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
🔇 Additional comments (1)
tests/unit/engine/pipeline/test_forecast_gate.py (1)

26-27: LGTM!

Also applies to: 121-136, 234-257

Comment thread src/synthorg/engine/pipeline/forecast_gate.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/engine/pipeline/test_forecast_gate.py (1)

225-242: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Test exercises brief-hash mismatch rather than PENDING-reuse path.

The forecast's brief_hash="a" * 64 differs from _BRIEF_HASH computed from the work item. This causes _forecast_covers_brief to return False, falling through to _forecast_for_brief() (minting a fresh row) rather than exercising the existing.decision is ForecastDecision.PENDING branch at line 136–137 in the implementation.

The actual PENDING-reuse scenario (matching brief_hash) is correctly covered by test_pending_forecast_covering_brief_is_reused. Consider renaming this test to clarify it exercises the "stale linked forecast" path, or update to use brief_hash=_BRIEF_HASH if the intent is to test PENDING with a matching brief.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/engine/pipeline/test_forecast_gate.py` around lines 225 - 242, The
test test_pending_forecast_raises_approval_required is exercising the
stale-linked-forecast path because the created Forecast uses brief_hash="a"*64
which doesn't match the work item's computed _BRIEF_HASH, causing
_forecast_covers_brief to return False and skipping the ForecastDecision.PENDING
reuse branch; to fix, either set the existing Forecast.brief_hash to _BRIEF_HASH
so the test exercises the PENDING reuse path (and asserts
CostForecastApprovalRequiredError) or, if the intent is to test the stale-link
path, rename the test to reflect that behavior (e.g.,
test_stale_linked_forecast_raises_approval_required) and adjust assertions
accordingly so the intent is clear when inspecting _forecast_covers_brief,
ForecastDecision.PENDING, and _forecast_for_brief.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/unit/engine/pipeline/test_forecast_gate.py`:
- Around line 225-242: The test test_pending_forecast_raises_approval_required
is exercising the stale-linked-forecast path because the created Forecast uses
brief_hash="a"*64 which doesn't match the work item's computed _BRIEF_HASH,
causing _forecast_covers_brief to return False and skipping the
ForecastDecision.PENDING reuse branch; to fix, either set the existing
Forecast.brief_hash to _BRIEF_HASH so the test exercises the PENDING reuse path
(and asserts CostForecastApprovalRequiredError) or, if the intent is to test the
stale-link path, rename the test to reflect that behavior (e.g.,
test_stale_linked_forecast_raises_approval_required) and adjust assertions
accordingly so the intent is clear when inspecting _forecast_covers_brief,
ForecastDecision.PENDING, and _forecast_for_brief.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 86978907-d6ea-40cd-93d4-82ece6abc953

📥 Commits

Reviewing files that changed from the base of the PR and between 3e43d3d and 40d76a3.

📒 Files selected for processing (2)
  • src/synthorg/engine/pipeline/forecast_gate.py
  • tests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test Unit
  • GitHub Check: Dashboard Test
  • GitHub Check: Test E2E
  • GitHub Check: Test Integration
  • GitHub Check: Lighthouse Site
  • GitHub Check: Lighthouse Dashboard
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,pyi}

📄 CodeRabbit inference engine (CLAUDE.md)

Read docs/design/ page before implementing; deviations need approval per DESIGN_SPEC.md

Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories in persistence/_generics.py (SingletonRepository, IdKeyedRepository, FilteredQueryRepository, AppendOnlyRepository, StatefulRepository, MVCCRepository)

No hardcoded numeric values; numerics must live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the form NAME: int|float|Final|Final[int]|Final[float] = literal; enforced by scripts/check_no_magic_numbers.py

Never use from __future__ import annotations (Python 3.14 has PEP 649); use PEP 758 except: except A, B: requires parens when binding

Type hints on public functions; mypy strict mode; Google-style docstrings; line length 88; functions <50 lines; files <800 lines

Error classes must follow <Domain><Condition>Error naming and inherit from DomainError; never inherit directly from Exception, RuntimeError, etc.; enforced by check_domain_error_hierarchy.py

All Pydantic v2 frozen models must have extra="forbid" project-wide; use @computed_field for derived fields; use NotBlankStr for identifiers; @computed_field is auto-exempt from frozen-extra-forbid gate; per-line # lint-allow: frozen-extra-forbid -- <reason> for extra="allow"/"ignore" boundaries; enforced by check_frozen_model_extra_forbid.py

Args models required at every system boundary; use parse_typed() for every external dict ingestion; enforced by check_boundary_typed.py

For immutability, use model_copy(update=...) or copy.deepcopy(); apply deepcopy at system boundaries

Async fan-out/fan-in must use asyncio.TaskGroup; helpers must catch Exception and re-raise MemoryError/RecursionError

Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock; lifecycle: ...

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use markers: @pytest.mark.{unit,integration,e2e,slow}; async tests use auto; global timeout 30s; coverage minimum 80%; xdist -n 8 --dist=loadfile auto-applied via pyproject addopts

Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back

Test doubles ladder: FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at a typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline)

Import FakeClock and mock_of from tests._shared; inject via clock= and the helper's spec subscript

Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...)); never skip/xfail flaky tests; fix fundamentally; use asyncio.Event().wait() not sleep(large)

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
**/*.{py,js,jsx,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

No region/currency/locale privileged; use metric units; British English throughout; enforced per docs/reference/regional-defaults.md

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Git commits: <type>: <description> (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced; signed commits required on protected refs (GPG/SSH or GitHub App via synthorg-repo-bot); branches: <type>/<slug> from main

Files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: two phases: construction (create_app body) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence; ordering invariants: agent_registry before auto_wire_meetings, tunnel_provider unconditional; on-startup: SettingsService before WorkflowExecutionObserver, OntologyService after persistence.connect(), cost-dial services after persistence via _try_wire_cost_dial

Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returns RuntimeServices pair (worker execution service + multi-agent coordinator); _install_runtime_services hook installs both via AppState seams appended FIRST; empty-company rejects task creation (4014 AgentRuntimeNotConfiguredError) and /coordinate 503s; swap_* methods hold lock against lazy reads

Setup completion: post_setup_reinit() (provider reload, agent bootstrap, runtime-services rebuild + dual hot-swap) propagates failures; settings_svc.set() only if reinit clean; whole sequence under COMPLETE_LOCK to prevent concurrent race on flag write

Telemetry: opt-in, off by default; every event property must be in _ALLOWED_PROPERTIES; detail in telemetry.md

Provider resilience: calls go through BaseCompletionProvider (retry + rate limit); never implement retry in driver subclasses; retryable: RateLimitError, Provider{Timeout,Connection,Internal}Error; WebSocket: per-frame timeout closes silent peers (1008), revalidation saturation closes (4011)

Conversational propose: POST /meta/chat/propose is opt-in (meta.chief_of_staff.propose_enabled); ChiefOfStaffProposer built by build_chief_of_staff_proposer (ENFORCED manifest entry, EPIC #1955 discipline) and 503s when ANY of provider, persistence, or work pipeline missing; human content wrapped via wrap_untrusted(TAG_TASK_DATA, ...) (SEC-1)

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: After every issue: create branch + commit + push (no auto-PR); use `/pre-pr-review` (gh pr create blocked by scripts/check_no_pr_create.sh); after PR: use `/aurelio-review-pr` for external feedback; fix EVERYTHING valid; no deferring
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: After every squash merge, run `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: Pre-commit/pre-push hooks: `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-21T00:10:24.563Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard and REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • tests/unit/engine/pipeline/test_forecast_gate.py
  • src/synthorg/engine/pipeline/forecast_gate.py
🔇 Additional comments (7)
src/synthorg/engine/pipeline/forecast_gate.py (4)

26-26: LGTM!


135-156: LGTM!


158-182: LGTM!


200-208: LGTM!

tests/unit/engine/pipeline/test_forecast_gate.py (3)

17-17: LGTM!


140-176: LGTM!


298-319: LGTM!

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 00:30 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 700a59e into main May 21, 2026
82 checks passed
@Aureliolo Aureliolo deleted the feat/1982-cost-dial branch May 21, 2026 04:53
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 21, 2026 04:53 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 22, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Introduced conversational interface for direct clarify and propose
interactions.
- Cost management now includes forecast gates, hard ceilings, and Pareto
considerations.
- Added living documentation engine combining wiki and
retrieval-augmented generation features.
- Real intake engine is now operational for live data processing.
- Virtual desktop tool with vision verification gate available for
enhanced workspace control.

### What's new
- Per-project reproducible environments for consistent setups.
- Headless browser testing tool integrated for automated UI validation.
- Governed external API and data access tool introduced.
- Hardened external-remote git backend with sandbox mounts and
push-queue dispatching.
- Adversarial red-team gate subsystem for enhanced security testing.
- Self-extending toolkit to dynamically expand capabilities.
- Stakes-aware model routing enables prioritized processing.
- Task-board entry adapter connects live runtime with project
management.
- Persistent project workspace with pluggable git backend and
per-project push queues implemented.
- Knowledge and provenance substrate added to track data lineage.
- Scoring and data contract framework for golden-company benchmark
evaluations.

### Under the hood
- Desktop Dockerfile pinned by digest to improve build stability and
documented publishing gap fixed.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.7](v0.8.6...v0.8.7)
(2026-05-22)


### Features

* conversational interface v1 - 1:1 clarify + propose
([#2019](#2019))
([216ef94](216ef94)),
closes [#1968](#1968)
* cost as a first-class dial (forecast gate, hard ceiling, Pareto)
([#2029](#2029))
([700a59e](700a59e)),
closes [#1982](#1982)
* **env:** reproducible per-project environments
([#2039](#2039))
([d2c0ef9](d2c0ef9)),
closes [#1994](#1994)
* **evals:** [#1980](#1980)
spine -- scoring + data contract for golden-company benchmark
([#2025](#2025))
([53108e8](53108e8))
* goal/objective entry adapter
([#1964](#1964))
([#2022](#2022))
([cb15c3c](cb15c3c))
* governed external API/data access tool
([#1991](#1991))
([#2032](#2032))
([e08b451](e08b451))
* harden external-remote git backend + per-project sandbox mount +
push-queue dispatch
([#2020](#2020))
([#2030](#2030))
([2fa2e1e](2fa2e1e))
* headless browser testing tool
([#1992](#1992))
([#2024](#2024))
([277b52a](277b52a))
* knowledge + provenance substrate
([#2036](#2036))
([48c897b](48c897b))
* living documentation engine (dual-purpose wiki + RAG namespace)
([#2028](#2028))
([3d10da9](3d10da9)),
closes [#1976](#1976)
* real intake engine online
([#2017](#2017))
([9d8eb34](9d8eb34))
* **redteam:** adversarial red-team gate subsystem
([#1986](#1986))
([#2026](#2026))
([d2207e9](d2207e9))
* self-extending toolkit
([#1995](#1995))
([#2035](#2035))
([5ffc545](5ffc545))
* stakes-aware model routing
([#1998](#1998))
([#2038](#2038))
([9b98312](9b98312))
* task-board entry adapter to live runtime
([#1963](#1963))
([#2023](#2023))
([a8f1eea](a8f1eea))
* virtual desktop tool and vision verifier gate
([#2031](#2031))
([dfe8b42](dfe8b42)),
closes [#1993](#1993)
* **workspace:** persistent project workspace + pluggable git backend +
per-project push queue
([#2021](#2021))
([ee58ee7](ee58ee7))


### Bug Fixes

* pin desktop Dockerfile by digest (Scorecard
[#309](#309)) + document
publish gap ([#2034](#2034))
([8fda188](8fda188))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cost as a first-class dial

1 participant