feat: cost as a first-class dial (forecast gate, hard ceiling, Pareto)#2029
Conversation
- align postgres cost_forecasts schema with sqlite (TEXT for ids/timestamps) so the schema-drift parity gate passes without a baseline change - drop entity-mutation save log from cost_forecast repos (persistence-boundary gate forbids audit logs at the repo layer; service-layer log lands in #1982 Phase 9 when the controllers are wired) - replace hardcoded "USD" default in _enforcer_helpers with DEFAULT_CURRENCY - annotate _assignments test helper return type
ForecastBudgetController exposes POST /budget/forecast, GET
/budget/forecasts/{id}, approve/reject/raise_ceiling decisions,
and GET /budget/pareto. AppState carries cost_forecaster,
cost_forecast_repo, pareto_analyzer, benchmark_provider, and
budget_config behind lock-protected swap_* methods mirroring the
existing worker / coordinator pattern; on_startup wires them
once persistence is connected. Three of the four PENDING
manifest lines flip to ENFORCED.
Adds web/src/pages/budget/ParetoSection.tsx + Storybook stories covering stub / measured / empty / loading / unwired states. Updates Task and BudgetConfig fixtures across the test + Storybook suites for the new Task.hard_ceiling / Task.forecast_id fields and the eight new BudgetConfig.forecast_* settings. Regenerates DTOs.
Extract _try_wire_cost_dial helper so a transient persistence-not- yet-connected error during shared-app fixture boot does not poison startup; controllers degrade to 503 when dependencies are absent.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request transforms cost management from a passive ledger into an active, operator-controlled dial. It introduces pre-flight forecasting to gate work entry, a hard ceiling mechanism to halt runaway costs mid-execution, and a Pareto-based analysis tool to help operators optimize cost versus quality. These changes provide granular control over spending and improve operational transparency. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
🧰 Additional context used📓 Path-based instructions (5)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
{src/**/*.py,tests/**/*.py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
{src,tests}/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*test*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📓 Common learnings📚 Learning: 2026-05-05T09:04:46.195ZApplied to files:
🔇 Additional comments (1)
</review_stack_artifact> WalkthroughThe PR implements a full “cost dial” system: immutable forecast domain models, a CostForecaster, benchmark protocol and stub, ParetoAnalyzer, persisted Forecast repository implementations (SQLite/Postgres) and migrations, ForecastGate at entry, task stamping of forecast/ceiling, in-loop hard‑ceiling enforcement that raises and can park runs, engine parking/stamp handling, AppState/runtime/controller wiring, Litestar API endpoints for forecasts/pareto, web client types/UI/store/mocks, settings and observability events, and extensive unit/integration/conformance tests. |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/synthorg/engine/pipeline/forecast_gate.py (1)
122-145:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReuse matching
PENDINGforecasts instead of minting a new pending row.When a matching existing forecast is
PENDING, this path still falls through to_issue_fresh_forecast(). That can trip the unique-pending constraint for the samebrief_hashand return a persistence failure instead of a stable approval-required response.Suggested fix
if existing is not None and self._forecast_covers_brief(work_item, existing): if existing.decision is ForecastDecision.APPROVED: @@ if existing.decision is ForecastDecision.REJECTED: @@ raise CostForecastRejectedError( msg, forecast_id=existing.forecast_id, brief_hash=existing.brief_hash, ) + if existing.decision is ForecastDecision.PENDING: + self._log_approval_required(existing) + msg = ( + "Pre-flight cost forecast required: " + f"estimated {existing.estimated_cost:.4f} {existing.currency} " + "awaiting operator approval" + ) + raise CostForecastApprovalRequiredError( + msg, + forecast_id=existing.forecast_id, + brief_hash=existing.brief_hash, + estimated_cost=existing.estimated_cost, + currency=existing.currency, + ) fresh = await self._issue_fresh_forecast(work_item)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/engine/pipeline/forecast_gate.py` around lines 122 - 145, Existing PENDING forecasts that match the brief are currently ignored and a new pending row is minted; modify the decision branch after self._lookup_forecast(...) to handle ForecastDecision.PENDING by reusing the existing forecast instead of calling _issue_fresh_forecast: when existing is not None and _forecast_covers_brief(...) and existing.decision is ForecastDecision.PENDING, call self._log_approval_required(existing) (same logging used for newly issued pending forecasts) and return/exit the method so the code does not fall through to _issue_fresh_forecast; preserve the current APPROVED and REJECTED handling in _forecast_covers_brief path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 122-145: Existing PENDING forecasts that match the brief are
currently ignored and a new pending row is minted; modify the decision branch
after self._lookup_forecast(...) to handle ForecastDecision.PENDING by reusing
the existing forecast instead of calling _issue_fresh_forecast: when existing is
not None and _forecast_covers_brief(...) and existing.decision is
ForecastDecision.PENDING, call self._log_approval_required(existing) (same
logging used for newly issued pending forecasts) and return/exit the method so
the code does not fall through to _issue_fresh_forecast; preserve the current
APPROVED and REJECTED handling in _forecast_covers_brief path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: f6312264-640b-45a8-8f36-0ae671b2712b
📒 Files selected for processing (21)
src/synthorg/api/controllers/budget_forecast.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/budget/benchmark_stub.pysrc/synthorg/budget/forecast_models.pysrc/synthorg/budget/forecaster.pysrc/synthorg/budget/pareto.pysrc/synthorg/engine/agent_engine_errors.pysrc/synthorg/engine/pipeline/entry/boot.pysrc/synthorg/engine/pipeline/forecast_gate.pysrc/synthorg/persistence/postgres/cost_forecast_repo.pysrc/synthorg/persistence/sqlite/cost_forecast_repo.pytests/conformance/persistence/test_cost_forecast_repository.pytests/unit/api/test_budget_forecast_controller.pytests/unit/budget/test_forecast_model_invariants.pytests/unit/budget/test_forecaster.pytests/unit/budget/test_hard_ceiling.pytests/unit/budget/test_pareto.pytests/unit/engine/pipeline/test_forecast_gate.pyweb/src/api/endpoints/budget.tsweb/src/mocks/handlers/budget.tsweb/src/stores/tasks.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: CodSpeed Web Pass
- GitHub Check: Build Backend
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test Integration
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (13)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env only; no os.environ.get outside startup per docs/reference/configuration-precedence.md
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
No
from __future__ import annotationsin Python 3.14+; use PEP 758 except:except A, B:requires parens when bindingType hints required on public functions; mypy strict mode; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Error classes must follow Error pattern and inherit from DomainError (never Exception/RuntimeError directly). Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model;
@computed_fieldfor derived fields; NotBlankStr for identifiers. Per-line# lint-allow: frozen-extra-forbid -- <reason>for extra="allow"/"ignore" boundaries. Enforced by check_frozen_model_extra_forbid.pyArgs models required at every system boundary; use parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helpers catch Exception (re-raise MemoryError/RecursionError)
Clock seam: clock: Clock | None = None; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
Untrusted content (SEC-1): wrap_untrusted() from engine.prompt_safety; HTMLParseGuard for HTML
Use synthorg.observability.get_logger; variable always named 'logger'. Never use 'import logging' or print() in app code
Event names from observability.events. constants; stru...
Files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/budget/forecast_models.pysrc/synthorg/budget/benchmark_stub.pysrc/synthorg/persistence/sqlite/cost_forecast_repo.pysrc/synthorg/budget/forecaster.pysrc/synthorg/engine/agent_engine_errors.pysrc/synthorg/engine/pipeline/forecast_gate.pysrc/synthorg/engine/pipeline/entry/boot.pysrc/synthorg/budget/pareto.pysrc/synthorg/api/controllers/budget_forecast.pysrc/synthorg/persistence/postgres/cost_forecast_repo.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup construction phase: synchronous service wiring in create_app body; on_startup wires persistence-dependent services. Ordering: agent_registry before auto_wire_meetings; tunnel_provider unconditional
API startup on_startup ordering: SettingsService auto-wire before WorkflowExecutionObserver; OntologyService after persistence.connect() via _wire_ontology_service
Cost-dial services wire via _try_wire_cost_dial after persistence connects (best-effort, idempotent); logs BUDGET_FORECAST_UNAVAILABLE if unavailable
Files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/api/controllers/budget_forecast.py
src/synthorg/api/controllers/setup/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Setup completion: post_setup_reinit() propagates failures; settings_svc.set("api", "setup_complete", "true") only after clean reinit. Sequence serialized under COMPLETE_LOCK in src/synthorg/api/controllers/setup/agent_helpers.py
Files:
src/synthorg/api/controllers/setup/agent_helpers.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/budget/forecast_models.pysrc/synthorg/budget/benchmark_stub.pysrc/synthorg/persistence/sqlite/cost_forecast_repo.pysrc/synthorg/budget/forecaster.pysrc/synthorg/engine/agent_engine_errors.pysrc/synthorg/engine/pipeline/forecast_gate.pysrc/synthorg/engine/pipeline/entry/boot.pysrc/synthorg/budget/pareto.pysrc/synthorg/api/controllers/budget_forecast.pysrc/synthorg/persistence/postgres/cost_forecast_repo.py
web/src/**/*.{js,jsx,ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{js,jsx,ts,tsx,mts}: Always usecreateLoggerfrom@/lib/logger; never bareconsole.warn/console.error/console.debugin application code. Variable name must always belog. Onlylogger.tsitself may use bare console methods. Uselog.debug()(DEV-only, stripped in production),log.warn(),log.error().
Pass dynamic/untrusted values as separate args to logger calls (not interpolated into the message string) so they go throughsanitizeArg
Attacker-controlled fields inside structured objects must be wrapped insanitizeForLog()before embedding in log calls
Error-code constants (MANDATORY): importErrorCodeandErrorCategoryfrom@/api/types/errors(re-exported from the generatedweb/src/api/types/error-codes.gen.ts). Discriminate onErrorCode.<NAME>, never on raw integer literals.
Use@eslint-react/web-api-no-leaked-fetchto detectfetch()in effects withoutAbortControllercleanup
Files:
web/src/api/endpoints/budget.tsweb/src/stores/tasks.tsweb/src/mocks/handlers/budget.ts
web/src/{api/endpoints,stores}/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Cursor pagination (MANDATORY): list endpoints must use opaque cursor-based paging via
PaginationMeta. Stores must keepnextCursor+hasMorein state (not offset arithmetic) and early-return when!hasMore || !nextCursor. Display counts must come fromdata.length.
Files:
web/src/api/endpoints/budget.tsweb/src/stores/tasks.ts
web/src/api/endpoints/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
Health / readiness endpoints (MANDATORY):
getLiveness()is always 200 while the process is alive;getReadiness()is 200 healthy / 503 unavailable (binary'ok' | 'unavailable'outcome, no tri-state). Any new caller must handle the 503 path explicitly.
Files:
web/src/api/endpoints/budget.ts
web/src/**/*.{ts,tsx,mts}
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/**/*.{ts,tsx,mts}: Use@typescript-eslint/no-floating-promisesto forbid unawaited promises so async work cannot survive the test that scheduled it and trip the active-handle gate
Use@typescript-eslint/no-misused-promises(withchecksVoidReturn: { attributes: false }) to forbid passing async functions where the callsite ignores the returned promise. React 19asyncevent handlers stay allowed via theattributes: falseexemption.
Files:
web/src/api/endpoints/budget.tsweb/src/stores/tasks.tsweb/src/mocks/handlers/budget.ts
web/src/stores/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/stores/**/*.ts: List reads (fetch*) must seterror: string | nullon the store instead of toasting
Test teardown (MANDATORY): any new store that schedules timers or attaches event listeners must expose an equivalent cleanup hook and register it in the globalafterEach. The globalafterEachinweb/src/test-setup.tsxalready callsuseToastStore.getState().dismissAll(),cancelPendingPersist(), anduseThemeStore.getState().teardown().
Files:
web/src/stores/tasks.ts
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test markers:
@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min. xdist -n 8 --dist=loadfile auto-applied via pyproject addoptsWindows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles ladder: FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Bare MagicMock at typed boundary blocked by scripts/check_mock_spec.py (zero-tolerance)
FakeClock and mock_of import from tests._shared; inject via clock= and helper's spec subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add
@example(...)). Never skip/xfail flaky tests; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/budget/test_hard_ceiling.pytests/unit/api/test_budget_forecast_controller.pytests/unit/engine/pipeline/test_forecast_gate.pytests/unit/budget/test_pareto.pytests/unit/budget/test_forecaster.pytests/unit/budget/test_forecast_model_invariants.pytests/conformance/persistence/test_cost_forecast_repository.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/budget/test_hard_ceiling.pytests/unit/api/test_budget_forecast_controller.pytests/unit/engine/pipeline/test_forecast_gate.pytests/unit/budget/test_pareto.pytests/unit/budget/test_forecaster.pytests/unit/budget/test_forecast_model_invariants.pytests/conformance/persistence/test_cost_forecast_repository.py
src/synthorg/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL; new repository protocols inherit from generic categories in persistence/_generics.py and stay
@runtime_checkableper ADR-0001Repository CRUD methods: save(entity), get(id), delete(id) -> bool, list_items(...), query(...) returning tuples
Datetime in persistence: parse_iso_utc / format_iso_utc from persistence._shared (reject naive); normalize_utc for already-typed
Files:
src/synthorg/persistence/sqlite/cost_forecast_repo.pysrc/synthorg/persistence/postgres/cost_forecast_repo.py
web/src/mocks/handlers/**/*.ts
📄 CodeRabbit inference engine (web/CLAUDE.md)
web/src/mocks/handlers/**/*.ts: MSW handlers (MANDATORY):web/src/mocks/handlers/must mirrorweb/src/api/endpoints/*.ts1:1 with a default happy-path handler for every exported endpoint. UseonUnhandledRequest: 'error'in test setup; tests override per-case viaserver.use(...), nevervi.mock('@/api/endpoints/*').
Use typed envelope helpers (successFor,paginatedFor,voidSuccess) to keep MSW handlers in lockstep with endpoint return types
Files:
web/src/mocks/handlers/budget.ts
tests/conformance/persistence/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Dual-backend conformance: tests/conformance/persistence/ consumes backend fixture (SQLite + Postgres). Enforced by check_dual_backend_test_parity.py
Files:
tests/conformance/persistence/test_cost_forecast_repository.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Read design spec page before implementing; deviations require approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: No region/locale privileged; use metric units; British English per docs/reference/regional-defaults.md
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Every convention PR ships its enforcement gate per docs/reference/convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Timeout/slow test failures indicate source-code regression; never edit tests/baselines/unit_timing.json or scripts/*_baseline.{txt,json} / scripts/_*_baseline.py; both families are PreToolUse-blocked. Per-invocation bypass: ALLOW_BASELINE_GROWTH=1 git commit (requires explicit user approval)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: After issue: branch + commit + push (no auto-PR); use /pre-pr-review; use /aurelio-review-pr for external feedback after PR; fix EVERYTHING valid, no deferring
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Git commits: <type>: <description> (feat/fix/refactor/docs/test/chore/perf/ci) via commitizen. Signed commits required on protected refs. Branches: <type>/<slug> from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: Squash merge; PR body becomes squash commit; trailers (Release-As, Closes `#N`) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: After every squash merge → /post-merge-cleanup
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:16:50.000Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard + REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/budget/forecast_models.pytests/unit/budget/test_hard_ceiling.pysrc/synthorg/budget/benchmark_stub.pytests/unit/api/test_budget_forecast_controller.pysrc/synthorg/persistence/sqlite/cost_forecast_repo.pytests/unit/engine/pipeline/test_forecast_gate.pytests/unit/budget/test_pareto.pysrc/synthorg/budget/forecaster.pysrc/synthorg/engine/agent_engine_errors.pysrc/synthorg/engine/pipeline/forecast_gate.pysrc/synthorg/engine/pipeline/entry/boot.pysrc/synthorg/budget/pareto.pytests/unit/budget/test_forecaster.pytests/unit/budget/test_forecast_model_invariants.pytests/conformance/persistence/test_cost_forecast_repository.pysrc/synthorg/api/controllers/budget_forecast.pysrc/synthorg/persistence/postgres/cost_forecast_repo.py
🔇 Additional comments (21)
src/synthorg/persistence/sqlite/cost_forecast_repo.py (1)
480-493: LGTM!src/synthorg/api/controllers/budget_forecast.py (2)
22-22: LGTM!Also applies to: 32-32, 139-139, 190-190, 221-221, 250-250
280-285: LGTM!src/synthorg/api/controllers/setup/agent_helpers.py (1)
170-185: LGTM!src/synthorg/budget/benchmark_stub.py (1)
87-94: LGTM!src/synthorg/budget/forecast_models.py (1)
187-213: LGTM!src/synthorg/budget/forecaster.py (1)
44-47: LGTM!Also applies to: 148-152, 249-253
src/synthorg/budget/pareto.py (1)
257-264: LGTM!Also applies to: 289-293
src/synthorg/engine/agent_engine_errors.py (1)
9-9: LGTM!Also applies to: 297-304
src/synthorg/engine/pipeline/entry/boot.py (1)
26-27: LGTM!Also applies to: 63-80, 130-131, 193-194, 294-295
src/synthorg/persistence/postgres/cost_forecast_repo.py (1)
445-457: LGTM!tests/conformance/persistence/test_cost_forecast_repository.py (1)
198-199: LGTM!Also applies to: 275-280, 382-394
tests/unit/api/test_budget_forecast_controller.py (1)
22-23: LGTM!Also applies to: 152-167
tests/unit/budget/test_forecast_model_invariants.py (1)
57-79: LGTM!tests/unit/budget/test_forecaster.py (1)
15-16: LGTM!Also applies to: 61-63, 79-81, 88-90, 108-109, 132-133, 147-149, 160-162, 182-184, 191-193, 198-200, 231-233
tests/unit/budget/test_hard_ceiling.py (1)
163-168: LGTM!tests/unit/budget/test_pareto.py (1)
15-15: LGTM!Also applies to: 57-57, 60-65, 73-73, 92-92, 114-114, 131-131, 155-155, 158-163, 176-176, 179-179
tests/unit/engine/pipeline/test_forecast_gate.py (1)
15-15: LGTM!Also applies to: 17-17, 26-26, 51-55, 148-148, 203-203, 231-231, 253-280, 285-285
web/src/api/endpoints/budget.ts (1)
78-79: LGTM!Also applies to: 83-84, 88-88, 91-91, 98-98, 102-102, 109-109, 113-113, 120-120, 124-124
web/src/mocks/handlers/budget.ts (1)
4-5: LGTM!Also applies to: 8-11, 126-128, 131-131, 134-138, 142-149, 154-161, 166-172
web/src/stores/tasks.ts (1)
322-326: LGTM!Also applies to: 439-443
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 143-162: The current logic only reuses a PENDING forecast when the
caller provided the matching forecast_id; before calling _issue_fresh_forecast()
you must lookup any existing row with decision == ForecastDecision.PENDING for
the incoming brief_hash regardless of the caller-provided forecast_id, and if
found call _log_approval_required(existing) and raise
CostForecastApprovalRequiredError populated with existing.forecast_id,
existing.brief_hash, existing.estimated_cost and existing.currency so the stable
pending forecast is reused instead of attempting to create a duplicate;
implement this lookup where the current existing/decision check occurs and
branch to raise the existing pending error path before minting a fresh forecast.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 71500f71-316a-4509-8204-ba3ac1bb215c
📒 Files selected for processing (2)
src/synthorg/engine/pipeline/forecast_gate.pytests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Backend
- GitHub Check: Dashboard Test
- GitHub Check: Test Integration
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Lighthouse Site
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test markers:
@pytest.mark.{unit,integration,e2e,slow}; async tests are auto; 30s global timeout; 80% minimum coverage requiredUse xdist
-n 8 --dist=loadfile(auto-applied via pyprojectaddopts); on Windows useWindowsSelectorEventLoopPolicyfor unit tests; subprocess tests override backUse FakeClock from
tests._sharedfor Clock seam injection; usemock_of[T](**overrides)for typed-boundary substitutions; useSimpleNamespacefor attribute-bags; bareMagicMockat typed boundaries is prohibited (enforced byscripts/check_mock_spec.py)Hypothesis: 10 deterministic CI examples; test failures are real bugs (fix + add
@example(...)); never skip/xfail flaky tests—fix fundamentally
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
!(web/public/provider-logos)/**
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names in project code/tests; use
example-provider,test-provider,example-{large,medium,small}-001(allowed in.claude/, third-party imports,providers/presets.py,web/public/provider-logos/)
Files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/!(persistence)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Files:
src/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via
SettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env; noos.environ.getoutside startup; pre-init Cat-2 reads usesettings.bootstrap_resolver.resolve_init_value
Files:
src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Numerics must live in
settings/definitions/; allowlist 0, 1, -1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the formNAME: int|float|Final|Final[int]|Final[float] = literalUse comments to explain WHY only; no reviewer citations, issue back-refs, or migration framing
Do not use
from __future__ import annotations(Python 3.14 has PEP 649); use PEP 758except A, B:syntax without parens unless bindingType hints required on public functions; mypy strict mode enforced; use Google-style docstrings; maximum line length 88; functions under 50 lines; files under 800 lines
Define domain errors as
<Domain><Condition>Errorinheriting fromDomainError, never fromException/RuntimeErrordirectly; enforced bycheck_domain_error_hierarchy.pyPydantic v2: every frozen model must have
extra="forbid";@computed_fieldauto-exempt; per-line# lint-allow: frozen-extra-forbid -- <reason>for exceptions; use@computed_fieldfor derived fields; useNotBlankStrfor identifiersUse args models at every system boundary; call
parse_typed()for every external dict ingestion; enforced bycheck_boundary_typed.pyEnforce immutability via
model_copy(update=...)orcopy.deepcopy(); deepcopy at system boundariesUse
asyncio.TaskGroupfor async fan-out/fan-in; helpers must catchExceptionand re-raiseMemoryError/RecursionErrorUse Clock seam pattern:
clock: Clock | None = None; tests injectFakeClock; lifecycle services own_lifecycle_lock; timed-out stops mark unrestartableWrap untrusted content via
wrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML (SEC-1)Use
from synthorg.observability import get_loggerwith variable always namedlogger; neverimport loggingorprint()in app codeUse event names from
observability.events.<domain>constants; log with structured kwargs (logger.info(EVENT, key=value))Log error paths with WARNING/ERROR + context before raisi...
Files:
src/synthorg/engine/pipeline/forecast_gate.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Read design specification from `docs/design/` before implementing; deviations require approval
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Present every plan for accept/deny before coding (MANDATORY Planning gate)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: No region/locale/currency should be privileged; use metric units; use British English
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Every convention PR must ship its enforcement gate
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Commit messages: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced; signed commits required on protected refs
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Branch naming: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Squash merge with PR body becoming squash commit; include trailers (`Release-As`, `Closes `#N``) in PR body
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Use `.pre-commit-config.yaml` for pre-commit/pre-push hooks; tool-call gates in `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: After every squash merge → `/post-merge-cleanup` command
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: CLI features go in Docker (init/start/stop/status commands); new features go in dashboard + REST API
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: Use `go -C cli` (never `cd cli`) for Go CLI commands; see `cli/CLAUDE.md` for CLI-specific guidelines
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:34:24.704Z
Learning: See `shell` rules in `~/.claude/rules/common/bash.md` (canonical for `cd` / `git -C` / Bash file-write rules)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2029 +/- ##
==========================================
- Coverage 84.96% 84.95% -0.01%
==========================================
Files 1981 1991 +10
Lines 117187 118250 +1063
Branches 9931 10014 +83
==========================================
+ Hits 99569 100462 +893
- Misses 15166 15309 +143
- Partials 2452 2479 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/engine/pipeline/forecast_gate.py (1)
151-180: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winBring
ForecastGate.run()back under the 50-line body limit.The new pending-reuse branch pushes
run()to 61 body lines (Lines 120-180). Please collapse the duplicated approval-required path and hoist the shared signal/hash work so this method gets back under the repo limit.As per coding guidelines:
src/synthorg/**/*.py: Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/synthorg/engine/pipeline/forecast_gate.py` around lines 151 - 180, ForecastGate.run has grown past the 50-line limit because the approval-required flow is duplicated for pending vs fresh forecasts; refactor by hoisting the shared signal and brief_hash work (use _signal_from_work_item and compute_brief_hash once at the top), then centralize the approval path so both cases funnel through the same approval handling: call _pending_forecast_for_brief to get pending, if present set a local variable approved_forecast = pending else call _forecaster.forecast and await _forecast_repo.save to set approved_forecast; finally call the single approval handler (_raise_approval_required or _log_approval_required as appropriate) and raise CostForecastApprovalRequiredError using fields from approved_forecast (forecast_id, brief_hash, estimated_cost, currency), keeping ForecastGate.run under 50 lines and preserving existing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 160-166: The TOCTOU occurs because after calling
_pending_forecast_for_brief and creating fresh via _forecaster.forecast,
_forecast_repo.save(fresh) can fail due to a duplicate pending-row insert;
change the save path so that when _forecast_repo.save(fresh) raises the
repository-specific duplicate-pending/conflict error (the same condition that
indicates the partial-unique pending index violation), catch that exception,
re-query using _pending_forecast_for_brief(compute_brief_hash(signal)) to get
the existing pending row, and then call _raise_approval_required(pending) with
that existing row instead of letting the save error bubble up; keep the normal
flow (saving and proceeding) for non-conflict errors.
---
Outside diff comments:
In `@src/synthorg/engine/pipeline/forecast_gate.py`:
- Around line 151-180: ForecastGate.run has grown past the 50-line limit because
the approval-required flow is duplicated for pending vs fresh forecasts;
refactor by hoisting the shared signal and brief_hash work (use
_signal_from_work_item and compute_brief_hash once at the top), then centralize
the approval path so both cases funnel through the same approval handling: call
_pending_forecast_for_brief to get pending, if present set a local variable
approved_forecast = pending else call _forecaster.forecast and await
_forecast_repo.save to set approved_forecast; finally call the single approval
handler (_raise_approval_required or _log_approval_required as appropriate) and
raise CostForecastApprovalRequiredError using fields from approved_forecast
(forecast_id, brief_hash, estimated_cost, currency), keeping ForecastGate.run
under 50 lines and preserving existing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 069f5522-960c-4e1e-b60d-34649ca41277
📒 Files selected for processing (2)
src/synthorg/engine/pipeline/forecast_gate.pytests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Backend
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Test Unit
- GitHub Check: Test Integration
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test markers:
@pytest.mark.{unit,integration,e2e,slow}; asyncauto; timeout 30s global; coverage 80% minWindows unit tests use
WindowsSelectorEventLoopPolicy; subprocess tests override back (3.14 IOCP teardown race fix)Test doubles: use FakeClock for Clock seam,
mock_of[T](**overrides)for typed-boundary substitutions,SimpleNamespacefor attribute-bags; bareMagicMockat typed boundary is blocked bycheck_mock_spec.py
FakeClockandmock_ofimport fromtests._shared; inject viaclock=and helper's spec subscriptHypothesis: 10 deterministic CI examples; failures are real bugs (fix + add
@example(...)); never skip/xfail flaky tests, fix fundamentallyUse
asyncio.Event().wait()notsleep(large)to avoid flaky async tests
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Configuration precedence: DB > env > code default via
SettingsService/ConfigResolver(Cat-1) or env > code default (Cat-2,read_only_post_init); Cat-3 bootstrap secrets are pure env at boot site; noos.environ.getoutside startupNumerics must live in
settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal)Type hints on public functions; mypy strict; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Error classes must follow
<Domain><Condition>Errornaming and inherit fromDomainError, never directly fromException/RuntimeErrorPydantic v2 frozen models must have
extra="forbid"on every frozen model; use@computed_fieldfor derived values; useNotBlankStrfor identifiersUse args models at every system boundary; call
parse_typed()for every external dict ingestionFor immutability: use
model_copy(update=...)orcopy.deepcopy(); deepcopy at system boundariesUse
asyncio.TaskGroupfor async fan-out/fan-in; helpers must catchExceptionand re-raiseMemoryError/RecursionErrorServices must own
_lifecycle_lock; timed-out stops mark services as unrestartableWrap untrusted content via
wrap_untrusted()fromengine.prompt_safety; useHTMLParseGuardfor HTML (SEC-1)Use
from synthorg.observability import get_logger; variable name alwayslogger; never useimport loggingorprint()in app codeEvent names must come from
observability.events.<domain>constants; use structured kwargs (logger.info(EVENT, key=value))Error paths log WARNING/ERROR with context before raising; state transitions log INFO via
*_STATUS_TRANSITIONEDAFTER persistence writeNever log error as
error=str(exc)or interpolate{exc}; useerror_type=type(exc).__name__+error=safe_error_description(exc); never useexc_info=True; for OTel use `span.set_attribute("exception.m...
Files:
src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Read `docs/design/` page before implementing; deviations need approval per DESIGN_SPEC.md
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: No region/locale privileged; use metric units; British English for all documentation and code
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Every convention PR must ship its enforcement gate per convention-gates.md
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: After issue: branch + commit + push (no auto-PR); use `/pre-pr-review`; `gh pr create` is blocked by `scripts/check_no_pr_create.sh`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Commits: `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Signed commits required on protected refs (GPG/SSH or GitHub App via `synthorg-repo-bot`)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: See `web/CLAUDE.md` for web guidelines; see `cli/CLAUDE.md` for CLI guidelines (use `go -C cli`, never `cd cli`); see `~/.claude/rules/common/bash.md` for shell rules (canonical for `cd` / `git -C` / Bash file-write rules)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: Squash merge with PR body becoming squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-20T23:52:45.960Z
Learning: After every squash merge → `/post-merge-cleanup`
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
🔇 Additional comments (1)
tests/unit/engine/pipeline/test_forecast_gate.py (1)
26-27: LGTM!Also applies to: 121-136, 234-257
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/engine/pipeline/test_forecast_gate.py (1)
225-242: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueTest exercises brief-hash mismatch rather than PENDING-reuse path.
The forecast's
brief_hash="a" * 64differs from_BRIEF_HASHcomputed from the work item. This causes_forecast_covers_briefto returnFalse, falling through to_forecast_for_brief()(minting a fresh row) rather than exercising theexisting.decision is ForecastDecision.PENDINGbranch at line 136–137 in the implementation.The actual PENDING-reuse scenario (matching brief_hash) is correctly covered by
test_pending_forecast_covering_brief_is_reused. Consider renaming this test to clarify it exercises the "stale linked forecast" path, or update to usebrief_hash=_BRIEF_HASHif the intent is to test PENDING with a matching brief.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/engine/pipeline/test_forecast_gate.py` around lines 225 - 242, The test test_pending_forecast_raises_approval_required is exercising the stale-linked-forecast path because the created Forecast uses brief_hash="a"*64 which doesn't match the work item's computed _BRIEF_HASH, causing _forecast_covers_brief to return False and skipping the ForecastDecision.PENDING reuse branch; to fix, either set the existing Forecast.brief_hash to _BRIEF_HASH so the test exercises the PENDING reuse path (and asserts CostForecastApprovalRequiredError) or, if the intent is to test the stale-link path, rename the test to reflect that behavior (e.g., test_stale_linked_forecast_raises_approval_required) and adjust assertions accordingly so the intent is clear when inspecting _forecast_covers_brief, ForecastDecision.PENDING, and _forecast_for_brief.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tests/unit/engine/pipeline/test_forecast_gate.py`:
- Around line 225-242: The test test_pending_forecast_raises_approval_required
is exercising the stale-linked-forecast path because the created Forecast uses
brief_hash="a"*64 which doesn't match the work item's computed _BRIEF_HASH,
causing _forecast_covers_brief to return False and skipping the
ForecastDecision.PENDING reuse branch; to fix, either set the existing
Forecast.brief_hash to _BRIEF_HASH so the test exercises the PENDING reuse path
(and asserts CostForecastApprovalRequiredError) or, if the intent is to test the
stale-link path, rename the test to reflect that behavior (e.g.,
test_stale_linked_forecast_raises_approval_required) and adjust assertions
accordingly so the intent is clear when inspecting _forecast_covers_brief,
ForecastDecision.PENDING, and _forecast_for_brief.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 86978907-d6ea-40cd-93d4-82ece6abc953
📒 Files selected for processing (2)
src/synthorg/engine/pipeline/forecast_gate.pytests/unit/engine/pipeline/test_forecast_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test Unit
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Test Integration
- GitHub Check: Lighthouse Site
- GitHub Check: Lighthouse Dashboard
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
Read
docs/design/page before implementing; deviations need approval per DESIGN_SPEC.mdOnly
src/synthorg/persistence/may import sqlite/psycopg or emit raw SQL; new repository protocols must inherit from generic categories inpersistence/_generics.py(SingletonRepository,IdKeyedRepository,FilteredQueryRepository,AppendOnlyRepository,StatefulRepository,MVCCRepository)No hardcoded numeric values; numerics must live in
settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants of the formNAME: int|float|Final|Final[int]|Final[float] = literal; enforced byscripts/check_no_magic_numbers.pyNever use
from __future__ import annotations(Python 3.14 has PEP 649); use PEP 758 except:except A, B:requires parens when bindingType hints on public functions; mypy strict mode; Google-style docstrings; line length 88; functions <50 lines; files <800 lines
Error classes must follow
<Domain><Condition>Errornaming and inherit fromDomainError; never inherit directly fromException,RuntimeError, etc.; enforced bycheck_domain_error_hierarchy.pyAll Pydantic v2 frozen models must have
extra="forbid"project-wide; use@computed_fieldfor derived fields; useNotBlankStrfor identifiers;@computed_fieldis auto-exempt fromfrozen-extra-forbidgate; per-line# lint-allow: frozen-extra-forbid -- <reason>forextra="allow"/"ignore"boundaries; enforced bycheck_frozen_model_extra_forbid.pyArgs models required at every system boundary; use
parse_typed()for every external dict ingestion; enforced bycheck_boundary_typed.pyFor immutability, use
model_copy(update=...)orcopy.deepcopy(); apply deepcopy at system boundariesAsync fan-out/fan-in must use
asyncio.TaskGroup; helpers must catchExceptionand re-raiseMemoryError/RecursionErrorClock seam: include
clock: Clock | None = Noneparameter; tests injectFakeClock; lifecycle: ...
Files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use markers:
@pytest.mark.{unit,integration,e2e,slow}; async tests useauto; global timeout 30s; coverage minimum 80%; xdist-n 8 --dist=loadfileauto-applied via pyprojectaddoptsWindows: unit tests use
WindowsSelectorEventLoopPolicy(3.14 IOCP teardown race); subprocess tests override backTest doubles ladder:
FakeClockfor Clock seam,mock_of[T](**overrides)for typed-boundary substitutions,SimpleNamespacefor attribute-bags; bareMagicMockat a typed boundary (constructor / fn arg / annotated local / typed fixture return) is blocked byscripts/check_mock_spec.py(zero-tolerance, no baseline)Import
FakeClockandmock_offromtests._shared; inject viaclock=and the helper's spec subscriptHypothesis: 10 deterministic CI examples; failures are real bugs (fix + add
@example(...)); never skip/xfail flaky tests; fix fundamentally; useasyncio.Event().wait()notsleep(large)
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/pipeline/test_forecast_gate.py
**/*.{py,js,jsx,ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
No region/currency/locale privileged; use metric units; British English throughout; enforced per docs/reference/regional-defaults.md
Files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Git commits:
<type>: <description>(feat/fix/refactor/docs/test/chore/perf/ci); commitizen-enforced; signed commits required on protected refs (GPG/SSH or GitHub App via synthorg-repo-bot); branches:<type>/<slug>from main
Files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup lifecycle: two phases: construction (
create_appbody) wires synchronous services; on_startup (_build_lifecycle.on_startup) wires services needing connected persistence; ordering invariants:agent_registrybeforeauto_wire_meetings,tunnel_providerunconditional; on-startup:SettingsServicebeforeWorkflowExecutionObserver,OntologyServiceafterpersistence.connect(), cost-dial services after persistence via_try_wire_cost_dialRuntime services:
synthorg.workers.runtime_builder.build_runtime_servicesselects behind ONE provider-present switch, returnsRuntimeServicespair (worker execution service + multi-agent coordinator);_install_runtime_serviceshook installs both viaAppStateseams appended FIRST; empty-company rejects task creation (4014 AgentRuntimeNotConfiguredError) and /coordinate 503s;swap_*methods hold lock against lazy readsSetup completion:
post_setup_reinit()(provider reload, agent bootstrap, runtime-services rebuild + dual hot-swap) propagates failures;settings_svc.set()only if reinit clean; whole sequence underCOMPLETE_LOCKto prevent concurrent race on flag writeTelemetry: opt-in, off by default; every event property must be in
_ALLOWED_PROPERTIES; detail in telemetry.mdProvider resilience: calls go through
BaseCompletionProvider(retry + rate limit); never implement retry in driver subclasses; retryable:RateLimitError,Provider{Timeout,Connection,Internal}Error; WebSocket: per-frame timeout closes silent peers (1008), revalidation saturation closes (4011)Conversational propose:
POST /meta/chat/proposeis opt-in (meta.chief_of_staff.propose_enabled);ChiefOfStaffProposerbuilt bybuild_chief_of_staff_proposer(ENFORCED manifest entry, EPIC#1955discipline) and 503s when ANY of provider, persistence, or work pipeline missing; human content wrapped viawrap_untrusted(TAG_TASK_DATA, ...)(SEC-1)
Files:
src/synthorg/engine/pipeline/forecast_gate.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/pipeline/forecast_gate.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: Present every plan for accept/deny before coding
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: After every issue: create branch + commit + push (no auto-PR); use `/pre-pr-review` (gh pr create blocked by scripts/check_no_pr_create.sh); after PR: use `/aurelio-review-pr` for external feedback; fix EVERYTHING valid; no deferring
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: After every squash merge, run `/post-merge-cleanup`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: Pre-commit/pre-push hooks: `.pre-commit-config.yaml`; tool-call gates: `.claude/settings.json` PreToolUse (`scripts/check_*.sh`/`.py`)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: Squash merge; PR body becomes squash commit; trailers (`Release-As`, `Closes `#N``) must be in PR body
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-21T00:10:24.563Z
Learning: CLI is Docker-only (init/start/stop/status); features go in dashboard and REST API
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/engine/pipeline/test_forecast_gate.pysrc/synthorg/engine/pipeline/forecast_gate.py
🔇 Additional comments (7)
src/synthorg/engine/pipeline/forecast_gate.py (4)
26-26: LGTM!
135-156: LGTM!
158-182: LGTM!
200-208: LGTM!tests/unit/engine/pipeline/test_forecast_gate.py (3)
17-17: LGTM!
140-176: LGTM!
298-319: LGTM!
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Introduced conversational interface for direct clarify and propose interactions. - Cost management now includes forecast gates, hard ceilings, and Pareto considerations. - Added living documentation engine combining wiki and retrieval-augmented generation features. - Real intake engine is now operational for live data processing. - Virtual desktop tool with vision verification gate available for enhanced workspace control. ### What's new - Per-project reproducible environments for consistent setups. - Headless browser testing tool integrated for automated UI validation. - Governed external API and data access tool introduced. - Hardened external-remote git backend with sandbox mounts and push-queue dispatching. - Adversarial red-team gate subsystem for enhanced security testing. - Self-extending toolkit to dynamically expand capabilities. - Stakes-aware model routing enables prioritized processing. - Task-board entry adapter connects live runtime with project management. - Persistent project workspace with pluggable git backend and per-project push queues implemented. - Knowledge and provenance substrate added to track data lineage. - Scoring and data contract framework for golden-company benchmark evaluations. ### Under the hood - Desktop Dockerfile pinned by digest to improve build stability and documented publishing gap fixed. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.7](v0.8.6...v0.8.7) (2026-05-22) ### Features * conversational interface v1 - 1:1 clarify + propose ([#2019](#2019)) ([216ef94](216ef94)), closes [#1968](#1968) * cost as a first-class dial (forecast gate, hard ceiling, Pareto) ([#2029](#2029)) ([700a59e](700a59e)), closes [#1982](#1982) * **env:** reproducible per-project environments ([#2039](#2039)) ([d2c0ef9](d2c0ef9)), closes [#1994](#1994) * **evals:** [#1980](#1980) spine -- scoring + data contract for golden-company benchmark ([#2025](#2025)) ([53108e8](53108e8)) * goal/objective entry adapter ([#1964](#1964)) ([#2022](#2022)) ([cb15c3c](cb15c3c)) * governed external API/data access tool ([#1991](#1991)) ([#2032](#2032)) ([e08b451](e08b451)) * harden external-remote git backend + per-project sandbox mount + push-queue dispatch ([#2020](#2020)) ([#2030](#2030)) ([2fa2e1e](2fa2e1e)) * headless browser testing tool ([#1992](#1992)) ([#2024](#2024)) ([277b52a](277b52a)) * knowledge + provenance substrate ([#2036](#2036)) ([48c897b](48c897b)) * living documentation engine (dual-purpose wiki + RAG namespace) ([#2028](#2028)) ([3d10da9](3d10da9)), closes [#1976](#1976) * real intake engine online ([#2017](#2017)) ([9d8eb34](9d8eb34)) * **redteam:** adversarial red-team gate subsystem ([#1986](#1986)) ([#2026](#2026)) ([d2207e9](d2207e9)) * self-extending toolkit ([#1995](#1995)) ([#2035](#2035)) ([5ffc545](5ffc545)) * stakes-aware model routing ([#1998](#1998)) ([#2038](#2038)) ([9b98312](9b98312)) * task-board entry adapter to live runtime ([#1963](#1963)) ([#2023](#2023)) ([a8f1eea](a8f1eea)) * virtual desktop tool and vision verifier gate ([#2031](#2031)) ([dfe8b42](dfe8b42)), closes [#1993](#1993) * **workspace:** persistent project workspace + pluggable git backend + per-project push queue ([#2021](#2021)) ([ee58ee7](ee58ee7)) ### Bug Fixes * pin desktop Dockerfile by digest (Scorecard [#309](#309)) + document publish gap ([#2034](#2034)) ([8fda188](8fda188)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Makes cost a prospective, operator-facing control rather than a passive ledger (child of EPIC #1979).
CostForecaster(hybrid static-prior + Bayesian-shrinkage estimate with an uncertainty band) feeds aForecastGateat the work-entry seam. Whenforecast_requiredis set, a brief cannot dispatch without an operator-approvedForecastrow; missing/pending yields a fresh pending row and a 402. Newcost_forecaststable (dual-backend, conformance-tested) with apending → approved | rejected | supersededstate machine and a partial-unique-pending index.BudgetCheckerraisesRunHardCeilingExceededError(subclass ofBudgetExhaustedError) the moment accumulated cost meets the per-taskhard_ceiling(globalrun_hard_ceilingfallback). The engine routes toTerminationReason.PARKEDviaApprovalGate.park_context, stamps aHaltContextonto the forecast row, and the operator raises the ceiling (raise_ceiling, rejected byRunHardCeilingTooLowErrorif it does not clear accumulated cost) to resume. The approvedforecast_id+ceiling_amountare propagated onto theTaskin the intake phase so the checker enforces the per-brief ceiling and the resume banner can render.ParetoAnalyzer+ a contractedBenchmarkScoreProvider(calibrated stub pending Golden-company benchmark (eval spine) #1980) produce a frontier ranked bycost_saving_pct; every point carries asourcefield the dashboard surfaces so stub data is never mistaken for measured data.BudgetForecastDialog,CostForecastApprovalCard,HardCeilingHaltedBanner,ParetoSection, and abudgetForecastZustand store, wired intoBudgetPage.All four cost-dial symbols are
ENFORCEDin the ghost-wiring manifest.Test plan
forecast_idpropagation, park/halt-stamp degradation, Pareto frontier, domain-error hierarchy, model invariant validators, forecast-controller logic (503, raise-ceiling 422 + resume).cost_forecastsdual-backend (SQLite + Postgres) CRUD, state machine, same-currency invariant,halt_contextround-trip,count().tests/integration/test_cost_dial_e2e.py— full lifecycle (forecast → approve → dispatch → ceiling → PARKED → raise → resume → Pareto), validated locally on both backends.mypy,ruff, web type-check/lint/tests, schema-drift, dto-sync, and every convention gate pass on push.Review coverage
Pre-reviewed via
/pre-pr-review(18 agents). 18 valid findings addressed (2 CRITICAL production-integration gaps —forecast_id+ approved ceiling not reaching theTask; 5 MAJOR; 9 MEDIUM; 3 LOW); 4 false positives rejected with evidence. Two real e2e test bugs (un-awaited async handler, syncpark_contextdouble) caught and fixed.Closes #1982