Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ See `web/CLAUDE.md` for the full component inventory, design token rules, and po
- **Every module** with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code (exception: `observability/setup.py`, `observability/sinks.py`, `observability/syslog_handler.py`, `observability/http_handler.py`, and `observability/otlp_handler.py` may use stdlib `logging` and `print(..., file=sys.stderr)` for handler construction, bootstrap, and error reporting code that runs before or during logging system configuration)
- **Variable name**: always `logger` (not `_logger`, not `log`)
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED`, `CONTEXT_BUDGET_COMPACTION_STARTED`, `CONTEXT_BUDGET_COMPACTION_COMPLETED`, `CONTEXT_BUDGET_COMPACTION_FAILED`, `CONTEXT_BUDGET_COMPACTION_SKIPPED`, `CONTEXT_BUDGET_COMPACTION_FALLBACK`, `CONTEXT_BUDGET_INDICATOR_INJECTED`, `CONTEXT_BUDGET_AGENT_COMPACTION_REQUESTED`, `CONTEXT_BUDGET_EPISTEMIC_MARKERS_PRESERVED` from `events.context_budget`, `BACKUP_STARTED` from `events.backup`, `SETUP_COMPLETED` from `events.setup`, `ROUTING_CANDIDATE_SELECTED` from `events.routing`, `SHIPPING_HTTP_BATCH_SENT` from `events.shipping`, `EVAL_REPORT_COMPUTED` from `events.evaluation`, `PROMPT_PROFILE_SELECTED` from `events.prompt`, `PROCEDURAL_MEMORY_START` from `events.procedural_memory`, `PERF_LLM_JUDGE_STARTED` from `events.performance`, `TASK_ENGINE_OBSERVER_FAILED` from `events.task_engine`, `WORKFLOW_EXEC_COMPLETED` from `events.workflow_execution`, `BLUEPRINT_INSTANTIATE_START` from `events.blueprint`, `WORKFLOW_DEF_ROLLED_BACK` from `events.workflow_definition`, `WORKFLOW_VERSION_SAVED` from `events.workflow_version`, `MEMORY_FINE_TUNE_STARTED`, `MEMORY_SELF_EDIT_TOOL_EXECUTE`, `MEMORY_SELF_EDIT_CORE_READ`, `MEMORY_SELF_EDIT_CORE_WRITE`, `MEMORY_SELF_EDIT_CORE_WRITE_REJECTED`, `MEMORY_SELF_EDIT_ARCHIVAL_SEARCH`, `MEMORY_SELF_EDIT_ARCHIVAL_WRITE`, `MEMORY_SELF_EDIT_RECALL_READ`, `MEMORY_SELF_EDIT_RECALL_WRITE`, `MEMORY_SELF_EDIT_WRITE_FAILED` from `events.memory`, `REPORTING_GENERATION_STARTED` from `events.reporting`, `RISK_BUDGET_SCORE_COMPUTED` from `events.risk_budget`, `LLM_STRATEGY_SYNTHESIZED` and `DISTILLATION_CAPTURED` from `events.consolidation`, `MEMORY_DIVERSITY_RERANKED`, `MEMORY_DIVERSITY_RERANK_FAILED`, and `MEMORY_REFORMULATION_ROUND` from `events.memory`, `NOTIFICATION_DISPATCHED` and `NOTIFICATION_DISPATCH_FAILED` from `events.notification`, `QUALITY_STEP_CLASSIFIED` from `events.quality`, `HEALTH_TICKET_EMITTED` from `events.health`, `TRAJECTORY_SCORING_START` from `events.trajectory`, `COORD_METRICS_AMDAHL_COMPUTED` from `events.coordination_metrics`, `COORDINATION_STARTED`, `COORDINATION_COMPLETED`, `COORDINATION_FAILED`, `COORDINATION_PHASE_STARTED`, `COORDINATION_PHASE_COMPLETED`, `COORDINATION_PHASE_FAILED`, `COORDINATION_WAVE_STARTED`, `COORDINATION_WAVE_COMPLETED`, `COORDINATION_TOPOLOGY_RESOLVED`, `COORDINATION_CLEANUP_STARTED`, `COORDINATION_CLEANUP_COMPLETED`, `COORDINATION_CLEANUP_FAILED`, `COORDINATION_WAVE_BUILT`, `COORDINATION_FACTORY_BUILT`, and `COORDINATION_ATTRIBUTION_BUILT` from `events.coordination`, `WEB_REQUEST_START` and `WEB_SSRF_BLOCKED` from `events.web`, `DB_QUERY_START` and `DB_WRITE_BLOCKED` from `events.database`, `TERMINAL_COMMAND_START` and `TERMINAL_COMMAND_BLOCKED` from `events.terminal`, `SUB_CONSTRAINT_RESOLVED` and `SUB_CONSTRAINT_DENIED` from `events.sub_constraint`, `VERSION_SAVED` and `VERSION_SNAPSHOT_FAILED` from `events.versioning`, `ANALYTICS_AGGREGATION_COMPUTED` and `ANALYTICS_RETRY_RATE_ALERT` from `events.analytics`, `CALL_CLASSIFICATION_COMPUTED` from `events.call_classification`, `QUOTA_THRESHOLD_ALERT` and `QUOTA_POLL_FAILED` from `events.quota`, `CONFLICT_DEBATE_EVALUATOR_FAILED` from `events.conflict`, `DELEGATION_LOOP_CIRCUIT_BACKOFF` and `DELEGATION_LOOP_CIRCUIT_PERSIST_FAILED` from `events.delegation`, `MEETING_EVENT_COOLDOWN_SKIPPED` and `MEETING_TASKS_CAPPED` from `events.meeting`, `PERSISTENCE_CIRCUIT_BREAKER_SAVED`, `PERSISTENCE_CIRCUIT_BREAKER_SAVE_FAILED`, `PERSISTENCE_CIRCUIT_BREAKER_LOADED`, `PERSISTENCE_CIRCUIT_BREAKER_LOAD_FAILED`, `PERSISTENCE_CIRCUIT_BREAKER_DELETED`, and `PERSISTENCE_CIRCUIT_BREAKER_DELETE_FAILED` from `events.persistence`, `METRICS_SCRAPE_COMPLETED`, `METRICS_SCRAPE_FAILED`, `METRICS_COLLECTOR_INITIALIZED`, `METRICS_COORDINATION_RECORDED`, `METRICS_OTLP_EXPORT_COMPLETED` and `METRICS_OTLP_FLUSHER_STOPPED` from `events.metrics`, `ORG_MEMORY_QUERY_START`, `ORG_MEMORY_QUERY_COMPLETE`, `ORG_MEMORY_QUERY_FAILED`, `ORG_MEMORY_WRITE_START`, `ORG_MEMORY_WRITE_COMPLETE`, `ORG_MEMORY_WRITE_DENIED`, `ORG_MEMORY_WRITE_FAILED`, `ORG_MEMORY_POLICIES_LISTED`, `ORG_MEMORY_BACKEND_CREATED`, `ORG_MEMORY_CONNECT_FAILED`, `ORG_MEMORY_DISCONNECT_FAILED`, `ORG_MEMORY_NOT_CONNECTED`, `ORG_MEMORY_ROW_PARSE_FAILED`, `ORG_MEMORY_CONFIG_INVALID`, `ORG_MEMORY_MODEL_INVALID`, `ORG_MEMORY_MVCC_PUBLISH_APPENDED`, `ORG_MEMORY_MVCC_RETRACT_APPENDED`, `ORG_MEMORY_MVCC_SNAPSHOT_AT_QUERIED`, and `ORG_MEMORY_MVCC_LOG_QUERIED` from `events.org_memory`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
- **Event names**: always use constants from the domain-specific module under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`, `GIT_COMMAND_START` from `events.git`, `CONTEXT_BUDGET_FILL_UPDATED`, `CONTEXT_BUDGET_COMPACTION_STARTED`, `CONTEXT_BUDGET_COMPACTION_COMPLETED`, `CONTEXT_BUDGET_COMPACTION_FAILED`, `CONTEXT_BUDGET_COMPACTION_SKIPPED`, `CONTEXT_BUDGET_COMPACTION_FALLBACK`, `CONTEXT_BUDGET_INDICATOR_INJECTED`, `CONTEXT_BUDGET_AGENT_COMPACTION_REQUESTED`, `CONTEXT_BUDGET_EPISTEMIC_MARKERS_PRESERVED` from `events.context_budget`, `BACKUP_STARTED` from `events.backup`, `SETUP_COMPLETED` from `events.setup`, `ROUTING_CANDIDATE_SELECTED` from `events.routing`, `SHIPPING_HTTP_BATCH_SENT` from `events.shipping`, `EVAL_REPORT_COMPUTED` from `events.evaluation`, `PROMPT_PROFILE_SELECTED` from `events.prompt`, `PROCEDURAL_MEMORY_START` from `events.procedural_memory`, `PERF_LLM_JUDGE_STARTED` from `events.performance`, `TASK_ENGINE_OBSERVER_FAILED` from `events.task_engine`, `TASK_ASSIGNMENT_PROJECT_FILTERED` and `TASK_ASSIGNMENT_PROJECT_NO_ELIGIBLE` from `events.task_assignment`, `WORKFLOW_EXEC_COMPLETED` from `events.workflow_execution`, `BLUEPRINT_INSTANTIATE_START` from `events.blueprint`, `WORKFLOW_DEF_ROLLED_BACK` from `events.workflow_definition`, `WORKFLOW_VERSION_SAVED` from `events.workflow_version`, `MEMORY_FINE_TUNE_STARTED`, `MEMORY_SELF_EDIT_TOOL_EXECUTE`, `MEMORY_SELF_EDIT_CORE_READ`, `MEMORY_SELF_EDIT_CORE_WRITE`, `MEMORY_SELF_EDIT_CORE_WRITE_REJECTED`, `MEMORY_SELF_EDIT_ARCHIVAL_SEARCH`, `MEMORY_SELF_EDIT_ARCHIVAL_WRITE`, `MEMORY_SELF_EDIT_RECALL_READ`, `MEMORY_SELF_EDIT_RECALL_WRITE`, `MEMORY_SELF_EDIT_WRITE_FAILED` from `events.memory`, `REPORTING_GENERATION_STARTED` from `events.reporting`, `RISK_BUDGET_SCORE_COMPUTED` from `events.risk_budget`, `BUDGET_PROJECT_COST_QUERIED`, `BUDGET_PROJECT_RECORDS_QUERIED`, `BUDGET_PROJECT_BUDGET_EXCEEDED`, and `BUDGET_PROJECT_ENFORCEMENT_CHECK` from `events.budget`, `LLM_STRATEGY_SYNTHESIZED` and `DISTILLATION_CAPTURED` from `events.consolidation`, `MEMORY_DIVERSITY_RERANKED`, `MEMORY_DIVERSITY_RERANK_FAILED`, and `MEMORY_REFORMULATION_ROUND` from `events.memory`, `NOTIFICATION_DISPATCHED` and `NOTIFICATION_DISPATCH_FAILED` from `events.notification`, `QUALITY_STEP_CLASSIFIED` from `events.quality`, `HEALTH_TICKET_EMITTED` from `events.health`, `TRAJECTORY_SCORING_START` from `events.trajectory`, `COORD_METRICS_AMDAHL_COMPUTED` from `events.coordination_metrics`, `COORDINATION_STARTED`, `COORDINATION_COMPLETED`, `COORDINATION_FAILED`, `COORDINATION_PHASE_STARTED`, `COORDINATION_PHASE_COMPLETED`, `COORDINATION_PHASE_FAILED`, `COORDINATION_WAVE_STARTED`, `COORDINATION_WAVE_COMPLETED`, `COORDINATION_TOPOLOGY_RESOLVED`, `COORDINATION_CLEANUP_STARTED`, `COORDINATION_CLEANUP_COMPLETED`, `COORDINATION_CLEANUP_FAILED`, `COORDINATION_WAVE_BUILT`, `COORDINATION_FACTORY_BUILT`, and `COORDINATION_ATTRIBUTION_BUILT` from `events.coordination`, `WEB_REQUEST_START` and `WEB_SSRF_BLOCKED` from `events.web`, `DB_QUERY_START` and `DB_WRITE_BLOCKED` from `events.database`, `TERMINAL_COMMAND_START` and `TERMINAL_COMMAND_BLOCKED` from `events.terminal`, `SUB_CONSTRAINT_RESOLVED` and `SUB_CONSTRAINT_DENIED` from `events.sub_constraint`, `VERSION_SAVED` and `VERSION_SNAPSHOT_FAILED` from `events.versioning`, `ANALYTICS_AGGREGATION_COMPUTED` and `ANALYTICS_RETRY_RATE_ALERT` from `events.analytics`, `CALL_CLASSIFICATION_COMPUTED` from `events.call_classification`, `QUOTA_THRESHOLD_ALERT` and `QUOTA_POLL_FAILED` from `events.quota`, `CONFLICT_DEBATE_EVALUATOR_FAILED` from `events.conflict`, `DELEGATION_LOOP_CIRCUIT_BACKOFF` and `DELEGATION_LOOP_CIRCUIT_PERSIST_FAILED` from `events.delegation`, `MEETING_EVENT_COOLDOWN_SKIPPED` and `MEETING_TASKS_CAPPED` from `events.meeting`, `PERSISTENCE_CIRCUIT_BREAKER_SAVED`, `PERSISTENCE_CIRCUIT_BREAKER_SAVE_FAILED`, `PERSISTENCE_CIRCUIT_BREAKER_LOADED`, `PERSISTENCE_CIRCUIT_BREAKER_LOAD_FAILED`, `PERSISTENCE_CIRCUIT_BREAKER_DELETED`, and `PERSISTENCE_CIRCUIT_BREAKER_DELETE_FAILED` from `events.persistence`, `METRICS_SCRAPE_COMPLETED`, `METRICS_SCRAPE_FAILED`, `METRICS_COLLECTOR_INITIALIZED`, `METRICS_COORDINATION_RECORDED`, `METRICS_OTLP_EXPORT_COMPLETED` and `METRICS_OTLP_FLUSHER_STOPPED` from `events.metrics`, `EXECUTION_PROJECT_VALIDATION_FAILED` from `events.execution`, `ORG_MEMORY_QUERY_START`, `ORG_MEMORY_QUERY_COMPLETE`, `ORG_MEMORY_QUERY_FAILED`, `ORG_MEMORY_WRITE_START`, `ORG_MEMORY_WRITE_COMPLETE`, `ORG_MEMORY_WRITE_DENIED`, `ORG_MEMORY_WRITE_FAILED`, `ORG_MEMORY_POLICIES_LISTED`, `ORG_MEMORY_BACKEND_CREATED`, `ORG_MEMORY_CONNECT_FAILED`, `ORG_MEMORY_DISCONNECT_FAILED`, `ORG_MEMORY_NOT_CONNECTED`, `ORG_MEMORY_ROW_PARSE_FAILED`, `ORG_MEMORY_CONFIG_INVALID`, `ORG_MEMORY_MODEL_INVALID`, `ORG_MEMORY_MVCC_PUBLISH_APPENDED`, `ORG_MEMORY_MVCC_RETRACT_APPENDED`, `ORG_MEMORY_MVCC_SNAPSHOT_AT_QUERIED`, and `ORG_MEMORY_MVCC_LOG_QUERIED` from `events.org_memory`). Each domain has its own module -- see `src/synthorg/observability/events/` for the full inventory of constants. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
- **Structured kwargs**: always `logger.info(EVENT, key=value)` -- never `logger.info("msg %s", val)`
- **All error paths** must log at WARNING or ERROR with context before raising
- **All state transitions** must log at INFO
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ curl http://localhost:3001/api/v1/health

## What's Inside

**[Agent Orchestration](https://synthorg.io/docs/design/engine/)** -- Task decomposition, 6 routing strategies, execution loops (ReAct, Plan-and-Execute, Hybrid, auto-selection by complexity), crash recovery with checkpoint resume, and multi-agent coordination.
**[Agent Orchestration](https://synthorg.io/docs/design/engine/)** -- Task decomposition, 6 routing strategies, execution loops (ReAct, Plan-and-Execute, Hybrid, auto-selection by complexity), crash recovery with checkpoint resume, multi-agent coordination, and multi-project support with project-scoped teams and isolated budgets.

**[Budget & Cost Management](https://synthorg.io/docs/design/operations/)** -- Per-agent cost limits with hierarchical cascading, auto-downgrade to cheaper models at task boundaries, spending reports, budget forecasting, and anomaly detection.
**[Budget & Cost Management](https://synthorg.io/docs/design/operations/)** -- Per-agent and per-project cost limits with hierarchical cascading, auto-downgrade to cheaper models at task boundaries, spending reports, budget forecasting, and anomaly detection.

**[Security & Trust](https://synthorg.io/docs/security/)** -- SecOps agent with fail-closed rule engine, progressive trust (4 strategies), configurable autonomy levels (4 tiers), approval gates, LLM fallback evaluator, and audit logging. Container images are cosign-signed with [SLSA L3](https://slsa.dev) provenance.

Expand Down
37 changes: 23 additions & 14 deletions docs/design/engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,15 @@ async run(
monthly hard stop and daily limit via `check_can_execute()`, then apply
auto-downgrade via `resolve_model()`. Raises `BudgetExhaustedError` or
`DailyLimitExceededError` on violation.
3. **Build system prompt** -- calls `build_system_prompt()` with agent identity,
3. **Project validation** -- if `ProjectRepository` is provided, validate that the
task's project exists (`ProjectNotFoundError` if not) and that the agent is a
member of the project team (`ProjectAgentNotMemberError` if not; empty teams
allow any agent). When the project has a non-zero budget and `BudgetEnforcer`
is available, check project-level budget via `check_project_budget()`. Raises
`ProjectBudgetExhaustedError` when the project's accumulated cost has reached
its budget. Pre-flight project budget checks are best-effort under concurrency
(TOCTOU); the in-flight `BudgetChecker` closure provides the true safety net.
4. **Build system prompt** -- calls `build_system_prompt()` with agent identity,
task, and resolved model tier. The tier determines a `PromptProfile` that
controls prompt verbosity (see [Prompt Profiles](#prompt-profiles) below),
including personality token trimming when the section exceeds the profile's
Expand All @@ -661,17 +669,17 @@ async run(
Follows the **non-inferable-only principle**: system prompts include only
information the agent cannot discover by reading the codebase or environment
(role constraints, custom conventions, organizational policies).
4. **Create context** -- `AgentContext.from_identity()` with the configured
5. **Create context** -- `AgentContext.from_identity()` with the configured
`max_turns`.
5. **Seed conversation** -- injects system prompt, optional memory messages, and
6. **Seed conversation** -- injects system prompt, optional memory messages, and
formatted task instruction as initial messages.
6. **Transition task** -- `ASSIGNED` -> `IN_PROGRESS` (pass-through if already
7. **Transition task** -- `ASSIGNED` -> `IN_PROGRESS` (pass-through if already
`IN_PROGRESS`).
7. **Prepare tools and budget** -- creates `ToolInvoker` from registry and
`BudgetChecker` from `BudgetEnforcer` (task + monthly + daily limits with
pre-computed baselines and alert deduplication) or from task budget limit
8. **Prepare tools and budget** -- creates `ToolInvoker` from registry and
`BudgetChecker` from `BudgetEnforcer` (task + monthly + daily + project limits
with pre-computed baselines and alert deduplication) or from task budget limit
alone when no enforcer is configured.
8. **Resolve execution loop** -- if `auto_loop_config` is set, calls
9. **Resolve execution loop** -- if `auto_loop_config` is set, calls
`select_loop_type()` with the task's `estimated_complexity` and current
budget utilization (via `BudgetEnforcer.get_budget_utilization_pct()`).
Budget-aware downgrade: hybrid is downgraded to plan_execute when
Expand All @@ -681,7 +689,7 @@ async run(
engine's `compaction_callback`, `plan_execute_config` (for
plan-execute), and `hybrid_loop_config` (for hybrid), along with the
approval gate and stagnation detector.
9. **Delegate to loop** -- calls `ExecutionLoop.execute()` with context,
10. **Delegate to loop** -- calls `ExecutionLoop.execute()` with context,
provider, tool invoker, budget checker, and completion config. If
`timeout_seconds` is set, wraps the call in `asyncio.wait`; on expiry
the run returns with `TerminationReason.ERROR` but cost recording and
Expand All @@ -691,9 +699,10 @@ async run(
parking is needed. If so, the context is serialized via `ParkService`
and persisted when a `ParkedContextRepository` is configured; the loop
then returns a `PARKED` result.
10. **Record costs** -- records accumulated `TokenUsage` to `CostTracker` (if
available). Cost recording failures are logged but do not affect the result.
11. **Apply post-execution transitions:**
11. **Record costs** -- records accumulated `TokenUsage` to `CostTracker` (if
available), tagged with `project_id` for project-level cost aggregation.
Cost recording failures are logged but do not affect the result.
12. **Apply post-execution transitions:**
- `COMPLETED` termination: IN_PROGRESS -> IN_REVIEW (review gate).
The task parks at IN_REVIEW until resolved by one of two paths:
(a) a human approves (-> COMPLETED) or rejects (-> IN_PROGRESS
Expand Down Expand Up @@ -757,13 +766,13 @@ async run(
[AgentEngine ↔ TaskEngine Incremental Sync](#agentengine--taskengine-incremental-sync)).
- Transition failures are logged but do not discard the successful execution
result.
12. **Procedural memory generation** (non-critical) -- when
13. **Procedural memory generation** (non-critical) -- when
`ProceduralMemoryConfig` is enabled and the execution failed
(recovery_result exists), a separate proposer LLM call analyses the
failure and stores a `PROCEDURAL` memory entry for future retrieval.
Optionally materializes a SKILL.md file. Failures are logged but do
not affect the result (see [Memory > Procedural Memory Auto-Generation](memory.md#procedural-memory-auto-generation)).
13. **Return result** -- wraps `ExecutionResult` in `AgentRunResult` with
14. **Return result** -- wraps `ExecutionResult` in `AgentRunResult` with
engine-level metadata.

**Error handling:** `MemoryError` and `RecursionError` propagate
Expand Down
41 changes: 41 additions & 0 deletions src/synthorg/budget/_enforcer_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
BUDGET_DOWNGRADE_APPLIED,
BUDGET_DOWNGRADE_SKIPPED,
BUDGET_HARD_STOP_TRIGGERED,
BUDGET_PROJECT_BUDGET_EXCEEDED,
BUDGET_TASK_LIMIT_HIT,
BUDGET_TIER_PRESERVED,
)
Expand Down Expand Up @@ -256,6 +257,9 @@ def _build_checker_closure( # noqa: PLR0913
daily_baseline: float,
thresholds: _AlertThresholds,
agent_id: str,
project_budget: float = 0.0,
project_baseline: float = 0.0,
project_id: str | None = None,
) -> BudgetChecker:
"""Build the sync budget checker closure.

Expand All @@ -267,6 +271,10 @@ def _build_checker_closure( # noqa: PLR0913
daily_baseline: Pre-computed daily spend at task start.
thresholds: Pre-computed alert thresholds.
agent_id: Agent identifier for logging.
project_budget: Total project budget (0 = disabled).
project_baseline: Pre-computed project spend at task start.
project_id: Project identifier for logging (None when
project budget is disabled).

Returns:
Sync callable returning ``True`` when budget is exhausted.
Expand All @@ -277,6 +285,13 @@ def _check(ctx: AgentContext) -> bool:
running_cost = ctx.accumulated_cost.cost_usd
return (
_check_task_limit(running_cost, task_limit, agent_id)
or _check_project_limit(
running_cost,
project_budget,
project_baseline,
agent_id,
project_id,
)
Comment on lines +288 to +294
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Project budget checks miss concurrent spend.

running_cost here is the current execution's ctx.accumulated_cost.cost_usd, so each closure only sees its own in-flight spend. If two agents run in the same project concurrently, both can reuse the same project_baseline snapshot and stay under project_budget individually while the combined project total overshoots it. That makes this a best-effort per-execution stop, not a true project-wide in-flight limit.

Please compare against a shared project accumulator/live tracker instead of project_baseline + local running_cost.

Also applies to: 397-420

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/_enforcer_helpers.py` around lines 287 - 293, The project
limit check in _check_project_limit (called with running_cost, project_baseline,
agent_id, project_id) uses only the per-execution running_cost +
project_baseline snapshot and thus misses concurrent in-flight spend; replace
the baseline+local check with a query to a shared in-flight project
accumulator/live tracker (e.g., get_or_create_project_inflight(project_id)) and
compare running_cost + live_inflight_cost against project_budget, performing the
check-and-reserve atomically (use a lock or atomic increment) and ensure you
decrement/release the reserved amount when the run finishes/fails; update both
call sites (the _check_project_limit invocation around
running_cost/project_baseline and the similar logic at the 397-420 region) to
use the shared tracker and atomic reserve/release pattern, referencing
ctx.accumulated_cost.cost_usd, project_id, and agent_id for locating and
updating the tracker.

or _check_monthly_limit(
running_cost,
monthly_budget,
Expand Down Expand Up @@ -378,3 +393,29 @@ def _check_daily_limit(
)
return True
return False


def _check_project_limit(
running_cost: float,
project_budget: float,
project_baseline: float,
agent_id: str,
project_id: str | None = None,
) -> bool:
"""Return True if project budget is exhausted."""
if project_budget <= 0:
return False
total_project = round(
project_baseline + running_cost,
BUDGET_ROUNDING_PRECISION,
)
if total_project >= project_budget:
logger.warning(
BUDGET_PROJECT_BUDGET_EXCEEDED,
agent_id=agent_id,
project_id=project_id,
total_project=total_project,
project_budget=project_budget,
)
return True
Comment on lines +398 to +420
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the in-flight budget checker, project budget exhaustion is logged without the project identifier (only agent_id/total_project/project_budget). With multi-project concurrency this makes the event hard to attribute. Consider threading project_id into the checker closure (e.g., as an additional _build_checker_closure arg) and include it in the BUDGET_PROJECT_BUDGET_EXCEEDED log fields.

Copilot uses AI. Check for mistakes.
return False
Loading
Loading