feat: work pipeline spine (#1960)#2013
Conversation
Compose intake -> projects -> decompose (solo-vs-team verdict) -> solo or team execute -> coordination metrics into one pluggable WorkPipeline that every entry adapter feeds via a typed WorkItem. Solo-vs-team is owned by a pluggable WorkRoutingPolicy (leaf-threshold default, always-team, llm-judged); leaf work runs single-agent via the worker execution service, splittable work via the coordinator. Boot-wired through the existing _install_runtime_services hook as a RuntimeServices 3-tuple sharing one AgentEngine and one AgentTaskScorer; AppState work_pipeline seam mirrors coordinator; post_setup_reinit hot-swaps it. Manifest line ENFORCED; no-ghost-wiring gate green. Deviation: did not add a redundant (WorkPipelineError, handle_domain_error) entry -- the existing DomainError catch-all dispatches all subclasses via MRO.
…log (#1960) Review-round fixes: (1) CRITICAL SEC-1 -- llm-judged routing policy now appends untrusted_content_directive((TAG_TASK_DATA,)) to the system prompt and wraps task content with the shared TAG_TASK_DATA tag, so a crafted task title/description cannot inject routing instructions. (2) register the new 'pipeline' event module in test_events expected-domain set (pre-push full-suite gate). (3) log the simulation-runtime-present-but-intake-unset early return in _build_runtime_work_pipeline for observability parity.
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI (base), Organization UI (inherited) Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR implements a complete work pipeline "spine" that routes work through a unified execution path from intake through project validation to solo-agent or team coordination execution. It establishes domain contracts and models (WorkItem, WorkPipelineResult, WorkPipelineError), implements three routing policies (leaf-threshold, always-team, llm-judged) that decide solo-vs-team routing based on task characteristics, develops the DefaultWorkPipeline service orchestrating a staged execution spine with observability and error handling, wires the pipeline into AppState as a managed dependency seam, and integrates boot-time construction into RuntimeServices with shared scorer and coordinator instances. Comprehensive unit and integration test coverage validates individual components and end-to-end execution paths under the simulation harness. 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/engine/pipeline/models.py`:
- Around line 189-191: Make is_success a derived/computed field instead of a
free Field: remove the current Field(...) declaration for is_success and add a
`@computed_field` method (using pydantic's computed_field) on the same model that
returns True only if all recorded phase outcomes are successful (e.g., iterate
the model's phases or recorded_phases sequence and return all(phase.is_success
for phase in phases)). Ensure the computed method name references is_success and
use the model's actual phase container (phases or recorded_phases) so callers
cannot construct inconsistent objects.
In `@src/synthorg/engine/pipeline/policy/llm_judged.py`:
- Around line 133-150: The _parse_verdict function currently uses naive
substring checks and misclassifies negated phrases like "not splittable"; change
it to use whole-word regex checks with negation detection: import re and replace
the "if 'splittable' in text" / "if 'leaf' in text" checks with pattern checks
such as if re.search(r'\bsplittable\b', text) and not
re.search(r'\b(?:not|no|never|n\'t)\b.*\bsplittable\b', text): return
RoutingVerdict.SPLITTABLE (and analogously for 'leaf'); ensure the checks are
case-insensitive (text is already lowered) and return None when a
negation/qualification is detected to avoid false positives.
In `@src/synthorg/engine/pipeline/policy/threshold.py`:
- Around line 36-45: The Threshold policy's __init__ currently accepts
non-positive values for threshold (symbol: self._threshold in __init__), which
breaks routing; add validation in the constructor of the class (the __init__
shown) to check the provided threshold and raise a ValueError with a clear
message if threshold <= 0, otherwise assign to self._threshold as before; keep
the existing classifier fallback (self._classifier = ...) intact and update any
unit tests to expect the ValueError for invalid inputs.
In `@src/synthorg/engine/pipeline/service.py`:
- Around line 134-137: The call to self._agent_registry.list_active() occurs
outside the _phase(...) call, so agent-registry errors and lookup latency aren't
captured by the _PHASE_DECOMPOSE telemetry; move the list_active() invocation
into the same closure passed to self._phase so that the entire lookup + decision
logic (i.e., fetching agents and invoking self._decide(task, agents)) runs
inside the _phase for _PHASE_DECOMPOSE, ensuring any exceptions and the full
timing for agent retrieval are recorded and the returned verdict remains
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: a03db119-bee3-4bea-b505-0aaf5d9e3a9f
📒 Files selected for processing (27)
scripts/_ghost_wiring_manifest.txtsrc/synthorg/api/app.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/api/state.pysrc/synthorg/engine/coordination/factory.pysrc/synthorg/engine/pipeline/__init__.pysrc/synthorg/engine/pipeline/errors.pysrc/synthorg/engine/pipeline/factory.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/__init__.pysrc/synthorg/engine/pipeline/policy/always_team.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/protocol.pysrc/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/protocol.pysrc/synthorg/engine/pipeline/service.pysrc/synthorg/observability/events/pipeline.pysrc/synthorg/settings/definitions/coordination.pysrc/synthorg/workers/runtime_builder.pytests/e2e/test_work_pipeline_spine_e2e.pytests/unit/api/test_state.pytests/unit/engine/test_pipeline_errors.pytests/unit/engine/test_pipeline_models.pytests/unit/engine/test_pipeline_policy.pytests/unit/engine/test_pipeline_service.pytests/unit/observability/test_events.pytests/unit/workers/test_runtime_builder.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test Integration
- GitHub Check: Test Unit
- GitHub Check: Dashboard Test
- GitHub Check: Test E2E
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py
Files:
src/synthorg/observability/events/pipeline.pysrc/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/engine/pipeline/__init__.pysrc/synthorg/engine/pipeline/factory.pysrc/synthorg/engine/pipeline/policy/always_team.pysrc/synthorg/engine/pipeline/protocol.pysrc/synthorg/api/app.pysrc/synthorg/engine/pipeline/errors.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/protocol.pysrc/synthorg/settings/definitions/coordination.pysrc/synthorg/api/state.pysrc/synthorg/engine/coordination/factory.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/__init__.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/engine/pipeline/service.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use@computed_fieldfor derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: includeclock: Clock | None = Noneparameter; tests inject FakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...
Files:
src/synthorg/observability/events/pipeline.pysrc/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/engine/pipeline/__init__.pysrc/synthorg/engine/pipeline/factory.pysrc/synthorg/engine/pipeline/policy/always_team.pysrc/synthorg/engine/pipeline/protocol.pysrc/synthorg/api/app.pysrc/synthorg/engine/pipeline/errors.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/protocol.pysrc/synthorg/settings/definitions/coordination.pysrc/synthorg/api/state.pysrc/synthorg/engine/coordination/factory.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/__init__.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/engine/pipeline/service.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/pipeline.pysrc/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/engine/pipeline/__init__.pysrc/synthorg/engine/pipeline/factory.pysrc/synthorg/engine/pipeline/policy/always_team.pysrc/synthorg/engine/pipeline/protocol.pysrc/synthorg/api/app.pysrc/synthorg/engine/pipeline/errors.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/protocol.pysrc/synthorg/settings/definitions/coordination.pysrc/synthorg/api/state.pysrc/synthorg/engine/coordination/factory.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/__init__.pysrc/synthorg/workers/runtime_builder.pysrc/synthorg/engine/pipeline/service.py
src/synthorg/observability/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md
Files:
src/synthorg/observability/events/pipeline.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/observability/test_events.pytests/unit/engine/test_pipeline_errors.pytests/unit/api/test_state.pytests/unit/engine/test_pipeline_service.pytests/unit/engine/test_pipeline_models.pytests/e2e/test_work_pipeline_spine_e2e.pytests/unit/workers/test_runtime_builder.pytests/unit/engine/test_pipeline_policy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_events.pytests/unit/engine/test_pipeline_errors.pytests/unit/api/test_state.pytests/unit/engine/test_pipeline_service.pytests/unit/engine/test_pipeline_models.pytests/e2e/test_work_pipeline_spine_e2e.pytests/unit/workers/test_runtime_builder.pytests/unit/engine/test_pipeline_policy.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
API startup lifecycle: construction phase wires synchronous services; on_startup wires services needing connected persistence. Construction-phase invariants: agent_registry before auto_wire_meetings; tunnel_provider wired unconditionally. On-startup invariants: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService wires after persistence.connect() via _wire_ontology_service
Files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/api/app.pysrc/synthorg/api/state.py
src/synthorg/{workers,api}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returning RuntimeServices (AgentEngineExecutionService + coordinator or NoProviderExecutionService + None). _install_runtime_services appends FIRST after persistence/SettingsService hooks; swap_worker_execution_service / swap_coordinator / swap_provider_registry hold a lock. Empty-company rejects task creation (AgentRuntimeNotConfiguredError 4014) and /coordinate 503s
Files:
src/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/workers/runtime_builder.py
src/synthorg/api/controllers/setup/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Setup completion: post_setup_reinit() propagates failures; settings_svc.set("api", "setup_complete", "true") only runs if reinit returns clean. Check/validate/reinit/persist serialised under COMPLETE_LOCK (src/synthorg/api/controllers/setup/agent_helpers.py) to prevent concurrent /setup/complete race conditions
Files:
src/synthorg/api/controllers/setup/agent_helpers.py
🧠 Learnings (3)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/observability/events/pipeline.pysrc/synthorg/engine/pipeline/policy/threshold.pytests/unit/observability/test_events.pysrc/synthorg/api/controllers/setup/agent_helpers.pysrc/synthorg/engine/pipeline/__init__.pytests/unit/engine/test_pipeline_errors.pysrc/synthorg/engine/pipeline/factory.pysrc/synthorg/engine/pipeline/policy/always_team.pysrc/synthorg/engine/pipeline/protocol.pysrc/synthorg/api/app.pysrc/synthorg/engine/pipeline/errors.pytests/unit/api/test_state.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/protocol.pysrc/synthorg/settings/definitions/coordination.pysrc/synthorg/api/state.pysrc/synthorg/engine/coordination/factory.pysrc/synthorg/engine/pipeline/models.pytests/unit/engine/test_pipeline_service.pysrc/synthorg/engine/pipeline/policy/__init__.pytests/unit/engine/test_pipeline_models.pysrc/synthorg/workers/runtime_builder.pytests/e2e/test_work_pipeline_spine_e2e.pytests/unit/workers/test_runtime_builder.pysrc/synthorg/engine/pipeline/service.pytests/unit/engine/test_pipeline_policy.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.
Applied to files:
src/synthorg/settings/definitions/coordination.py
📚 Learning: 2026-05-17T11:45:11.839Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.
Applied to files:
src/synthorg/settings/definitions/coordination.py
🔇 Additional comments (23)
tests/unit/engine/test_pipeline_models.py (1)
1-156: LGTM!tests/unit/engine/test_pipeline_errors.py (1)
1-76: LGTM!tests/unit/engine/test_pipeline_policy.py (1)
1-137: LGTM!tests/unit/engine/test_pipeline_service.py (1)
1-259: LGTM!tests/unit/api/test_state.py (1)
12-12: LGTM!Also applies to: 214-267
tests/unit/observability/test_events.py (1)
297-297: LGTM!tests/e2e/test_work_pipeline_spine_e2e.py (1)
1-356: LGTM!tests/unit/workers/test_runtime_builder.py (1)
13-24: LGTM!Also applies to: 40-110, 144-145, 164-186, 356-400
src/synthorg/api/state.py (1)
66-66: LGTM!Also applies to: 309-309, 337-337, 394-394, 1298-1384
src/synthorg/api/app.py (1)
94-94: LGTM!Also applies to: 257-257, 292-293, 553-553, 1070-1074
src/synthorg/api/controllers/setup/agent_helpers.py (1)
148-154: LGTM!Also applies to: 179-180
src/synthorg/engine/coordination/factory.py (1)
176-177: LGTM!Also applies to: 220-224, 233-237, 393-394
src/synthorg/workers/runtime_builder.py (1)
34-36: LGTM!Also applies to: 60-61, 73-74, 121-137, 141-141, 348-350, 356-360, 377-381, 393-403, 405-464, 483-489, 495-495, 516-517, 537-549
src/synthorg/settings/definitions/coordination.py (1)
72-109: LGTM!src/synthorg/observability/events/pipeline.py (1)
1-14: LGTM!scripts/_ghost_wiring_manifest.txt (1)
40-40: LGTM!src/synthorg/engine/pipeline/protocol.py (1)
13-34: LGTM!src/synthorg/engine/pipeline/__init__.py (1)
10-32: LGTM!src/synthorg/engine/pipeline/errors.py (1)
16-73: LGTM!src/synthorg/engine/pipeline/policy/protocol.py (1)
17-41: LGTM!src/synthorg/engine/pipeline/policy/always_team.py (1)
26-40: LGTM!src/synthorg/engine/pipeline/policy/__init__.py (1)
19-39: LGTM!Also applies to: 42-88
src/synthorg/engine/pipeline/factory.py (1)
30-92: LGTM!
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2013 +/- ##
==========================================
+ Coverage 85.06% 85.09% +0.03%
==========================================
Files 1901 1913 +12
Lines 113035 113388 +353
Branches 9646 9673 +27
==========================================
+ Hits 96152 96491 +339
- Misses 14521 14530 +9
- Partials 2362 2367 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
…ex verdict parse, threshold validation, decompose telemetry)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/synthorg/engine/pipeline/policy/llm_judged.py`:
- Around line 152-156: The code currently checks _SPLITTABLE_RE then _LEAF_RE
and will pick SPLITTABLE when both appear; modify the logic in the block around
variables negated, _SPLITTABLE_RE, and _LEAF_RE so that you first detect both
matches and treat that as ambiguous: if both _SPLITTABLE_RE.search(text) and
_LEAF_RE.search(text) are true, return None (or None if negated—effectively
ambiguous regardless), otherwise proceed with the existing single-match checks
(return RoutingVerdict.SPLITTABLE or RoutingVerdict.LEAF accounting for
negated).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1dcdc729-f2d3-4192-96a8-a3910704d972
📒 Files selected for processing (6)
src/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/service.pytests/unit/engine/test_pipeline_models.pytests/unit/engine/test_pipeline_policy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Deploy Preview
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Test E2E
- GitHub Check: Test Conformance (SQLite)
- GitHub Check: Dashboard Test
- GitHub Check: Test Unit
- GitHub Check: Test Integration
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Onlysrc/synthorg/persistence/may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py
Files:
src/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/service.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Numerics live insettings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not usefrom __future__ import annotations(Python 3.14 has PEP 649). Use PEP 758 except:except A, B:requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as<Domain><Condition>Errorinheriting fromDomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py;@computed_fieldauto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use@computed_fieldfor derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: includeclock: Clock | None = Noneparameter; tests inject FakeClock. Lifecycle: services own_lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...
Files:
src/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/service.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/models.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/service.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add@example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)
Files:
tests/unit/engine/test_pipeline_models.pytests/unit/engine/test_pipeline_policy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/engine/test_pipeline_models.pytests/unit/engine/test_pipeline_policy.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
src/synthorg/engine/pipeline/policy/threshold.pysrc/synthorg/engine/pipeline/models.pytests/unit/engine/test_pipeline_models.pytests/unit/engine/test_pipeline_policy.pysrc/synthorg/engine/pipeline/policy/llm_judged.pysrc/synthorg/engine/pipeline/service.py
🔇 Additional comments (4)
src/synthorg/engine/pipeline/policy/threshold.py (1)
24-73: LGTM!src/synthorg/engine/pipeline/models.py (1)
27-200: LGTM!tests/unit/engine/test_pipeline_models.py (1)
21-176: LGTM!src/synthorg/engine/pipeline/service.py (1)
121-360: LGTM!
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Multi-agent coordination is now active immediately on startup for smoother operation. - Governance rules are fully enforced during use, ensuring compliance at all times. - Coordination metrics update live, giving real-time insights into system activity. - Review agents are now reliably processed, preventing silent drops in tasks. - Sandbox containers can be reused for agents and tasks, speeding up execution and reducing overhead. ### What's new - Agents support online runtime with a minimal safety framework to improve stability. - Recorded LLM interactions can be deterministically replayed at the provider interface. - Distributed path validation has been enhanced for more robust data routing. - A client-simulation runtime was added for end-to-end testing of the IntakeEngine. - A new work pipeline spine architecture has been introduced to streamline task processing. ### Under the hood - Infrastructure, Python, and web dependencies have all been updated to latest versions. - Updated apko lockfiles in the CI/CD pipeline improve build consistency. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.6](v0.8.5...v0.8.6) (2026-05-19) ### Features * agent runtime online + minimal safety spine (runtime root) ([#2003](#2003)) ([e5eef1a](e5eef1a)), closes [#1956](#1956) * deterministic recorded-LLM cassette replay at the provider chokepoint ([#2010](#2010)) ([cabf55d](cabf55d)) * distributed path validation + hardening ([#2011](#2011)) ([a382e4a](a382e4a)), closes [#1966](#1966) * wire IntakeEngine via boot client-simulation runtime (e2e test harness) ([#2006](#2006)) ([6a9c0aa](6a9c0aa)), closes [#1961](#1961) * work pipeline spine ([#1960](#1960)) ([#2013](#2013)) ([29b64e3](29b64e3)) ### Bug Fixes * bring the multi-agent coordinator online at boot ([#2007](#2007)) ([180b38a](180b38a)), closes [#1958](#1958) * full governance enforcement online ([#1957](#1957)) ([#2005](#2005)) ([4140fc5](4140fc5)) * harden anti-ghost-wiring gate and fix silently-dropped review agents ([#2000](#2000)) ([89b57ce](89b57ce)) * make coordination metrics live ([#1959](#1959)) ([#2012](#2012)) ([c4775e2](c4775e2)) * sandbox lifecycle dispatch (per-agent / per-task container reuse) ([#2008](#2008)) ([03d2587](03d2587)), closes [#1965](#1965) ### Documentation * add GitButler concept-only concurrency research ([#1978](#1978)) ([#2009](#2009)) ([9e4f5c1](9e4f5c1)) * honest-hybrid refresh of README, site, and design specs ([#2001](#2001)) ([f485bea](f485bea)) ### CI/CD * update apko lockfiles ([#2004](#2004)) ([e2b9eee](e2b9eee)) ### Maintenance * Update Infrastructure dependencies ([#2014](#2014)) ([0b16bdf](0b16bdf)) * Update Python dependencies ([#2015](#2015)) ([a7224bb](a7224bb)) * Update Web dependencies ([#2016](#2016)) ([7a7fe76](7a7fe76)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Composes the single coherent path from "work enters" to "agents execute it":
intake -> projects -> decompose (solo-vs-team verdict) -> solo OR team execute -> coordination metrics. This is the gateway child of EPIC #1955 and thesingle integration point every entry adapter (#1962/#1963/#1964/#1968) will
feed via a typed
WorkItem.The pieces existed (IntakeEngine, DecompositionService, MultiAgentCoordinator,
CoordinationMetricsCollector, simulation harness) but nothing composed them,
and the solo-vs-team decision had no owner. This PR adds that spine and the
owner.
What changed
New
src/synthorg/engine/pipeline/package (pluggable per the project rule):WorkItem/WorkPipelineResult/WorkPhaseResultfrozen models +WorkSource/RoutingVerdict/ExecutionPathenums.WorkRoutingPolicyprotocol + three strategies (leaf-thresholddefault,always-team,llm-judged) +build_work_routing_policyfactory, owned bythe decomposition layer. The solo-vs-team decision is internal and
automatic; never a user choice.
DefaultWorkPipeline(the spine) +build_work_pipelinefactory.WorkPipelineErrorhierarchy (intake-rejected 422 / project-not-found 404 /routing-undecidable 500 / team-path-unavailable 503).
observability/events/pipeline.pyevent constants.Boot wiring:
RuntimeServicesextended to a 3-tuple sharing ONE bootAgentEngineandONE
AgentTaskScoreracross the coordinator and the spine.AppState.work_pipelineseam mirroringcoordinator; installed by theexisting
_install_runtime_serviceshook (once-only, injection-over-autowire); hot-swapped on
post_setup_reinit.coordination.routing_policy/coordination.leaf_subtask_thresholdsettings, resolved at boot.
Leaf work runs single-agent via
worker_execution_service; splittable workruns the coordinator. Empty-company / no-intake paths return cleanly (no
silent degradation).
Decisions
Architecture decisions were taken via the decision protocol: new
engine/pipeline/package; pluggableWorkRoutingPolicyowned by thedecomposition layer; leaf bypasses the coordinator (with one shared scorer);
typed
WorkItementry contract; single PR.Validation
selection, runtime-builder 3-tuple + shared-scorer identity, AppState seam.
tests/e2e/test_work_pipeline_spine_e2e.pyexercises both the solo and theteam branch through
work_pipeline.run, asserting aCoordinationMetricsRecordlands for the team path.green.
Manifest discipline (CORE)
build_work_pipelineadded toscripts/_ghost_wiring_manifest.txtasENFORCED; theno-ghost-wiringgate passes.Security
The
llm-judgedrouting policy wraps task content with the sharedTAG_TASK_DATAtag and appendsuntrusted_content_directive(...)to itssystem prompt, so a crafted task title/description cannot inject routing
instructions (SEC-1).
Deviation from plan (justified)
Did not add a redundant
(WorkPipelineError, handle_domain_error)entry toexception_handlers.py: the existing(DomainError, handle_domain_error)catch-all already dispatches every subclass via MRO, so the line would be
dead duplication.
Closes #1960