feat: work pipeline spine (#1960) by Aureliolo · Pull Request #2013 · Aureliolo/synthorg

Aureliolo · 2026-05-19T18:43:41Z

Summary

Composes the single coherent path from "work enters" to "agents execute it":
intake -> projects -> decompose (solo-vs-team verdict) -> solo OR team execute -> coordination metrics. This is the gateway child of EPIC #1955 and the
single integration point every entry adapter (#1962/#1963/#1964/#1968) will
feed via a typed WorkItem.

The pieces existed (IntakeEngine, DecompositionService, MultiAgentCoordinator,
CoordinationMetricsCollector, simulation harness) but nothing composed them,
and the solo-vs-team decision had no owner. This PR adds that spine and the
owner.

What changed

New src/synthorg/engine/pipeline/ package (pluggable per the project rule):

WorkItem / WorkPipelineResult / WorkPhaseResult frozen models +
WorkSource / RoutingVerdict / ExecutionPath enums.
WorkRoutingPolicy protocol + three strategies (leaf-threshold default,
always-team, llm-judged) + build_work_routing_policy factory, owned by
the decomposition layer. The solo-vs-team decision is internal and
automatic; never a user choice.
DefaultWorkPipeline (the spine) + build_work_pipeline factory.
WorkPipelineError hierarchy (intake-rejected 422 / project-not-found 404 /
routing-undecidable 500 / team-path-unavailable 503).
observability/events/pipeline.py event constants.

Boot wiring:

RuntimeServices extended to a 3-tuple sharing ONE boot AgentEngine and
ONE AgentTaskScorer across the coordinator and the spine.
AppState.work_pipeline seam mirroring coordinator; installed by the
existing _install_runtime_services hook (once-only, injection-over-
autowire); hot-swapped on post_setup_reinit.
New coordination.routing_policy / coordination.leaf_subtask_threshold
settings, resolved at boot.

Leaf work runs single-agent via worker_execution_service; splittable work
runs the coordinator. Empty-company / no-intake paths return cleanly (no
silent degradation).

Decisions

Architecture decisions were taken via the decision protocol: new
engine/pipeline/ package; pluggable WorkRoutingPolicy owned by the
decomposition layer; leaf bypasses the coordinator (with one shared scorer);
typed WorkItem entry contract; single PR.

Validation

Unit: models, errors, policy (3 strategies + factory), service branch
selection, runtime-builder 3-tuple + shared-scorer identity, AppState seam.
Acceptance under the simulation harness (deterministic, zero LLM):
tests/e2e/test_work_pipeline_spine_e2e.py exercises both the solo and the
team branch through work_pipeline.run, asserting a
CoordinationMetricsRecord lands for the team path.
Full unit suite, ruff, ruff-format, mypy strict, and all convention gates
green.

Manifest discipline (CORE)

build_work_pipeline added to scripts/_ghost_wiring_manifest.txt as
ENFORCED; the no-ghost-wiring gate passes.

Security

The llm-judged routing policy wraps task content with the shared
TAG_TASK_DATA tag and appends untrusted_content_directive(...) to its
system prompt, so a crafted task title/description cannot inject routing
instructions (SEC-1).

Deviation from plan (justified)

Did not add a redundant (WorkPipelineError, handle_domain_error) entry to
exception_handlers.py: the existing (DomainError, handle_domain_error)
catch-all already dispatches every subclass via MRO, so the line would be
dead duplication.

Closes #1960

Compose intake -> projects -> decompose (solo-vs-team verdict) -> solo or team execute -> coordination metrics into one pluggable WorkPipeline that every entry adapter feeds via a typed WorkItem. Solo-vs-team is owned by a pluggable WorkRoutingPolicy (leaf-threshold default, always-team, llm-judged); leaf work runs single-agent via the worker execution service, splittable work via the coordinator. Boot-wired through the existing _install_runtime_services hook as a RuntimeServices 3-tuple sharing one AgentEngine and one AgentTaskScorer; AppState work_pipeline seam mirrors coordinator; post_setup_reinit hot-swaps it. Manifest line ENFORCED; no-ghost-wiring gate green. Deviation: did not add a redundant (WorkPipelineError, handle_domain_error) entry -- the existing DomainError catch-all dispatches all subclasses via MRO.

…log (#1960) Review-round fixes: (1) CRITICAL SEC-1 -- llm-judged routing policy now appends untrusted_content_directive((TAG_TASK_DATA,)) to the system prompt and wraps task content with the shared TAG_TASK_DATA tag, so a crafted task title/description cannot inject routing instructions. (2) register the new 'pipeline' event module in test_events expected-domain set (pre-push full-suite gate). (3) log the simulation-runtime-present-but-intake-unset early return in _build_runtime_work_pipeline for observability parity.

github-actions · 2026-05-19T18:43:55Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-05-19T18:45:07Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ab9decb-4b0e-4566-b7c1-a5681022f562

📥 Commits

Reviewing files that changed from the base of the PR and between 2724482 and 6f9d95f.

📒 Files selected for processing (2)

src/synthorg/engine/pipeline/policy/llm_judged.py
tests/unit/engine/test_pipeline_policy.py

Walkthrough

This PR implements a complete work pipeline "spine" that routes work through a unified execution path from intake through project validation to solo-agent or team coordination execution. It establishes domain contracts and models (WorkItem, WorkPipelineResult, WorkPipelineError), implements three routing policies (leaf-threshold, always-team, llm-judged) that decide solo-vs-team routing based on task characteristics, develops the DefaultWorkPipeline service orchestrating a staged execution spine with observability and error handling, wires the pipeline into AppState as a managed dependency seam, and integrates boot-time construction into RuntimeServices with shared scorer and coordinator instances. Comprehensive unit and integration test coverage validates individual components and end-to-end execution paths under the simulation harness.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.23% which is insufficient. The required threshold is 40.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title 'feat: work pipeline spine (`#1960`)' directly and clearly summarizes the main change: adding a work pipeline spine that composes intake, decomposition, and execution.
Description check	✅ Passed	The pull request description comprehensively explains the work pipeline spine composition, new models/errors/policies, runtime wiring changes, validation approach, and manifest discipline, all directly related to the changeset.
Linked Issues check	✅ Passed	The PR fully addresses issue `#1960`: composes a complete pipeline (intake → projects → decompose → solo/team execute → metrics), makes decomposition own the solo-vs-team decision, provides typed WorkItem entry point, supports both execution paths, integrates with existing components, and validates via unit and e2e tests under the simulation harness.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to `#1960`: new pipeline package, routing policies, domain models/errors, runtime wiring, settings, observability events, and comprehensive test coverage. Settings for coordination.routing_policy and leaf_subtask_threshold are necessary boot-time configuration for the pipeline.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codspeed-hq · 2026-05-19T18:46:14Z

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks¹

_{Comparing feat/work-pipeline-spine (6f9d95f) with main (c4775e2)}

21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/engine/pipeline/models.py`:
- Around line 189-191: Make is_success a derived/computed field instead of a
free Field: remove the current Field(...) declaration for is_success and add a
`@computed_field` method (using pydantic's computed_field) on the same model that
returns True only if all recorded phase outcomes are successful (e.g., iterate
the model's phases or recorded_phases sequence and return all(phase.is_success
for phase in phases)). Ensure the computed method name references is_success and
use the model's actual phase container (phases or recorded_phases) so callers
cannot construct inconsistent objects.

In `@src/synthorg/engine/pipeline/policy/llm_judged.py`:
- Around line 133-150: The _parse_verdict function currently uses naive
substring checks and misclassifies negated phrases like "not splittable"; change
it to use whole-word regex checks with negation detection: import re and replace
the "if 'splittable' in text" / "if 'leaf' in text" checks with pattern checks
such as if re.search(r'\bsplittable\b', text) and not
re.search(r'\b(?:not|no|never|n\'t)\b.*\bsplittable\b', text): return
RoutingVerdict.SPLITTABLE (and analogously for 'leaf'); ensure the checks are
case-insensitive (text is already lowered) and return None when a
negation/qualification is detected to avoid false positives.

In `@src/synthorg/engine/pipeline/policy/threshold.py`:
- Around line 36-45: The Threshold policy's __init__ currently accepts
non-positive values for threshold (symbol: self._threshold in __init__), which
breaks routing; add validation in the constructor of the class (the __init__
shown) to check the provided threshold and raise a ValueError with a clear
message if threshold <= 0, otherwise assign to self._threshold as before; keep
the existing classifier fallback (self._classifier = ...) intact and update any
unit tests to expect the ValueError for invalid inputs.

In `@src/synthorg/engine/pipeline/service.py`:
- Around line 134-137: The call to self._agent_registry.list_active() occurs
outside the _phase(...) call, so agent-registry errors and lookup latency aren't
captured by the _PHASE_DECOMPOSE telemetry; move the list_active() invocation
into the same closure passed to self._phase so that the entire lookup + decision
logic (i.e., fetching agents and invoking self._decide(task, agents)) runs
inside the _phase for _PHASE_DECOMPOSE, ensuring any exceptions and the full
timing for agent retrieval are recorded and the returned verdict remains
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a03db119-bee3-4bea-b505-0aaf5d9e3a9f

📥 Commits

Reviewing files that changed from the base of the PR and between c4775e2 and cf95bdf.

📒 Files selected for processing (27)

scripts/_ghost_wiring_manifest.txt
src/synthorg/api/app.py
src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/api/state.py
src/synthorg/engine/coordination/factory.py
src/synthorg/engine/pipeline/__init__.py
src/synthorg/engine/pipeline/errors.py
src/synthorg/engine/pipeline/factory.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/__init__.py
src/synthorg/engine/pipeline/policy/always_team.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/protocol.py
src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/protocol.py
src/synthorg/engine/pipeline/service.py
src/synthorg/observability/events/pipeline.py
src/synthorg/settings/definitions/coordination.py
src/synthorg/workers/runtime_builder.py
tests/e2e/test_work_pipeline_spine_e2e.py
tests/unit/api/test_state.py
tests/unit/engine/test_pipeline_errors.py
tests/unit/engine/test_pipeline_models.py
tests/unit/engine/test_pipeline_policy.py
tests/unit/engine/test_pipeline_service.py
tests/unit/observability/test_events.py
tests/unit/workers/test_runtime_builder.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Build Web Assets (melange)
GitHub Check: Test Integration
GitHub Check: Test Unit
GitHub Check: Dashboard Test
GitHub Check: Test E2E
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (7)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

src/synthorg/observability/events/pipeline.py
src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/engine/pipeline/__init__.py
src/synthorg/engine/pipeline/factory.py
src/synthorg/engine/pipeline/policy/always_team.py
src/synthorg/engine/pipeline/protocol.py
src/synthorg/api/app.py
src/synthorg/engine/pipeline/errors.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/protocol.py
src/synthorg/settings/definitions/coordination.py
src/synthorg/api/state.py
src/synthorg/engine/coordination/factory.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/__init__.py
src/synthorg/workers/runtime_builder.py
src/synthorg/engine/pipeline/service.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

src/synthorg/observability/events/pipeline.py
src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/engine/pipeline/__init__.py
src/synthorg/engine/pipeline/factory.py
src/synthorg/engine/pipeline/policy/always_team.py
src/synthorg/engine/pipeline/protocol.py
src/synthorg/api/app.py
src/synthorg/engine/pipeline/errors.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/protocol.py
src/synthorg/settings/definitions/coordination.py
src/synthorg/api/state.py
src/synthorg/engine/coordination/factory.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/__init__.py
src/synthorg/workers/runtime_builder.py
src/synthorg/engine/pipeline/service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/observability/events/pipeline.py
src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/engine/pipeline/__init__.py
src/synthorg/engine/pipeline/factory.py
src/synthorg/engine/pipeline/policy/always_team.py
src/synthorg/engine/pipeline/protocol.py
src/synthorg/api/app.py
src/synthorg/engine/pipeline/errors.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/protocol.py
src/synthorg/settings/definitions/coordination.py
src/synthorg/api/state.py
src/synthorg/engine/coordination/factory.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/__init__.py
src/synthorg/workers/runtime_builder.py
src/synthorg/engine/pipeline/service.py

src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Telemetry: opt-in, off by default. Every event property must be in _ALLOWED_PROPERTIES. See telemetry.md

Files:

src/synthorg/observability/events/pipeline.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

tests/unit/observability/test_events.py
tests/unit/engine/test_pipeline_errors.py
tests/unit/api/test_state.py
tests/unit/engine/test_pipeline_service.py
tests/unit/engine/test_pipeline_models.py
tests/e2e/test_work_pipeline_spine_e2e.py
tests/unit/workers/test_runtime_builder.py
tests/unit/engine/test_pipeline_policy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/observability/test_events.py
tests/unit/engine/test_pipeline_errors.py
tests/unit/api/test_state.py
tests/unit/engine/test_pipeline_service.py
tests/unit/engine/test_pipeline_models.py
tests/e2e/test_work_pipeline_spine_e2e.py
tests/unit/workers/test_runtime_builder.py
tests/unit/engine/test_pipeline_policy.py

src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

API startup lifecycle: construction phase wires synchronous services; on_startup wires services needing connected persistence. Construction-phase invariants: agent_registry before auto_wire_meetings; tunnel_provider wired unconditionally. On-startup invariants: SettingsService auto-wire before WorkflowExecutionObserver registration; OntologyService wires after persistence.connect() via _wire_ontology_service

Files:

src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/api/app.py
src/synthorg/api/state.py

src/synthorg/{workers,api}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Runtime services: synthorg.workers.runtime_builder.build_runtime_services selects behind ONE provider-present switch, returning RuntimeServices (AgentEngineExecutionService + coordinator or NoProviderExecutionService + None). _install_runtime_services appends FIRST after persistence/SettingsService hooks; swap_worker_execution_service / swap_coordinator / swap_provider_registry hold a lock. Empty-company rejects task creation (AgentRuntimeNotConfiguredError 4014) and /coordinate 503s

Files:

src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/api/app.py
src/synthorg/api/state.py
src/synthorg/workers/runtime_builder.py

src/synthorg/api/controllers/setup/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Setup completion: post_setup_reinit() propagates failures; settings_svc.set("api", "setup_complete", "true") only runs if reinit returns clean. Check/validate/reinit/persist serialised under COMPLETE_LOCK (src/synthorg/api/controllers/setup/agent_helpers.py) to prevent concurrent /setup/complete race conditions

Files:

src/synthorg/api/controllers/setup/agent_helpers.py

🧠 Learnings (3)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/observability/events/pipeline.py
src/synthorg/engine/pipeline/policy/threshold.py
tests/unit/observability/test_events.py
src/synthorg/api/controllers/setup/agent_helpers.py
src/synthorg/engine/pipeline/__init__.py
tests/unit/engine/test_pipeline_errors.py
src/synthorg/engine/pipeline/factory.py
src/synthorg/engine/pipeline/policy/always_team.py
src/synthorg/engine/pipeline/protocol.py
src/synthorg/api/app.py
src/synthorg/engine/pipeline/errors.py
tests/unit/api/test_state.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/protocol.py
src/synthorg/settings/definitions/coordination.py
src/synthorg/api/state.py
src/synthorg/engine/coordination/factory.py
src/synthorg/engine/pipeline/models.py
tests/unit/engine/test_pipeline_service.py
src/synthorg/engine/pipeline/policy/__init__.py
tests/unit/engine/test_pipeline_models.py
src/synthorg/workers/runtime_builder.py
tests/e2e/test_work_pipeline_spine_e2e.py
tests/unit/workers/test_runtime_builder.py
src/synthorg/engine/pipeline/service.py
tests/unit/engine/test_pipeline_policy.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In SynthOrg (Aureliolo/synthorg) pre-alpha, apply the strict no-backward-compat policy: any setting-key rename must be fully completed in the same change/PR with all repo callers updated, and you should not keep legacy aliases or compatibility fallbacks. When reviewing, do not flag a setting-key rename as a breaking upgrade hazard if the rename is repo-wide and fully implemented within the same PR.

Applied to files:

src/synthorg/settings/definitions/coordination.py

📚 Learning: 2026-05-17T11:45:11.839Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1952
File: src/synthorg/settings/definitions/api.py:594-638
Timestamp: 2026-05-17T11:45:11.839Z
Learning: In this repository, SynthOrg is pre-alpha and uses a strict no-backward-compat policy for setting-key renames. When reviewing code under src/synthorg/settings, do NOT flag a setting-key rename as an “upgrade-safety” issue if the rename is complete/atomic in the same PR: all callers/usages of the old key are updated simultaneously, and the PR does not keep any legacy aliases, compatibility fallbacks, or migration/rollback paths for the old key.

Applied to files:

src/synthorg/settings/definitions/coordination.py

🔇 Additional comments (23)

tests/unit/engine/test_pipeline_models.py (1)

1-156: LGTM!

tests/unit/engine/test_pipeline_errors.py (1)

1-76: LGTM!

tests/unit/engine/test_pipeline_policy.py (1)

1-137: LGTM!

tests/unit/engine/test_pipeline_service.py (1)

1-259: LGTM!

tests/unit/api/test_state.py (1)

12-12: LGTM!

Also applies to: 214-267

tests/unit/observability/test_events.py (1)

297-297: LGTM!

tests/e2e/test_work_pipeline_spine_e2e.py (1)

1-356: LGTM!

tests/unit/workers/test_runtime_builder.py (1)

13-24: LGTM!

Also applies to: 40-110, 144-145, 164-186, 356-400

src/synthorg/api/state.py (1)

66-66: LGTM!

Also applies to: 309-309, 337-337, 394-394, 1298-1384

src/synthorg/api/app.py (1)

94-94: LGTM!

Also applies to: 257-257, 292-293, 553-553, 1070-1074

src/synthorg/api/controllers/setup/agent_helpers.py (1)

148-154: LGTM!

Also applies to: 179-180

src/synthorg/engine/coordination/factory.py (1)

176-177: LGTM!

Also applies to: 220-224, 233-237, 393-394

src/synthorg/workers/runtime_builder.py (1)

34-36: LGTM!

Also applies to: 60-61, 73-74, 121-137, 141-141, 348-350, 356-360, 377-381, 393-403, 405-464, 483-489, 495-495, 516-517, 537-549

src/synthorg/settings/definitions/coordination.py (1)

72-109: LGTM!

src/synthorg/observability/events/pipeline.py (1)

1-14: LGTM!

scripts/_ghost_wiring_manifest.txt (1)

40-40: LGTM!

src/synthorg/engine/pipeline/protocol.py (1)

13-34: LGTM!

src/synthorg/engine/pipeline/__init__.py (1)

10-32: LGTM!

src/synthorg/engine/pipeline/errors.py (1)

16-73: LGTM!

src/synthorg/engine/pipeline/policy/protocol.py (1)

17-41: LGTM!

src/synthorg/engine/pipeline/policy/always_team.py (1)

26-40: LGTM!

src/synthorg/engine/pipeline/policy/__init__.py (1)

19-39: LGTM!

Also applies to: 42-88

src/synthorg/engine/pipeline/factory.py (1)

30-92: LGTM!

codecov · 2026-05-19T19:00:32Z

Codecov Report

❌ Patch coverage is 94.42897% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.09%. Comparing base (c4775e2) to head (6f9d95f).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/synthorg/engine/pipeline/policy/llm_judged.py	84.44%	4 Missing and 3 partials ⚠️
src/synthorg/engine/pipeline/service.py	94.44%	4 Missing and 1 partial ⚠️
src/synthorg/workers/runtime_builder.py	86.36%	2 Missing and 1 partial ⚠️
src/synthorg/api/state.py	92.85%	1 Missing and 1 partial ⚠️
src/synthorg/engine/coordination/factory.py	50.00%	1 Missing and 1 partial ⚠️
...rc/synthorg/api/controllers/setup/agent_helpers.py	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2013      +/-   ##
==========================================
+ Coverage   85.06%   85.09%   +0.03%     
==========================================
  Files        1901     1913      +12     
  Lines      113035   113388     +353     
  Branches     9646     9673      +27     
==========================================
+ Hits        96152    96491     +339     
- Misses      14521    14530       +9     
- Partials     2362     2367       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gemini-code-assist · 2026-05-19T19:01:07Z

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

…ex verdict parse, threshold validation, decompose telemetry)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/engine/pipeline/policy/llm_judged.py`:
- Around line 152-156: The code currently checks _SPLITTABLE_RE then _LEAF_RE
and will pick SPLITTABLE when both appear; modify the logic in the block around
variables negated, _SPLITTABLE_RE, and _LEAF_RE so that you first detect both
matches and treat that as ambiguous: if both _SPLITTABLE_RE.search(text) and
_LEAF_RE.search(text) are true, return None (or None if negated—effectively
ambiguous regardless), otherwise proceed with the existing single-match checks
(return RoutingVerdict.SPLITTABLE or RoutingVerdict.LEAF accounting for
negated).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1dcdc729-f2d3-4192-96a8-a3910704d972

📥 Commits

Reviewing files that changed from the base of the PR and between cf95bdf and 2724482.

📒 Files selected for processing (6)

src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/service.py
tests/unit/engine/test_pipeline_models.py
tests/unit/engine/test_pipeline_policy.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Deploy Preview
GitHub Check: Build Backend
GitHub Check: Build Web Assets (melange)
GitHub Check: CodSpeed Python benchmarks
GitHub Check: Test E2E
GitHub Check: Test Conformance (SQLite)
GitHub Check: Dashboard Test
GitHub Check: Test Unit
GitHub Check: Test Integration
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (3)

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Only src/synthorg/persistence/ may import sqlite/psycopg or emit raw SQL
Use DB > env > code default via SettingsService/ConfigResolver (Cat-1) or env > code default with read_only_post_init (Cat-2); Cat-3 bootstrap secrets are pure env at boot site. YAML is company-template ingestion format only, not a precedence tier. No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
Use from synthorg.observability import get_logger; variable always named logger. Never import logging or use print() in app code. Event names from observability.events. constants; structured kwargs (logger.info(EVENT, key=value))
Error paths log WARNING/ERROR with context before raising; state transitions log INFO via *_STATUS_TRANSITIONED after persistence write
Never log error=str(exc) or interpolate {exc}; use error_type=type(exc).name + error=safe_error_description(exc). Never use exc_info=True. OTel: forbidden span.record_exception(exc); use span.set_attribute("exception.message", safe_error_description(exc)) + record_exception=False, set_status_on_exception=False. Enforced by check_logger_exception_str_exc.py

Files:

src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/service.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Numerics live in settings/definitions/; allowlist 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal). Enforced by scripts/check_no_magic_numbers.py
Comments should explain WHY only; no reviewer citations, issue back-refs, or migration framing. Enforced by check_no_review_origin_in_code.py and check_no_migration_framing.py
Do not use from __future__ import annotations (Python 3.14 has PEP 649). Use PEP 758 except: except A, B: requires parens when binding
Type hints required on public functions; mypy strict. Google-style docstrings. Line length 88; functions <50 lines; files <800 lines
Define errors as <Domain><Condition>Error inheriting from DomainError; never inherit from Exception/RuntimeError/etc directly. Enforced by check_domain_error_hierarchy.py
Pydantic v2: use frozen + extra="forbid" on every frozen model project-wide (gate check_frozen_model_extra_forbid.py; @computed_field auto-exempt, per-line # lint-allow: frozen-extra-forbid -- for extra="allow"/"ignore" boundaries). Use @computed_field for derived fields. Use NotBlankStr for identifiers
Args models at every system boundary; parse_typed() for every external dict ingestion. Enforced by check_boundary_typed.py
Use immutability patterns: model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Use asyncio.TaskGroup for fan-out/fan-in async patterns; helpers must catch Exception (re-raise MemoryError/RecursionError)
Clock seam: include clock: Clock | None = None parameter; tests inject FakeClock. Lifecycle: services own _lifecycle_lock; timed-out stops mark unrestartable
For untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML
Never use real vendor names in project code/tests; use example-provider, test-provider, example-{large,medium,small}-001. Allowed in .claude/, third-party imports, providers/presets.py, we...

Files:

src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/service.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/models.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/service.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.{unit,integration,e2e,slow}. Async auto. Timeout 30s global. Coverage 80% min
Windows: unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race). Subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_ofT for typed-boundary substitutions, SimpleNamespace for attribute-bags. Never use bare MagicMock at typed boundary (constructor/fn arg/annotated local/typed fixture return); blocked by scripts/check_mock_spec.py (zero-tolerance, no baseline). Import FakeClock and mock_of from tests._shared
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: NEVER skip/xfail; fix fundamentally. Use asyncio.Event().wait() not sleep(large)

Files:

tests/unit/engine/test_pipeline_models.py
tests/unit/engine/test_pipeline_policy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

tests/unit/engine/test_pipeline_models.py
tests/unit/engine/test_pipeline_policy.py

🧠 Learnings (1)

📚 Learning: 2026-05-05T09:04:46.195Z

Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

src/synthorg/engine/pipeline/policy/threshold.py
src/synthorg/engine/pipeline/models.py
tests/unit/engine/test_pipeline_models.py
tests/unit/engine/test_pipeline_policy.py
src/synthorg/engine/pipeline/policy/llm_judged.py
src/synthorg/engine/pipeline/service.py

🔇 Additional comments (4)

src/synthorg/engine/pipeline/policy/threshold.py (1)

24-73: LGTM!

src/synthorg/engine/pipeline/models.py (1)

27-200: LGTM!

tests/unit/engine/test_pipeline_models.py (1)

21-176: LGTM!

src/synthorg/engine/pipeline/service.py (1)

121-360: LGTM!

…dict

## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Multi-agent coordination is now active immediately on startup for smoother operation. - Governance rules are fully enforced during use, ensuring compliance at all times. - Coordination metrics update live, giving real-time insights into system activity. - Review agents are now reliably processed, preventing silent drops in tasks. - Sandbox containers can be reused for agents and tasks, speeding up execution and reducing overhead. ### What's new - Agents support online runtime with a minimal safety framework to improve stability. - Recorded LLM interactions can be deterministically replayed at the provider interface. - Distributed path validation has been enhanced for more robust data routing. - A client-simulation runtime was added for end-to-end testing of the IntakeEngine. - A new work pipeline spine architecture has been introduced to streamline task processing. ### Under the hood - Infrastructure, Python, and web dependencies have all been updated to latest versions. - Updated apko lockfiles in the CI/CD pipeline improve build consistency.  :robot: I have created a release *beep* *boop* --- ## [0.8.6](v0.8.5...v0.8.6) (2026-05-19) ### Features * agent runtime online + minimal safety spine (runtime root) ([#2003](#2003)) ([e5eef1a](e5eef1a)), closes [#1956](#1956) * deterministic recorded-LLM cassette replay at the provider chokepoint ([#2010](#2010)) ([cabf55d](cabf55d)) * distributed path validation + hardening ([#2011](#2011)) ([a382e4a](a382e4a)), closes [#1966](#1966) * wire IntakeEngine via boot client-simulation runtime (e2e test harness) ([#2006](#2006)) ([6a9c0aa](6a9c0aa)), closes [#1961](#1961) * work pipeline spine ([#1960](#1960)) ([#2013](#2013)) ([29b64e3](29b64e3)) ### Bug Fixes * bring the multi-agent coordinator online at boot ([#2007](#2007)) ([180b38a](180b38a)), closes [#1958](#1958) * full governance enforcement online ([#1957](#1957)) ([#2005](#2005)) ([4140fc5](4140fc5)) * harden anti-ghost-wiring gate and fix silently-dropped review agents ([#2000](#2000)) ([89b57ce](89b57ce)) * make coordination metrics live ([#1959](#1959)) ([#2012](#2012)) ([c4775e2](c4775e2)) * sandbox lifecycle dispatch (per-agent / per-task container reuse) ([#2008](#2008)) ([03d2587](03d2587)), closes [#1965](#1965) ### Documentation * add GitButler concept-only concurrency research ([#1978](#1978)) ([#2009](#2009)) ([9e4f5c1](9e4f5c1)) * honest-hybrid refresh of README, site, and design specs ([#2001](#2001)) ([f485bea](f485bea)) ### CI/CD * update apko lockfiles ([#2004](#2004)) ([e2b9eee](e2b9eee)) ### Maintenance * Update Infrastructure dependencies ([#2014](#2014)) ([0b16bdf](0b16bdf)) * Update Python dependencies ([#2015](#2015)) ([a7224bb](a7224bb)) * Update Web dependencies ([#2016](#2016)) ([7a7fe76](7a7fe76)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Aureliolo added 2 commits May 19, 2026 20:21

Aureliolo temporarily deployed to cloudflare-preview May 19, 2026 18:45 — with GitHub Actions Inactive

coderabbitai Bot requested changes May 19, 2026

View reviewed changes

Comment thread src/synthorg/engine/pipeline/models.py Outdated

Comment thread src/synthorg/engine/pipeline/policy/llm_judged.py

Comment thread src/synthorg/engine/pipeline/policy/threshold.py

Comment thread src/synthorg/engine/pipeline/service.py Outdated

Aureliolo added 2 commits May 19, 2026 21:06

fix: babysit round 1, 4 coderabbit findings (computed is_success, reg…

e7ce3d9

…ex verdict parse, threshold validation, decompose telemetry)

fix: add prop-decorator type-ignore on is_success computed_field

2724482

Aureliolo temporarily deployed to cloudflare-preview May 19, 2026 19:13 — with GitHub Actions Inactive

coderabbitai Bot requested changes May 19, 2026

View reviewed changes

Comment thread src/synthorg/engine/pipeline/policy/llm_judged.py

fix: treat both-verdict-words LLM response as ambiguous in _parse_ver…

6f9d95f

…dict

Aureliolo merged commit 29b64e3 into main May 19, 2026
44 of 46 checks passed

Aureliolo deleted the feat/work-pipeline-spine branch May 19, 2026 19:23

synthorg-repo-bot Bot mentioned this pull request May 19, 2026

chore(main): release 0.8.6 #2002

Merged

Aureliolo temporarily deployed to cloudflare-preview May 19, 2026 19:24 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: work pipeline spine (#1960)#2013

feat: work pipeline spine (#1960)#2013
Aureliolo merged 5 commits into
mainfrom
feat/work-pipeline-spine

Aureliolo commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 warning)

Uh oh!

codspeed-hq Bot commented May 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aureliolo commented May 19, 2026

Summary

What changed

Decisions

Validation

Manifest discipline (CORE)

Security

Deviation from plan (justified)

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

codspeed-hq Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gemini-code-assist Bot commented May 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading

codspeed-hq Bot commented May 19, 2026 •

edited

Loading

codecov Bot commented May 19, 2026 •

edited

Loading