Skip to content
2 changes: 2 additions & 0 deletions scripts/_ghost_wiring_manifest.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ ENFORCED SeenClaimsPruner #1966 -- constructed by workers.backend_services.build
ENFORCED WorkerHeartbeatSubscriber #1966 -- constructed by workers.backend_services.build_distributed_backend_services; surfaces worker liveness in the log pipeline
ENFORCED build_work_pipeline #1960 -- called by workers.runtime_builder._build_runtime_work_pipeline behind the provider-present switch; composes the work spine (intake -> projects -> solo/team -> coordination metrics)
ENFORCED build_chief_of_staff_proposer #1968 -- called by api.app._wire_chief_of_staff_proposer behind propose_enabled + provider switch; constructs ChiefOfStaffProposer which parks approval-gated WorkItems for the conversational interface
ENFORCED CharterInterviewService #1977 -- constructed in api.app._wire_charter_engine behind interview_enabled + provider switch; runs the deep CEO interview producing a ProjectCharter
ENFORCED CharterDispatcher #1977 -- constructed in api.app._wire_charter_engine; on charter approval creates the project + approved forecast and drives the work pipeline spine
ENFORCED TaskBoardEntryAdapter #1963 -- constructed by engine.pipeline.entry.factory.build_work_entry_adapter on the TASK_BOARD arm; wired at boot by engine.pipeline.entry.boot.wire_real_task_board_entry; drives the spine for human-filed board tasks (POST /tasks)
ENFORCED ObjectiveEntryAdapter #1964 -- built at boot by engine.pipeline.entry.boot.wire_real_objective_entry; fed by POST /objectives
ENFORCED ProjectWorkspaceService #1974 -- constructed in api/app.py _install_runtime_services; per-project persistent git-backed workspace provisioning
Expand Down
55 changes: 29 additions & 26 deletions scripts/run_affected_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,19 +242,6 @@ def _affected_test_dirs(changed: list[str]) -> tuple[list[str], bool]:
r"\[(?P<worker>gw\d+)\] node down: Not properly terminated",
)

# pytest-xdist's scheduler raises Python-level exceptions when it
# tries to assign work to a worker that already disappeared (KeyError
# in ``loadscope.py``) or asserts on a residual ``crashitem`` while a
# worker reports finished. Both are downstream consequences of the
# native-level worker death captured by ``_NODE_DOWN_RE`` /
# ``_WORKER_CRASH_RE`` above, not independent regressions, so the
# classifier folds them into the crash-advisory branch when paired
# with at least one observed crash signature.
_XDIST_INTERNAL_ERROR_RE = re.compile(
r"^INTERNALERROR>",
re.MULTILINE,
)

# pytest in ``-q`` mode prints ``FAILED <test_id> - <reason>`` (or just
# ``FAILED <test_id>``) at the start of a line for every failure in the
# session summary. ``\S+`` captures up to the first whitespace; valid
Expand Down Expand Up @@ -648,12 +635,15 @@ def _classify_isolation_outcome(
* Crashes only, no repeats -> crash advisory (treat as pass; the
gate exits 0 and prints a hint about Windows ProactorEventLoop /
cross-worktree contention).
* Worker(s) went ``node down`` AND the only summary signal is an
``INTERNALERROR>`` traceback (xdist scheduler crashed because
its workers vanished) -> crash advisory. Without the parser
branch the dead-worker chain reads as "non-zero returncode +
no parsable test signal" and falls through to fail-closed,
blocking the push on documented native-level flakiness.
* Worker(s) went ``node down`` with a non-zero returncode (and no
real failure / repeated crash above) -> crash advisory. The
controller-side loadscope crash guard in ``tests/conftest.py``
suppresses the downstream ``INTERNALERROR>`` the dead-worker
chain used to emit, and a worker killed mid-teardown can die
before pytest prints any summary, so neither signal is required
to recognise the documented Python 3.14 + Windows xdist teardown
crash. The repeated-named-crash check above still blocks a test
that crashes the worker on every run.
Comment on lines +638 to +646
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't downgrade repeated bare node down crashes to advisory.

This path exits 0 for any non-zero run that only emits [gwN] node down..., but the repeated-crash guard only counts _parse_worker_crashes() entries with test ids. If the same test tears a worker down on both --count 2 iterations and xdist never prints worker ... crashed while running ..., the gate will still pass as advisory. That breaks the stated contract that “every replay” worker crashes must block.

Also applies to: 683-707

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/run_affected_tests.py` around lines 638 - 646, The repeated-crash
gate is only counting entries with test ids from _parse_worker_crashes(), so
bare "[gwN] node down" worker deaths get downgraded to advisory; update
_parse_worker_crashes() (and any consumer like the repeated-crash check
function) to also record and key plain "node down" events (e.g., by worker id or
crash signature) so they contribute to the repeated-crash detection, and then
ensure the repeated-crash guard treats repeated worker-down entries the same as
entries tied to test ids (i.e., fail the run rather than mark advisory).

* No crashes, no failures, returncode 0 -> pass.
* No parsable signal but returncode non-zero -> fail closed
(regression) so degraded output never silently passes.
Expand All @@ -662,7 +652,6 @@ def _classify_isolation_outcome(
crashed_tests = tuple(test for _, test in crashes)
crashed_set = set(crashed_tests)
node_down_workers = _parse_node_down(stdout)
has_internal_error = bool(_XDIST_INTERNAL_ERROR_RE.search(stdout))
failed_tests_raw = _parse_test_failures(stdout)
real_failures = tuple(t for t in failed_tests_raw if t not in crashed_set)

Expand Down Expand Up @@ -691,12 +680,26 @@ def _classify_isolation_outcome(
exit_code=0,
crashed_tests=crashed_tests,
)
# ``node down`` without a paired ``crashed while running`` line means
# the worker died between tests, so the test names are not
# recoverable -- surface the worker ids in their place so the
# advisory banner still has something to print and the
# ``crash_advisory`` invariant (``crashed_tests`` non-empty) holds.
if node_down_workers and has_internal_error and returncode != 0:
# A worker that went ``node down`` is a native-level crash, not a
# test failure. The real-failure and repeated-named-crash checks
# above have already returned, so reaching here means the only
# adverse signal is the worker death itself, with a non-zero exit.
#
# We do NOT require a downstream ``INTERNALERROR>`` here: the
# controller-side loadscope crash guard in ``tests/conftest.py``
# (``_install_xdist_loadscope_crash_guard``) deliberately suppresses
# the reschedule ``KeyError`` that used to surface as
# ``INTERNALERROR>``, and a worker killed mid-teardown can die
# before pytest prints any FAILED summary -- so neither an
# INTERNALERROR nor a parseable test id is guaranteed for the
# documented Python 3.14 + Windows xdist teardown crash. Requiring
# the INTERNALERROR would (now that the guard suppresses it) fail
# closed on every such crash, blocking every push that widens the
# affected selection. The repeated-crash guard above is the safety
# net for a test that genuinely crashes the worker on every run; a
# one-off node-down is treated as advisory. The test names are
# unrecoverable from a bare node-down, so surface the worker ids.
if node_down_workers and returncode != 0:
return IsolationOutcome(
kind="crash_advisory",
exit_code=0,
Expand Down
113 changes: 113 additions & 0 deletions src/synthorg/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
API_BRIDGE_CONFIG_RESOLVE_FAILED,
API_SERVICE_AUTO_WIRED,
)
from synthorg.observability.events.charter import CHARTER_SUBSTRATE_UNAVAILABLE
from synthorg.observability.events.settings import SETTINGS_VALUE_RESOLVED
from synthorg.persistence.artifact_storage import (
ArtifactStorageBackend, # noqa: TC001
Expand Down Expand Up @@ -1774,6 +1775,118 @@ async def _wire_chief_of_staff_proposer() -> None:

startup = [*startup, _wire_chief_of_staff_proposer]

async def _wire_charter_engine() -> None:
# Deep CEO interview to project charter. Wired only when
# ``meta.charter.interview_enabled`` is set AND a provider is
# registered AND persistence is connected (the conversation +
# charter stores are durable). Otherwise the /meta/charters
# controllers honestly surface 503. Best-effort: a wiring failure
# never poisons startup. Idempotent for re-entered lifespans.
if app_state.has_charter_service:
return
if (
provider_registry is None
or persistence is None
or not app_state.has_persistence
):
return
try:
from synthorg.api.services.project_service import ( # noqa: PLC0415
ProjectService,
)
from synthorg.meta.charter.dispatch import ( # noqa: PLC0415
CharterDispatcher,
)
from synthorg.meta.charter.factory import ( # noqa: PLC0415
build_charter_interview_strategy,
)
from synthorg.meta.charter.service import ( # noqa: PLC0415
CharterInterviewService,
)
from synthorg.meta.config import ( # noqa: PLC0415
load_self_improvement_config,
)
from synthorg.persistence.charter_factory import ( # noqa: PLC0415
build_charter_repository,
)
from synthorg.persistence.conversational_factory import ( # noqa: PLC0415
build_conversational_repositories,
)

si_config = await load_self_improvement_config(
app_state.settings_service if app_state.has_settings_service else None,
)
charter_config = si_config.charter
if not charter_config.interview_enabled:
return
charter_repo = build_charter_repository(persistence)
conv_repos = build_conversational_repositories(persistence)
available = provider_registry.list_providers()
if charter_repo is None or conv_repos is None or not available:
logger.warning(
CHARTER_SUBSTRATE_UNAVAILABLE,
note="charter interview enabled but stores/provider unavailable",
)
return
provider = provider_registry.get(available[0])
strategy = build_charter_interview_strategy(
charter_config,
provider=provider,
cost_tracker=cost_tracker,
)
app_state.set_charter_service(
CharterInterviewService(
strategy=strategy,
config=charter_config,
conversation_repo=conv_repos.conversation_repo,
turn_repo=conv_repos.turn_repo,
charter_repo=charter_repo,
)
)
# The approval dispatcher additionally needs the work-pipeline
# spine, the cost-forecast store, and the live budget config.
# When any is absent the interview still works; only approve
# 503s.
forecast_repo = app_state.cost_forecast_repo
budget_config = app_state.budget_config
if (
not app_state.has_work_pipeline
or forecast_repo is None
or budget_config is None
):
logger.warning(
CHARTER_SUBSTRATE_UNAVAILABLE,
note="charter dispatcher deps absent; approve will 503",
)
return
resolved_budget = budget_config
app_state.set_charter_dispatcher(
CharterDispatcher(
charter_repo=charter_repo,
forecast_repo=forecast_repo,
project_service=ProjectService(repo=persistence.projects),
work_pipeline=app_state.work_pipeline,
conversation_repo=conv_repos.conversation_repo,
budget_currency=lambda: resolved_budget.currency,
)
)
except MemoryError, RecursionError:
raise
except Exception as exc:
# Any other failure (settings load, repo construction,
# strategy build, ...) must not poison startup; the
# controllers will keep 503ing until the operator fixes
# the underlying configuration and reboots.
logger.warning(
CHARTER_SUBSTRATE_UNAVAILABLE,
note="charter wiring raised; charter endpoints stay unavailable",
error_type=type(exc).__name__,
error=safe_error_description(exc),
)
return

startup = [*startup, _wire_charter_engine]

Comment thread
coderabbitai[bot] marked this conversation as resolved.
async def _wire_toolsmith() -> None:
# Self-extending toolkit. Wired only when
# ``tool_creation_enabled`` is set AND a provider is registered
Expand Down
3 changes: 3 additions & 0 deletions src/synthorg/api/controllers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from synthorg.api.controllers.ceremony_policy import (
CeremonyPolicyController,
)
from synthorg.api.controllers.charter import CharterController
from synthorg.api.controllers.clients import ClientController
from synthorg.api.controllers.cockpit import CockpitController
from synthorg.api.controllers.collaboration import CollaborationController
Expand Down Expand Up @@ -128,6 +129,7 @@
MessageController,
MeetingController,
ArtifactController,
CharterController,
BudgetController,
ForecastBudgetController,
AnalyticsController,
Expand Down Expand Up @@ -228,6 +230,7 @@
"BudgetConfigVersionController",
"BudgetController",
"CeremonyPolicyController",
"CharterController",
"ClientController",
"CollaborationController",
"CompanyController",
Expand Down
Loading
Loading