Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/reference/py314-flake-investigation-2026-05.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Python 3.14 CI flake investigation (2026-05-05)

## Outcome

No reproducible Python 3.14-specific flake was identified. The single CI failure
attributed to "test flakiness" on 2026-05-05 was a deterministic edge case in a
newly-added symlink-escape test, not a 3.14 issue. No further test-isolation
tightening is warranted at this time.

## Runs inspected

`gh run list --workflow ci.yml --status failure` for the 2026-05-05 window
returned three failed runs. Only one was on the Python `Test (Python 3.14)` job:

| Run | Branch | Failed test |
| --- | --- | --- |
| 25399305527 | `chore/convention-rollout-gate` | `tests/unit/scripts/test_check_convention_gate_inventory.py::test_gate_path_symlink_escape_treated_as_missing` |
| 25361288425 | `main` | unrelated (post-merge gate run) |
| 25359737418 | `fix/test-isolation-gate-1755` | already a fix PR for cross-loop asyncio primitives |

## Root cause for run 25399305527

The failing assertion was `len(violations) == 1` evaluating to `len([]) == 1`.
The test creates a symlink under the test repo root pointing at a tempdir that is
a sibling of the repo root, then asserts that the convention-gate inventory check
flags the symlink as a missing gate file (because resolved path escapes the repo).

On the Linux GitHub Actions runner that day the symlink resolution *did not*
escape: both `tmp_path_factory.mktemp("outside_repo")` and the test repo root
landed under the same `pytest-of-runner/...` parent, so the resolved real path
was still inside the repo root and the gate file was treated as present.

This is a deterministic environment quirk, not a flake. Subsequent runs of the
same test on identical CI infrastructure pass (the parent-tempdir layout
varies between sessions).

## Already-merged mitigations relevant to 3.14 isolation

The 3.14 + xdist surface area was tightened during the same audit window by
prior PRs:

- `fix(test): exterminate xdist-flaky tests with module-level state (#1713)`:
removed module-level state leaks that surfaced under `loadfile` distribution.
- `fix(test): pre-push isolation gate flakes from cross-loop asyncio primitives
(#1755)`: replaced asyncio primitives constructed at import time with
per-test instances.

The pyproject `addopts` already pin `-n 8 --dist=loadfile` so a Windows + 3.14
+ ProactorEventLoop teardown leak cannot escape into other test modules.
`tests/conftest.py` already enforces a per-unit-test wall-clock budget of 8 s
which surfaces real regressions deterministically rather than as flakes.

## Why no further work is queued

A genuine flake-hunt requires a reproducer. The 2026-05-05 symlink test runs
green on every subsequent CI invocation; rerunning it in a loop on Linux did
not reproduce the empty-violations branch. Adding pre-emptive isolation
hardening without a failing test would be speculative work that could mask
real flakes if they emerge.

If a 3.14-specific flake re-surfaces, the next investigation should:

1. capture the failing run's full junit.xml + stdout artifact;
2. attempt local reproduction with `RUN_INTEGRATION_TESTS=1 uv run python -m
pytest tests/<failing_path> --count=20`;
3. only then add a targeted isolation fixture or skip-marker.
68 changes: 7 additions & 61 deletions scripts/mock_spec_baseline.txt
Original file line number Diff line number Diff line change
Expand Up @@ -120,53 +120,6 @@ tests/integration/integrations/test_controllers.py:1087:18
tests/integration/integrations/test_controllers.py:1088:27
tests/integration/integrations/test_controllers.py:1105:18
tests/integration/integrations/test_controllers.py:1106:31
tests/integration/integrations/test_oauth_flows.py:51:11
tests/integration/integrations/test_oauth_flows.py:54:28
tests/integration/integrations/test_oauth_flows.py:84:22
tests/integration/integrations/test_oauth_flows.py:119:22
tests/integration/integrations/test_oauth_flows.py:155:21
tests/integration/integrations/test_oauth_flows.py:156:25
tests/integration/integrations/test_oauth_flows.py:157:28
tests/integration/integrations/test_oauth_flows.py:158:35
tests/integration/integrations/test_oauth_flows.py:172:18
tests/integration/integrations/test_oauth_flows.py:173:31
tests/integration/integrations/test_oauth_flows.py:180:34
tests/integration/integrations/test_oauth_flows.py:187:37
tests/integration/integrations/test_oauth_flows.py:188:25
tests/integration/integrations/test_oauth_flows.py:190:20
tests/integration/integrations/test_oauth_flows.py:191:34
tests/integration/integrations/test_oauth_flows.py:192:25
tests/integration/integrations/test_oauth_flows.py:219:21
tests/integration/integrations/test_oauth_flows.py:220:25
tests/integration/integrations/test_oauth_flows.py:221:28
tests/integration/integrations/test_oauth_flows.py:222:18
tests/integration/integrations/test_oauth_flows.py:239:21
tests/integration/integrations/test_oauth_flows.py:240:25
tests/integration/integrations/test_oauth_flows.py:241:28
tests/integration/integrations/test_oauth_flows.py:242:18
tests/integration/integrations/test_oauth_flows.py:243:31
tests/integration/integrations/test_oauth_flows.py:250:34
tests/integration/integrations/test_oauth_flows.py:286:25
tests/integration/integrations/test_oauth_flows.py:287:28
tests/integration/integrations/test_oauth_flows.py:288:35
tests/integration/integrations/test_oauth_flows.py:291:31
tests/integration/integrations/test_oauth_flows.py:292:34
tests/integration/integrations/test_oauth_flows.py:293:37
tests/integration/integrations/test_oauth_flows.py:294:25
tests/integration/integrations/test_oauth_flows.py:297:34
tests/integration/integrations/test_oauth_flows.py:337:25
tests/integration/integrations/test_oauth_flows.py:338:28
tests/integration/integrations/test_oauth_flows.py:339:35
tests/integration/integrations/test_oauth_flows.py:342:31
tests/integration/integrations/test_oauth_flows.py:349:34
tests/integration/integrations/test_oauth_flows.py:356:37
tests/integration/integrations/test_oauth_flows.py:357:25
tests/integration/integrations/test_oauth_flows.py:360:34
tests/integration/integrations/test_oauth_flows.py:361:25
tests/integration/integrations/test_oauth_flows.py:398:22
tests/integration/integrations/test_oauth_flows.py:450:22
tests/integration/integrations/test_oauth_flows.py:620:22
tests/integration/integrations/test_oauth_flows.py:753:22
tests/integration/integrations/test_rate_limiter_shared_state.py:114:18
tests/integration/mcp/test_tool_surface.py:59:11
tests/integration/mcp/test_tool_surface.py:60:22
Expand Down Expand Up @@ -2287,13 +2240,13 @@ tests/unit/meta/test_code_applier.py:119:16
tests/unit/meta/test_code_applier.py:210:27
tests/unit/meta/test_code_applier.py:250:29
tests/unit/meta/test_code_applier.py:373:25
tests/unit/meta/test_code_modification_strategy.py:102:15
tests/unit/meta/test_code_modification_strategy.py:103:20
tests/unit/meta/test_code_modification_strategy.py:105:24
tests/unit/meta/test_code_modification_strategy.py:249:19
tests/unit/meta/test_code_modification_strategy.py:250:28
tests/unit/meta/test_code_modification_strategy.py:392:19
tests/unit/meta/test_code_modification_strategy.py:393:28
tests/unit/meta/test_code_modification_strategy.py:103:15
tests/unit/meta/test_code_modification_strategy.py:104:20
tests/unit/meta/test_code_modification_strategy.py:106:24
tests/unit/meta/test_code_modification_strategy.py:250:19
tests/unit/meta/test_code_modification_strategy.py:251:28
tests/unit/meta/test_code_modification_strategy.py:393:19
tests/unit/meta/test_code_modification_strategy.py:394:28
tests/unit/meta/test_config_loader.py:20:14
tests/unit/meta/test_config_loader.py:23:18
tests/unit/meta/test_config_loader.py:73:14
Expand Down Expand Up @@ -2553,13 +2506,6 @@ tests/unit/providers/test_discovery.py:40:13
tests/unit/providers/test_discovery.py:45:24
tests/unit/providers/test_discovery.py:46:23
tests/unit/providers/test_family.py:11:13
tests/unit/providers/test_health_prober.py:96:22
tests/unit/providers/test_health_prober.py:98:30
tests/unit/providers/test_health_prober.py:100:28
tests/unit/providers/test_health_prober.py:102:30
tests/unit/providers/test_health_prober.py:103:33
tests/unit/providers/test_health_prober.py:104:32
tests/unit/providers/test_health_prober.py:255:24
tests/unit/providers/test_protocol.py:456:32
tests/unit/scripts/test_generate_comparison.py:191:13
tests/unit/scripts/test_generate_comparison.py:193:14
Expand Down
34 changes: 33 additions & 1 deletion src/synthorg/api/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
_build_default_trust_service,
_build_performance_tracker,
_build_telemetry_collector,
build_chief_of_staff_chat,
)
from synthorg.api.app_helpers import (
_make_expire_callback,
Expand Down Expand Up @@ -141,7 +142,9 @@
# Update both sites together if the default ever changes; otherwise a
# bootstrap value will silently disagree with operator-editable
# overrides resolved through ``ConfigResolver``.
_DEFAULT_TIMEOUT_CHECK_INTERVAL_SECONDS = 60.0
_DEFAULT_TIMEOUT_CHECK_INTERVAL_SECONDS = (
60.0 # lint-allow: magic-numbers -- bootstrap default mirrored by ConfigResolver
)


def _build_default_approval_timeout_scheduler(
Expand Down Expand Up @@ -958,6 +961,35 @@ def create_app( # noqa: C901, PLR0912, PLR0913, PLR0915
)
app_state.set_report_service(report_service)

async def _wire_chief_of_staff_chat() -> None:
# Wired only when the meta config opts in via
# ``chief_of_staff.chat_enabled`` AND a provider is registered.
# When unwired, ``POST /meta/chat`` surfaces 503 rather than the
# silent placeholder it returned previously.
# Idempotent: a re-entry of lifespan startup against the same
# ``AppState`` (e.g. ASGI restart in tests) would otherwise make
# the one-shot ``set_chief_of_staff_chat`` raise.
if app_state.has_chief_of_staff_chat:
return
if provider_registry is None:
return
from synthorg.meta.config import ( # noqa: PLC0415
load_self_improvement_config,
)

meta_self_improvement = await load_self_improvement_config(
app_state.settings_service if app_state.has_settings_service else None,
)
chat_backend = build_chief_of_staff_chat(
meta_self_improvement.chief_of_staff,
provider_registry=provider_registry,
cost_tracker=cost_tracker,
)
if chat_backend is not None:
app_state.set_chief_of_staff_chat(chat_backend)

startup = [*startup, _wire_chief_of_staff_chat]

Comment thread
coderabbitai[bot] marked this conversation as resolved.
# Bring up the notification dispatcher's HTTP-bearing sinks
# (slack/ntfy ``httpx.AsyncClient``) lazily under their lifecycle
# locks. Stateless sinks (console/email) implement no-op
Expand Down
59 changes: 59 additions & 0 deletions src/synthorg/api/app_builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,25 @@
)
from synthorg.telemetry import TelemetryCollector, TelemetryConfig

# All four of ``CostTracker`` / ``ChiefOfStaffChat`` / ``ChiefOfStaffConfig``
# / ``ProviderRegistry`` are imported lazily under TYPE_CHECKING. Hoisting
# them to runtime imports created a circular import via the budget /
# observability chain (``cannot import name 'CostRecord' from partially
# initialized module 'synthorg.budget.cost_record'``). Under PEP 649 the
# annotations are stored as code objects and only evaluated when
# ``typing.get_type_hints()`` runs against this module -- which Litestar's
# route discovery does for handler signatures, not for the helpers below
# (private prefix or non-handler). ``ChiefOfStaffChat`` is also imported
# in-function below for the constructor call site, so the runtime
# constructor reference is independent of the annotation surface.
if TYPE_CHECKING:
from synthorg.budget.tracker import CostTracker
from synthorg.config.schema import RootConfig
from synthorg.hr.performance.config import PerformanceConfig
from synthorg.hr.performance.quality_protocol import QualityScoringStrategy
from synthorg.hr.performance.tracker import PerformanceTracker
from synthorg.meta.chief_of_staff.chat import ChiefOfStaffChat
from synthorg.meta.chief_of_staff.config import ChiefOfStaffConfig
from synthorg.providers.registry import ProviderRegistry
from synthorg.security.trust.service import TrustService

Expand Down Expand Up @@ -116,6 +129,52 @@ def _resolve_llm_judge_strategy(
)


def build_chief_of_staff_chat(
chief_of_staff_config: ChiefOfStaffConfig,
*,
provider_registry: ProviderRegistry,
cost_tracker: CostTracker | None,
) -> ChiefOfStaffChat | None:
"""Resolve a ChiefOfStaffChat from the meta config + provider registry.

Returns ``None`` -- and the ``POST /meta/chat`` endpoint then surfaces
503 -- when:

- ``chief_of_staff_config.chat_enabled`` is False (the documented
opt-in default), or
- no LLM provider is registered (degenerate test/anonymous boots).

The provider is picked by the same convention as the LLM quality
judge: the first registered provider, since the chat model name in
config is provider-agnostic.
"""
from synthorg.meta.chief_of_staff.chat import ChiefOfStaffChat # noqa: PLC0415

if not chief_of_staff_config.chat_enabled:
return None

available = provider_registry.list_providers()
if not available:
logger.warning(
API_APP_STARTUP,
note="Chief of Staff chat enabled but no providers registered",
)
return None

provider = provider_registry.get(available[0])
logger.info(
API_APP_STARTUP,
note="Chief of Staff chat configured",
provider=available[0],
chat_model=str(chief_of_staff_config.chat_model),
)
return ChiefOfStaffChat(
provider=provider,
config=chief_of_staff_config,
cost_tracker=cost_tracker,
)


def _build_default_trust_service() -> TrustService:
"""Build a default no-op TrustService for agent health queries."""
from synthorg.security.trust.config import TrustConfig # noqa: PLC0415
Expand Down
Loading
Loading