Skip to content

feat(preflight): eliminate silent graph-expansion fallbacks (#243)#294

Merged
jinhongkuan merged 2 commits into
devfrom
243-preflight-eliminate-fallbacks
May 10, 2026
Merged

feat(preflight): eliminate silent graph-expansion fallbacks (#243)#294
jinhongkuan merged 2 commits into
devfrom
243-preflight-eliminate-fallbacks

Conversation

@silongtan

Copy link
Copy Markdown
Collaborator

Summary

Closes #243 (P0). Two-piece fix on top of #173/#174's graph-expansion work.

PR #174 closed the recall ceiling but introduced two silent fallback paths in _region_anchored_preflight — when ctx.code_graph was absent OR when the expander raised, the response shape was byte-identical to "expansion ran and matched zero" — caller couldn't tell recall was degraded. The RealCodeLocatorAdapter already raised a loud RuntimeError at the adapter layer, but the preflight handler swallowed it at DEBUG.

This PR makes both ends loud:

  • Piece A (commit 3c9730f): handler-level loud signal — sources_chained tag + WARN log + telemetry counter.
  • Piece B (commit d136637): server boot refuses to start with a broken index — fail-loud at startup so silent fallback can't accumulate hours of degraded recall in production.

Phase 2 spec was posted on #243 for signoff before any code landed (per Kevin's #87 ruling). All four open questions defaulted to recommended.

Piece A — Loud handler-level fallback signal

handlers/preflight.py distinguishes three fallback reasons (was conflated into a single if expander is not None: skip):

Code path New fallback_reason Signals fired
ctx.code_graph is None "absent" response tag + telemetry
code_graph set but no expand_file_paths_via_graph "missing_method" response tag + telemetry
expander raised "exception:<type>" response tag + telemetry + WARN log

Three additive signals when any of the above fires:

  1. Response fieldsources_chained includes "graph_unavailable". Additive (never replaces existing "region" / "graph" tags). Bare tag per signoff Q2 — granular reason flows through telemetry, not the response shape, keeping the response stable.
  2. Log level — exception case bumped from logger.debuglogger.warning with stable [preflight:fallback] substring + exception type for grep-friendly production logs.
  3. Telemetry counter — new preflight_telemetry.write_fallback_event(reason, session_id) modeled on write_ingest_refusal_event ([compliance:epic] Ingest boundary guardrails — server-side gates on the durable write surface #216). Emits a graph_expansion_fallback row to ~/.bicameral/preflight_events.jsonl. Gated on BICAMERAL_TELEMETRY=preflight.

Skill update (skills/bicameral-preflight/SKILL.md) renders a one-line recall-degraded note to the agent when the tag is present:

Note: structural-neighbor lookup was unavailable this call — recall may be reduced until the symbol index is rebuilt. Decisions bound to files that import these may not have surfaced.

Piece B — Eager startup init + fail-loud

adapters/code_locator.py:

  • Singleton-by-REPO_PATH cache via _INSTANCE_CACHE. Path.resolve() on the key so symlink + relative-path callers cache-hit consistently. Multi-repo correctness preserved (per signoff Q1).
  • New reset_code_locator_cache() test-only hook, mirroring adapters.ledger.reset_ledger_singleton.
  • New async def initialize() wraps sync _ensure_initialized() in loop.run_in_executor(None, ...) so cold-init doesn't block the event loop. Idempotent on already-initialized adapters.

server.py:serve_stdio():

  • Calls await get_code_locator().initialize() between dashboard sidecar start and consent-notice block.
  • Fail-loud per signoff Q3 — explicit except RuntimeError as exc: re-raises after printing an actionable stderr message ("Run: python -m code_locator index <repo>"). Outer try/finally still runs SERVER_SHUTDOWN audit emit.

Files

File Δ Role
handlers/preflight.py +70 / −22 Three-reason classifier, loud signals, sources_chained tag
preflight_telemetry.py +39 write_fallback_event(reason, session_id)
skills/bicameral-preflight/SKILL.md +15 graph_unavailable agent-facing render
adapters/code_locator.py +50 Singleton cache, reset hook, async initialize()
server.py +28 Eager startup hook, fail-loud on RuntimeError
tests/test_preflight_graph_expansion.py +330 8 new tests (4 Piece A + 4 Piece B)
CHANGELOG.md +2 Unreleased entries (Added + Changed)

Tests

# Test Piece
1 test_preflight_fallback_absent_code_graph_tags_graph_unavailable A
2 test_preflight_fallback_expander_raises_warns_and_tags (asserts WARN log via caplog) A
3 test_preflight_successful_expansion_does_not_tag_graph_unavailable (regression guard) A
4 test_preflight_empty_file_paths_does_not_tag_graph_unavailable (distinguishes never-attempted from attempted-and-fell-back) A
5a test_get_code_locator_returns_same_instance_per_repo_path (singleton + reset across two REPO_PATHs) B
5b test_initialize_succeeds_when_index_present (idempotent on already-initialized) B
6 test_initialize_fails_loudly_when_index_empty (RuntimeError propagates through async wrapper) B
7 test_serve_stdio_refuses_boot_on_empty_index (boot-path level: empty index aborts boot) B

Existing tests use containment assertions ("region" in sources_chained) not exact list equality, so the additive "graph_unavailable" tag won't break them.

Local verification

  • ✅ ruff check + format + mypy all green on touched files
  • ✅ Singleton + reset_code_locator_cache smoke test (4 assertions: cache hit, distinct on new path, fresh after reset, second call cached again)
  • ✅ Async initialize() smoke test (re-raises stubbed RuntimeError; idempotent no-op on _initialized=True adapter)
  • bicameral.link_commit clean on both commits — 0 drift, 0 pending checks
  • ⏳ Full ledger-touching test run pending CI (4 of 8 new tests need surrealdb via the integration_env fixture)

Refs

Closes #243 (P0). Parent: #173 / PR #174. Plan signoff via issue-243 comment.

🤖 Generated with Claude Code

silongtan and others added 2 commits May 9, 2026 22:29
PR #174 closed the recall ceiling but introduced two silent fallback
paths in `_region_anchored_preflight`: when `ctx.code_graph` was
absent OR when the expander raised, the response shape was byte-
identical to "expansion ran and matched zero" — caller couldn't tell
recall was degraded.

Three additive signals now surface every fallback (per Phase 2 spec
posted on #243, all four open questions defaulted to recommended):

  1. Response field — `sources_chained` includes `"graph_unavailable"`.
     Additive (never replaces existing `"region"` / `"graph"` tags).
     Bare tag — granular reason flows through telemetry, not the
     response shape, per signoff Q2.

  2. Log level — exception case bumped from `logger.debug` →
     `logger.warning` with stable `[preflight:fallback]` substring +
     exception type for grep-friendly production logs.

  3. Telemetry counter — new `preflight_telemetry.write_fallback_event(
     reason, session_id)` modeled on `write_ingest_refusal_event`
     (#216). Emits a `graph_expansion_fallback` row to the existing
     `~/.bicameral/preflight_events.jsonl` substrate. Reasons are a
     controlled enum: `"absent"`, `"missing_method"`,
     `"exception:<type>"`. Gated on `BICAMERAL_TELEMETRY=preflight`.

The fallback case classifier in `_region_anchored_preflight`
distinguishes three reasons (was conflated into a single `if expander
is not None:` skip in the pre-#243 code):

  - `code_graph is None`                                      → "absent"
  - `code_graph` set but no `expand_file_paths_via_graph`     → "missing_method"
  - expander raised                                            → "exception:<typ>"

Skill update (`skills/bicameral-preflight/SKILL.md`) renders a one-
line recall-degraded note to the agent when the tag is present:

  > Note: structural-neighbor lookup was unavailable this call —
  > recall may be reduced until the symbol index is rebuilt. Decisions
  > bound to files that import these may not have surfaced.

Treats `"graph_unavailable"` as advisory: doesn't block the preflight
surface; direct-pin matches are unaffected.

Tests
-----

4 new cases in `tests/test_preflight_graph_expansion.py`:

  - test_preflight_fallback_absent_code_graph_tags_graph_unavailable
    — ctx with code_graph=None → response carries the tag,
    telemetry counter reason="absent"
  - test_preflight_fallback_expander_raises_warns_and_tags
    — stub expander raises RuntimeError → response carries the tag,
    `caplog` captures WARN-level log with `[preflight:fallback]`
    substring, telemetry counter reason="exception:RuntimeError"
  - test_preflight_successful_expansion_does_not_tag_graph_unavailable
    — regression guard: clean expansion path must NOT carry the tag
    (no false alarms)
  - test_preflight_empty_file_paths_does_not_tag_graph_unavailable
    — empty file_paths short-circuits before expansion check; the
    "expansion was never attempted" case is distinguishable from
    "attempted-and-fell-back"

Existing tests use containment assertions (`"region" in
sources_chained`) not exact list equality, so additive `"graph_
unavailable"` doesn't break them.

What's NOT in this PR
---------------------

Piece B (eager symbol-index initialization at server startup) is the
follow-up commit on this branch. Lands separately so the response-
shape change can ship without the adapter-lifecycle change. After
both pieces land, the telemetry counter shipped here gives ongoing
visibility into how often fallback engages in production.

Refs #243 (parent #173 / PR #174). Plan signoff via
#243 (comment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Piece B)

Pre-fix, the code-locator adapter had two cooperating problems that
made silent fallback the default:

  1. `get_code_locator()` returned a FRESH `RealCodeLocatorAdapter`
     per call. Caching was absent.
  2. `_ensure_initialized()` was lazy — first tool call paid the
     index-build cost AND could race the index check on concurrent
     dispatch (e.g. preflight + bind landing in parallel after
     server boot).

Together: every silent fallback in the production runtime was
"hot" because the adapter was being rebuilt + rechecked on every
call. Piece A (#283 commit 3c9730f) made the fallback loud at the
response layer; Piece B closes the upstream cause.

Three changes
-------------

  adapters/code_locator.py
    - Singleton-by-REPO_PATH cache via `_INSTANCE_CACHE: dict[str,
      RealCodeLocatorAdapter]`. Path resolved through `Path.resolve()`
      so symlink + relative-path callers cache-hit consistently.
      Multi-repo correctness preserved (any test that swaps REPO_PATH
      mid-process gets a fresh adapter for the new path).
    - New `reset_code_locator_cache()` test-only hook, mirroring
      `adapters.ledger.reset_ledger_singleton`.
    - New `async def RealCodeLocatorAdapter.initialize()` — wraps
      sync `_ensure_initialized()` in `loop.run_in_executor(None, ...)`
      so the cold-init path doesn't block the event loop. Idempotent
      on already-initialized adapters.

  server.py
    - `serve_stdio()` calls `await get_code_locator().initialize()`
      between the dashboard sidecar start and the consent-notice block.
    - **Fail-loud per #243 phase-2 signoff Q3** — explicit `except
      RuntimeError as exc:` re-raises after printing an actionable
      stderr message (`"Run: python -m code_locator index <repo>"`).
      The outer try/finally still runs the `SERVER_SHUTDOWN` audit
      emit, so operators get a clean event AND a clear actionable
      error. No more silent degradation.

  tests/test_preflight_graph_expansion.py — 4 new tests
    - test_get_code_locator_returns_same_instance_per_repo_path
      (singleton + reset behavior across two REPO_PATHs)
    - test_initialize_succeeds_when_index_present
      (idempotent on already-initialized adapter)
    - test_initialize_fails_loudly_when_index_empty
      (RuntimeError from `_ensure_initialized` propagates through the
      async wrapper — doesn't get swallowed)
    - test_serve_stdio_refuses_boot_on_empty_index
      (boot-path level: with everything else stubbed healthy, an
      empty index aborts `serve_stdio()` with the expected
      RuntimeError)

Local smoke tests
-----------------

  - Singleton + reset_code_locator_cache: 4 assertions pass
    (cache hit on same path, distinct instance on new path, fresh
    after reset, second call after reset stays cached)
  - Async `initialize()`: re-raises RuntimeError on stubbed
    `_ensure_initialized` failure; idempotent no-op on
    already-initialized adapter

  - ruff check + ruff format --check + mypy all green on touched files

What's NOT in this PR
---------------------

Nothing — Piece A (commit 3c9730f) and Piece B (this commit) together
close #243's full scope. PR will open with both pieces. Telemetry
counter shipped in Piece A gives ongoing production visibility into
how often fallback engages post-merge.

Refs #243 (parent #173 / PR #174). Plan signoff via
#243 (comment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 10, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8127c671-a0fc-450c-b9e5-4c90cd824d0e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 243-preflight-eliminate-fallbacks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan merged commit 119cd89 into dev May 10, 2026
8 of 9 checks passed
@silongtan silongtan deleted the 243-preflight-eliminate-fallbacks branch May 16, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants