feat(preflight): eliminate silent graph-expansion fallbacks (#243)#294
Merged
Conversation
PR #174 closed the recall ceiling but introduced two silent fallback paths in `_region_anchored_preflight`: when `ctx.code_graph` was absent OR when the expander raised, the response shape was byte- identical to "expansion ran and matched zero" — caller couldn't tell recall was degraded. Three additive signals now surface every fallback (per Phase 2 spec posted on #243, all four open questions defaulted to recommended): 1. Response field — `sources_chained` includes `"graph_unavailable"`. Additive (never replaces existing `"region"` / `"graph"` tags). Bare tag — granular reason flows through telemetry, not the response shape, per signoff Q2. 2. Log level — exception case bumped from `logger.debug` → `logger.warning` with stable `[preflight:fallback]` substring + exception type for grep-friendly production logs. 3. Telemetry counter — new `preflight_telemetry.write_fallback_event( reason, session_id)` modeled on `write_ingest_refusal_event` (#216). Emits a `graph_expansion_fallback` row to the existing `~/.bicameral/preflight_events.jsonl` substrate. Reasons are a controlled enum: `"absent"`, `"missing_method"`, `"exception:<type>"`. Gated on `BICAMERAL_TELEMETRY=preflight`. The fallback case classifier in `_region_anchored_preflight` distinguishes three reasons (was conflated into a single `if expander is not None:` skip in the pre-#243 code): - `code_graph is None` → "absent" - `code_graph` set but no `expand_file_paths_via_graph` → "missing_method" - expander raised → "exception:<typ>" Skill update (`skills/bicameral-preflight/SKILL.md`) renders a one- line recall-degraded note to the agent when the tag is present: > Note: structural-neighbor lookup was unavailable this call — > recall may be reduced until the symbol index is rebuilt. Decisions > bound to files that import these may not have surfaced. Treats `"graph_unavailable"` as advisory: doesn't block the preflight surface; direct-pin matches are unaffected. Tests ----- 4 new cases in `tests/test_preflight_graph_expansion.py`: - test_preflight_fallback_absent_code_graph_tags_graph_unavailable — ctx with code_graph=None → response carries the tag, telemetry counter reason="absent" - test_preflight_fallback_expander_raises_warns_and_tags — stub expander raises RuntimeError → response carries the tag, `caplog` captures WARN-level log with `[preflight:fallback]` substring, telemetry counter reason="exception:RuntimeError" - test_preflight_successful_expansion_does_not_tag_graph_unavailable — regression guard: clean expansion path must NOT carry the tag (no false alarms) - test_preflight_empty_file_paths_does_not_tag_graph_unavailable — empty file_paths short-circuits before expansion check; the "expansion was never attempted" case is distinguishable from "attempted-and-fell-back" Existing tests use containment assertions (`"region" in sources_chained`) not exact list equality, so additive `"graph_ unavailable"` doesn't break them. What's NOT in this PR --------------------- Piece B (eager symbol-index initialization at server startup) is the follow-up commit on this branch. Lands separately so the response- shape change can ship without the adapter-lifecycle change. After both pieces land, the telemetry counter shipped here gives ongoing visibility into how often fallback engages in production. Refs #243 (parent #173 / PR #174). Plan signoff via #243 (comment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Piece B)
Pre-fix, the code-locator adapter had two cooperating problems that
made silent fallback the default:
1. `get_code_locator()` returned a FRESH `RealCodeLocatorAdapter`
per call. Caching was absent.
2. `_ensure_initialized()` was lazy — first tool call paid the
index-build cost AND could race the index check on concurrent
dispatch (e.g. preflight + bind landing in parallel after
server boot).
Together: every silent fallback in the production runtime was
"hot" because the adapter was being rebuilt + rechecked on every
call. Piece A (#283 commit 3c9730f) made the fallback loud at the
response layer; Piece B closes the upstream cause.
Three changes
-------------
adapters/code_locator.py
- Singleton-by-REPO_PATH cache via `_INSTANCE_CACHE: dict[str,
RealCodeLocatorAdapter]`. Path resolved through `Path.resolve()`
so symlink + relative-path callers cache-hit consistently.
Multi-repo correctness preserved (any test that swaps REPO_PATH
mid-process gets a fresh adapter for the new path).
- New `reset_code_locator_cache()` test-only hook, mirroring
`adapters.ledger.reset_ledger_singleton`.
- New `async def RealCodeLocatorAdapter.initialize()` — wraps
sync `_ensure_initialized()` in `loop.run_in_executor(None, ...)`
so the cold-init path doesn't block the event loop. Idempotent
on already-initialized adapters.
server.py
- `serve_stdio()` calls `await get_code_locator().initialize()`
between the dashboard sidecar start and the consent-notice block.
- **Fail-loud per #243 phase-2 signoff Q3** — explicit `except
RuntimeError as exc:` re-raises after printing an actionable
stderr message (`"Run: python -m code_locator index <repo>"`).
The outer try/finally still runs the `SERVER_SHUTDOWN` audit
emit, so operators get a clean event AND a clear actionable
error. No more silent degradation.
tests/test_preflight_graph_expansion.py — 4 new tests
- test_get_code_locator_returns_same_instance_per_repo_path
(singleton + reset behavior across two REPO_PATHs)
- test_initialize_succeeds_when_index_present
(idempotent on already-initialized adapter)
- test_initialize_fails_loudly_when_index_empty
(RuntimeError from `_ensure_initialized` propagates through the
async wrapper — doesn't get swallowed)
- test_serve_stdio_refuses_boot_on_empty_index
(boot-path level: with everything else stubbed healthy, an
empty index aborts `serve_stdio()` with the expected
RuntimeError)
Local smoke tests
-----------------
- Singleton + reset_code_locator_cache: 4 assertions pass
(cache hit on same path, distinct instance on new path, fresh
after reset, second call after reset stays cached)
- Async `initialize()`: re-raises RuntimeError on stubbed
`_ensure_initialized` failure; idempotent no-op on
already-initialized adapter
- ruff check + ruff format --check + mypy all green on touched files
What's NOT in this PR
---------------------
Nothing — Piece A (commit 3c9730f) and Piece B (this commit) together
close #243's full scope. PR will open with both pieces. Telemetry
counter shipped in Piece A gives ongoing production visibility into
how often fallback engages post-merge.
Refs #243 (parent #173 / PR #174). Plan signoff via
#243 (comment).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #243 (P0). Two-piece fix on top of #173/#174's graph-expansion work.
PR #174 closed the recall ceiling but introduced two silent fallback paths in
_region_anchored_preflight— whenctx.code_graphwas absent OR when the expander raised, the response shape was byte-identical to "expansion ran and matched zero" — caller couldn't tell recall was degraded. TheRealCodeLocatorAdapteralready raised a loudRuntimeErrorat the adapter layer, but the preflight handler swallowed it atDEBUG.This PR makes both ends loud:
3c9730f): handler-level loud signal —sources_chainedtag + WARN log + telemetry counter.d136637): server boot refuses to start with a broken index — fail-loud at startup so silent fallback can't accumulate hours of degraded recall in production.Phase 2 spec was posted on #243 for signoff before any code landed (per Kevin's #87 ruling). All four open questions defaulted to recommended.
Piece A — Loud handler-level fallback signal
handlers/preflight.pydistinguishes three fallback reasons (was conflated into a singleif expander is not None:skip):fallback_reasonctx.code_graph is None"absent"code_graphset but noexpand_file_paths_via_graph"missing_method""exception:<type>"Three additive signals when any of the above fires:
sources_chainedincludes"graph_unavailable". Additive (never replaces existing"region"/"graph"tags). Bare tag per signoff Q2 — granular reason flows through telemetry, not the response shape, keeping the response stable.logger.debug→logger.warningwith stable[preflight:fallback]substring + exception type for grep-friendly production logs.preflight_telemetry.write_fallback_event(reason, session_id)modeled onwrite_ingest_refusal_event([compliance:epic] Ingest boundary guardrails — server-side gates on the durable write surface #216). Emits agraph_expansion_fallbackrow to~/.bicameral/preflight_events.jsonl. Gated onBICAMERAL_TELEMETRY=preflight.Skill update (
skills/bicameral-preflight/SKILL.md) renders a one-line recall-degraded note to the agent when the tag is present:Piece B — Eager startup init + fail-loud
adapters/code_locator.py:REPO_PATHcache via_INSTANCE_CACHE.Path.resolve()on the key so symlink + relative-path callers cache-hit consistently. Multi-repo correctness preserved (per signoff Q1).reset_code_locator_cache()test-only hook, mirroringadapters.ledger.reset_ledger_singleton.async def initialize()wraps sync_ensure_initialized()inloop.run_in_executor(None, ...)so cold-init doesn't block the event loop. Idempotent on already-initialized adapters.server.py:serve_stdio():await get_code_locator().initialize()between dashboard sidecar start and consent-notice block.except RuntimeError as exc:re-raises after printing an actionable stderr message ("Run: python -m code_locator index <repo>"). Outer try/finally still runsSERVER_SHUTDOWNaudit emit.Files
handlers/preflight.pypreflight_telemetry.pywrite_fallback_event(reason, session_id)skills/bicameral-preflight/SKILL.mdgraph_unavailableagent-facing renderadapters/code_locator.pyasync initialize()server.pytests/test_preflight_graph_expansion.pyCHANGELOG.mdTests
test_preflight_fallback_absent_code_graph_tags_graph_unavailabletest_preflight_fallback_expander_raises_warns_and_tags(asserts WARN log viacaplog)test_preflight_successful_expansion_does_not_tag_graph_unavailable(regression guard)test_preflight_empty_file_paths_does_not_tag_graph_unavailable(distinguishes never-attempted from attempted-and-fell-back)test_get_code_locator_returns_same_instance_per_repo_path(singleton + reset across two REPO_PATHs)test_initialize_succeeds_when_index_present(idempotent on already-initialized)test_initialize_fails_loudly_when_index_empty(RuntimeError propagates through async wrapper)test_serve_stdio_refuses_boot_on_empty_index(boot-path level: empty index aborts boot)Existing tests use containment assertions (
"region" in sources_chained) not exact list equality, so the additive"graph_unavailable"tag won't break them.Local verification
reset_code_locator_cachesmoke test (4 assertions: cache hit, distinct on new path, fresh after reset, second call cached again)initialize()smoke test (re-raises stubbed RuntimeError; idempotent no-op on_initialized=Trueadapter)bicameral.link_commitclean on both commits — 0 drift, 0 pending checkssurrealdbvia theintegration_envfixture)Refs
Closes #243 (P0). Parent: #173 / PR #174. Plan signoff via issue-243 comment.
🤖 Generated with Claude Code