feat(preflight): expand region-anchored lookup via 1-hop code-graph traversal#174
Merged
Conversation
…raversal (#173) ## Why The headline product loop — *dev prompt contradicts a prior decision → bicameral surfaces it → refinement captured → PM ratifies* — silently breaks when the caller-LLM picks ``file_paths`` that don't exactly match the file the prior decision is anchored to. Observed on the e2e runs for #168 and #172: agent passes UI-layer files (``reorder.tsx``, ``commit-list.tsx``) for the "reorder feature"; the seeded decision is bound to the git-layer file (``reorder.ts``); region-anchored lookup ``WHERE file_path IN $fps`` returns zero hits; preflight reports ``fired=false, decisions=[]``; the PostToolUse hook (#154 / #168) doesn't fire its gate (``fired && decisions[]``); Step 5.6 reminder never reaches the model; no ``ingest(agent_session)``, no ``resolve_collision``. Flow 2a fails advisory but the deeper consequence is real-world contradictions go uncaptured. The server has full structural knowledge of the codebase — the ``code_locator`` package ships a ``symbols`` + ``edges`` graph with ``invokes`` / ``imports`` / ``inherits`` / ``contains`` edges, exposed via ``SymbolDB.get_ego_graph(symbol_id, hops=1)``. Until now, ``_region_anchored_preflight`` did exact-match-only lookup and ignored the graph. This PR closes the gap. ## What - ``adapters/code_locator.py::RealCodeLocatorAdapter.expand_file_paths_via_graph`` — new public method. Walks each caller-supplied file's symbols, fetches each symbol's 1-hop ego graph, collects neighbor file paths. Returns ``(expanded, added)`` so callers can mark provenance. Bounded per-symbol AND globally by ``CodeLocatorConfig.max_neighbors_per_result`` to prevent hub-file explosion. Falls back gracefully (returns input unchanged) when the symbol index is unavailable. - ``handlers/preflight.py::_region_anchored_preflight`` — calls the expander before the ``binds_to`` lookup. Decisions reached via caller-supplied paths get ``confidence=0.9``; decisions reached only via expansion get ``confidence=0.7`` so the caller can de-prioritize the latter without losing recall. Returns ``(matches, used_graph_expansion)`` so the caller can record ``"region_graph_expanded"`` in ``sources_chained`` when expansion contributed at least one hit beyond the direct-match set. - ``skills/bicameral-preflight/SKILL.md`` Step 2 — documents the new expansion behaviour and the ``confidence`` / ``sources_chained`` semantics so downstream skills can read them. - ``tests/test_preflight_graph_expansion.py`` — 8 tests: - 6 adapter unit tests (synthetic ``SymbolDB``): finds 1-hop neighbor, handles no-edge / empty / unindexed inputs, caps hub-file explosion, falls back when uninitialized. - 2 handler integration tests (real ledger, fake ``code_graph``): expansion-only surfaces the bound decision and tags ``region_graph_expanded`` in ``sources_chained``; direct pin alone does not get tagged as expanded. ## Plan / Audit / Seal Plan: trivial; risk:L1 (additive — no contract change to the MCP tool surface, no schema migration, falls back gracefully when index is absent). Implementation note in commit body above. ## Test plan - [x] ``ruff check . && ruff format --check .`` — clean. - [x] ``pytest tests/test_preflight_graph_expansion.py -v`` — 8/8. - [x] ``pytest tests/test_alpha_contract.py::test_preflight_surfaces_bound_decisions`` — direct-pin path still PASSes (no regression on the existing contract). - [x] ``pytest tests/test_preflight_hook.py tests/test_post_preflight_capture_hook.py tests/test_post_commit_sync_hook.py`` — 26/26 across the related hook suites. - [ ] CI: ``e2e assertions (auto)`` — Flow 2a expected to flip⚠️ FAIL → PASS now that preflight surfaces the bound decision under structural distance, the PostToolUse hook fires, and the agent has decision_ids to feed into ``ingest(agent_session)`` + ``resolve_collision``. Closes #173.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6 tasks
…ly, M6 eval Per @silongtan's review on #173 (cross-ref to the now-superseded #64 "Preflight: graph expansion for transitive relevance (M6)"), align PR #174 with that issue's design specifics: - **Edge-type filter**: narrow expansion to ``imports`` edges only. ``invokes`` / ``inherits`` / ``contains`` are symbol-level edges that over-broaden the file-level expansion contract. Imports is the file-level structural-dependency edge — matches the granularity of the region-anchored decision lookup. - **sources_chained tag**: rename ``"region_graph_expanded"`` → ``"graph"`` per #64. Sits cleanly alongside the existing ``"region"`` source tag. - **Input-seed cap**: ≤10 input seeds × ``max_neighbors_per_result`` neighbors per seed (#64's worst-case envelope). Prevents per-symbol walks from compounding when the caller passes a long ``file_paths`` list. - **M6 eval flips XFAIL → live**: ``tests/eval/preflight_dataset.jsonl`` M6 row now uses two new setup fields — ``region_decisions_pinned_to`` (path-aware decision lookup) and ``graph_neighbors`` (stub ``code_graph`` topology) — added to ``_apply_setup`` in ``run_preflight_eval.py``. The asserter now tests true graph-expansion semantics (mock returns the decision only when the expanded file_path is in the lookup set). - **Catalog flip**: ``docs/preflight-failure-scenarios.md`` M6 status goes ❌ → ✅ with a pointer to #173/#174. - **New regression test**: ``test_expander_filters_to_imports_only`` asserts that a neighbor reachable only via ``invokes`` / ``inherits`` does NOT appear in the expansion, locking the imports-only posture in. Test plan: - ``ruff check . && ruff format --check .`` — clean. - ``pytest tests/test_preflight_graph_expansion.py -v`` — 9/9 (added ``test_expander_filters_to_imports_only``). - ``pytest tests/eval/run_preflight_eval.py -v`` — 6 PASS + 3 XFAIL (M7 family, separate concern). M6 specifically flipped XFAIL → PASS. Deferred to follow-up (called out in the issue comment on #173): - ``preflight`` p95 baseline + 5ms perf assertion — needs the existing baseline harness to land first. - Splitting the implementation into a dedicated ``code_locator/tools/get_file_neighbors.py`` primitive vs the current method-on-adapter shape. Less load-bearing; revisit if a second consumer (e.g. drift detection) needs the same expansion.
…ming, no file paths
Two changes that make Flow 2 a real test of the contradiction-capture
loop instead of a partially-rigged one:
1. **Drop the "X is out, Y instead" framing.** The prior prompt told
the agent the drag-drop approach was being replaced ("drag-and-drop
is out", "no more drag-drop on this surface"). That short-circuits
the contradiction-detection step the loop is supposed to validate
— Step 5.6 fires when *the agent* notices the prompt conflicts
with a surfaced decision, not when the user pre-announces the
conflict. New prompt asks for a programmatic API for reordering
commits with no replacement framing at all. The agent has to:
a. read the surfaced "Drag-to-reorder commits, drag-to-squash, ..."
decision and notice the prior plan was a drag-drop UX.
b. read its own request (programmatic API, ordered SHA list,
callable from any UI surface) and notice that's structurally
incompatible with drag-drop.
c. classify the situation as a supersede / refinement and
execute Step 5.6.
2. **No file paths in the prompt.** The prior prompt named
``app/src/lib/git/reorder.ts`` directly. With #172's discovery-
first contract, the caller LLM is supposed to use Grep/Glob/Read
to identify files from a feature description on its own. Naming
the file in the prompt cheats the discovery step — it's the agent's
job to map "reordering commits" to ``reorder.ts``. Now the prompt
describes the feature only; discovery is genuinely on the agent.
If Flow 2a flips to PASS on this prompt, the end-to-end loop works
on a clean test of the product claim: dev asks for a feature → agent
discovers files → preflight surfaces the conflicting prior decision →
agent recognizes the conflict → captures the refinement. If it stays
advisory FAIL, that's an honest signal about which link is weakest
(usually the agent's contradiction-judgment pass).
Companion to PR #174 (#173 — graph expansion); kept on the same branch
because both pieces are part of validating the same loop end-to-end.
4 tasks
…175) Even after #154 / #168 / #172 / #173, the contradiction-capture loop fails on prompts where the conflict is *semantic* rather than *lexical*. Latest PR #174 e2e: - Test prompt: "Add a programmatic API for reordering commits — takes an ordered list of commit SHAs and rewrites history to match. Wire it so any UI surface can call it..." - Server-side chain works end-to-end. Agent discovers files via Read/ Grep/Glob, calls preflight with file_paths populated, server surfaces the bound "Drag-to-reorder commits, drag-to-squash, ..." decision, PostToolUse hook fires its gate, Step 5.6 reminder reaches the model. - Agent's bicameral sequence: ['bicameral_preflight', 'bicameral_resolve_compliance']. No ingest(agent_session). No resolve_collision. The agent reads the surfaced decision, reads the user's prompt, reasons "adding an API doesn't necessarily replace a UI — they could coexist", and skips capture. That's a defensible engineering inference; LLMs aren't reliably capturing structural-mismatch refinements. Per #175's design discussion, route the judgment to the user instead of the agent. Changes: - skills/bicameral-preflight/SKILL.md — Step 5.6 rewritten. Step 5.6.1 is the new AskUserQuestion disambiguation (supersede / keep_both / unrelated); Step 5.6.2 is the mechanical capture, gated on the user picking one of the "yes" options. The new step fires whenever fired=True && len(decisions) > 0, regardless of guided mode (capture is the headline product behavior, not opt-in). - scripts/hooks/post_preflight_capture_reminder.py — reminder text retemplated. Instead of "you MUST call ingest+resolve_collision now", the reminder embeds the AskUserQuestion shape with the surfaced decision_id + description filled in, plus the answer→action branch table. Docstring rewritten to capture the new contract + the archaeology of why we abandoned the unconditional approach. - tests/e2e/run_e2e_flows.py::assert_flow_2a — pass criterion changes from "ingest+resolve_collision fired" to "AskUserQuestion was invoked with disambiguation shape after preflight surfaced decisions." The user-side response can't be driven in headless claude -p, so the testable signal is the question invocation itself. Loose shape match on options labels (supersede / keep_both / unrelated / "refinement of" / decision: ID). - tests/test_post_preflight_capture_hook.py — assertions updated to lock in the new D-path posture. Test renamed: test_reminder_is_unconditional_not_judgment_gated → test_reminder_routes_judgment_to_user_not_agent. Negative assertions catch regression to either the prior unconditional shape ("you MUST capture") or the original conditional shape ("IF you contradict ..."). - CHANGELOG.md — Changed section entries for the skill rewrite and the asserter contract change. Trust contract preserved: hook still fires only when fired=True && len(decisions) > 0. Silent on no signal. The disambiguation question runs at a moment the flow is already paused (rendering the surfaced block). Headless-mode caveat: AskUserQuestion in claude -p has no human to answer, so the e2e asserter checks invocation only — Step 5.6.2's mechanical capture only fires after a real human answer in interactive sessions. Acceptable per #175's acceptance criteria. Closes #175.
jinhongkuan
added a commit
that referenced
this pull request
May 4, 2026
feat(preflight): expand region-anchored lookup via 1-hop code-graph traversal
This was referenced May 4, 2026
This was referenced May 7, 2026
This was referenced May 10, 2026
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
Bumps pyproject + RECOMMENDED_VERSION to 0.13.7 and resolves the stale git conflict markers that were committed into CHANGELOG.md by the previous `Merge branch 'main' into triage-from-dev` (c7d1274). v0.13.6 was bumped in pyproject on 2026-04-30 but never tagged or published to PyPI (latest published is v0.13.5; latest GitHub release is v0.13.5). v0.13.7 is the first release that ships everything merged into main since v0.13.5, including: - Preflight graph expansion + region anchored preflight (BicameralAI#173, BicameralAI#174) - Contradiction-capture flow via AskUserQuestion (BicameralAI#154, BicameralAI#175) - Preflight skill auto-fire fix on natural refactor prompts (BicameralAI#146) - SessionEnd hook re-entrancy + --auto-ingest (BicameralAI#147) - Post-preflight capture reminder hook (BicameralAI#168) - Flow1 asserter relax + flow2/2a split (BicameralAI#171) - v0 user flow e2e + demo recording carried over from dev (BicameralAI#165) - Lint-and-typecheck CI wired up; ruff format + fixes across 115 files See CHANGELOG.md for full details.
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
…Piece A) PR BicameralAI#174 closed the recall ceiling but introduced two silent fallback paths in `_region_anchored_preflight`: when `ctx.code_graph` was absent OR when the expander raised, the response shape was byte- identical to "expansion ran and matched zero" — caller couldn't tell recall was degraded. Three additive signals now surface every fallback (per Phase 2 spec posted on BicameralAI#243, all four open questions defaulted to recommended): 1. Response field — `sources_chained` includes `"graph_unavailable"`. Additive (never replaces existing `"region"` / `"graph"` tags). Bare tag — granular reason flows through telemetry, not the response shape, per signoff Q2. 2. Log level — exception case bumped from `logger.debug` → `logger.warning` with stable `[preflight:fallback]` substring + exception type for grep-friendly production logs. 3. Telemetry counter — new `preflight_telemetry.write_fallback_event( reason, session_id)` modeled on `write_ingest_refusal_event` (BicameralAI#216). Emits a `graph_expansion_fallback` row to the existing `~/.bicameral/preflight_events.jsonl` substrate. Reasons are a controlled enum: `"absent"`, `"missing_method"`, `"exception:<type>"`. Gated on `BICAMERAL_TELEMETRY=preflight`. The fallback case classifier in `_region_anchored_preflight` distinguishes three reasons (was conflated into a single `if expander is not None:` skip in the pre-BicameralAI#243 code): - `code_graph is None` → "absent" - `code_graph` set but no `expand_file_paths_via_graph` → "missing_method" - expander raised → "exception:<typ>" Skill update (`skills/bicameral-preflight/SKILL.md`) renders a one- line recall-degraded note to the agent when the tag is present: > Note: structural-neighbor lookup was unavailable this call — > recall may be reduced until the symbol index is rebuilt. Decisions > bound to files that import these may not have surfaced. Treats `"graph_unavailable"` as advisory: doesn't block the preflight surface; direct-pin matches are unaffected. Tests ----- 4 new cases in `tests/test_preflight_graph_expansion.py`: - test_preflight_fallback_absent_code_graph_tags_graph_unavailable — ctx with code_graph=None → response carries the tag, telemetry counter reason="absent" - test_preflight_fallback_expander_raises_warns_and_tags — stub expander raises RuntimeError → response carries the tag, `caplog` captures WARN-level log with `[preflight:fallback]` substring, telemetry counter reason="exception:RuntimeError" - test_preflight_successful_expansion_does_not_tag_graph_unavailable — regression guard: clean expansion path must NOT carry the tag (no false alarms) - test_preflight_empty_file_paths_does_not_tag_graph_unavailable — empty file_paths short-circuits before expansion check; the "expansion was never attempted" case is distinguishable from "attempted-and-fell-back" Existing tests use containment assertions (`"region" in sources_chained`) not exact list equality, so additive `"graph_ unavailable"` doesn't break them. What's NOT in this PR --------------------- Piece B (eager symbol-index initialization at server startup) is the follow-up commit on this branch. Lands separately so the response- shape change can ship without the adapter-lifecycle change. After both pieces land, the telemetry counter shipped here gives ongoing visibility into how often fallback engages in production. Refs BicameralAI#243 (parent BicameralAI#173 / PR BicameralAI#174). Plan signoff via BicameralAI#243 (comment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
…eralAI#243 Piece B) Pre-fix, the code-locator adapter had two cooperating problems that made silent fallback the default: 1. `get_code_locator()` returned a FRESH `RealCodeLocatorAdapter` per call. Caching was absent. 2. `_ensure_initialized()` was lazy — first tool call paid the index-build cost AND could race the index check on concurrent dispatch (e.g. preflight + bind landing in parallel after server boot). Together: every silent fallback in the production runtime was "hot" because the adapter was being rebuilt + rechecked on every call. Piece A (BicameralAI#283 commit 3c9730f) made the fallback loud at the response layer; Piece B closes the upstream cause. Three changes ------------- adapters/code_locator.py - Singleton-by-REPO_PATH cache via `_INSTANCE_CACHE: dict[str, RealCodeLocatorAdapter]`. Path resolved through `Path.resolve()` so symlink + relative-path callers cache-hit consistently. Multi-repo correctness preserved (any test that swaps REPO_PATH mid-process gets a fresh adapter for the new path). - New `reset_code_locator_cache()` test-only hook, mirroring `adapters.ledger.reset_ledger_singleton`. - New `async def RealCodeLocatorAdapter.initialize()` — wraps sync `_ensure_initialized()` in `loop.run_in_executor(None, ...)` so the cold-init path doesn't block the event loop. Idempotent on already-initialized adapters. server.py - `serve_stdio()` calls `await get_code_locator().initialize()` between the dashboard sidecar start and the consent-notice block. - **Fail-loud per BicameralAI#243 phase-2 signoff Q3** — explicit `except RuntimeError as exc:` re-raises after printing an actionable stderr message (`"Run: python -m code_locator index <repo>"`). The outer try/finally still runs the `SERVER_SHUTDOWN` audit emit, so operators get a clean event AND a clear actionable error. No more silent degradation. tests/test_preflight_graph_expansion.py — 4 new tests - test_get_code_locator_returns_same_instance_per_repo_path (singleton + reset behavior across two REPO_PATHs) - test_initialize_succeeds_when_index_present (idempotent on already-initialized adapter) - test_initialize_fails_loudly_when_index_empty (RuntimeError from `_ensure_initialized` propagates through the async wrapper — doesn't get swallowed) - test_serve_stdio_refuses_boot_on_empty_index (boot-path level: with everything else stubbed healthy, an empty index aborts `serve_stdio()` with the expected RuntimeError) Local smoke tests ----------------- - Singleton + reset_code_locator_cache: 4 assertions pass (cache hit on same path, distinct instance on new path, fresh after reset, second call after reset stays cached) - Async `initialize()`: re-raises RuntimeError on stubbed `_ensure_initialized` failure; idempotent no-op on already-initialized adapter - ruff check + ruff format --check + mypy all green on touched files What's NOT in this PR --------------------- Nothing — Piece A (commit 3c9730f) and Piece B (this commit) together close BicameralAI#243's full scope. PR will open with both pieces. Telemetry counter shipped in Piece A gives ongoing production visibility into how often fallback engages post-merge. Refs BicameralAI#243 (parent BicameralAI#173 / PR BicameralAI#174). Plan signoff via BicameralAI#243 (comment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_region_anchored_preflightnow does a 1-hop code-graph expansion over caller-suppliedfile_pathsbefore thebinds_tolookup, so a decision bound toapp/src/lib/git/reorder.tssurfaces when the caller passes the structurally-nearapp/src/ui/multi-commit-operation/reorder.tsx.confidence=0.9; expansion-only matches getconfidence=0.7so callers can de-prioritize the latter without losing recall.sources_chainedincludes"region_graph_expanded"when expansion contributed at least one new decision.Linked issues
Closes #173.
Refs #168 (PostToolUse hook on
bicameral_preflight) and #172 (caller-side discovery-first reminder) — this PR closes the recall ceiling those two PRs surfaced. Together they reconnect the contradiction-capture loop end-to-end.Plan / Audit / Seal
Plan: trivial; risk:L1.
The change is additive: no MCP tool contract change (caller sends the same
file_pathsshape), no schema migration, falls back gracefully (returns input unchanged) when the symbol index is unavailable. Hub-file explosion is bounded byCodeLocatorConfig.max_neighbors_per_resultper-symbol AND globally. Provenance is preserved on the response so downstream skills can decide whether to weight expanded matches differently.Test plan
ruff check .— clean (220 files).ruff format --check .— clean.pytest tests/test_preflight_graph_expansion.py -v— 8/8 PASS:SymbolDB(1-hop neighbor walk, no-edge case, empty input, unindexed file, hub-file cap, uninitialized fallback).code_graphstub (expansion adds path → decision surfaces +region_graph_expandedtag set; direct pin alone does NOT get tagged as expanded).pytest tests/test_alpha_contract.py::test_preflight_surfaces_bound_decisions— direct-pin path still PASSes; the new tuple unpacking in_region_anchored_preflightdoesn't break the existing contract.pytest tests/test_preflight_hook.py tests/test_post_preflight_capture_hook.py tests/test_post_commit_sync_hook.py— 26/26 across the surrounding hook suites.e2e assertions (auto)— Flow 2a expected to flipbicameral.preflight(topic, file_paths=[...])— UI files OK.ingest(source=agent_session)+resolve_collision.ruff + mypy,MCP Regression Suite(ubuntu + windows),Schema Persistence Smoke,TruffleHog,Preflight Failure-Mode Eval (advisory)— all expected to pass.Out of scope
bicameral.preflightrequest/response shape is byte-compatible; onlyconfidencevalues andsources_chainedset membership shift.Carry-over plan
Lands on
devfirst. Will cherry-pick totriage-from-devper the §10.5 lane convention so PR #165 picks up the fix before it merges to main.