fix(skill): preflight reminder allows discovery first, gates only writes by jinhongkuan · Pull Request #172 · BicameralAI/bicameral-mcp

jinhongkuan · 2026-05-04T00:07:30Z

Why

The UserPromptSubmit hook installed by #146 / #155 told the agent to call `bicameral.preflight` "Before invoking any file-inspection tool (Read, Grep, Bash, Glob)". That short-circuits the caller-LLM discovery the rest of the contract depends on:

`bicameral.preflight` uses `file_paths` for region-anchored `binds_to` lookup (the precision channel). Empty `file_paths` drops to fuzzy text-similarity over decision descriptions.
The user often names a feature ("the reorder feature") rather than a file (`reorder.ts`). The caller LLM has to do that mapping — it's the semantic half of "selection before generation."
But to do the mapping it needs Read / Grep / Glob, which the old reminder forbade.

Symptom on PR #168 / #165 e2e: agent fired preflight with empty `file_paths` because it had no chance to inspect the codebase first. Server returned weak / no surfaced decisions. Flow 2 asserter failed (`file_paths=[]`); Flow 2a cascaded (no surfaced decisions to capture from).

How this reconciles with #146

#146's actual failure mode was "agent does Glob/Read and then NEVER calls preflight at all." The fix needs to be "preflight is mandatory before writing", not "preflight is mandatory before reading."

The Flow 2 asserter (`assert_flow_2` in `run_e2e_flows.py`) has already been gating on the right line — see lines 763-766: "Read is deliberately allowed before/in-parallel-with preflight … Edit / Bash write-ops are the line; preflight must precede the first one." This PR aligns the hook reminder text with what the asserter already requires.

Updated contract in the reminder:

Read / Grep / Glob FIRST — caller LLM resolves "the X feature" to concrete file paths.
`bicameral.preflight(topic, file_paths)` — fed by step 1's discovery so the server gets the deterministic `binds_to` lookup it needs.
Write ops (Edit / Write / NotebookEdit / mutating Bash) — preflight must precede the first one.

What's in this PR

`scripts/hooks/preflight_reminder.py` — `REMINDER_TEXT` rewrite. Now spells out the recommended sequence (discover → preflight with populated `file_paths` → writes) and explicitly says "`file_paths=[]` defeats region-anchored retrieval." Docstring also updated to document the reconciliation with fix(skill): preflight does not auto-fire on natural refactor prompts in headless Claude Code sessions #146.
`skills/bicameral-preflight/SKILL.md` — Step 2 strengthened: "Discover first, then preflight." `file_paths` reframed as the precision channel; `omit` is now narrowly scoped to genuinely abstract queries with no file referent.
`tests/test_preflight_hook.py` — new `test_reminder_gates_writes_not_discovery` asserts the new posture (positive: "Read-only discovery FIRST", "BEFORE any write op"; negative: must NOT contain the old "before any file-inspection tool" phrasing).

What stays the same

Flow 2 asserter (`assert_flow_2`) is unchanged — it already gates writes, not reads.
UserPromptSubmit envelope shape (`hookSpecificOutput.additionalContext`) is unchanged.
Verb classifier (`should_fire_preflight` in `preflight_intent.py`) is unchanged.
The hook still fires on the same set of implementation-intent prompts.

Validation

`ruff check .` — clean.
`ruff format --check .` — clean.
`pytest tests/test_preflight_hook.py tests/test_preflight_intent.py -v` — 12/12 PASS.

Test plan

CI: `e2e assertions (auto)` — Flow 2 PASSes (preflight called with `reorder.ts` in `file_paths` after the agent grep/reads first).
CI: `ruff + mypy` passes (this PR doesn't touch the dev-side lint config).
CI: `MCP Regression Suite` passes.

Carry-over plan

Lands on `dev` first. After merge, will cherry-pick to `triage-from-dev` (PR #165) so the carry-over branch picks up the same reconciliation.

The UserPromptSubmit hook installed by #146/#155 told the agent to call bicameral.preflight "Before invoking any file-inspection tool (Read, Grep, Bash, Glob)". That short-circuited the caller-LLM discovery the rest of the contract depends on: - bicameral.preflight uses `file_paths` for region-anchored binds_to lookup (the precision channel). Empty file_paths drops to fuzzy text-similarity over decision descriptions. - The user often names a *feature* ("the reorder feature") rather than a *file* (`reorder.ts`). The caller LLM has to do that mapping — it's the semantic half of "selection before generation." - But to do the mapping it needs Read / Grep / Glob, which the old reminder forbade. Symptom on PR #168 / #165 e2e: agent fired preflight with empty file_paths because it had no chance to inspect the codebase first. Server returned weak / no surfaced decisions. Flow 2 asserter failed (file_paths=[]); Flow 2a cascaded (no surfaced decisions to capture from). Reconcile with #146 by gating on the right line: - Read / Grep / Glob FIRST (discovery — caller LLM resolves the user's request to concrete file paths). - bicameral.preflight(topic, file_paths) — fed by step 1. - Write ops (Edit / Write / NotebookEdit / mutating Bash) — preflight must precede the first one. This is the contract assert_flow_2 has *already* been gating; only the hook reminder was misaligned. Files: - scripts/hooks/preflight_reminder.py — REMINDER_TEXT rewrite + docstring documenting the reconciliation with #146 - skills/bicameral-preflight/SKILL.md — Step 2 strengthened: "Discover first, then preflight"; file_paths is the precision channel, omit only for genuinely abstract queries - tests/test_preflight_hook.py — new test_reminder_gates_writes_not_discovery asserts the new posture (positive: "Read-only discovery FIRST", "BEFORE any write op"; negative: must NOT contain the old "before any file-inspection tool" phrasing) The Flow 2 asserter is unchanged — it has always gated writes, not reads (see lines 763-766: "Read is deliberately allowed before/in- parallel-with preflight"). This PR aligns the hook reminder with what the asserter already requires.

coderabbitai · 2026-05-04T00:07:38Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f27a6147-4423-4d9d-9f86-a02bf4870b71

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/preflight-after-discovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ming, no file paths Two changes that make Flow 2 a real test of the contradiction-capture loop instead of a partially-rigged one: 1. **Drop the "X is out, Y instead" framing.** The prior prompt told the agent the drag-drop approach was being replaced ("drag-and-drop is out", "no more drag-drop on this surface"). That short-circuits the contradiction-detection step the loop is supposed to validate — Step 5.6 fires when *the agent* notices the prompt conflicts with a surfaced decision, not when the user pre-announces the conflict. New prompt asks for a programmatic API for reordering commits with no replacement framing at all. The agent has to: a. read the surfaced "Drag-to-reorder commits, drag-to-squash, ..." decision and notice the prior plan was a drag-drop UX. b. read its own request (programmatic API, ordered SHA list, callable from any UI surface) and notice that's structurally incompatible with drag-drop. c. classify the situation as a supersede / refinement and execute Step 5.6. 2. **No file paths in the prompt.** The prior prompt named ``app/src/lib/git/reorder.ts`` directly. With #172's discovery- first contract, the caller LLM is supposed to use Grep/Glob/Read to identify files from a feature description on its own. Naming the file in the prompt cheats the discovery step — it's the agent's job to map "reordering commits" to ``reorder.ts``. Now the prompt describes the feature only; discovery is genuinely on the agent. If Flow 2a flips to PASS on this prompt, the end-to-end loop works on a clean test of the product claim: dev asks for a feature → agent discovers files → preflight surfaces the conflicting prior decision → agent recognizes the conflict → captures the refinement. If it stays advisory FAIL, that's an honest signal about which link is weakest (usually the agent's contradiction-judgment pass). Companion to PR #174 (#173 — graph expansion); kept on the same branch because both pieces are part of validating the same loop end-to-end.

…175) Even after #154 / #168 / #172 / #173, the contradiction-capture loop fails on prompts where the conflict is *semantic* rather than *lexical*. Latest PR #174 e2e: - Test prompt: "Add a programmatic API for reordering commits — takes an ordered list of commit SHAs and rewrites history to match. Wire it so any UI surface can call it..." - Server-side chain works end-to-end. Agent discovers files via Read/ Grep/Glob, calls preflight with file_paths populated, server surfaces the bound "Drag-to-reorder commits, drag-to-squash, ..." decision, PostToolUse hook fires its gate, Step 5.6 reminder reaches the model. - Agent's bicameral sequence: ['bicameral_preflight', 'bicameral_resolve_compliance']. No ingest(agent_session). No resolve_collision. The agent reads the surfaced decision, reads the user's prompt, reasons "adding an API doesn't necessarily replace a UI — they could coexist", and skips capture. That's a defensible engineering inference; LLMs aren't reliably capturing structural-mismatch refinements. Per #175's design discussion, route the judgment to the user instead of the agent. Changes: - skills/bicameral-preflight/SKILL.md — Step 5.6 rewritten. Step 5.6.1 is the new AskUserQuestion disambiguation (supersede / keep_both / unrelated); Step 5.6.2 is the mechanical capture, gated on the user picking one of the "yes" options. The new step fires whenever fired=True && len(decisions) > 0, regardless of guided mode (capture is the headline product behavior, not opt-in). - scripts/hooks/post_preflight_capture_reminder.py — reminder text retemplated. Instead of "you MUST call ingest+resolve_collision now", the reminder embeds the AskUserQuestion shape with the surfaced decision_id + description filled in, plus the answer→action branch table. Docstring rewritten to capture the new contract + the archaeology of why we abandoned the unconditional approach. - tests/e2e/run_e2e_flows.py::assert_flow_2a — pass criterion changes from "ingest+resolve_collision fired" to "AskUserQuestion was invoked with disambiguation shape after preflight surfaced decisions." The user-side response can't be driven in headless claude -p, so the testable signal is the question invocation itself. Loose shape match on options labels (supersede / keep_both / unrelated / "refinement of" / decision: ID). - tests/test_post_preflight_capture_hook.py — assertions updated to lock in the new D-path posture. Test renamed: test_reminder_is_unconditional_not_judgment_gated → test_reminder_routes_judgment_to_user_not_agent. Negative assertions catch regression to either the prior unconditional shape ("you MUST capture") or the original conditional shape ("IF you contradict ..."). - CHANGELOG.md — Changed section entries for the skill rewrite and the asserter contract change. Trust contract preserved: hook still fires only when fired=True && len(decisions) > 0. Silent on no signal. The disambiguation question runs at a moment the flow is already paused (rendering the surfaced block). Headless-mode caveat: AskUserQuestion in claude -p has no human to answer, so the e2e asserter checks invocation only — Step 5.6.2's mechanical capture only fires after a real human answer in interactive sessions. Acceptable per #175's acceptance criteria. Closes #175.

fix(skill): preflight reminder allows discovery first, gates only writes

…raversal (BicameralAI#173) ## Why The headline product loop — *dev prompt contradicts a prior decision → bicameral surfaces it → refinement captured → PM ratifies* — silently breaks when the caller-LLM picks ``file_paths`` that don't exactly match the file the prior decision is anchored to. Observed on the e2e runs for BicameralAI#168 and BicameralAI#172: agent passes UI-layer files (``reorder.tsx``, ``commit-list.tsx``) for the "reorder feature"; the seeded decision is bound to the git-layer file (``reorder.ts``); region-anchored lookup ``WHERE file_path IN $fps`` returns zero hits; preflight reports ``fired=false, decisions=[]``; the PostToolUse hook (BicameralAI#154 / BicameralAI#168) doesn't fire its gate (``fired && decisions[]``); Step 5.6 reminder never reaches the model; no ``ingest(agent_session)``, no ``resolve_collision``. Flow 2a fails advisory but the deeper consequence is real-world contradictions go uncaptured. The server has full structural knowledge of the codebase — the ``code_locator`` package ships a ``symbols`` + ``edges`` graph with ``invokes`` / ``imports`` / ``inherits`` / ``contains`` edges, exposed via ``SymbolDB.get_ego_graph(symbol_id, hops=1)``. Until now, ``_region_anchored_preflight`` did exact-match-only lookup and ignored the graph. This PR closes the gap. ## What - ``adapters/code_locator.py::RealCodeLocatorAdapter.expand_file_paths_via_graph`` — new public method. Walks each caller-supplied file's symbols, fetches each symbol's 1-hop ego graph, collects neighbor file paths. Returns ``(expanded, added)`` so callers can mark provenance. Bounded per-symbol AND globally by ``CodeLocatorConfig.max_neighbors_per_result`` to prevent hub-file explosion. Falls back gracefully (returns input unchanged) when the symbol index is unavailable. - ``handlers/preflight.py::_region_anchored_preflight`` — calls the expander before the ``binds_to`` lookup. Decisions reached via caller-supplied paths get ``confidence=0.9``; decisions reached only via expansion get ``confidence=0.7`` so the caller can de-prioritize the latter without losing recall. Returns ``(matches, used_graph_expansion)`` so the caller can record ``"region_graph_expanded"`` in ``sources_chained`` when expansion contributed at least one hit beyond the direct-match set. - ``skills/bicameral-preflight/SKILL.md`` Step 2 — documents the new expansion behaviour and the ``confidence`` / ``sources_chained`` semantics so downstream skills can read them. - ``tests/test_preflight_graph_expansion.py`` — 8 tests: - 6 adapter unit tests (synthetic ``SymbolDB``): finds 1-hop neighbor, handles no-edge / empty / unindexed inputs, caps hub-file explosion, falls back when uninitialized. - 2 handler integration tests (real ledger, fake ``code_graph``): expansion-only surfaces the bound decision and tags ``region_graph_expanded`` in ``sources_chained``; direct pin alone does not get tagged as expanded. ## Plan / Audit / Seal Plan: trivial; risk:L1 (additive — no contract change to the MCP tool surface, no schema migration, falls back gracefully when index is absent). Implementation note in commit body above. ## Test plan - [x] ``ruff check . && ruff format --check .`` — clean. - [x] ``pytest tests/test_preflight_graph_expansion.py -v`` — 8/8. - [x] ``pytest tests/test_alpha_contract.py::test_preflight_surfaces_bound_decisions`` — direct-pin path still PASSes (no regression on the existing contract). - [x] ``pytest tests/test_preflight_hook.py tests/test_post_preflight_capture_hook.py tests/test_post_commit_sync_hook.py`` — 26/26 across the related hook suites. - [ ] CI: ``e2e assertions (auto)`` — Flow 2a expected to flip ⚠️ FAIL → PASS now that preflight surfaces the bound decision under structural distance, the PostToolUse hook fires, and the agent has decision_ids to feed into ``ingest(agent_session)`` + ``resolve_collision``. Closes BicameralAI#173.

jinhongkuan temporarily deployed to ci-test May 4, 2026 00:07 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 4, 2026 00:07 — with GitHub Actions Failure

jinhongkuan temporarily deployed to production May 4, 2026 00:07 — with GitHub Actions Inactive

jinhongkuan merged commit 235a684 into dev May 4, 2026
7 of 8 checks passed

jinhongkuan deleted the fix/preflight-after-discovery branch May 4, 2026 00:23

jinhongkuan mentioned this pull request May 4, 2026

feat(preflight): expand region-anchored lookup via 1-hop code-graph traversal #173

Closed

6 tasks

This was referenced May 4, 2026

M6 preflight handler retrieval: by-design split (handler structural, skill-layer covered by #306) #58

Closed

Preflight: graph expansion for transitive relevance (M6) #64

Closed

jinhongkuan mentioned this pull request May 4, 2026

feat(preflight): expand region-anchored lookup via 1-hop code-graph traversal #174

Merged

7 tasks

jinhongkuan mentioned this pull request May 4, 2026

feat(skill): user-disambiguation question before Step 5.6 contradiction capture #175

Closed

4 tasks

jinhongkuan added a commit that referenced this pull request May 4, 2026

Merge pull request #172 from BicameralAI/fix/preflight-after-discovery

b178e13

fix(skill): preflight reminder allows discovery first, gates only writes

jinhongkuan mentioned this pull request May 5, 2026

release: v0.13.7 (triage) #182

Merged

5 tasks

silongtan mentioned this pull request May 7, 2026

feat(preflight): eliminate silent graph-expansion fallbacks (#173 follow-up) #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skill): preflight reminder allows discovery first, gates only writes#172

fix(skill): preflight reminder allows discovery first, gates only writes#172
jinhongkuan merged 1 commit into
devfrom
fix/preflight-after-discovery

jinhongkuan commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinhongkuan commented May 4, 2026

Why

How this reconciles with #146

What's in this PR

What stays the same

Validation

Test plan

Carry-over plan

Uh oh!

coderabbitai Bot commented May 4, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant