fix(skill): preflight reminder allows discovery first, gates only writes#172
Merged
Conversation
The UserPromptSubmit hook installed by #146/#155 told the agent to call bicameral.preflight "Before invoking any file-inspection tool (Read, Grep, Bash, Glob)". That short-circuited the caller-LLM discovery the rest of the contract depends on: - bicameral.preflight uses `file_paths` for region-anchored binds_to lookup (the precision channel). Empty file_paths drops to fuzzy text-similarity over decision descriptions. - The user often names a *feature* ("the reorder feature") rather than a *file* (`reorder.ts`). The caller LLM has to do that mapping — it's the semantic half of "selection before generation." - But to do the mapping it needs Read / Grep / Glob, which the old reminder forbade. Symptom on PR #168 / #165 e2e: agent fired preflight with empty file_paths because it had no chance to inspect the codebase first. Server returned weak / no surfaced decisions. Flow 2 asserter failed (file_paths=[]); Flow 2a cascaded (no surfaced decisions to capture from). Reconcile with #146 by gating on the right line: - Read / Grep / Glob FIRST (discovery — caller LLM resolves the user's request to concrete file paths). - bicameral.preflight(topic, file_paths) — fed by step 1. - Write ops (Edit / Write / NotebookEdit / mutating Bash) — preflight must precede the first one. This is the contract assert_flow_2 has *already* been gating; only the hook reminder was misaligned. Files: - scripts/hooks/preflight_reminder.py — REMINDER_TEXT rewrite + docstring documenting the reconciliation with #146 - skills/bicameral-preflight/SKILL.md — Step 2 strengthened: "Discover first, then preflight"; file_paths is the precision channel, omit only for genuinely abstract queries - tests/test_preflight_hook.py — new test_reminder_gates_writes_not_discovery asserts the new posture (positive: "Read-only discovery FIRST", "BEFORE any write op"; negative: must NOT contain the old "before any file-inspection tool" phrasing) The Flow 2 asserter is unchanged — it has always gated writes, not reads (see lines 763-766: "Read is deliberately allowed before/in- parallel-with preflight"). This PR aligns the hook reminder with what the asserter already requires.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6 tasks
This was referenced May 4, 2026
7 tasks
jinhongkuan
added a commit
that referenced
this pull request
May 4, 2026
…ming, no file paths
Two changes that make Flow 2 a real test of the contradiction-capture
loop instead of a partially-rigged one:
1. **Drop the "X is out, Y instead" framing.** The prior prompt told
the agent the drag-drop approach was being replaced ("drag-and-drop
is out", "no more drag-drop on this surface"). That short-circuits
the contradiction-detection step the loop is supposed to validate
— Step 5.6 fires when *the agent* notices the prompt conflicts
with a surfaced decision, not when the user pre-announces the
conflict. New prompt asks for a programmatic API for reordering
commits with no replacement framing at all. The agent has to:
a. read the surfaced "Drag-to-reorder commits, drag-to-squash, ..."
decision and notice the prior plan was a drag-drop UX.
b. read its own request (programmatic API, ordered SHA list,
callable from any UI surface) and notice that's structurally
incompatible with drag-drop.
c. classify the situation as a supersede / refinement and
execute Step 5.6.
2. **No file paths in the prompt.** The prior prompt named
``app/src/lib/git/reorder.ts`` directly. With #172's discovery-
first contract, the caller LLM is supposed to use Grep/Glob/Read
to identify files from a feature description on its own. Naming
the file in the prompt cheats the discovery step — it's the agent's
job to map "reordering commits" to ``reorder.ts``. Now the prompt
describes the feature only; discovery is genuinely on the agent.
If Flow 2a flips to PASS on this prompt, the end-to-end loop works
on a clean test of the product claim: dev asks for a feature → agent
discovers files → preflight surfaces the conflicting prior decision →
agent recognizes the conflict → captures the refinement. If it stays
advisory FAIL, that's an honest signal about which link is weakest
(usually the agent's contradiction-judgment pass).
Companion to PR #174 (#173 — graph expansion); kept on the same branch
because both pieces are part of validating the same loop end-to-end.
4 tasks
jinhongkuan
added a commit
that referenced
this pull request
May 4, 2026
…175) Even after #154 / #168 / #172 / #173, the contradiction-capture loop fails on prompts where the conflict is *semantic* rather than *lexical*. Latest PR #174 e2e: - Test prompt: "Add a programmatic API for reordering commits — takes an ordered list of commit SHAs and rewrites history to match. Wire it so any UI surface can call it..." - Server-side chain works end-to-end. Agent discovers files via Read/ Grep/Glob, calls preflight with file_paths populated, server surfaces the bound "Drag-to-reorder commits, drag-to-squash, ..." decision, PostToolUse hook fires its gate, Step 5.6 reminder reaches the model. - Agent's bicameral sequence: ['bicameral_preflight', 'bicameral_resolve_compliance']. No ingest(agent_session). No resolve_collision. The agent reads the surfaced decision, reads the user's prompt, reasons "adding an API doesn't necessarily replace a UI — they could coexist", and skips capture. That's a defensible engineering inference; LLMs aren't reliably capturing structural-mismatch refinements. Per #175's design discussion, route the judgment to the user instead of the agent. Changes: - skills/bicameral-preflight/SKILL.md — Step 5.6 rewritten. Step 5.6.1 is the new AskUserQuestion disambiguation (supersede / keep_both / unrelated); Step 5.6.2 is the mechanical capture, gated on the user picking one of the "yes" options. The new step fires whenever fired=True && len(decisions) > 0, regardless of guided mode (capture is the headline product behavior, not opt-in). - scripts/hooks/post_preflight_capture_reminder.py — reminder text retemplated. Instead of "you MUST call ingest+resolve_collision now", the reminder embeds the AskUserQuestion shape with the surfaced decision_id + description filled in, plus the answer→action branch table. Docstring rewritten to capture the new contract + the archaeology of why we abandoned the unconditional approach. - tests/e2e/run_e2e_flows.py::assert_flow_2a — pass criterion changes from "ingest+resolve_collision fired" to "AskUserQuestion was invoked with disambiguation shape after preflight surfaced decisions." The user-side response can't be driven in headless claude -p, so the testable signal is the question invocation itself. Loose shape match on options labels (supersede / keep_both / unrelated / "refinement of" / decision: ID). - tests/test_post_preflight_capture_hook.py — assertions updated to lock in the new D-path posture. Test renamed: test_reminder_is_unconditional_not_judgment_gated → test_reminder_routes_judgment_to_user_not_agent. Negative assertions catch regression to either the prior unconditional shape ("you MUST capture") or the original conditional shape ("IF you contradict ..."). - CHANGELOG.md — Changed section entries for the skill rewrite and the asserter contract change. Trust contract preserved: hook still fires only when fired=True && len(decisions) > 0. Silent on no signal. The disambiguation question runs at a moment the flow is already paused (rendering the surfaced block). Headless-mode caveat: AskUserQuestion in claude -p has no human to answer, so the e2e asserter checks invocation only — Step 5.6.2's mechanical capture only fires after a real human answer in interactive sessions. Acceptable per #175's acceptance criteria. Closes #175.
jinhongkuan
added a commit
that referenced
this pull request
May 4, 2026
fix(skill): preflight reminder allows discovery first, gates only writes
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
…raversal (BicameralAI#173) ## Why The headline product loop — *dev prompt contradicts a prior decision → bicameral surfaces it → refinement captured → PM ratifies* — silently breaks when the caller-LLM picks ``file_paths`` that don't exactly match the file the prior decision is anchored to. Observed on the e2e runs for BicameralAI#168 and BicameralAI#172: agent passes UI-layer files (``reorder.tsx``, ``commit-list.tsx``) for the "reorder feature"; the seeded decision is bound to the git-layer file (``reorder.ts``); region-anchored lookup ``WHERE file_path IN $fps`` returns zero hits; preflight reports ``fired=false, decisions=[]``; the PostToolUse hook (BicameralAI#154 / BicameralAI#168) doesn't fire its gate (``fired && decisions[]``); Step 5.6 reminder never reaches the model; no ``ingest(agent_session)``, no ``resolve_collision``. Flow 2a fails advisory but the deeper consequence is real-world contradictions go uncaptured. The server has full structural knowledge of the codebase — the ``code_locator`` package ships a ``symbols`` + ``edges`` graph with ``invokes`` / ``imports`` / ``inherits`` / ``contains`` edges, exposed via ``SymbolDB.get_ego_graph(symbol_id, hops=1)``. Until now, ``_region_anchored_preflight`` did exact-match-only lookup and ignored the graph. This PR closes the gap. ## What - ``adapters/code_locator.py::RealCodeLocatorAdapter.expand_file_paths_via_graph`` — new public method. Walks each caller-supplied file's symbols, fetches each symbol's 1-hop ego graph, collects neighbor file paths. Returns ``(expanded, added)`` so callers can mark provenance. Bounded per-symbol AND globally by ``CodeLocatorConfig.max_neighbors_per_result`` to prevent hub-file explosion. Falls back gracefully (returns input unchanged) when the symbol index is unavailable. - ``handlers/preflight.py::_region_anchored_preflight`` — calls the expander before the ``binds_to`` lookup. Decisions reached via caller-supplied paths get ``confidence=0.9``; decisions reached only via expansion get ``confidence=0.7`` so the caller can de-prioritize the latter without losing recall. Returns ``(matches, used_graph_expansion)`` so the caller can record ``"region_graph_expanded"`` in ``sources_chained`` when expansion contributed at least one hit beyond the direct-match set. - ``skills/bicameral-preflight/SKILL.md`` Step 2 — documents the new expansion behaviour and the ``confidence`` / ``sources_chained`` semantics so downstream skills can read them. - ``tests/test_preflight_graph_expansion.py`` — 8 tests: - 6 adapter unit tests (synthetic ``SymbolDB``): finds 1-hop neighbor, handles no-edge / empty / unindexed inputs, caps hub-file explosion, falls back when uninitialized. - 2 handler integration tests (real ledger, fake ``code_graph``): expansion-only surfaces the bound decision and tags ``region_graph_expanded`` in ``sources_chained``; direct pin alone does not get tagged as expanded. ## Plan / Audit / Seal Plan: trivial; risk:L1 (additive — no contract change to the MCP tool surface, no schema migration, falls back gracefully when index is absent). Implementation note in commit body above. ## Test plan - [x] ``ruff check . && ruff format --check .`` — clean. - [x] ``pytest tests/test_preflight_graph_expansion.py -v`` — 8/8. - [x] ``pytest tests/test_alpha_contract.py::test_preflight_surfaces_bound_decisions`` — direct-pin path still PASSes (no regression on the existing contract). - [x] ``pytest tests/test_preflight_hook.py tests/test_post_preflight_capture_hook.py tests/test_post_commit_sync_hook.py`` — 26/26 across the related hook suites. - [ ] CI: ``e2e assertions (auto)`` — Flow 2a expected to flip⚠️ FAIL → PASS now that preflight surfaces the bound decision under structural distance, the PostToolUse hook fires, and the agent has decision_ids to feed into ``ingest(agent_session)`` + ``resolve_collision``. Closes BicameralAI#173.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The UserPromptSubmit hook installed by #146 / #155 told the agent to call `bicameral.preflight` "Before invoking any file-inspection tool (Read, Grep, Bash, Glob)". That short-circuits the caller-LLM discovery the rest of the contract depends on:
Symptom on PR #168 / #165 e2e: agent fired preflight with empty `file_paths` because it had no chance to inspect the codebase first. Server returned weak / no surfaced decisions. Flow 2 asserter failed (`file_paths=[]`); Flow 2a cascaded (no surfaced decisions to capture from).
How this reconciles with #146
#146's actual failure mode was "agent does Glob/Read and then NEVER calls preflight at all." The fix needs to be "preflight is mandatory before writing", not "preflight is mandatory before reading."
The Flow 2 asserter (`assert_flow_2` in `run_e2e_flows.py`) has already been gating on the right line — see lines 763-766: "Read is deliberately allowed before/in-parallel-with preflight … Edit / Bash write-ops are the line; preflight must precede the first one." This PR aligns the hook reminder text with what the asserter already requires.
Updated contract in the reminder:
What's in this PR
What stays the same
Validation
Test plan
Carry-over plan
Lands on `dev` first. After merge, will cherry-pick to `triage-from-dev` (PR #165) so the carry-over branch picks up the same reconciliation.