Skip to content

fix(skill): preflight reminder allows discovery first, gates only writes#172

Merged
jinhongkuan merged 1 commit into
devfrom
fix/preflight-after-discovery
May 4, 2026
Merged

fix(skill): preflight reminder allows discovery first, gates only writes#172
jinhongkuan merged 1 commit into
devfrom
fix/preflight-after-discovery

Conversation

@jinhongkuan

Copy link
Copy Markdown
Contributor

Why

The UserPromptSubmit hook installed by #146 / #155 told the agent to call `bicameral.preflight` "Before invoking any file-inspection tool (Read, Grep, Bash, Glob)". That short-circuits the caller-LLM discovery the rest of the contract depends on:

  • `bicameral.preflight` uses `file_paths` for region-anchored `binds_to` lookup (the precision channel). Empty `file_paths` drops to fuzzy text-similarity over decision descriptions.
  • The user often names a feature ("the reorder feature") rather than a file (`reorder.ts`). The caller LLM has to do that mapping — it's the semantic half of "selection before generation."
  • But to do the mapping it needs Read / Grep / Glob, which the old reminder forbade.

Symptom on PR #168 / #165 e2e: agent fired preflight with empty `file_paths` because it had no chance to inspect the codebase first. Server returned weak / no surfaced decisions. Flow 2 asserter failed (`file_paths=[]`); Flow 2a cascaded (no surfaced decisions to capture from).

How this reconciles with #146

#146's actual failure mode was "agent does Glob/Read and then NEVER calls preflight at all." The fix needs to be "preflight is mandatory before writing", not "preflight is mandatory before reading."

The Flow 2 asserter (`assert_flow_2` in `run_e2e_flows.py`) has already been gating on the right line — see lines 763-766: "Read is deliberately allowed before/in-parallel-with preflight … Edit / Bash write-ops are the line; preflight must precede the first one." This PR aligns the hook reminder text with what the asserter already requires.

Updated contract in the reminder:

  1. Read / Grep / Glob FIRST — caller LLM resolves "the X feature" to concrete file paths.
  2. `bicameral.preflight(topic, file_paths)` — fed by step 1's discovery so the server gets the deterministic `binds_to` lookup it needs.
  3. Write ops (Edit / Write / NotebookEdit / mutating Bash) — preflight must precede the first one.

What's in this PR

  • `scripts/hooks/preflight_reminder.py` — `REMINDER_TEXT` rewrite. Now spells out the recommended sequence (discover → preflight with populated `file_paths` → writes) and explicitly says "`file_paths=[]` defeats region-anchored retrieval." Docstring also updated to document the reconciliation with fix(skill): preflight does not auto-fire on natural refactor prompts in headless Claude Code sessions #146.
  • `skills/bicameral-preflight/SKILL.md` — Step 2 strengthened: "Discover first, then preflight." `file_paths` reframed as the precision channel; `omit` is now narrowly scoped to genuinely abstract queries with no file referent.
  • `tests/test_preflight_hook.py` — new `test_reminder_gates_writes_not_discovery` asserts the new posture (positive: "Read-only discovery FIRST", "BEFORE any write op"; negative: must NOT contain the old "before any file-inspection tool" phrasing).

What stays the same

  • Flow 2 asserter (`assert_flow_2`) is unchanged — it already gates writes, not reads.
  • UserPromptSubmit envelope shape (`hookSpecificOutput.additionalContext`) is unchanged.
  • Verb classifier (`should_fire_preflight` in `preflight_intent.py`) is unchanged.
  • The hook still fires on the same set of implementation-intent prompts.

Validation

  • `ruff check .` — clean.
  • `ruff format --check .` — clean.
  • `pytest tests/test_preflight_hook.py tests/test_preflight_intent.py -v` — 12/12 PASS.

Test plan

  • CI: `e2e assertions (auto)` — Flow 2 PASSes (preflight called with `reorder.ts` in `file_paths` after the agent grep/reads first).
  • CI: `ruff + mypy` passes (this PR doesn't touch the dev-side lint config).
  • CI: `MCP Regression Suite` passes.

Carry-over plan

Lands on `dev` first. After merge, will cherry-pick to `triage-from-dev` (PR #165) so the carry-over branch picks up the same reconciliation.

The UserPromptSubmit hook installed by #146/#155 told the agent to call
bicameral.preflight "Before invoking any file-inspection tool (Read,
Grep, Bash, Glob)". That short-circuited the caller-LLM discovery the
rest of the contract depends on:

  - bicameral.preflight uses `file_paths` for region-anchored binds_to
    lookup (the precision channel). Empty file_paths drops to fuzzy
    text-similarity over decision descriptions.
  - The user often names a *feature* ("the reorder feature") rather
    than a *file* (`reorder.ts`). The caller LLM has to do that
    mapping — it's the semantic half of "selection before generation."
  - But to do the mapping it needs Read / Grep / Glob, which the old
    reminder forbade.

Symptom on PR #168 / #165 e2e: agent fired preflight with empty
file_paths because it had no chance to inspect the codebase first.
Server returned weak / no surfaced decisions. Flow 2 asserter failed
(file_paths=[]); Flow 2a cascaded (no surfaced decisions to capture
from).

Reconcile with #146 by gating on the right line:

  - Read / Grep / Glob FIRST (discovery — caller LLM resolves the
    user's request to concrete file paths).
  - bicameral.preflight(topic, file_paths) — fed by step 1.
  - Write ops (Edit / Write / NotebookEdit / mutating Bash) — preflight
    must precede the first one. This is the contract assert_flow_2
    has *already* been gating; only the hook reminder was misaligned.

Files:
- scripts/hooks/preflight_reminder.py — REMINDER_TEXT rewrite + docstring
  documenting the reconciliation with #146
- skills/bicameral-preflight/SKILL.md — Step 2 strengthened: "Discover
  first, then preflight"; file_paths is the precision channel, omit
  only for genuinely abstract queries
- tests/test_preflight_hook.py — new test_reminder_gates_writes_not_discovery
  asserts the new posture (positive: "Read-only discovery FIRST", "BEFORE
  any write op"; negative: must NOT contain the old "before any
  file-inspection tool" phrasing)

The Flow 2 asserter is unchanged — it has always gated writes, not
reads (see lines 763-766: "Read is deliberately allowed before/in-
parallel-with preflight"). This PR aligns the hook reminder with what
the asserter already requires.
@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f27a6147-4423-4d9d-9f86-a02bf4870b71

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/preflight-after-discovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan merged commit 235a684 into dev May 4, 2026
7 of 8 checks passed
@jinhongkuan jinhongkuan deleted the fix/preflight-after-discovery branch May 4, 2026 00:23
jinhongkuan added a commit that referenced this pull request May 4, 2026
…ming, no file paths

Two changes that make Flow 2 a real test of the contradiction-capture
loop instead of a partially-rigged one:

1. **Drop the "X is out, Y instead" framing.** The prior prompt told
   the agent the drag-drop approach was being replaced ("drag-and-drop
   is out", "no more drag-drop on this surface"). That short-circuits
   the contradiction-detection step the loop is supposed to validate
   — Step 5.6 fires when *the agent* notices the prompt conflicts
   with a surfaced decision, not when the user pre-announces the
   conflict. New prompt asks for a programmatic API for reordering
   commits with no replacement framing at all. The agent has to:
     a. read the surfaced "Drag-to-reorder commits, drag-to-squash, ..."
        decision and notice the prior plan was a drag-drop UX.
     b. read its own request (programmatic API, ordered SHA list,
        callable from any UI surface) and notice that's structurally
        incompatible with drag-drop.
     c. classify the situation as a supersede / refinement and
        execute Step 5.6.

2. **No file paths in the prompt.** The prior prompt named
   ``app/src/lib/git/reorder.ts`` directly. With #172's discovery-
   first contract, the caller LLM is supposed to use Grep/Glob/Read
   to identify files from a feature description on its own. Naming
   the file in the prompt cheats the discovery step — it's the agent's
   job to map "reordering commits" to ``reorder.ts``. Now the prompt
   describes the feature only; discovery is genuinely on the agent.

If Flow 2a flips to PASS on this prompt, the end-to-end loop works
on a clean test of the product claim: dev asks for a feature → agent
discovers files → preflight surfaces the conflicting prior decision →
agent recognizes the conflict → captures the refinement. If it stays
advisory FAIL, that's an honest signal about which link is weakest
(usually the agent's contradiction-judgment pass).

Companion to PR #174 (#173 — graph expansion); kept on the same branch
because both pieces are part of validating the same loop end-to-end.
jinhongkuan added a commit that referenced this pull request May 4, 2026
…175)

Even after #154 / #168 / #172 / #173, the contradiction-capture loop fails
on prompts where the conflict is *semantic* rather than *lexical*. Latest
PR #174 e2e:

- Test prompt: "Add a programmatic API for reordering commits — takes an
  ordered list of commit SHAs and rewrites history to match. Wire it so
  any UI surface can call it..."
- Server-side chain works end-to-end. Agent discovers files via Read/
  Grep/Glob, calls preflight with file_paths populated, server surfaces
  the bound "Drag-to-reorder commits, drag-to-squash, ..." decision,
  PostToolUse hook fires its gate, Step 5.6 reminder reaches the model.
- Agent's bicameral sequence: ['bicameral_preflight',
  'bicameral_resolve_compliance']. No ingest(agent_session). No
  resolve_collision. The agent reads the surfaced decision, reads the
  user's prompt, reasons "adding an API doesn't necessarily replace a
  UI — they could coexist", and skips capture.

That's a defensible engineering inference; LLMs aren't reliably capturing
structural-mismatch refinements. Per #175's design discussion, route the
judgment to the user instead of the agent.

Changes:

- skills/bicameral-preflight/SKILL.md — Step 5.6 rewritten. Step 5.6.1
  is the new AskUserQuestion disambiguation (supersede / keep_both /
  unrelated); Step 5.6.2 is the mechanical capture, gated on the user
  picking one of the "yes" options. The new step fires whenever
  fired=True && len(decisions) > 0, regardless of guided mode (capture
  is the headline product behavior, not opt-in).
- scripts/hooks/post_preflight_capture_reminder.py — reminder text
  retemplated. Instead of "you MUST call ingest+resolve_collision now",
  the reminder embeds the AskUserQuestion shape with the surfaced
  decision_id + description filled in, plus the answer→action branch
  table. Docstring rewritten to capture the new contract + the
  archaeology of why we abandoned the unconditional approach.
- tests/e2e/run_e2e_flows.py::assert_flow_2a — pass criterion changes
  from "ingest+resolve_collision fired" to "AskUserQuestion was invoked
  with disambiguation shape after preflight surfaced decisions." The
  user-side response can't be driven in headless claude -p, so the
  testable signal is the question invocation itself. Loose shape match
  on options labels (supersede / keep_both / unrelated / "refinement of"
  / decision: ID).
- tests/test_post_preflight_capture_hook.py — assertions updated to
  lock in the new D-path posture. Test renamed:
  test_reminder_is_unconditional_not_judgment_gated →
  test_reminder_routes_judgment_to_user_not_agent. Negative assertions
  catch regression to either the prior unconditional shape ("you MUST
  capture") or the original conditional shape ("IF you contradict ...").
- CHANGELOG.md — Changed section entries for the skill rewrite and the
  asserter contract change.

Trust contract preserved: hook still fires only when
fired=True && len(decisions) > 0. Silent on no signal. The disambiguation
question runs at a moment the flow is already paused (rendering the
surfaced block).

Headless-mode caveat: AskUserQuestion in claude -p has no human to
answer, so the e2e asserter checks invocation only — Step 5.6.2's
mechanical capture only fires after a real human answer in interactive
sessions. Acceptable per #175's acceptance criteria.

Closes #175.
jinhongkuan added a commit that referenced this pull request May 4, 2026
fix(skill): preflight reminder allows discovery first, gates only writes
@jinhongkuan jinhongkuan mentioned this pull request May 5, 2026
5 tasks
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
…raversal (BicameralAI#173)

## Why

The headline product loop — *dev prompt contradicts a prior decision →
bicameral surfaces it → refinement captured → PM ratifies* — silently
breaks when the caller-LLM picks ``file_paths`` that don't exactly
match the file the prior decision is anchored to. Observed on the e2e
runs for BicameralAI#168 and BicameralAI#172: agent passes UI-layer files (``reorder.tsx``,
``commit-list.tsx``) for the "reorder feature"; the seeded decision is
bound to the git-layer file (``reorder.ts``); region-anchored lookup
``WHERE file_path IN $fps`` returns zero hits; preflight reports
``fired=false, decisions=[]``; the PostToolUse hook (BicameralAI#154 / BicameralAI#168)
doesn't fire its gate (``fired && decisions[]``); Step 5.6 reminder
never reaches the model; no ``ingest(agent_session)``, no
``resolve_collision``. Flow 2a fails advisory but the deeper
consequence is real-world contradictions go uncaptured.

The server has full structural knowledge of the codebase — the
``code_locator`` package ships a ``symbols`` + ``edges`` graph with
``invokes`` / ``imports`` / ``inherits`` / ``contains`` edges, exposed
via ``SymbolDB.get_ego_graph(symbol_id, hops=1)``. Until now,
``_region_anchored_preflight`` did exact-match-only lookup and ignored
the graph. This PR closes the gap.

## What

- ``adapters/code_locator.py::RealCodeLocatorAdapter.expand_file_paths_via_graph``
  — new public method. Walks each caller-supplied file's symbols, fetches
  each symbol's 1-hop ego graph, collects neighbor file paths. Returns
  ``(expanded, added)`` so callers can mark provenance. Bounded per-symbol
  AND globally by ``CodeLocatorConfig.max_neighbors_per_result`` to
  prevent hub-file explosion. Falls back gracefully (returns input
  unchanged) when the symbol index is unavailable.
- ``handlers/preflight.py::_region_anchored_preflight`` — calls the
  expander before the ``binds_to`` lookup. Decisions reached via
  caller-supplied paths get ``confidence=0.9``; decisions reached only
  via expansion get ``confidence=0.7`` so the caller can de-prioritize
  the latter without losing recall. Returns
  ``(matches, used_graph_expansion)`` so the caller can record
  ``"region_graph_expanded"`` in ``sources_chained`` when expansion
  contributed at least one hit beyond the direct-match set.
- ``skills/bicameral-preflight/SKILL.md`` Step 2 — documents the new
  expansion behaviour and the ``confidence`` / ``sources_chained``
  semantics so downstream skills can read them.
- ``tests/test_preflight_graph_expansion.py`` — 8 tests:
  - 6 adapter unit tests (synthetic ``SymbolDB``): finds 1-hop neighbor,
    handles no-edge / empty / unindexed inputs, caps hub-file
    explosion, falls back when uninitialized.
  - 2 handler integration tests (real ledger, fake ``code_graph``):
    expansion-only surfaces the bound decision and tags
    ``region_graph_expanded`` in ``sources_chained``; direct pin alone
    does not get tagged as expanded.

## Plan / Audit / Seal

Plan: trivial; risk:L1 (additive — no contract change to the MCP tool
surface, no schema migration, falls back gracefully when index is
absent). Implementation note in commit body above.

## Test plan

- [x] ``ruff check . && ruff format --check .`` — clean.
- [x] ``pytest tests/test_preflight_graph_expansion.py -v`` — 8/8.
- [x] ``pytest tests/test_alpha_contract.py::test_preflight_surfaces_bound_decisions`` — direct-pin path still PASSes (no regression on the existing contract).
- [x] ``pytest tests/test_preflight_hook.py tests/test_post_preflight_capture_hook.py tests/test_post_commit_sync_hook.py`` — 26/26 across the related hook suites.
- [ ] CI: ``e2e assertions (auto)`` — Flow 2a expected to flip ⚠️ FAIL → PASS now that preflight surfaces the bound decision under structural distance, the PostToolUse hook fires, and the agent has decision_ids to feed into ``ingest(agent_session)`` + ``resolve_collision``.

Closes BicameralAI#173.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant