Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,29 @@
All notable changes to bicameral-mcp are tracked here. Format loosely follows
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

### Added

- `handlers/preflight.py` — `_region_anchored_preflight` now expands caller-supplied `file_paths` by 1 hop along the code-locator graph's **import edges** before the `binds_to` lookup. Lifts the strict exact-match recall ceiling so a decision bound to `app/src/lib/git/reorder.ts` surfaces when the caller passes the structurally-near `app/src/ui/multi-commit-operation/reorder.tsx`. Decisions reached only via expansion carry `confidence=0.7` (vs `0.9` for direct pins). `sources_chained` includes `"graph"` (alongside `"region"`) when expansion contributed at least one hit. Bounded per #64: ≤10 input seeds × `max_neighbors_per_result` neighbors per seed. Closes #173 (and supersedes #64).
- `adapters/code_locator.py::RealCodeLocatorAdapter.expand_file_paths_via_graph` — public method backing the expansion. Filters to ``imports`` edges only (file-level structural dependency); ``invokes`` / ``inherits`` / ``contains`` are symbol-level edges that over-broaden the file-level expansion. Returns `(expanded, added)` so callers can mark provenance.
- `skills/bicameral-preflight/SKILL.md` Step 2 — documents the imports-only expansion + caller-side `confidence` and `sources_chained` semantics.
- `tests/eval/preflight_dataset.jsonl` — M6 row flipped from XFAIL → live. Setup updated to specify graph-neighbor topology (`graph_neighbors`) and pinned-decision targets (`region_decisions_pinned_to`); the asserter now tests true graph-expansion semantics rather than mock-returns-decision-regardless-of-input.
- `tests/eval/run_preflight_eval.py` — `_apply_setup` extended with `region_decisions_pinned_to` (path-aware decision lookup) and `graph_neighbors` (stub code_graph) so M6-style scenarios can be expressed in the dataset.

### Changed

- `skills/bicameral-preflight/SKILL.md` Step 5.6 — judgment for contradiction-capture moves from the agent to the user via `AskUserQuestion` (Step 5.6.1). The agent no longer infers whether the prompt contradicts a surfaced decision; it asks the user (`supersede` / `keep_both` / `unrelated`) and acts mechanically on the answer (Step 5.6.2 — ingest + resolve_collision). The PostToolUse hook reminder now templates the disambiguation question rather than the bare ingest+resolve_collision sequence. Closes #175.
- `tests/e2e/run_e2e_flows.py::assert_flow_2a` — pass criterion changed from "ingest+resolve_collision fired" to "`AskUserQuestion` invoked with disambiguation shape after preflight surfaced ≥1 decision." The user-side response can't be driven in headless `claude -p`, so the testable signal is the question invocation. The mechanical capture (Step 5.6.2) only fires after a human answers and is exercised in interactive Claude Code sessions, not CI.

### Changed

### Fixed

### Schema

### Security

## v0.18.0 -- event vocabulary extension: ratify + supersede (#97)

Extends the existing Phase 1 JSONL emitter with two new event types so the shipped vocabulary matches the v0 architecture description. Team-mode replay now restores ratify and supersede outcomes alongside the pre-existing ingest/bind/link_commit events.
Expand Down
110 changes: 110 additions & 0 deletions adapters/code_locator.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def _ensure_initialized(self) -> None:
)

self._db = db
self._config = config
self._validate_tool = ValidateSymbolsTool(db, config)
self._neighbors_tool = GetNeighborsTool(db, config)
self._initialized = True
Expand Down Expand Up @@ -90,6 +91,115 @@ def get_neighbors(self, symbol_id: int) -> list[dict]:
results = self._neighbors_tool.execute({"symbol_id": symbol_id})
return [r.model_dump() for r in results]

# Hard cap on the number of caller-supplied seeds we expand. Mirrors the
# contract documented in #64: ≤10 input seeds × ≤max_neighbors_per_result
# neighbors per seed, so the worst-case response is still bounded even
# when the caller passes a large file_paths list. Tightens the cost
# envelope vs the per-config-only cap. Tunable via the PR's #64 lineage
# if telemetry shows we're losing recall.
_MAX_EXPANSION_SEEDS = 10

def expand_file_paths_via_graph(
self,
file_paths: list[str],
hops: int = 1,
) -> tuple[list[str], list[str]]:
"""Expand caller-supplied file paths to include 1-hop *import* graph
neighbors.

For each input file, look up its indexed symbols, fetch each
symbol's 1-hop ego graph filtered to **import edges only**, and
collect the file paths those neighbor symbols live in. The expanded
set is the union of inputs and neighbor files.

**Why imports only** (per #64): import is a *file-level* structural
dependency edge ("module A's contract is referenced by module B"),
which matches the granularity of the region-anchored decision
lookup. ``invokes`` / ``inherits`` / ``contains`` are *symbol-level*
edges that broaden the expansion to "any file whose symbols are
used by my file's symbols," which over-fires for the recall
contract this method backs. If telemetry surfaces real-world
contradictions that imports-only misses, widen the filter then —
not preemptively.

Returns ``(expanded, added)`` where ``expanded`` is the deduped
union (preserving caller order for inputs, then appending
newly-discovered neighbor files) and ``added`` is the list of file
paths NOT in the original input — the caller uses this to mark
expanded matches with lower confidence than direct pins.

Bounds (mirrors #64's spec):
- At most ``_MAX_EXPANSION_SEEDS`` (=10) input seeds are walked.
- For each seed, at most ``max_neighbors_per_result`` symbols are
walked; for each symbol, at most ``max_neighbors_per_result``
neighbors are inspected.
- Global cap on the added set is the product so the worst-case
response is still bounded for hub seeds.
Falls back gracefully (returns input unchanged + empty added list)
on any exception or if the symbol index is unavailable.

Used by ``handlers/preflight.py::_region_anchored_preflight`` to
lift the strict ``WHERE file_path IN $fps`` recall ceiling so the
contradiction-capture loop fires even when the caller picks a
structurally-near-but-not-exact file. See issue #173 (and the
superseded #64 for the imports-only design rationale).
"""
if not file_paths or hops < 1:
return list(file_paths), []
try:
self._ensure_initialized()
except Exception:
return list(file_paths), []

per_symbol_cap = self._config.max_neighbors_per_result
# Cap total NEW paths added by expansion. With ≤10 seeds and
# ≤per_symbol_cap neighbors each, the worst case is bounded.
global_cap = max(per_symbol_cap, per_symbol_cap * self._MAX_EXPANSION_SEEDS)

# Cap the number of input seeds we expand from. Caller can still pass
# more file_paths to the underlying ledger lookup — we just don't
# blow up the graph walk.
seeds = [fp for fp in file_paths if fp][: self._MAX_EXPANSION_SEEDS]

original_set = {fp for fp in file_paths if fp}
added_paths: list[str] = []
added_set: set[str] = set()

for fp in seeds:
try:
symbols = self._db.lookup_by_file(fp) or []
except Exception:
continue
for sym in symbols[:per_symbol_cap]:
if len(added_paths) >= global_cap:
break
sym_id = sym["id"]
try:
neighbors = self._db.get_ego_graph(sym_id, hops=hops) or []
except Exception:
continue
for n in neighbors[:per_symbol_cap]:
if len(added_paths) >= global_cap:
break
if (n.get("edge_type") or "") != "imports":
continue
nfp = (n.get("file_path") or "").strip()
if not nfp or nfp in original_set or nfp in added_set:
continue
added_set.add(nfp)
added_paths.append(nfp)
if len(added_paths) >= global_cap:
break

# Preserve caller order for the input prefix; append newly-added in
# discovery order.
expanded: list[str] = []
for fp in file_paths:
if fp and fp not in expanded:
expanded.append(fp)
expanded.extend(added_paths)
return expanded, added_paths

def neighbors_for(
self,
file_path: str,
Expand Down
2 changes: 1 addition & 1 deletion docs/preflight-failure-scenarios.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Status legend:
| **M3** | skill | Internal acronym / jargon | Decision: *"Audit log captures every admin action..."* / Topic: `SOC2 compliance trail` | ⚪ |
| **M4** | skill | Ungrounded decision (no `binds_to`) — only surfaces if skill judges its feature group relevant from history | Decision (status=ungrounded): *"Permission checks always run server-side"* / Topic: `permission middleware client check` | ⚪ |
| **M5** | handler | Region-anchored miss — caller didn't pass `file_paths` | Topic: `update auth config` / `file_paths=[]` — handler returns no region matches; only HITL/guided can fire | ⚪ acknowledged caller responsibility; HITL still global |
| **M6** | handler | Transitive — decision pinned to a dependency of `file_paths` | Decision pinned to `auth/jwt.py` / `file_paths=["auth/login_handler.py"]` (imports `jwt`) | ❌ region lookup only sees the direct file |
| **M6** | handler | Transitive — decision pinned to a dependency of `file_paths` | Decision pinned to `auth/jwt.py` / `file_paths=["auth/login_handler.py"]` (imports `jwt`) | ✅ closed by #173/#174 — `_region_anchored_preflight` expands `file_paths` by 1 hop along import edges before the `binds_to` lookup; expansion-only matches surface with `confidence=0.7` and `sources_chained` adds `"graph"` |
| **M7** | handler | Dedup-key coarseness — current key is `(topic)`; same topic with changed `file_paths`, new HITL state, or a fresh ledger revision is silenced | (a) Topic re-asked after a relevant decision lands; (b) topic kept stable while `file_paths` shifts to a different region; (c) HITL condition resolves mid-window | ❌ open — broaden cache key to `(topic, normalized_file_paths, ledger_revision)` and invalidate on HITL change |
| **M8** | meta | Skill skips `bicameral.history()` despite non-empty ledger (skill-step adherence drift) | Caller LLM jumps straight to `bicameral.preflight` and never reads history | ⛔ skill-conformance, not handler-eval scope |
| **M9** | meta | `BICAMERAL_PREFLIGHT_MUTE` set, developer forgot it's on | Env var carried over from prior debug session | ⛔ intentional kill switch |
Expand Down
67 changes: 57 additions & 10 deletions handlers/preflight.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,16 +229,26 @@ def _check_dedup(ctx, topic: str) -> bool:
async def _region_anchored_preflight(
ctx,
file_paths: list[str],
) -> list[DecisionMatch]:
) -> tuple[list[DecisionMatch], bool]:
"""file_paths (caller-supplied) → decisions pinned to those regions.

The caller LLM is responsible for resolving which files a proposed change
will touch — preflight then looks up decisions pinned to those files in
the ledger. Returns DecisionMatch objects with confidence=0.9 (direct
pin, not keyword match).
the ledger. Before the lookup, run a 1-hop code-graph expansion via the
code-locator adapter (#173): caller-LLM discovery is imprecise, and a
decision bound to ``app/src/lib/git/reorder.ts`` should still surface
when the caller passes the structurally-near ``app/src/ui/multi-commit-
operation/reorder.tsx``. Expansion is deterministic, no LLM in the path,
bounded by ``code_locator/config.py::max_neighbors_per_result``.

Returns ``(matches, expanded)`` where ``expanded`` is True iff the graph
expansion produced extra paths beyond the caller-supplied set, so the
caller can record ``"graph"`` in ``sources_chained``. Direct-pin matches
carry ``confidence=0.9``; matches surfaced only via expanded paths carry
``confidence=0.7``.
"""
if not file_paths:
return []
return [], False

# Dedup + normalize while preserving caller-supplied order.
seen_paths: set[str] = set()
Expand All @@ -249,16 +259,34 @@ async def _region_anchored_preflight(
seen_paths.add(fp)
ordered.append(fp)
if not ordered:
return []
return [], False

# Graph expansion. Defensive: code_graph may be absent (mock contexts) or
# the adapter may not implement the method (older deployments). Either
# case falls back to direct file_paths only.
direct_paths: set[str] = set(ordered)
expanded_paths = list(ordered)
expanded_only_paths: set[str] = set()
code_graph = getattr(ctx, "code_graph", None)
expander = getattr(code_graph, "expand_file_paths_via_graph", None) if code_graph else None
if expander is not None:
try:
expanded_paths, added_paths = expander(ordered, hops=1)
expanded_only_paths = set(added_paths)
except Exception as exc:
logger.debug("[preflight:region] graph expansion failed: %s", exc)
expanded_paths = list(ordered)
expanded_only_paths = set()

try:
raw = await ctx.ledger.get_decisions_for_files(ordered)
raw = await ctx.ledger.get_decisions_for_files(expanded_paths)
except Exception as exc:
logger.debug("[preflight:region] ledger region lookup failed: %s", exc)
return []
return [], False

matches: list[DecisionMatch] = []
seen_ids: set[str] = set()
surfaced_via_expansion = False
for d in raw:
did = d.get("decision_id", "")
if did in seen_ids:
Expand All @@ -280,14 +308,31 @@ async def _region_anchored_preflight(
if status not in ("reflected", "drifted", "pending", "ungrounded"):
status = "ungrounded" if not regions else "pending"

# Provenance: a decision is "directly pinned" if any of its bound
# code_regions live in a caller-supplied path; otherwise it was only
# reached via 1-hop graph expansion. Caller can de-prioritize the
# latter (lower confidence) without losing recall.
bound_paths = {
(r.get("file_path") or "").strip()
for r in (d.get("code_regions") or [])
if r and (r.get("file_path") or "").strip()
}
# Single-region decisions also have a top-level ``code_region`` (used
# above); include it in the provenance check.
if region_dict and (region_dict.get("file_path") or "").strip():
bound_paths.add(region_dict["file_path"].strip())
is_direct = bool(bound_paths & direct_paths) if bound_paths else not expanded_only_paths
if not is_direct:
surfaced_via_expansion = True

_sf = d.get("signoff") or {}
matches.append(
DecisionMatch(
decision_id=d.get("decision_id", ""),
description=d.get("description", ""),
status=status,
signoff_state=(_sf.get("state") if isinstance(_sf, dict) else None),
confidence=0.9,
confidence=0.9 if is_direct else 0.7,
source_ref=d.get("source_ref", ""),
code_regions=regions,
drift_evidence="",
Expand All @@ -298,7 +343,7 @@ async def _region_anchored_preflight(
)
)

return matches
return matches, surfaced_via_expansion


async def handle_preflight(
Expand Down Expand Up @@ -380,9 +425,11 @@ async def handle_preflight(
region_matches: list[DecisionMatch] = []
if file_paths:
try:
region_matches = await _region_anchored_preflight(ctx, file_paths)
region_matches, used_graph_expansion = await _region_anchored_preflight(ctx, file_paths)
if region_matches:
sources_chained.append("region")
if used_graph_expansion:
sources_chained.append("graph")
except Exception as exc:
logger.debug("[preflight] region lookup failed: %s", exc)

Expand Down
Loading
Loading