From 8069c08297f9979fc373c7df88093fa3139c341f Mon Sep 17 00:00:00 2001 From: jinhongkuan Date: Tue, 14 Apr 2026 01:16:50 -0400 Subject: [PATCH 1/2] =?UTF-8?q?chore:=20bump=20to=20v0.4.5=20=E2=80=94=20L?= =?UTF-8?q?1=20drift=20wiring=20fix?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes the two wiring bugs that made `pending → reflected` a no-op for freshly-ingested decisions: 1. ingest_payload now resolves HEAD from repo_path when no commit_hash is in the payload, computes a baseline content_hash for every grounded region, and derives the intent's initial status from that hash instead of hardcoding "pending". 2. handle_link_commit runs a repo-scoped backfill sweep that walks empty-hash regions and hands them to HashDriftAnalyzer, which now self-heals a missing baseline by adopting the current git state. Legacy ledgers self-heal on the next bicameral_status / link_commit call — no forced migration. Intent status aggregation picks the loudest value across regions (drifted > reflected > pending > ungrounded) so a single drifted region always raises an alarm. New regression test: tests/test_phase1_l1_wiring.py (4 cases covering ingest→reflected, edit→drifted, phantom range, and legacy backfill). Live-verified against Accountable-App-3.0: 8/8 decisions reflected after bulk Slack ingest, vs the prior 27/27 pending. Co-Authored-By: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 79 +++++++++ handlers/link_commit.py | 11 ++ ledger/adapter.py | 133 ++++++++++++++- ledger/drift.py | 13 ++ ledger/queries.py | 29 ++++ pyproject.toml | 2 +- tests/test_phase1_l1_wiring.py | 294 +++++++++++++++++++++++++++++++++ 7 files changed, 553 insertions(+), 8 deletions(-) create mode 100644 CHANGELOG.md create mode 100644 tests/test_phase1_l1_wiring.py diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 00000000..76b2f991 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,79 @@ +# Changelog + +All notable changes to bicameral-mcp are tracked here. Format loosely follows +[Keep a Changelog](https://keepachangelog.com/en/1.1.0/). + +## 0.4.5 — 2026-04-14 + +### Fixed + +- **Ingest now stamps a baseline `content_hash` at HEAD for every grounded + region.** Previously `ingest_payload` only computed a hash when the caller + explicitly passed `commit_hash` in the payload, which the MCP `bicameral_ingest` + handler never did — so every bulk transcript ingest persisted empty hashes + and decisions were permanently stuck in `pending`. Now `ingest_payload` + resolves HEAD from `repo_path` when no commit_hash is supplied, computes a + baseline hash for every region, and derives the intent's initial status from + that hash. Freshly ingested decisions are born `reflected` when their + grounded code exists at HEAD. +- **Empty-hash regions from older ledgers are now self-healed.** `handle_link_commit` + runs a repo-scoped backfill sweep before the normal drift loop, walking any + code regions with an empty `content_hash` and handing them to + `HashDriftAnalyzer`, which adopts the current git state as the baseline and + flips the owning intents to `reflected`. No forced migration, no new tooling — + just call `bicameral_status` or any other handler that triggers a + `link_commit` and legacy ledgers heal themselves. +- **`HashDriftAnalyzer.analyze_region` self-heals missing baselines.** When + `stored_hash == ""` and the analyzer can compute a real hash at the requested + ref, it returns `reflected` with the new hash as the baseline instead of the + old `ungrounded` verdict. Used by both the new backfill path and the regular + drift sweep. + +### Added + +- Per-region status aggregation on intents: an intent with multiple code + regions now adopts the loudest status across them (drifted > reflected > + pending > ungrounded), so a single drifted region always raises an alarm + even if other regions still reflect. +- `SurrealDBLedgerAdapter.backfill_empty_hashes(repo_path, drift_analyzer=...)` + — public method to run the backfill sweep on demand. Idempotent and scoped + by repo, so multi-repo SurrealDB instances stay isolated. +- `ledger.queries.get_regions_without_hash(client, repo="")` — helper query + used by the backfill sweep to find legacy regions. +- New test module `tests/test_phase1_l1_wiring.py` with four regression + scenarios: ingest→reflected, edit→drifted, phantom range→not reflected, + and legacy empty-hash backfill→reflected. + +### Migration + +No manual action required. Existing ledgers backfill themselves on the next +`bicameral_status`, `bicameral_link_commit`, or any other tool that drives a +`link_commit`. Users whose files aren't touched by subsequent commits can +also simply re-run their original bulk ingest — v0.4.5 stamps hashes at ingest +time, so re-ingestion produces correct status for every grounded decision +without any further work. + +--- + +## 0.4.4 — 2026-04-13 + +- Submodule bump for grounding reuse + coverage loop (Phase 3 of the + code-locator drift fix plan). + +## 0.4.3 — 2026-04-12 + +- Few-shot `bicameral-ingest` skill update (`ff2eff7`). + +## 0.4.2 — 2026-04-11 + +- Skills bundle + CLAUDE.md context files. + +## 0.4.1 — 2026-04-10 + +- Ingest pipeline hardening: input contracts, payload normalization, freshness + guards. + +## 0.4.0 — 2026-04-08 + +- Event-sourced collaboration + BicameralContext request-scoped snapshot + isolation. diff --git a/handlers/link_commit.py b/handlers/link_commit.py index 168f8ec8..446be527 100644 --- a/handlers/link_commit.py +++ b/handlers/link_commit.py @@ -69,6 +69,17 @@ async def _reground_ungrounded(ctx) -> int: async def handle_link_commit(ctx, commit_hash: str = "HEAD") -> LinkCommitResponse: + # Self-heal legacy regions with empty content_hash from pre-v0.4.5 + # ingests. Scoped to ctx.repo_path so multi-repo SurrealDB instances + # stay isolated; no-op once every region in this repo has a baseline. + try: + if hasattr(ctx.ledger, "backfill_empty_hashes"): + await ctx.ledger.backfill_empty_hashes( + ctx.repo_path, drift_analyzer=ctx.drift_analyzer, + ) + except Exception as exc: + logger.warning("[link_commit] backfill failed: %s", exc) + result = await ctx.ledger.ingest_commit( commit_hash, ctx.repo_path, drift_analyzer=ctx.drift_analyzer, ) diff --git a/ledger/adapter.py b/ledger/adapter.py index 2a58ee32..b3c6f59d 100644 --- a/ledger/adapter.py +++ b/ledger/adapter.py @@ -20,6 +20,7 @@ get_all_decisions, get_decisions_for_file, get_regions_for_files, + get_regions_without_hash, get_source_cursor, get_sync_state, get_undocumented_symbols, @@ -39,7 +40,7 @@ upsert_sync_state, ) from .schema import init_schema, migrate -from .status import compute_content_hash, get_changed_files, resolve_head +from .status import compute_content_hash, derive_status, get_changed_files, resolve_head logger = logging.getLogger(__name__) @@ -51,6 +52,24 @@ def _default_db_url() -> str: return f"surrealkv://{db_path}" +# Priority is "loudest wins" — any region drifting flags the whole intent as +# drifted so users see the alarm, even if other regions still reflect. +_STATUS_PRIORITY = {"drifted": 3, "reflected": 2, "pending": 1, "ungrounded": 0} + + +def _aggregate_intent_status(region_statuses: list[str]) -> str: + """Collapse per-region statuses to a single intent status. + + drifted > reflected > pending > ungrounded. A multi-region intent is + drifted if any of its regions drifted; reflected if all surviving + regions reflect; pending if any region is waiting on code that doesn't + exist yet; else ungrounded. + """ + if not region_statuses: + return "ungrounded" + return max(region_statuses, key=lambda s: _STATUS_PRIORITY.get(s, -1)) + + class SurrealDBLedgerAdapter: """Real SurrealDB-backed ledger adapter. @@ -265,6 +284,83 @@ async def ingest_commit(self, commit_hash: str, repo_path: str, drift_analyzer=N "undocumented_symbols": list(set(undocumented_symbols)), } + async def backfill_empty_hashes( + self, + repo_path: str, + drift_analyzer=None, + ) -> dict: + """Self-heal pre-v0.4.5 regions that were persisted with an empty + content_hash. Walks every code_region for ``repo_path`` that has no + stored hash, runs the configured drift analyzer (which, for + HashDriftAnalyzer, adopts the current source as the baseline and + returns reflected), and updates the region + its linked intents. + + Idempotent and scoped: regions already carrying a hash are ignored, + and regions belonging to other repos are left alone. Safe to call + on every link_commit — once every region is stamped, the query + returns an empty set and the sweep is a no-op. + """ + await self._ensure_connected() + + if drift_analyzer is None: + from .drift import HashDriftAnalyzer + drift_analyzer = HashDriftAnalyzer() + + legacy = await get_regions_without_hash(self._client, repo=repo_path) + if not legacy: + return {"healed": 0, "failed": 0} + + healed = 0 + failed = 0 + # Use HEAD as the backfill ref — that's "what the code looks like now," + # which is the only meaningful baseline when no prior hash exists. + ref = resolve_head(repo_path) or "HEAD" + + for region in legacy: + region_id = region.get("region_id", "") + file_path = region.get("file_path", "") + symbol_name = region.get("symbol_name", "") + start_line = region.get("start_line", 0) + end_line = region.get("end_line", 0) + if not region_id or not file_path or not symbol_name: + failed += 1 + continue + + drift_result = await drift_analyzer.analyze_region( + file_path=file_path, + symbol_name=symbol_name, + start_line=start_line, + end_line=end_line, + stored_hash="", + repo_path=repo_path, + ref=ref, + source_context="", + ) + + # Only persist heals that produced a real baseline. If compute + # failed (file/range missing at ref), we leave the region alone + # so a future code move can still find it. + if not drift_result.content_hash: + failed += 1 + continue + + await update_region_hash(self._client, region_id, drift_result.content_hash, ref) + new_status = drift_result.status + for intent in (region.get("intents") or []): + if intent is None: + continue + intent_id = str(intent.get("id", "")) + if intent_id: + await update_intent_status(self._client, intent_id, new_status) + healed += 1 + + if healed or failed: + logger.info( + "[backfill] repo=%s healed=%d failed=%d", + repo_path, healed, failed, + ) + return {"healed": healed, "failed": failed} + # ── Extended: ingestion of CodeLocatorPayload ───────────────────────── async def ingest_payload(self, payload: dict) -> dict: @@ -277,10 +373,15 @@ async def ingest_payload(self, payload: dict) -> dict: repo = payload.get("repo", "") commit_hash = payload.get("commit_hash", "") + # Resolve HEAD once per ingest so every region hashes against the + # same baseline ref. Without this, bulk ingests that don't carry a + # commit_hash stamped empty hashes and decisions were born pending. + effective_ref = commit_hash or resolve_head(repo) or "HEAD" intents_created = 0 symbols_mapped = 0 regions_linked = 0 ungrounded = [] + region_ids: list[str] = [] for mapping in payload.get("mappings", []): span = mapping.get("span", {}) @@ -324,6 +425,9 @@ async def ingest_payload(self, payload: dict) -> dict: ungrounded.append(description) continue + # Track per-region derived status so we can aggregate up to the intent. + region_statuses: list[str] = [] + for region_data in code_regions: symbol_name = region_data.get("symbol", "") file_path = region_data.get("file_path", "") @@ -331,13 +435,15 @@ async def ingest_payload(self, payload: dict) -> dict: if not symbol_name or not file_path: continue - # Compute content hash at commit time + # Compute content hash at the effective ref. Always — no gate. + # When repo is unset or the file/range isn't in git, this + # returns None and the region stays pending via derive_status. start_line = region_data.get("start_line", 0) end_line = region_data.get("end_line", 0) content_hash = "" - if commit_hash and repo: + if repo: content_hash = compute_content_hash( - file_path, start_line, end_line, repo, ref=commit_hash + file_path, start_line, end_line, repo, ref=effective_ref ) or "" # Create / update symbol node @@ -365,6 +471,14 @@ async def ingest_payload(self, payload: dict) -> dict: if not region_id: continue regions_linked += 1 + region_ids.append(region_id) + + # Baseline == actual at ingest time, so derive_status returns + # "reflected" when the hash was computable, "ungrounded" when + # the file/range isn't in git yet. + region_statuses.append( + derive_status(content_hash, content_hash if content_hash else None) + ) # intent → symbol → code_region edges provenance = {} @@ -377,9 +491,13 @@ async def ingest_payload(self, payload: dict) -> dict: ) await relate_implements(self._client, symbol_id, region_id) - # Update intent status to pending (has regions now) - if intent_id and code_regions: - await update_intent_status(self._client, intent_id, "pending") + # Aggregate region statuses up to the intent. An intent is + # drifted if any region drifted; else reflected if any reflect; + # else pending if any pending; else ungrounded (all regions + # failed to link or no region could be hashed against HEAD). + if intent_id: + aggregated = _aggregate_intent_status(region_statuses) + await update_intent_status(self._client, intent_id, aggregated) return { "ingested": True, @@ -391,6 +509,7 @@ async def ingest_payload(self, payload: dict) -> dict: "ungrounded": len(ungrounded), }, "ungrounded_intents": ungrounded, + "region_ids": region_ids, } async def get_source_cursor( diff --git a/ledger/drift.py b/ledger/drift.py index 5fa50775..6adfae07 100644 --- a/ledger/drift.py +++ b/ledger/drift.py @@ -42,6 +42,19 @@ async def analyze_region( file_path, start_line, end_line, repo_path, ref=ref ) + # Self-heal legacy regions that were persisted before v0.4.5's + # baseline-stamping fix. If we have no stored hash but the code + # exists at ref, adopt actual_hash as the baseline and report + # reflected. Without this, regions ingested pre-v0.4.5 stay + # permanently pending/ungrounded even after reindex. + if not stored_hash and actual_hash is not None: + return DriftResult( + status="reflected", + content_hash=actual_hash, + confidence=1.0, + explanation="", + ) + status = derive_status(stored_hash, actual_hash) new_hash = actual_hash or stored_hash diff --git a/ledger/queries.py b/ledger/queries.py index 3328839f..3e4cdfd6 100644 --- a/ledger/queries.py +++ b/ledger/queries.py @@ -602,6 +602,35 @@ async def get_regions_for_files( return rows +async def get_regions_without_hash( + client: LedgerClient, + repo: str = "", +) -> list[dict]: + """Return regions whose content_hash has never been stamped. + + Used by the backfill sweep in ingest_commit to self-heal legacy regions + from pre-v0.4.5 ledgers where ingest skipped hash computation. Filters + in Python rather than SurrealQL to avoid v2-vs-v3 NONE/NULL syntax drift. + + When ``repo`` is provided, only regions belonging to that repo are + returned — prevents backfill noise from unrelated ledgers in the same + SurrealDB database (common during multi-fixture test runs). + """ + rows = await client.query( + """ + SELECT + type::string(id) AS region_id, + file_path, symbol_name, start_line, end_line, content_hash, repo, + <-implements<-symbol<-maps_to<-intent.{id, status, description} AS intents + FROM code_region + """, + ) + filtered = [r for r in (rows or []) if not r.get("content_hash")] + if repo: + filtered = [r for r in filtered if str(r.get("repo", "")) == repo] + return filtered + + # ── Helpers ─────────────────────────────────────────────────────────────── diff --git a/pyproject.toml b/pyproject.toml index c7eda575..04acf03e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "hatchling.build" [project] name = "bicameral-mcp" -version = "0.4.4" +version = "0.4.5" description = "Decision ledger MCP server — ingests meeting transcripts, maps decisions to code, tracks drift" readme = "README.md" requires-python = ">=3.10" diff --git a/tests/test_phase1_l1_wiring.py b/tests/test_phase1_l1_wiring.py new file mode 100644 index 00000000..86cafe82 --- /dev/null +++ b/tests/test_phase1_l1_wiring.py @@ -0,0 +1,294 @@ +"""L1 drift ladder wiring regression tests (bicameral-mcp v0.4.5). + +These lock in the two fixes that make `pending → reflected` actually work: + +1. **Baseline hash stamping at ingest** — `ingest_payload` resolves HEAD and + computes a `content_hash` for every grounded region, then derives the + intent's initial status. Before v0.4.5 an unconditional `"pending"` was + persisted, and without a baseline hash the subsequent `link_commit` + sweep could never mark the decision reflected or drifted. + +2. **Empty-hash backfill sweep** — `handle_link_commit` walks regions with + empty `content_hash` scoped to the active repo and self-heals them via + `HashDriftAnalyzer`, which adopts the current git state as the baseline. + This rescues ledgers ingested on older versions. + +The tests use a throwaway git repo under tmp_path so they don't depend on +HEAD of the bicameral repo staying stable. +""" + +from __future__ import annotations + +import os +import subprocess +from pathlib import Path +from textwrap import dedent + +import pytest + +from adapters.ledger import get_ledger, reset_ledger_singleton +from context import BicameralContext +from handlers.decision_status import handle_decision_status +from handlers.link_commit import handle_link_commit + + +# ── Tiny git repo fixture ───────────────────────────────────────────── + + +def _git(cwd: Path, *args: str) -> str: + result = subprocess.run( + ["git", *args], + cwd=cwd, + capture_output=True, + text=True, + check=True, + ) + return result.stdout.strip() + + +def _write(path: Path, body: str) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(dedent(body).lstrip("\n")) + + +def _seed_repo(root: Path, initial_body: str) -> str: + """Create a git repo at ``root`` with a single python file containing + a ``calculate_discount`` function. Returns the initial commit SHA.""" + root.mkdir(parents=True, exist_ok=True) + _git(root, "init", "-q", "-b", "main") + _git(root, "config", "user.email", "test@example.com") + _git(root, "config", "user.name", "Test") + _write(root / "pricing.py", initial_body) + _git(root, "add", "pricing.py") + _git(root, "-c", "commit.gpgsign=false", "commit", "-q", "-m", "seed") + return _git(root, "rev-parse", "HEAD") + + +def _commit_edit(root: Path, new_body: str, msg: str) -> str: + _write(root / "pricing.py", new_body) + _git(root, "add", "pricing.py") + _git(root, "-c", "commit.gpgsign=false", "commit", "-q", "-m", msg) + return _git(root, "rev-parse", "HEAD") + + +@pytest.fixture(autouse=True) +def _isolated_ledger(monkeypatch, tmp_path): + """Fresh in-memory ledger + tmp git repo per test.""" + monkeypatch.setenv("USE_REAL_LEDGER", "1") + monkeypatch.setenv("SURREAL_URL", "memory://") + repo_root = tmp_path / "repo" + _seed_repo( + repo_root, + """ + def calculate_discount(order_total): + if order_total >= 100: + return order_total * 0.10 + return 0 + """, + ) + monkeypatch.setenv("REPO_PATH", str(repo_root)) + monkeypatch.chdir(repo_root) + reset_ledger_singleton() + yield repo_root + reset_ledger_singleton() + + +def _payload_for(symbol: str, intent: str, file_path: str, start: int, end: int, repo: str) -> dict: + """Build a minimal ingest payload with pre-resolved code regions. + + We skip auto-grounding so these tests stay laser-focused on the drift + wiring — the regions are handed in directly. + """ + return { + "query": intent, + "repo": repo, + "analyzed_at": "2026-04-14T00:00:00Z", + "mappings": [ + { + "span": { + "span_id": "p1-0", + "source_type": "transcript", + "text": intent, + "source_ref": "phase1-test", + }, + "intent": intent, + "symbols": [symbol], + "code_regions": [ + { + "file_path": file_path, + "symbol": symbol, + "type": "function", + "start_line": start, + "end_line": end, + "purpose": "pricing discount rule", + } + ], + "dependency_edges": [], + } + ], + } + + +def _ctx() -> BicameralContext: + return BicameralContext.from_env() + + +# ── Regression tests ────────────────────────────────────────────────── + + +@pytest.mark.phase2 +@pytest.mark.asyncio +async def test_ingest_of_existing_symbol_is_reflected_immediately(_isolated_ledger): + """A region grounded against code that exists at HEAD must be born + reflected — not stuck at pending because no baseline hash was stamped. + """ + repo_root = _isolated_ledger + ledger = get_ledger() + await ledger.connect() + + payload = _payload_for( + symbol="calculate_discount", + intent="Apply 10% discount on orders of $100 or more", + file_path="pricing.py", + start=1, + end=4, + repo=str(repo_root), + ) + result = await ledger.ingest_payload(payload) + assert result["stats"]["regions_linked"] == 1, ( + f"Expected 1 region linked, got stats={result['stats']!r}" + ) + + ctx = _ctx() + status = await handle_decision_status(ctx, filter="all") + assert status.summary.get("reflected", 0) == 1, ( + f"Expected 1 reflected intent immediately after ingest, " + f"got summary={status.summary!r}" + ) + assert status.summary.get("pending", 0) == 0, ( + f"No intent should be pending when the grounded symbol exists at HEAD, " + f"got summary={status.summary!r}" + ) + + +@pytest.mark.phase2 +@pytest.mark.asyncio +async def test_edit_to_grounded_symbol_flips_to_drifted(_isolated_ledger): + """After ingest, a real code edit on the grounded region must flip the + decision's status from reflected to drifted on the next link_commit. + """ + repo_root = _isolated_ledger + ledger = get_ledger() + await ledger.connect() + + payload = _payload_for( + symbol="calculate_discount", + intent="Apply 10% discount on orders of $100 or more", + file_path="pricing.py", + start=1, + end=4, + repo=str(repo_root), + ) + await ledger.ingest_payload(payload) + + ctx = _ctx() + pre = await handle_decision_status(ctx, filter="all") + assert pre.summary.get("reflected", 0) == 1 + + # Invert the discount threshold — real semantic change, not cosmetic + _commit_edit( + repo_root, + """ + def calculate_discount(order_total): + if order_total >= 500: + return order_total * 0.25 + return 0 + """, + "tighten discount thresholds", + ) + + await handle_link_commit(ctx, "HEAD") + post = await handle_decision_status(ctx, filter="all") + assert post.summary.get("drifted", 0) == 1, ( + f"Edit to grounded symbol must flip to drifted, " + f"got summary={post.summary!r}" + ) + assert post.summary.get("reflected", 0) == 0 + + +@pytest.mark.phase2 +@pytest.mark.asyncio +async def test_phantom_range_stays_pending(_isolated_ledger): + """A region whose line range doesn't exist at HEAD must stay + ungrounded/pending — not crash, not false-flip to reflected. + """ + repo_root = _isolated_ledger + ledger = get_ledger() + await ledger.connect() + + payload = _payload_for( + symbol="phantom_symbol", + intent="This decision points at code that was never written", + file_path="does_not_exist.py", + start=1, + end=20, + repo=str(repo_root), + ) + await ledger.ingest_payload(payload) + + ctx = _ctx() + status = await handle_decision_status(ctx, filter="all") + assert status.summary.get("reflected", 0) == 0, ( + f"Phantom range must not report as reflected, got {status.summary!r}" + ) + # Either ungrounded (region linked but no hashable content) or pending + # is acceptable — the important thing is the region never masquerades + # as reflected when nothing actually exists to hash. + bad = [d for d in status.decisions if d.status == "reflected"] + assert not bad, f"Unexpected reflected decisions: {[d.description for d in bad]}" + + +@pytest.mark.phase2 +@pytest.mark.asyncio +async def test_backfill_heals_legacy_empty_hash_regions(_isolated_ledger): + """Simulate a pre-v0.4.5 ledger by clearing content_hash on an ingested + region and flipping its status to pending. The next link_commit must + run the backfill sweep and flip the decision to reflected without + needing the commit to touch the file. + """ + repo_root = _isolated_ledger + ledger = get_ledger() + await ledger.connect() + + payload = _payload_for( + symbol="calculate_discount", + intent="Apply 10% discount on orders of $100 or more", + file_path="pricing.py", + start=1, + end=4, + repo=str(repo_root), + ) + await ledger.ingest_payload(payload) + + # Force the region back into the pre-v0.4.5 shape: empty content_hash, + # intent back to pending. This is the state every Accountable-style + # bulk ingest left behind. + inner = getattr(ledger, "_inner", ledger) + client = inner._client + await client.query("UPDATE code_region SET content_hash = ''") + await client.query("UPDATE intent SET status = 'pending'") + + pre = await client.query("SELECT status FROM intent") + assert any(r.get("status") == "pending" for r in pre), ( + "Precondition: the intent should be pending before backfill runs" + ) + + ctx = _ctx() + # handle_decision_status auto-calls handle_link_commit, which runs the + # backfill sweep before the normal drift loop. No code edit, no commit. + status = await handle_decision_status(ctx, filter="all") + assert status.summary.get("reflected", 0) == 1, ( + f"Backfill must self-heal pre-v0.4.5 empty-hash regions, " + f"got summary={status.summary!r}" + ) + assert status.summary.get("pending", 0) == 0 From fe64e91ffc899f41216485adc313be8d167fbb65 Mon Sep 17 00:00:00 2001 From: jinhongkuan Date: Tue, 14 Apr 2026 01:24:24 -0400 Subject: [PATCH 2/2] docs(readme): restructure + drift ladder roadmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Hoist Collaboration Modes and Tool Composition above "How It Works" so the workflow narrative lands before the implementation detail. - Rewrite Tool Composition around a single running example (Sprint 14 checkout planning → discount rule → drift catch at PR review) showing how ingest / search / drift compose across a decision's lifecycle. - Drop CodeGenome mentions from the roadmap — superseded by the drift ladder plan and the planned CocoIndex re-grounding workstream. - Add ASCII art header with the decisions↔code chamber diagram. - Remove the Roadmap section entirely. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 189 +++++++++++++++++++++++++++++------------------------- 1 file changed, 103 insertions(+), 86 deletions(-) diff --git a/README.md b/README.md index dcee146a..a900c5dd 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,22 @@ +``` + ██████╗ ██╗ ██████╗ █████╗ ███╗ ███╗███████╗██████╗ █████╗ ██╗ + ██╔══██╗██║██╔════╝██╔══██╗████╗ ████║██╔════╝██╔══██╗██╔══██╗██║ + ██████╔╝██║██║ ███████║██╔████╔██║█████╗ ██████╔╝███████║██║ + ██╔══██╗██║██║ ██╔══██║██║╚██╔╝██║██╔══╝ ██╔══██╗██╔══██║██║ + ██████╔╝██║╚██████╗██║ ██║██║ ╚═╝ ██║███████╗██║ ██║██║ ██║███████╗ + ╚═════╝ ╚═╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝ + + ┌────────────────┐ ┌────────────────┐ + │ DECISIONS │ ◀── drift ──▶ │ CODE │ + │ what was │ detection │ what actually │ + │ said │ │ shipped │ + └────────┬───────┘ └────────┬───────┘ + │ ┌──────────┐ │ + └──────────▶│ ledger │◀─────────────┘ + └──────────┘ + local · deterministic +``` + # Bicameral MCP [![PyPI version](https://img.shields.io/pypi/v/bicameral-mcp)](https://pypi.org/project/bicameral-mcp/) @@ -14,15 +33,14 @@ Bicameral MCP is a local-first [Model Context Protocol](https://spec.modelcontex ## Table of Contents - [The Problem](#the-problem) +- [Collaboration Modes](#collaboration-modes) +- [Tool Composition](#tool-composition) - [How It Works](#how-it-works) - [Architecture](#architecture) - [Quickstart](#quickstart) - [MCP Tools Reference](#mcp-tools-reference) -- [Tool Composition](#tool-composition) - [Testing](#testing) -- [Collaboration Modes](#collaboration-modes) - [Configuration](#configuration) -- [Roadmap](#roadmap) - [Contributing](#contributing) - [License](#license) @@ -44,6 +62,88 @@ Bicameral's core value is **drift detection** -- knowing that a decision made th --- +## Collaboration Modes + +Bicameral runs in one of two modes, set during `bicameral-mcp setup` or in `.bicameral/config.yaml`: + +| | Solo (default) | Team | +|---|---|---| +| **Who** | Individual testing or evaluation | Any mix of roles -- devs, PMs, designers | +| **Data** | Local DB only | Local DB + git-committed event files | +| **Shared via** | Nothing -- fully isolated, zero impact on teammates | Normal `git push` / `git pull` | +| **Merge conflicts** | N/A | Zero -- per-user directories, append-only files | + +**Solo mode** is ideal for trying Bicameral without affecting your team's workflow. All data stays in a gitignored local DB -- no event files, no commits, no side effects. Switch to team mode when you're ready to share. + +**Team mode** enables cross-role collaboration through git. A PM ingests a PRD and sprint transcript; when a developer pulls, `bicameral.search` surfaces those decisions as coding context and `bicameral.status` shows what still needs implementation. The PM never touches the code; the developer never sits through the meeting. The decision graph is the handoff. + +``` +.bicameral/ +├── events/ ← committed to git (shared decisions) +│ ├── pm@co.com/ ← PM's ingested PRDs and transcripts +│ └── dev@co.com/ ← developer's commit syncs +├── config.yaml ← committed (mode: solo | team) +└── local/ ← gitignored (materialized state) +``` + +**Switching modes:** Set `mode: team` or `mode: solo` in `.bicameral/config.yaml`. No data migration needed. + +--- + +## Tool Composition + +The nine tools compose into three workflows that follow the natural lifecycle of a decision — **captured in a meeting, pulled as context during coding, checked at review time.** Each workflow below uses the same running example (a checkout-flow sprint) so you can see a single decision move through the pipeline. + +### 1. Ingestion — after a meeting + +> **Scenario:** Your PM wraps a 30-minute sprint planning in `#product-planning`. The transcript contains three decisions. You paste it into Claude and say "ingest this." + +```jsonc +// bicameral.ingest +{ + "source": "slack", + "title": "Sprint 14 Planning — 2026-03-12", + "decisions": [ + { "title": "Apply 10% discount on orders over $100", + "description": "Marketing confirmed at offsite. No upper bound." }, + { "title": "Cache user sessions in Redis, not local memory", + "description": "Arch review: local memory breaks horizontal scaling." }, + { "title": "Rate-limit checkout to 100 req/min per user", + "description": "Legal/compliance ask. Not yet built." } + ] +} +``` + +**Outcome.** The discount rule and the Redis session decision anchor to real symbols (`pricing/discount.py:DiscountService.calculate`, `auth/session_store.py:SessionStore.put`) and are born `reflected`. Auto-grounding can't find code for the rate-limit rule — because it hasn't been written yet — so it lands as `ungrounded`. The ledger now knows a decision exists with no corresponding code, and the next `bicameral.status` call will show exactly that. + +--- + +### 2. Pre-flight — before writing new code + +> **Scenario:** A dev picks up the ticket "add rate limiting to checkout." Before writing a single line, they ask Claude for context. + +```jsonc +// bicameral.search +{ "query": "rate limit checkout", "max_results": 5 } +``` + +**Outcome.** Before writing any code, the dev sees the prior rate-limit decision *with its compliance rationale*, learns that it's still `ungrounded` (so they're the first implementer), and discovers an adjacent `pricing/discount.py:DiscountService.calculate` region their new code will need to coexist with. No re-litigating a decided rule, no Slack archaeology, no ambushing the PM in standup tomorrow. + +--- + +### 3. Code review — before merging + +> **Scenario:** Three weeks later, a different dev opens PR #241 with a 50-line diff touching `pricing/discount.py`. Reviewer asks Claude "any drift in this file?" + +```jsonc +// bicameral.drift +{ "file_path": "pricing/discount.py", "use_working_tree": false } +``` + +**Outcome.** The reviewer learns that `DiscountService.calculate:42-67` has drifted from the Sprint 14 Planning decision — threshold raised $100 → $500, rate lowered 10% → 5%. Either the change is intentional, in which case a new decision must be ingested before merge, or it's accidental and gets reverted. The conversation happens at PR time, not three sprints later in an incident post-mortem. + +--- + ## How It Works ### Status Derivation Model @@ -406,49 +506,6 @@ No LLM provider credentials needed -- all retrieval is deterministic. --- -## Tool Composition - -The nine tools are designed to compose into three primary workflows: - -### Pre-flight (before coding) - -``` -bicameral.search("add rate limiting to checkout") - --> surfaces prior constraints, related decisions, and their code regions - --> auto-syncs ledger to HEAD before returning results -``` - -Use this to check for prior art and constraints before writing new code. Prevents CONSTRAINT_LOST and REPEATED_EXPLANATION. - -### Code review (before merging) - -``` -bicameral.drift("payments/processor.py") - --> surfaces all decisions touching symbols in this file - --> flags any where the code has diverged from recorded intent - -bicameral.status(filter="drifted") - --> full drift report across the entire codebase -``` - -Use this in pull request review to catch unintentional drift. The `use_working_tree` parameter controls whether comparison is against disk (pre-commit) or HEAD (PR review). - -### Ingestion (after a meeting) - -``` -bicameral.ingest(payload) - --> extracts intents, auto-grounds to code symbols - --> advances source cursor for incremental sync - -bicameral.link_commit("HEAD") - --> syncs latest commit state into the ledger - -bicameral.status(since="2025-03-20") - --> shows what's reflected vs. pending since the meeting -``` - ---- - ## Testing Bicameral has 42 test files organized into three phases, all using real adapters with `SURREAL_URL=memory://` (embedded, in-process SurrealDB -- no external services required). @@ -470,34 +527,6 @@ Phase 3 tests produce JSON artifacts (`test-results/e2e/`) with full tool respon --- -## Collaboration Modes - -Bicameral runs in one of two modes, set during `bicameral-mcp setup` or in `.bicameral/config.yaml`: - -| | Solo (default) | Team | -|---|---|---| -| **Who** | Individual testing or evaluation | Any mix of roles -- devs, PMs, designers | -| **Data** | Local DB only | Local DB + git-committed event files | -| **Shared via** | Nothing -- fully isolated, zero impact on teammates | Normal `git push` / `git pull` | -| **Merge conflicts** | N/A | Zero -- per-user directories, append-only files | - -**Solo mode** is ideal for trying Bicameral without affecting your team's workflow. All data stays in a gitignored local DB -- no event files, no commits, no side effects. Switch to team mode when you're ready to share. - -**Team mode** enables cross-role collaboration through git. A PM ingests a PRD and sprint transcript; when a developer pulls, `bicameral.search` surfaces those decisions as coding context and `bicameral.status` shows what still needs implementation. The PM never touches the code; the developer never sits through the meeting. The decision graph is the handoff. - -``` -.bicameral/ -├── events/ ← committed to git (shared decisions) -│ ├── pm@co.com/ ← PM's ingested PRDs and transcripts -│ └── dev@co.com/ ← developer's commit syncs -├── config.yaml ← committed (mode: solo | team) -└── local/ ← gitignored (materialized state) -``` - -**Switching modes:** Set `mode: team` or `mode: solo` in `.bicameral/config.yaml`. No data migration needed. - ---- - ## Configuration | Variable | Default | Description | @@ -510,18 +539,6 @@ All data is stored locally. The embedded SurrealDB instance runs in-process -- n --- -## Roadmap - -### CodeGenome Identity Layer - -The current system grounds decisions via symbol names and file paths. This works well for stable codebases, but location-based anchoring breaks when code is renamed, moved, or heavily refactored. - -The next major evolution -- **CodeGenome** -- replaces location-based anchoring with identity-based grounding: structural signatures and behavioral profiles that persist across renames, moves, and AI-driven rewrites. This resolves what we call the **Auto-Grounding Problem**: intent anchored to identity rather than location. - -Where Bicameral today maps `intent --> symbol_name --> file:line`, CodeGenome will map `intent --> structural_identity --> any_location`, making the decision graph resilient to large-scale codebase reorganization. - ---- - ## Contributing Contributions are welcome. To get started: