BicameralAI · jinhongkuan · Apr 26, 2026 · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026
@@ -53,6 +53,12 @@ The handler returns a `DoctorResponse` with:
 - `ledger_summary` — `DoctorLedgerSummary` with repo-wide `total`, `drifted`, `pending`, `ungrounded`, `reflected` counts. Populated on branch scope only.
 - `action_hints` — merged from whichever sub-scan produced them. Same intensity-gated semantics as every other skill (`guided_mode` controls `blocking`).
 
+### Per-entry advisory fields (read-path only, never gate behavior)
+
+- **`DriftEntry.cosmetic_hint: bool`** (on every entry inside `file_scan.decisions` and `branch_scan.decisions`). True when the HEAD-to-working-tree diff for that region is provably whitespace-only per the strict tree-sitter classifier (`ledger/ast_diff.is_cosmetic_change`). Never affects status; the entry stays drifted and the user must still address it. Use as a render-time tag (e.g. *"cosmetic edit, please confirm"*) — do not use it to suppress drift.
+- **`pending_grounding_checks[].original_lines: [start, end]`** when `reason == "symbol_disappeared"` (visible inside `file_scan.sync_status.pending_grounding_checks` and the equivalent under branch scope). Lets the caller LLM run `git show <prev_ref>:<file_path>` over those lines to inspect the symbol's prior position before deciding what to do. Strictly informational.
+- **`sync_status.verification_instruction`** is now built per response based on which `pending_*` payloads fired. For `pending_grounding_checks` with `reason == "symbol_disappeared"`, the text is **INFORMATIONAL ONLY** and explicitly forbids calling `bicameral.bind` on the new location (it would create duplicate-binding state under the N:N `binds_to` relation). Until V2 ships atomic rebind, the doctor skill must not synthesize a bind CTA for relocation cases. For `reason == "ungrounded"`, the bind CTA is safe and remains in the instruction text — render it as guidance.
+
 ## How to render
 
 ### Scope = file

@@ -34,6 +34,8 @@ bicameral.history(
 )
 ```
 
+The response also carries an optional `sync_metrics` (`{sync_catchup_ms, barrier_held_ms}`) observability field for the catch-up time spent inside `ensure_ledger_synced`. **Skip rendering it** — these are server-side latency numbers, not user-visible signal. Log them if you're profiling, otherwise ignore.
+
 ## How to present
 
 Group decisions by `HistoryFeature`. For each group:

@@ -106,6 +106,12 @@ The handler runs `bicameral.search` internally, gates on the user's
 `guided_mode` setting, conditionally chains to `bicameral.brief`, and
 returns a `PreflightResponse` with a `fired: bool` field.
 
+The response also carries an optional `sync_metrics`
+(`{sync_catchup_ms, barrier_held_ms}`) observability field for the
+catch-up time spent in `ensure_ledger_synced`. **Skip rendering it** —
+these are server-side latency numbers, not user-visible signal. Log
+them if you're profiling, otherwise ignore.
+
 ### 2.5 Render session-start banner if present
 
 Before evaluating `response.fired`, check `response.session_start_banner`.
@@ -180,6 +186,14 @@ Categories with no ask-findings are silently skipped. If every
 finding in every category is mechanical, produce NO output (same as
 `fired=false` — silent).
 
+**Cosmetic drift rule**: if a `drifted` entry has `cosmetic_hint=true`,
+classify it as **mechanical** regardless of guided mode. The server has
+verified via AST comparison that the change is whitespace-only and
+semantically inert — the stored intent is still intact. Auto-resolve
+silently; do NOT add it to the drift ask-queue and do NOT emit a
+blocking hint. Render it with `~` prefix (not `⚠ DRIFTED:`) if you
+render it at all — see the template in Step 5.
+
 ### 5. Render the surfaced block
 
 When at least one ask-finding exists, surface the response using this
@@ -200,6 +214,11 @@ format. Lead with the `(bicameral surfaced)` attribution line.
     Source: <source_ref>
     Drift evidence: <drift_evidence verbatim>
 
+  ~ REFORMATTED: <decision description>      ← cosmetic_hint=true only
+    <file_path>:<symbol>:<lines>
+    Source: <source_ref>
+    (whitespace-only change — intent intact, no action needed)
+
 ⚠ N divergent decision pair(s) — pick a winner before continuing:
   • <symbol> (<file_path>): <summary>
 
@@ -310,8 +329,9 @@ For every finding this skill surfaces, classify first:
 
 - **mechanical** — one obvious correct answer (e.g., renamed symbol
   with identical signature; a decision whose code moved but semantics
-  are intact). Auto-apply the resolution silently. Do NOT ask the
-  user.
+  are intact; a `drifted` entry with `cosmetic_hint=true` — AST
+  comparison confirmed whitespace-only change). Auto-apply the
+  resolution silently. Do NOT ask the user.
 - **ask** — reasonable people could disagree (e.g., drifted behavior
   where the old decision may still be valid; divergent decisions where
   no clear winner exists). Emit ONE question per finding, using the

@@ -53,7 +53,7 @@ The handler returns a `ScanBranchResponse` with:
 - `base_ref` / `head_ref` — the resolved refs that were diffed
 - `sweep_scope` — `"range_diff"` (default, good), `"head_only"` (base was unreachable — fell back to HEAD-only scope, surface to user), or `"range_truncated"` (range exceeded the 200-file cap; the scan ran on the first 200 files, rest need a separate pass)
 - `range_size` — number of files the sweep covered
-- `decisions` — deduped list of `DriftEntry` across all files (each decision shows up once even if it touches multiple files)
+- `decisions` — deduped list of `DriftEntry` across all files (each decision shows up once even if it touches multiple files). Drifted entries may carry `cosmetic_hint=true` when the HEAD-to-working-tree diff for that region is provably whitespace-only per the strict tree-sitter classifier (`ledger/ast_diff.is_cosmetic_change`). The hint is **advisory metadata only** — it never gates drift surfacing or status, and the entry stays in the drifted bucket regardless. Treat it as a render-time signal: a cosmetic-hinted drift is still drift the user must address.
 - `files_changed` — the file paths that were swept
 - `drifted_count` / `pending_count` / `ungrounded_count` / `reflected_count`
 - `undocumented_symbols` — union across all files

@@ -29,6 +29,8 @@ Pre-flight check before coding — surface past decisions relevant to what you'r
 
 $ARGUMENTS — the feature, task, or area to search for prior decisions about
 
+The response also carries an optional `sync_metrics` (`{sync_catchup_ms, barrier_held_ms}`) observability field for the catch-up time spent in the implicit `link_commit` that runs before search. **Skip rendering it** — these are server-side latency numbers, not user-visible signal. Log them if you're profiling, otherwise ignore.
+
 ## Action Hint Contract (v0.4.10+)
 
 The response always includes an `action_hints` list. Two intensities,

@@ -3,6 +3,200 @@
 All notable changes to bicameral-mcp are tracked here. Format loosely follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
 
+## Unreleased — desync optimization V1 — measurement + read-path advisory
+
+V1 of a two-part desync-correctness initiative. V1 ships measurement
+infrastructure, a strict-whitelist cosmetic-change classifier, relocation
+context enrichment, and a canonical 13-scenario regression matrix —
+**without touching any destructive write path**. V2 (separate effort,
+design captured in `docs/v2-desync-optimization-guide.md` with nine rounds of Codex
+review) tackles the destructive-path overhaul: atomic rebind, baseline
+advancement with full CAS, schema migration v6, append-only verdict
+history.
+
+V1 introduces zero new mutating capabilities. Every change is one of:
+read-only measurement, additive contract field, pure function, test
+coverage, or a surgical bug fix to an already-shipped path. The plan,
+phase breakdown, V2 deferred items, and Codex review parking lot live in
+`docs/v2-desync-optimization-guide.md`.
+
+### Added — `tests/bench_drift.py` (A1)
+
+Drift benchmark harness. Seeds 100 decisions across 25 files via
+tree-sitter `extract_symbols` (no BM25 index build required), times
+`handle_search_decisions`, `handle_detect_drift`, `handle_link_commit`
+under a `memory://` ledger, writes
+`test-results/bench/drift_baseline.json` plus a stdout summary.
+Marked `@pytest.mark.bench` so default test runs skip it; run via
+`pytest tests/bench_drift.py -v -m bench -s`.
+
+Baseline on Apple Silicon (post-rebase, surrealdb 2.0.0):
+
+| handler            | p50 (ms) | p95 (ms) | max (ms) |
+|--------------------|---------:|---------:|---------:|
+| search_decisions   |      9.2 |     10.4 |     11.0 |
+| detect_drift       |     14.2 |     15.5 |     16.4 |
+| link_commit (warm) |      7.3 |      8.0 |      8.3 |
+
+All 50–185× under the V2 perf targets in `PLAN.md:83`
+(`search_decisions < 2s`, `detect_drift < 1s`).
+
+### Added — `handlers/sync_middleware.repo_write_barrier(ctx)` (A2-light)
+
+Per-repo `asyncio.Lock` async context manager backed by a module-level
+`dict[repo_path, asyncio.Lock]`. `handle_bind` wraps its body via a thin
+`_do_bind` inner function. Different repos run concurrently; same repo
+serializes. Lazy guard-lock construction avoids the "bound to wrong
+loop" pitfall across test event loops. Yields a mutable `BarrierTiming`
+holder whose `held_ms` is populated on exit (including on exceptions).
+
+Deliberately narrow scope: does NOT protect `resolve_compliance` or
+cross-process writers — both are V2 scope (V1 plan §5.2, §5.5).
+
+### Added — `contracts.SyncMetrics` (A3)
+
+```python
+class SyncMetrics(BaseModel):
+    sync_catchup_ms: float | None = None
+    barrier_held_ms: float | None = None
+```
+
+Attached as `sync_metrics: SyncMetrics | None = None` to
+`SearchDecisionsResponse`, `PreflightResponse`, `HistoryResponse`,
+`BindResponse`. Purely additive, non-breaking. Each handler times its
+own sync call locally so nested calls (e.g. preflight chaining to
+search_decisions) don't step on each other's metrics.
+
+### Added — `ledger/ast_diff.is_cosmetic_change(before, after, lang)` (B1)
+
+Strict-whitelist tree-sitter classifier returning `True` only when two
+snippets differ by inter-token whitespace alone. Compares a recursive
+`(node.type, child_sigs | leaf_bytes)` signature; identifier renames,
+comment edits (incl. `# type: ignore` / `# noqa` / `// @ts-ignore` /
+build tags / lint pragmas), docstring edits, trailing-comma changes,
+string-literal changes, import reorders, and any AST shape change all
+return `False`. Reuses `code_locator.indexing.symbol_extractor._get_parser`
+so the cosmetic detector and symbol indexer can never silently disagree
+on supported languages: python, javascript, typescript, java, go, rust,
+c_sharp (plus jsx → javascript and tsx → typescript via
+`LANGUAGE_FALLBACK`). Unsupported langs, parse failures, and trees with
+`has_error` all fail safe to `False`.
+
+False negatives (real cosmetic changes routed unbiased to L3 in V2) are
+cheap; false positives (semantics-affecting changes mislabeled cosmetic)
+bias future L3 prompts toward "looks fine" — exactly the failure mode
+the strict whitelist prevents.
+
+### Added — `DriftEntry.cosmetic_hint: bool = False` (B2)
+
+Populated by `handlers.detect_drift._enrich_with_cosmetic_hints` after
+the pure `raw_decisions_to_drift_entries` mapping (IO encapsulated
+outside the pure function). Read-path advisory ONLY — never mutates
+`content_hash`, never gates drift surfacing or status, never advances
+baseline. Five fail-safe paths leave the hint `False`: non-drifted
+entry, equal HEAD/working-tree bytes, unsupported file extension,
+invalid line range, exception during classifier.
+
+Source comparison: HEAD bytes (via `ledger.status.get_git_content` ref
+`"HEAD"`) vs working-tree bytes (ref `"working_tree"`), sliced to the
+region's `(start_line, end_line)`. Language resolved from file extension
+via `code_locator.indexing.symbol_extractor.EXTENSION_LANGUAGE`.
+
+### Added — `pending_grounding_checks[].original_lines` (D1)
+
+For `reason='symbol_disappeared'` entries, the payload now carries
+`original_lines: [start_line, end_line]` so the caller LLM can run
+`git show <prev_ref>:<file_path>` to inspect the symbol's prior
+position when locating its new home. Strictly informational — no
+actionable workflow. Single-line addition in `ledger/adapter.py`.
+
+### Added — `tests/test_desync_scenarios.py` (F1)
+
+Canonical regression matrix for the 13 desync scenarios from the Notion
+"Auto-Grounding Problem" catalog, routed through the real handler
+layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss
+post-ingest hooks). Self-contained tmp git-repo fixture per test.
+
+**Scorecard**: 12 PASS, 1 XFAIL.
+
+| # | Scenario | V1 outcome |
+|---|---|---|
+| 1 | New decision, matching code exists | ✅ ungrounded → caller binds |
+| 2 | Code changed after grounded | ✅ pending + `pending_compliance_check` |
+| 3 | Code deleted after grounded | ✅ symbol_disappeared |
+| 4 | Symbol renamed in file | ✅ symbol_disappeared with `original_lines` |
+| 5 | Symbol moved cross-file | ✅ symbol_disappeared |
+| 6 | Code added later | ✅ caller binds explicitly |
+| 7 | Cold start, no matching code | ✅ stays ungrounded |
+| 8 | Drifted intent → atomic re-ground | ⏸ XFAIL (V2 §8 D2 — `bicameral_rebind` with old-binding CAS) |
+| 9 | Intent description supersession | ✅ re-ingest succeeds |
+| 10 | N decisions share a symbol | ✅ both surface |
+| 11 | No server-side BM25 grounding (post-v0.6.0) | ✅ stays ungrounded |
+| 12 | Line-shift edit | ✅ no spurious drift (`resolve_symbol_lines` self-heals) |
+| 13 | `[Open Question]` prefix | ✅ ingested as gap |
+
+### Changed — `handlers/link_commit._build_verification_instruction()`
+
+Splits the v0.6.4 monolithic `_VERIFICATION_INSTRUCTION` into three
+composable parts so the response text is conditional on which
+`pending_*` payloads actually fired:
+
+- `pending_compliance_checks` present → resolve_compliance CTA.
+- `pending_grounding_checks` with `reason='ungrounded'` →
+  `Grep/Read → validate_symbols / extract_symbols → bicameral.bind` CTA
+  (safe — no prior binding to retire, no duplicate-binding risk).
+- `pending_grounding_checks` with `reason='symbol_disappeared'` →
+  **explicit "INFORMATIONAL ONLY — do NOT call bicameral.bind on the
+  new location" warning** citing the duplicate-binding hazard under
+  the N:N `binds_to` relation. Atomic rebind ships in V2.
+
+Addresses Codex pass-10 #2 + pass-12 #2: the v0.6.4 monolithic CTA
+inadvertently routed relocation cases through the unsafe bind path.
+The V1 split removes that without reducing the safe CTA for ungrounded.
+
+### Fixed — `ledger/adapter.py` ungrounded grounding-check `decision_id`
+
+`pending_grounding_checks` for ungrounded decisions emitted empty
+`decision_id` because the consumer read `d.get("id", "")` from
+`get_all_decisions(filter="ungrounded")`, but that query aliases the
+field to `decision_id`. Callers had no handle to bind against.
+Surfaced by V1 F1 regression coverage; existing
+`test_pending_grounding_checks_for_ungrounded_decisions` regression
+only asserted `len > 0`, missing the empty-ID bug. Read `decision_id`
+first, fall back to `id` for forward compatibility.
+
+### Tests
+
+75 passed, 1 xfailed in 7.11s after V1. Zero regressions on the
+v0.6.3/v0.6.4/0.6.4-bump rebase, and the SDK-2.0 idempotency-catch
+issue I had originally pinned around (`surrealdb<2.0.0`) is fixed
+properly upstream by `66796ef`, so V1 ships against
+`surrealdb>=2.0.0` directly.
+
+### Deferred to V2
+
+Captured in full in `docs/v2-desync-optimization-guide.md` (design target with
+nine rounds of Codex review) and summarized in
+`docs/v2-desync-optimization-guide.md` §4–§5:
+
+- A0 — atomic SurrealQL block primitive (Python SDK doesn't support
+  `begin_transaction()` in embedded mode).
+- A2a — full sync barrier (sync-token CAS + region fingerprint at
+  commit time).
+- C0 / C0a / C1 — schema migration v5→v6 (per-binding baseline
+  ownership, tombstone fields, append-only `compliance_verdict_history`,
+  full-CAS cache key, traversal filtering).
+- C2 — `bicameral_judge_drift` + `record_compliance_verdict` with
+  five-field CAS (incl. binding-state token).
+- C3 — `pending_compliance_checks` from `detect_drift` (cache-aware).
+- B3 — `bicameral_advance_baseline` (only after fresh L3 `compliant`
+  verdict matching full CAS).
+- D2 — `bicameral_rebind` with old-binding CAS; closes scenario 8.
+- Migration of the `handlers/resolve_compliance.py` hard-delete +
+  `handlers/ingest.py` auto-chained `handle_judge_gaps` to the
+  tombstone + CAS contract — hard prerequisite before V2 destructive
+  work ships.
+
 ## 0.7.0 — 2026-04-24 — Accountable North Star: proposal state + signoff schema
 
 Every decision now has provenance: who proposed it, who ratified it, in which session.

@@ -80,12 +80,27 @@ Code search is caller-owned: Claude Code / Cursor / etc. use their native Grep/R
 - [x] Zero active mocks
 - [x] Full E2E verified
 - [x] GitHub Actions CI replaces pre-push git hook
+- [x] Performance: `search_decisions` < 2s, `detect_drift` < 1s on 100+ decisions — measured by V1 A1 (`tests/bench_drift.py`) at p95 = 10.4 ms / 15.5 ms, 55–185× under target
 
 ### Remaining
-- [ ] Performance: `search_decisions` < 2s, `detect_drift` < 1s on repo with 100+ decisions
-- [ ] LLM drift judge: wire `claude-haiku-4-5` for changed-region comparison in `detect_drift`
+- [ ] LLM drift judge: wire `claude-haiku-4-5` for changed-region comparison in `detect_drift` (V2 — `docs/desync-optimization.md` §8 C2)
 - [ ] All 4 tools demoed live in Claude Code (MCP connected)
 
+## Desync Optimization V1 — DONE (read-path advisory + measurement)
+
+Plan: `docs/desync-optimization-v1-plan.md`. V2 design target with full
+Codex-review history: `docs/desync-optimization.md`. V1 introduces zero
+new mutating capabilities.
+
+- [x] A1 — `tests/bench_drift.py` benchmark harness; `test-results/bench/drift_baseline.json` artifact
+- [x] A2-light — `handlers/sync_middleware.repo_write_barrier(ctx)` (per-repo `asyncio.Lock`); `handle_bind` wrapped
+- [x] A3 — `contracts.SyncMetrics` + `sync_metrics` field on Search/Preflight/History/Bind responses
+- [x] B1 — `ledger/ast_diff.is_cosmetic_change(before, after, lang)` (strict tree-sitter whitelist, 21 tests)
+- [x] B2 — `DriftEntry.cosmetic_hint` advisory + `_enrich_with_cosmetic_hints` helper in `handle_detect_drift`
+- [x] D1 — `original_lines` field on `symbol_disappeared` grounding checks; **`_build_verification_instruction` split** so relocation cases never get the unsafe `bicameral.bind` CTA
+- [x] F1 — `tests/test_desync_scenarios.py` canonical 13-scenario regression matrix (12 pass + 1 V2 xfail)
+- [x] Incidental fix — empty `decision_id` on ungrounded `pending_grounding_checks` (`ledger/adapter.py:475`)
+
 ---
 
 ## Mock → Real Swap Summary
-Original file line number
+Diff line change
@@ Expand Up / @@ -34,6 +34,8 @@ bicameral.history( @@
     )
     ```
+    The response also carries an optional `sync_metrics` (`{sync_catchup_ms, barrier_held_ms}`) observability field for the catch-up time spent inside `ensure_ledger_synced`. **Skip rendering it** — these are server-side latency numbers, not user-visible signal. Log them if you're profiling, otherwise ignore.
     ## How to present
     Group decisions by `HistoryFeature`. For each group:
@@ Expand Down @@