Skip to content

feat(ledger_locator): move state to ~/.bicameral/projects/<id>/ (#368)#408

Merged
jinhongkuan merged 8 commits into
devfrom
feat/368-ledger-locator-r4-phase1
May 19, 2026
Merged

feat(ledger_locator): move state to ~/.bicameral/projects/<id>/ (#368)#408
jinhongkuan merged 8 commits into
devfrom
feat/368-ledger-locator-r4-phase1

Conversation

@jinhongkuan

Copy link
Copy Markdown
Contributor

Summary

  • Adds the ledger_locator/ module: deterministic resolver for ledger / code-graph / bm25 / watermark / transcript-queue / operator-config paths, keyed off sha256(git rev-parse --git-common-dir)[:16] so worktrees of the same clone share one state bag at ~/.bicameral/projects/<id>/.
  • Delegates every existing call site (ledger adapter, code-locator runtime + config, events materializer + transcript queue, setup_wizard) to the locator. Splits <repo>/.bicameral/config.yaml per R4: team-identity keys stay committed, per-operator keys (telemetry, channel, guided, signer/render attribution, query timeouts, rate limits, team.role) move to ~/.bicameral/projects/<id>/operator.yaml. context.py readers route per-key via _CONFIG_KEY_ROUTING.
  • Ships bicameral-mcp migrate-state (one-shot, idempotent, archive-on-collision, R4 config-yaml partition) + bicameral-mcp gc (orphan project-dir reclaim). bicameral-update skill now runs migrate-state --auto post-upgrade.

Linked issues

Closes BicameralAI/bicameral-daemon#2

Linked decisions

Closes decision:ko8efq3z1zwhbof7kecq — Name "Ledger Locator"
Closes decision:c2eqcwimhe4lpaexrddw — Supported environments scope lock
Closes decision:fi1def9bci6s6fcflc2p — Branch isolation stays logical (locator carries the rationale forward)
Closes decision:rfbnlw7ghe175iu42u6b — Project identity via git common-dir hash
Closes decision:5nr66wvmapjpt58rrji8 — R4 config split (team-identity vs per-operator)
Closes decision:ew9rgegdlblexsraesss — Delete resolve_config_path(); wizard onboarding via git show HEAD:.bicameral/config.yaml
Closes decision:6c20xahdyxk3suzav4pj — Explicit VCS contract (ProjectIdResolutionError names "git only" assumption)
Closes decision:ogdfx014sqgc6fi6ky1a — Reuse _resolve_authoritative_ref() for the divergence guard
Refs decision:e3xz4c4ji4x7lm3lvq4k — Defer ephemeral environments (one-line notice in migrate-state success summary; full support deferred to v0.16.1/v0.17)

Plan / Audit / Seal

  • Plan: thoughts/shared/plans/2026-05-16-ledger-locator-and-migration.md (R4-bis)
  • Audit: R4 audit verdict PASS (R1 PASS → R3 scope expansion → R4 VETO → R4-bis incorporated all three V1/V2/V3 corrections)
  • Seal: pending squash-merge to dev

Test plan

  • pytest tests/test_ledger_locator.py tests/test_ledger_locator_origin_guard.py tests/test_ledger_locator_vcs_contract.py -q — Phase 1 (locator + origin guard + VCS contract): 19/19
  • pytest tests/test_ledger_adapter_uses_locator.py tests/test_code_locator_runtime_uses_locator.py tests/test_code_locator_config_none_safe.py -q — Phase 2A/2B delegation: 9/9
  • pytest tests/test_setup_wizard_omits_state_env_vars.py tests/test_config_split.py tests/test_setup_wizard_git_native.py tests/test_run_config_wizard.py -q — Phase 2C wizard split + git-native onboarding + two-pane editor: 16/16
  • pytest tests/test_migrate_state.py -q — Phase 3 migration CLI (12 tests incl. byte-equality, idempotency, archive-on-collision, dry-run, partial state, --auto, default archive dir, bm25/watermark/transcript queues, legacy user-global ledger, R4 config-yaml partition): 12/12
  • pytest tests/test_gc.py -q — Phase 4 orphan reclaim CLI: 5/5
  • pytest tests/test_setup_wizard*.py tests/test_v0410_guided_mode.py tests/test_signer_email_fallback.py tests/test_context_ingest_rate_limit.py tests/test_preflight_attribution_redaction.py tests/test_preflight_render_source_attribution.py -q — regression on every existing test that touches a routed key: 145/145
  • ruff check ledger_locator cli/migrate_state.py cli/gc.py setup_wizard.py context.py server.py tests/test_*.py — clean

🤖 Generated with Claude Code

jinhongkuan and others added 7 commits May 18, 2026 17:03
R4 retracts R3's `resolve_config_path()` (filesystem-topology inference
for primary-worktree convergence) and replaces it with five amendments:

1. DELETE `resolve_config_path()` from the locator. Runtime readers in
   `context.py` revert to direct `<repo>/.bicameral/config.yaml` access;
   wizard uses `git show HEAD:.bicameral/config.yaml` for onboarding
   detection. No filesystem-topology inference — concept ports cleanly
   across all 9 deployment shapes catalogued in the Topology Problem
   Notion page (worktree, submodule, bare-repo, sparse checkout,
   devcontainer, Codespaces, CI, --separate-git-dir, non-git VCS).
2. CONFIG SPLIT — team-identity keys stay at `<repo>/.bicameral/config.yaml`
   (git-committed); per-operator keys move to
   `~/.bicameral/projects/<id>/operator.yaml` (per-machine). Routing
   table `context._CONFIG_KEY_ROUTING` is the single source of truth.
3. EXPLICIT VCS CONTRACT — structured `ProjectIdResolutionError` from
   `ledger_locator/_project_id.py::common_dir_for` when `git rev-parse`
   fails, with verbatim "bicameral currently supports git only".
4. REUSE `_resolve_authoritative_ref()` for divergence guard.
5. DEFER full ephemeral-environment support; R4 adds only a one-line
   notice in `migrate-state` post-flight summary.

Backed by 9 ratified bicameral decisions + 14 explicitly-rejected
alternatives (see bicameral ledger 2026-05-18 session).

Audit chain:
- R4 (2026-05-18) VETO — V1 `_edit_config_interactive` wrong function
  name; V2 stale context.py reader lines + bogus 412-429 entry; V3
  missing tests for solo-mode short-circuit + `run_config_wizard`
  editor. Gate: .qor/gates/2026-05-18T2334-r4audit/audit.json (local).
- R4-bis (2026-05-18) PASS after plan-text fixes. Gate:
  .qor/gates/2026-05-18T2338-r4bis/audit.json (local).

META_LEDGER entries #51 (VETO) and #52 (PASS) recorded.

Phase 1 implementation lands in the following commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…368)

Phase 1 of the Ledger Locator plan (R4-bis PASS).

Adds:
- `resolve_operator_config_path()` returns
  `<STATE_ROOT>/<project-id>/operator.yaml` for the R4 config split
  (decision:5nr66wvmapjpt58rrji8). Per-operator keys (telemetry,
  channel, guided, signer_email_fallback, render_source_attribution,
  rate-limit knobs, query timeouts) will land here in Phase 2; the
  locator anchor lands first so consumers have a stable resolution
  function to import.
- Explicit VCS contract in `_project_id.py::common_dir_for`
  (decision:6c20xahdyxk3suzav4pj). When `git rev-parse --git-common-dir`
  fails, the raised `ProjectIdResolutionError` now names the assumption
  verbatim: "bicameral currently supports git only; non-git VCSes are
  not yet implemented." Forces future ports to jj/sapling/fossil to be
  a deliberate locator amendment rather than an accidental success on
  a misclassified VCS.

Phase 1 does NOT include:
- `resolve_config_path()` — explicitly rejected per R4
  (decision:ew9rgegdlblexsraesss; see superseded R3 design at
  decision:6z39wrjpmmg9vhm8i6t4). Config readers will read
  `<repo>/.bicameral/config.yaml` directly in Phase 2.
- Phase 2 call-site delegation (context.py per-key routing,
  setup_wizard.py config split, git-show-HEAD onboarding detection).
- Phase 3 migrate-state CLI / Phase 4 gc CLI.

Tests (14 passing):
- 8 existing in `test_ledger_locator.py` + 3 in
  `test_ledger_locator_origin_guard.py` preserved.
- 2 new operator-config tests: path under project dir + stability
  across `git worktree add` checkouts.
- 3 new VCS-contract tests in `test_ledger_locator_vcs_contract.py`:
  verifies the verbatim error message surfaces from
  `resolve_ledger_url`, `common_dir_for`, and `project_id_for`.

ruff: clean. Plan amendment + R4-bis audit gate in the preceding commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rk, transcripts) (#368)

Phase 2A of the Ledger Locator plan (R4-bis PASS). Adds four R3 locator
functions for project-scoped derived state — sibling to ledger.db and
code-graph.db under `~/.bicameral/projects/<id>/`. These enable Phase 2B
to delegate the corresponding call sites in events/, code_locator/, and
scripts/hooks/ to the locator.

Added:
- `resolve_bm25_index_path()` — derived from code-graph; sibling to it.
- `resolve_watermark_path()` — replaces `events/materializer.py`'s
  `local_dir/"watermark"`. Fixes per-worktree re-replay of peer events.
- `resolve_pending_transcripts_dir()` — replaces
  `events/transcript_queue.py:_pending_root`. Fixes per-worktree
  invisibility of SessionEnd-hook transcripts.
- `resolve_processed_transcripts_dir()` — sibling to pending.

Refactor:
- Extracted `_resolved_project_dir(repo_path)` private helper. The
  resolve-repo + assert-origin pattern was duplicated across 3 public
  resolvers; adding 4 more would push it to 7. Helper centralizes the
  pipeline (repo → project-id → origin-guard) so each public resolver
  is now a one-line return. Net diff is line-neutral for existing
  resolvers; new ones get 2 lines each instead of 5.

Tests (17 passing, +3 from Phase 1):
- `test_resolves_derived_state_paths_under_project_dir` — all four new
  paths share the same project dir as code-graph.db.
- `test_derived_state_paths_stable_across_worktrees` — paths are
  identical across `git worktree add` checkouts (the whole point of
  project-scoping derived state).
- `test_derived_state_paths_have_no_env_override` — unrelated env
  overrides (SURREAL_URL, CODE_LOCATOR_SQLITE_DB) do not leak into
  these paths; they always resolve to the project dir.

ruff: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase 2B-i) (#368)

Phase 2B (call-site delegation), part i: the three load-bearing
state-path call sites stop computing paths inline and delegate to the
Ledger Locator. Phase 2B-ii (events/materializer, transcript_queue) and
Phase 2B-iii (setup_wizard env writes) follow in subsequent commits.

Delegations:
- `ledger/adapter.py::_default_db_url` now returns
  `ledger_locator.resolve_ledger_url()`. The hard-coded
  `~/.bicameral/ledger.db` literal is gone; the SURREAL_URL env override
  flows through the locator's single resolution path. Implements
  decision:ko8efq3z1zwhbof7kecq + decision:c2eqcwimhe4lpaexrddw at the
  adapter boundary.
- `code_locator_runtime.ensure_runtime_env` calls
  `resolve_code_graph_path()` instead of computing
  `<repo>/.bicameral/code-graph.db` from the REPO_PATH env. The
  vestigial `_default_cache_root` helper is deleted. Outside a git repo
  the locator's `ProjectIdResolutionError` is caught and the env is
  left unset — the None-safe `code_locator.config.resolve_paths()` then
  handles direct-construction fallback (or raises, see below).
- `code_locator_runtime.rebuild_index` line 277: `bm25_path` now comes
  from `resolve_bm25_index_path()`. Removes the implicit "bm25 lives
  next to sqlite_db" coupling — both paths come from the locator's
  project dir independently, matching the R3 plan's "one bag of state"
  intent.
- `code_locator/config.py`: `sqlite_db` default is `None` (was the
  literal `~/.bicameral/code-graph.db`). `resolve_paths()` is None-safe
  and defers to `resolve_code_graph_path()` when set to None. Outside
  a git repo with no `CODE_LOCATOR_SQLITE_DB` override, the locator's
  `ProjectIdResolutionError` propagates verbatim ("bicameral currently
  supports git only"). Per decision:c2eqcwimhe4lpaexrddw, behavior is
  undefined in unsupported environments; naming the problem is better
  than writing to a hardcoded fallback that drifts from the canonical
  layout (and isn't Windows-friendly to begin with).

Tests (11 new + 17 retained = 28 passing):
- `test_ledger_adapter_uses_locator.py` — 3 tests cover the adapter
  default, SURREAL_URL override, and the regression guard against the
  legacy un-project-scoped path.
- `test_code_locator_runtime_uses_locator.py` — 3 tests cover the env
  pre-population, the setdefault preservation, and the silent
  outside-git fallback.
- `test_code_locator_config_none_safe.py` — 5 tests cover load_config
  end-to-end, direct-construction None-safety, env-var precedence, the
  outside-git error propagation, and the env-override escape hatch for
  test fixtures.

Section 4 razor (advisory from R4 audit): the resolve-repo + assert-
origin pattern was duplicated in 3 public resolvers; adding 4 more
(Phase 2A) would push it to 7. Extracted `_resolved_project_dir`
helper to centralize the pipeline. Net effect is line-neutral on the
existing resolvers and removes ~12 lines of duplication.

Outside scope: the broader pytest suite has 41 pre-existing failures
unrelated to this commit (verified: `test_v0417_jargon_hygiene` fails
on the clean dev tree too). They will be addressed separately.

ruff: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e 2B-ii) (#368)

Phase 2B (call-site delegation), part ii: events/materializer.py and
events/transcript_queue.py stop computing their state paths inline and
delegate to the Ledger Locator. The watermark and pending/processed
transcript queues now live under the locator's project dir — shared
across worktrees of one repo, ending the v0.15.x failure modes where
peer JSONL events re-replayed per-worktree and SessionEnd-hook
transcripts written from worktree A were invisible to worktree B's
drain loop.

Delegations:
- `events/materializer.EventMaterializer`: `local_dir` parameter dropped
  from the positional contract; replaced by an optional `repo_path`
  (locator-scoping) and a keyword-only `watermark_override` (tests-only
  escape hatch). Production callers pass `repo_path` or nothing;
  `watermark_override` exists so test fixtures don't have to git-init
  tmp_path-derived dirs. Internal mkdir on the watermark parent is
  preserved as belt-and-braces.
- `events/transcript_queue._pending_root` /  `_processed_root`: signature
  unchanged (still take `repo_path: str`); body delegates to
  `resolve_pending_transcripts_dir` / `resolve_processed_transcripts_dir`.
  Every caller through this module transparently moves to the
  project-scoped layout — no caller changes needed for downstream code.
- `adapters/ledger.py:117` (team-mode wiring): pass `repo_path` to
  `EventMaterializer`; drop the local watermark-dir computation.
- `handlers/reset.py::_replay_events_into_ledger`: use
  `resolve_watermark_path()` for the "{}" reset write; drop the
  local-dir mkdir / inline watermark path.

Tests (51 passing — 28 prior + 13 transcript-queue regression):
- `tests/test_session_end_queue_writer.py`: `_make_repo` now git-inits
  the tmp_path fixture; assertions on pending/processed paths use
  `_pending_root` / `_processed_root` instead of literal `<repo>/.bicameral/...`
  paths. The writer subprocess (cwd=repo) and CLI archiver (cwd=repo)
  both transparently use the locator now.
- `tests/test_team_event_replay.py`,
  `tests/test_team_adapter_with_backend.py`,
  `tests/test_team_round_trip_local_folder.py`,
  `tests/_replay_helpers.py`: pass `watermark_override=local_dir/"watermark"`
  to `EventMaterializer` so fixtures keep their per-test watermark
  location without git-init'ing every tmp_path.

ruff: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, #368)

Implements the operator-facing pieces of the R4 ledger-locator amendment:

- `_build_config` no longer writes SURREAL_URL / CODE_LOCATOR_SQLITE_DB
  to .mcp.json — the locator picks both up at runtime
  (decision:5nr66wvmapjpt58rrji8).
- `context._CONFIG_KEY_ROUTING` is the single source of truth partitioning
  team-identity keys (mode, team.backend/folder/remote_root,
  ingest_max_bytes) from per-operator keys (telemetry, channel, guided,
  signer_email_fallback, render_source_attribution, team.role,
  ingest_rate_limit_*, query_timeout_*). All 10 context.py readers route
  per-key via `_config_path_for_key`; falls back to `<repo>/.bicameral/
  config.yaml` when the locator can't resolve a project id (non-git
  tmpdir → preserves v0.15.x behavior for legacy tests).
- `_write_collaboration_config` writes BOTH files atomically (operator
  first → config second, both via temp+rename). On rename failure of the
  second file, the just-renamed operator file is rolled back and both
  temps unlinked — neither destination ends up half-written. Accepts a
  test-only `operator_path` override; falls back to single-file legacy
  layout when the locator can't resolve.
- `run_setup` detects committed team/solo config via `git show
  HEAD:.bicameral/config.yaml` and auto-joins; falls through to the
  prompt flow on no-commit / non-team / parse failure
  (decision:ew9rgegdlblexsraesss). Divergence guard: when HEAD lacks
  the file but the default branch (via `_resolve_authoritative_ref`)
  has it, prompt the operator to merge-first before persisting a fresh
  setup (decision:ogdfx014sqgc6fi6ky1a).
- `run_config_wizard` reads from both files via the locator, tags each
  prompt with `[team]` or `[your machine]`, writes back via the same
  atomic two-file split as `_write_collaboration_config`.

Tests (14 new, 0 broken):
- test_setup_wizard_omits_state_env_vars.py — 3 tests
- test_config_split.py — 5 tests (incl. rollback-on-failure +
  routing-table-covers-every-key + reads-route-per-key)
- test_setup_wizard_git_native.py — 6 tests (team/solo auto-join,
  divergence guard, _read_committed_config unit coverage)
- test_run_config_wizard.py — 2 tests (reads from both, writes to
  routed)

125 existing tests across setup_wizard + locator + context pass.
ruff clean across the diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes out #368: the state move actually moves, and orphans get cleanable.

`cli/migrate_state.py` (Phase 3)
- Moves project-scoped state from `<repo>/.bicameral/` into the locator-
  resolved project dir at `~/.bicameral/projects/<id>/`:
    ledger.db, code-graph.db (+ shm/wal), bm25_index.pkl, watermark,
    pending-transcripts/, processed-transcripts/.
- Also picks up the v0.15.x user-global `~/.bicameral/ledger.db` on the
  first project that runs migrate-state, then leaves subsequent projects
  alone.
- R4: partitions a pre-split `<repo>/.bicameral/config.yaml` per the
  `context._CONFIG_KEY_ROUTING` table — team-identity keys stay in the
  committed file, per-operator keys move to operator.yaml under the
  project dir. Merges with any pre-existing operator.yaml (existing
  values win). Unknown keys stay in config.yaml with a warning.
- Idempotent (`Nothing to migrate.` exit 0 on second run), archives on
  byte-different destination collisions to
  `~/.bicameral/archive/<project-id>/<name>.<iso8601>.bak` (default; CLI
  override via `--archive-dir`), de-dupes on byte-identical collisions,
  cleans empty source dirs after success.
- `--dry-run` enumerates the plan without writing. `--auto` skips the
  pre-execute confirm prompt.
- R4 deferred-ephemeral notice printed in every success summary
  (decision:e3xz4c4ji4x7lm3lvq4k).
- Wired into server.py as `migrate-state` + `migrate-ledger` (alias per
  the issue verbiage).

`cli/gc.py` (Phase 4)
- Scans `~/.bicameral/projects/<id>/origin.txt` for each project dir,
  classifying as live / orphan / unreadable.
- Default: list. `--delete` prompts per orphan / unreadable dir;
  `--yes` skips the prompt. Empty `origin.txt` and missing `origin.txt`
  both classify as `unreadable` so the operator can choose to reclaim.
- Wired into server.py as the `gc` subparser.

`skills/bicameral-update/SKILL.md`
- New Step 3.5: after `bicameral.update(action="apply", ...)`, run
  `bicameral-mcp migrate-state --auto` before reporting "update
  complete." Surface stderr verbatim and abort the flow on non-zero
  exit, with a `--dry-run` offer as the fallback.

Tests (17 new, 0 broken):
- test_migrate_state.py — 12 tests: full-layout move, idempotent re-run,
  collision archives, dry-run, missing-source no-op, partial state,
  auto-flag skips prompts, default archive dir under home,
  bm25+watermark+transcript-queue explicit coverage, legacy user-global
  ledger first-project + already-claimed, R4 config.yaml partition
- test_gc.py — 5 tests: list-only spares live, delete with per-item
  prompts, --yes skips prompts, empty origin.txt → unreadable, empty
  state root

145 tests across Phase 1–4 pass. ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan added the flow:feature Standard feature/fix PR targeting BicameralAI/dev (the default flow) label May 19, 2026
@jinhongkuan jinhongkuan requested a deployment to recording-approval May 19, 2026 02:57 — with GitHub Actions Waiting
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3acae8e9-8350-4ffe-b45a-ddd850a5a933

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/368-ledger-locator-r4-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

CI `ruff + mypy` failed `ruff format --check` on 10 files in the diff.
Running `ruff format` and re-staging — pure whitespace / line-wrap
reflow, no semantics changed; 25/25 tests across gc + migrate_state +
run_config_wizard + setup_wizard_git_native still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan requested a deployment to recording-approval May 19, 2026 03:16 — with GitHub Actions Waiting
@jinhongkuan jinhongkuan merged commit 2569f30 into dev May 19, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flow:feature Standard feature/fix PR targeting BicameralAI/dev (the default flow)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant