Skip to content

feat: event-sourced collaboration (Phase 1)#2

Merged
jinhongkuan merged 5 commits into
mainfrom
feat/event-sourced-collaboration
Apr 10, 2026
Merged

feat: event-sourced collaboration (Phase 1)#2
jinhongkuan merged 5 commits into
mainfrom
feat/event-sourced-collaboration

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds cross-team decision sharing via append-only JSON event files committed to git
  • Dual-write adapter: in team mode, every ingest/link_commit writes an event file to .bicameral/events/{git-email}/ before updating the local DB
  • Watermark-based materializer: on startup, replays peer events into the local ledger — incremental, only processes new events
  • Zero merge conflicts: per-user directories + append-only files + UUID naming
  • Zero handler changes: adapter factory swap is invisible to server.py and all handlers

New files

File Purpose
events/__init__.py Package marker
events/models.py EventEnvelope Pydantic model
events/writer.py EventFileWriter — atomic JSON writes to per-user dir
events/materializer.py EventMaterializer — watermark + glob + replay
events/team_adapter.py TeamWriteAdapter — composition wrapper
tests/test_team_events.py 20 tests covering writer, materializer, adapter, config

Modified files

File Changes
adapters/ledger.py get_ledger() reads .bicameral/config.yaml, returns TeamWriteAdapter when mode: team
setup_wizard.py Solo/team mode selection, config.yaml creation, .gitignore handling

Test plan

  • 20 new unit tests passing (writer, materializer, team adapter, config detection)
  • Existing test suite unchanged (106 passing, pre-existing failures unaffected)
  • Manual: bicameral-mcp setup → select team mode → verify config.yaml + .gitignore
  • Manual: ingest in team mode → verify event file in .bicameral/events/
  • Manual: copy events from another user dir → restart → verify materialized

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added team collaboration mode for multi-user workflows alongside solo mode
    • Introduced configuration system to set and manage collaboration mode during setup
    • Event-sourced collaboration enabling team members to share and synchronize architectural decisions
  • Documentation

    • Added collaboration modes guide with comparison table and configuration instructions
    • Enhanced local development setup documentation

…terializer

Adds cross-team decision sharing via append-only JSON event files
committed to git. In team mode, every ingest/link_commit writes an
event file to .bicameral/events/{git-email}/ before updating the
local DB. On startup, a watermark-based materializer replays peer
events into the local ledger. Zero merge conflicts by design.

New: events/ package (models, writer, materializer, team_adapter)
Modified: adapters/ledger.py (config-driven adapter factory)
Modified: setup_wizard.py (solo/team mode selection)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 10, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@jinhongkuan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f2771b4-de99-4c79-a66a-e56156241a88

📥 Commits

Reviewing files that changed from the base of the PR and between b678516 and 5060b7b.

📒 Files selected for processing (4)
  • events/materializer.py
  • events/team_adapter.py
  • setup_wizard.py
  • tests/test_team_events.py
📝 Walkthrough

Walkthrough

Adds a configuration-driven "team" collaboration mode: new event-sourcing components (EventEnvelope, EventFileWriter, EventMaterializer, TeamWriteAdapter), ledger factory wiring to optionally wrap the SurrealDB adapter in team mode, and setup changes to persist and respect collaboration mode.

Changes

Cohort / File(s) Summary
Event Sourcing Package
events/__init__.py, events/models.py, events/writer.py, events/materializer.py, events/team_adapter.py
New event-sourcing modules: EventEnvelope model, atomic per-author JSON EventFileWriter, watermark-driven EventMaterializer for incremental replay, and TeamWriteAdapter implementing dual-write (emit event file then delegate) with read-pass-through methods.
Ledger Adapter Factory
adapters/ledger.py
Added _read_collaboration_mode(repo_path) and updated get_ledger() to construct a base SurrealDBLedgerAdapter and, when mode == team, wrap it with TeamWriteAdapter wired with EventFileWriter and EventMaterializer using .bicameral/events/ and .bicameral/local/.
Setup Wizard & Config
setup_wizard.py
Introduced _select_collaboration_mode() and _write_collaboration_config(), made _ensure_gitignore(...) mode-aware (solo vs team) and rewrote gitignore block handling; run_setup() now persists mode and creates .bicameral/events/ in team mode.
Tests
tests/test_team_events.py
Added comprehensive tests for writer, materializer, TeamWriteAdapter, and config detection (in-memory SurrealDB): verifies file formats, watermarking/replay ordering, event dispatch, dual-write behavior, and git-email author detection.
Docs
README.md
Added "Collaboration Modes" documentation and adjusted local development/install instructions and post-install command grouping.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Factory as get_ledger()
    participant Inner as SurrealDBLedgerAdapter
    participant Team as TeamWriteAdapter
    participant Writer as EventFileWriter
    participant Materializer as EventMaterializer
    participant FS as Event Filesystem

    Factory->>Inner: instantiate SurrealDBLedgerAdapter
    Factory->>Writer: create EventFileWriter(.bicameral/events/, author)
    Factory->>Materializer: create EventMaterializer(.bicameral/events/, .bicameral/local/)
    Factory->>Team: wrap Inner with TeamWriteAdapter(Inner, Writer, Materializer)

    Team->>Inner: connect()
    Team->>Materializer: replay_new_events(inner)
    Materializer->>FS: read new event files (*.json)
    Materializer-->>Inner: call ingest_payload / ingest_commit for events

    User->>Team: ingest_payload(payload)
    Team->>Writer: write(event_type=ingest.completed, payload)
    Writer->>FS: atomic write timestamp-uuid.json
    Team->>Inner: inner.ingest_payload(payload)
    Inner-->>Team: result

    User->>Team: get_all_decisions(...)
    Team->>Inner: get_all_decisions(...)
    Inner-->>Team: decisions
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰
I hop and drop an event to share,
timestamps stitched with tidy care,
teammates read what I have penned,
watermarks tell where replays end,
solo or team—our logs declare.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: event-sourced collaboration (Phase 1)' directly summarizes the main change—adding event-sourced collaboration infrastructure for cross-team decision sharing via append-only JSON event files, which is the central theme of all file modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/event-sourced-collaboration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@events/materializer.py`:
- Around line 57-69: The single scalar watermark (watermark from
_read_watermark) is too coarse because EventFileWriter timestamps only to the
second; change the filtering logic that builds new_events (and the analogous
logic at lines 103-105) to perform per-event bookkeeping for the current
timestamp bucket: persist (or atomically store) a set of processed event
identifiers or filenames for the latest watermark-second and use
_extract_timestamp(f.name) to group by second, allowing events with timestamp ==
watermark-second to be accepted if their filename/ID is not in the persisted
processed set; update the watermark persistence API (or add a companion
persisted processed_ids set) so after processing you append processed file IDs
for that second and advance the watermark when the set is complete, and adjust
the filter to use ">" for seconds newer than watermark-second and "== and id not
in processed_ids" for the current second.

In `@events/team_adapter.py`:
- Around line 41-56: The ingest_payload and ingest_commit methods currently
write events via self._writer.write then delegate to self._inner but never
record that the local event has been materialized; after the await of
self._inner.ingest_payload(...) and self._inner.ingest_commit(...), update the
local replay cursor/state to mark that the written event has been applied (e.g.,
call your replay cursor/state API such as
self._replay_cursor.mark_applied(event_id) or update self._local_state to
include the event), ensuring this occurs only after the inner call succeeds so
replay won’t reapply the same event after restart.
- Around line 32-37: The current materialization only runs in connect(), so add
an internal async method async def _ensure_ready(self) that is idempotent and
called from every public method (e.g., read/write methods, any public API on
this adapter) to guarantee the inner adapter and materializer are initialized;
implement _ensure_ready to await a per-instance asyncio.Lock or an initialized
flag to avoid races, call await self._inner.connect() then await
self._materializer.replay_new_events(self._inner) and log as before, and set a
_ready flag so subsequent calls are no-ops; remove reliance on callers invoking
connect() and instead call await self._ensure_ready() at the start of each
public method.

In `@events/writer.py`:
- Around line 41-45: The constructor stores author_email directly into a
filesystem path which lets values with separators, '..', or absolute paths
escape the events dir; update Writer.__init__ to validate and sanitize
author_email before using it: reject or normalize inputs containing path
separators, leading slashes, or components like '..' (e.g., ensure author_email
== Path(author_email).name or apply a safe-encoding), raise ValueError for
invalid inputs, then set self._author to the sanitized value and build
self._user_dir from events_dir / sanitized_author and mkdir as before; ensure
any external callers still get a clear exception on invalid author strings.

In `@setup_wizard.py`:
- Around line 400-407: The team-mode config currently leaves SURREAL_URL and
CODE_LOCATOR_SQLITE_DB pointing to .bicameral/ledger.db and
.bicameral/code-graph.db which will be tracked; update the config generation
called after _select_collaboration_mode/_write_collaboration_config to accept
the collab_mode flag and, when collab_mode == "team", set SURREAL_URL and
CODE_LOCATOR_SQLITE_DB to files under .bicameral/local/ (e.g.,
.bicameral/local/ledger.db and .bicameral/local/code-graph.db) or add explicit
ignores for those exact paths; modify the function that writes MCP/env config
(where SURREAL_URL and CODE_LOCATOR_SQLITE_DB are emitted) to branch on
collab_mode and ensure the moved paths match what _ensure_gitignore produces.

In `@tests/test_team_events.py`:
- Around line 16-17: The test module currently uses
os.environ.setdefault("SURREAL_URL", "memory://") which can leave an existing
SURREAL_URL and cause tests to hit a persistent DB; replace this with an
unconditional override by assigning os.environ["SURREAL_URL"] = "memory://" at
module import time so the SURREAL_URL is always set to the in-memory SurrealDB
for tests (refer to the SURREAL_URL environment variable and the
tests/test_team_events.py top-level env setup).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06293b92-42c6-4788-8add-40ed6e13ec23

📥 Commits

Reviewing files that changed from the base of the PR and between 32f9097 and 800623d.

📒 Files selected for processing (8)
  • adapters/ledger.py
  • events/__init__.py
  • events/materializer.py
  • events/models.py
  • events/team_adapter.py
  • events/writer.py
  • setup_wizard.py
  • tests/test_team_events.py

Comment thread events/materializer.py Outdated
Comment on lines +57 to +69
watermark = self._read_watermark()

# Glob all event files across all user directories
event_files = sorted(
self._events_dir.glob("*/*.json"),
key=lambda f: f.name, # lexicographic = chronological
)

# Filter to new events
new_events = [
f for f in event_files
if self._extract_timestamp(f.name) > watermark
]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

A single timestamp watermark is too coarse for these filenames.

EventFileWriter only guarantees second-level timestamps, so once the watermark is 20260410T180000Z, any event file merged later from that same second is skipped forever by the > filter here. You need per-event bookkeeping for the current timestamp bucket (or persistent processed event IDs), not a single scalar timestamp.

Also applies to: 103-105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@events/materializer.py` around lines 57 - 69, The single scalar watermark
(watermark from _read_watermark) is too coarse because EventFileWriter
timestamps only to the second; change the filtering logic that builds new_events
(and the analogous logic at lines 103-105) to perform per-event bookkeeping for
the current timestamp bucket: persist (or atomically store) a set of processed
event identifiers or filenames for the latest watermark-second and use
_extract_timestamp(f.name) to group by second, allowing events with timestamp ==
watermark-second to be accepted if their filename/ID is not in the persisted
processed set; update the watermark persistence API (or add a companion
persisted processed_ids set) so after processing you append processed file IDs
for that second and advance the watermark when the set is complete, and adjust
the filter to use ">" for seconds newer than watermark-second and "== and id not
in processed_ids" for the current second.

Comment thread events/team_adapter.py
Comment thread events/team_adapter.py
Comment on lines +41 to +56
async def ingest_payload(self, payload: dict) -> dict:
"""Write ingest event, then delegate to inner adapter."""
self._writer.write("ingest.completed", payload)
return await self._inner.ingest_payload(payload)

async def ingest_commit(
self, commit_hash: str, repo_path: str, drift_analyzer=None,
) -> dict:
"""Write link_commit event, then delegate to inner adapter."""
self._writer.write(
"link_commit.completed",
{"commit_hash": commit_hash, "repo_path": repo_path},
)
return await self._inner.ingest_commit(
commit_hash, repo_path, drift_analyzer=drift_analyzer,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mark successful local writes as already applied.

These methods append the event file and then apply the same mutation to the local DB, but nothing records that the local event has already been materialized. After a restart, replay can apply that same event to the same ledger again. Once the inner write succeeds, record the written event in the local replay cursor/state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@events/team_adapter.py` around lines 41 - 56, The ingest_payload and
ingest_commit methods currently write events via self._writer.write then
delegate to self._inner but never record that the local event has been
materialized; after the await of self._inner.ingest_payload(...) and
self._inner.ingest_commit(...), update the local replay cursor/state to mark
that the written event has been applied (e.g., call your replay cursor/state API
such as self._replay_cursor.mark_applied(event_id) or update self._local_state
to include the event), ensuring this occurs only after the inner call succeeds
so replay won’t reapply the same event after restart.

Comment thread events/writer.py
Comment on lines +41 to +45
def __init__(self, events_dir: Path, author_email: str) -> None:
self._events_dir = events_dir
self._author = author_email
self._user_dir = events_dir / author_email
self._user_dir.mkdir(parents=True, exist_ok=True)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate author_email before using it as a path component.

author_email is joined directly into the filesystem path here. A value containing separators, .., or an absolute path can escape .bicameral/events/ and also breaks the materializer's */*.json scan.

Proposed fix
 class EventFileWriter:
     """Writes append-only JSON event files to a per-user directory."""
 
     def __init__(self, events_dir: Path, author_email: str) -> None:
+        if (
+            not author_email
+            or author_email in {".", ".."}
+            or any(sep in author_email for sep in ("/", "\\"))
+        ):
+            raise ValueError("author_email must be a single directory name")
         self._events_dir = events_dir
         self._author = author_email
         self._user_dir = events_dir / author_email
         self._user_dir.mkdir(parents=True, exist_ok=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@events/writer.py` around lines 41 - 45, The constructor stores author_email
directly into a filesystem path which lets values with separators, '..', or
absolute paths escape the events dir; update Writer.__init__ to validate and
sanitize author_email before using it: reject or normalize inputs containing
path separators, leading slashes, or components like '..' (e.g., ensure
author_email == Path(author_email).name or apply a safe-encoding), raise
ValueError for invalid inputs, then set self._author to the sanitized value and
build self._user_dir from events_dir / sanitized_author and mkdir as before;
ensure any external callers still get a clear exception on invalid author
strings.

Comment thread setup_wizard.py
Comment thread tests/test_team_events.py
Comment on lines +16 to +17
# Ensure in-memory SurrealDB for all tests in this module
os.environ.setdefault("SURREAL_URL", "memory://")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Force the in-memory DB here instead of using setdefault().

setdefault() leaves any preexisting SURREAL_URL untouched, so this module can unexpectedly run against a persistent ledger and leak state across test runs. Use a test-scoped env override that always sets memory://.

As per coding guidelines, "Set SURREAL_URL environment variable to configure SurrealDB connection; use memory:// for tests (no persistence)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_team_events.py` around lines 16 - 17, The test module currently
uses os.environ.setdefault("SURREAL_URL", "memory://") which can leave an
existing SURREAL_URL and cause tests to hit a persistent DB; replace this with
an unconditional override by assigning os.environ["SURREAL_URL"] = "memory://"
at module import time so the SURREAL_URL is always set to the in-memory
SurrealDB for tests (refer to the SURREAL_URL environment variable and the
tests/test_team_events.py top-level env setup).

jinhongkuan and others added 3 commits April 10, 2026 14:23
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…olation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 488-495: The fenced code block that shows the directory tree
starting with ".bicameral/" is missing a language tag, triggering markdownlint
MD040; update the opening fence from ``` to ```text (or another appropriate
language identifier) for the block that contains ".bicameral/ ├── events/ ..."
so the code fence declares a language and the linter passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ec3ac789-5221-45c5-88a7-5e1ef72787bf

📥 Commits

Reviewing files that changed from the base of the PR and between b615cc0 and b678516.

📒 Files selected for processing (1)
  • README.md

Comment thread README.md
Comment on lines +488 to +495
```
.bicameral/
├── events/ ← committed to git (shared decisions)
│ ├── pm@co.com/ ← PM's ingested PRDs and transcripts
│ └── dev@co.com/ ← developer's commit syncs
├── config.yaml ← committed (mode: solo | team)
└── local/ ← gitignored (materialized state)
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced block (markdownlint MD040).

At Line 488, the code fence should declare a language to satisfy lint rules.

🛠️ Proposed fix
-```
+```text
 .bicameral/
 ├── events/              ← committed to git (shared decisions)
 │   ├── pm@co.com/       ← PM's ingested PRDs and transcripts
 │   └── dev@co.com/      ← developer's commit syncs
 ├── config.yaml          ← committed (mode: solo | team)
 └── local/               ← gitignored (materialized state)
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 488-488: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 488 - 495, The fenced code block that shows the
directory tree starting with ".bicameral/" is missing a language tag, triggering
markdownlint MD040; update the opening fence from totext (or another
appropriate language identifier) for the block that contains ".bicameral/ ├──
events/ ..." so the code fence declares a language and the linter passes.


</details>

<!-- fingerprinting:phantom:triton:hawk:07d99fa3-2b89-4bda-9949-9f584bdc7bb1 -->

<!-- This is an auto-generated comment by CodeRabbit -->

…arseness

1. DB paths (setup_wizard): team mode now places ledger.db and
   code-graph.db under .bicameral/local/ so they stay gitignored
2. Lazy connect (team_adapter): _ensure_ready() guard on all public
   methods ensures materialization runs even without explicit connect()
3. Watermark (materializer): use >= instead of > to handle multiple
   events in the same second (safe due to upsert idempotency)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan merged commit 78ac4fc into main Apr 10, 2026
@jinhongkuan jinhongkuan deleted the feat/event-sourced-collaboration branch April 10, 2026 19:04
silongtan added a commit that referenced this pull request Apr 11, 2026
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline.
Before running the full BM25 + fuzzy grounding pipeline, check if the
ledger already has a similar intent with high-confidence maps_to edges
and reuse its code_regions.

Key design choices:
- Uses existing intent table BM25 index, not the vestigial vocab_cache table
- Repo-isolated via code_region[WHERE repo = $repo] in graph traversal
- Ranked by maps_to confidence (not search::score which returns 0.0 in v2)
- Cached regions validated against live symbol index before acceptance
- Qualified name fallback (PaymentService.process → process)
- File-path disambiguation when multiple symbols share a name

Closes gap #2 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 11, 2026
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline.
Before running the full BM25 + fuzzy grounding pipeline, check if the
ledger already has a similar intent with high-confidence maps_to edges
and reuse its code_regions.

Key design choices:
- Uses existing intent table BM25 index, not the vestigial vocab_cache table
- Repo-isolated via code_region[WHERE repo = $repo] in graph traversal
- Ranked by maps_to confidence (not search::score which returns 0.0 in v2)
- Cached regions validated against live symbol index before acceptance
- Qualified name fallback (PaymentService.process → process)
- File-path disambiguation when multiple symbols share a name

Closes gap #2 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 14, 2026
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline.
Before running the full BM25 + fuzzy grounding pipeline, check if the
ledger already has a similar intent with high-confidence maps_to edges
and reuse its code_regions.

Key design choices:
- Uses existing intent table BM25 index, not the vestigial vocab_cache table
- Repo-isolated via code_region[WHERE repo = $repo] in graph traversal
- Ranked by maps_to confidence (not search::score which returns 0.0 in v2)
- Cached regions validated against live symbol index before acceptance
- Qualified name fallback (PaymentService.process → process)
- File-path disambiguation when multiple symbols share a name

Closes gap #2 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 18, 2026
…uzzy

Two fixes from Codex review feedback:

1. P1 #1: Filter weak BM25 scores (< 0.1) when no fuzzy matches exist.
   Prevents false groundings for under-specified queries that pass
   the FC-1 guard but have only noise-level BM25 hits.

2. P1 #2: Stage 2 now runs as enrichment after fuzzy direct lookup,
   not just as fallback. When fuzzy finds symbols from a subset of
   files, file-level retrieval fills remaining budget with additional
   files — preserving multi-file feature discovery.

49/49 tests pass. Eval metrics stable: MRR@5=0.605, Recall=20.0%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 21, 2026
…, index hardening

1. Split Phase 1 into 1a (DONE, groundwork) and 1b (TODO, actual
   enrichment wiring). Branch currently ships zero recall improvement.

2. Added compare-and-set content_hash guard to resolve_compliance.
   Caller must send the hash of the code it read; server rejects
   verdict if hash no longer matches current content.

3. Added try/except + atomic write (temp+rename) for symbol index
   build. Failure falls back to file-level BM25 gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
…d payload

V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
  - ledger/adapter.py:412-420 emits a symbol_disappeared grounding
    check on the authoritative ref when resolve_symbol_lines returns
    None for a tracked region;
  - handlers/link_commit.py:21 verification_instruction tells the
    caller to use Grep/Read + validate_symbols / extract_symbols +
    bicameral.bind;
  - tests/test_link_commit_grounding.py covers the rename-→-grounding
    flow end-to-end.

V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.

Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.

Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
…trix

New tests/test_desync_scenarios.py — one test per scenario from the
Notion "Auto-Grounding Problem" catalog (Notion ID
3332a51619c4813caccec86c36d9bf98), routed through the real handler
layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss
post-ingest hooks). Self-contained fixture builds a tmp git repo on
``main`` plus a memory ledger per test, so scenarios are independent
of each other and of the bicameral-mcp checkout.

Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13).

Pass:
  1. New decision, matching code exists           → ungrounded → caller binds
  2. Code changed after grounded                   → pending + pending_compliance_check
  3. Code deleted after grounded                   → symbol_disappeared
  4. Symbol renamed in file                        → symbol_disappeared with original_lines (V1 D1)
  5. Symbol moved to different file                → symbol_disappeared
  6. Code added later, decision becomes resolvable → caller binds explicitly
  7. Cold start: no matching code in repo          → stays ungrounded
  9. Intent description supersession               → re-ingest succeeds
 10. Multiple intents share a symbol               → both surface in drift response
 11. No server-side BM25 grounding (post-v0.6.0)   → stays ungrounded — caller-LLM-driven
 12. Line-shift edit does not trigger drift        → resolve_symbol_lines re-resolves
 13. [Open Question] prefix                        → ingested as gap-class decision

XFAIL:
  8. Drifted intent → atomic re-ground             → requires V2 bicameral_rebind
                                                     with old-binding CAS + fresh
                                                     L3 verdict on new target
                                                     (design doc 8 D2; Codex pass-10 #2)

Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per
post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached
``compliant`` verdict yields ``pending``. ``drifted`` requires a
caller-LLM verdict via the verdict cache (V2 territory).

Incidental bug surfaced by F1 and fixed in ledger/adapter.py:
``pending_grounding_checks`` for ungrounded decisions read
``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``,
but that query aliases the field to ``decision_id``. Result: every
ungrounded grounding-check entry shipped with ``decision_id=""``,
leaving callers no handle to bind against. The existing
``test_pending_grounding_checks_for_ungrounded_decisions`` regression
didn't catch it because it only asserted ``len > 0`` on the list, not
non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id``
for forward compatibility. Real correctness bug, not a test artifact.

Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios,
test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration,
test_sync_middleware, test_bind, test_link_commit_grounding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
…don't get bind CTA

Codex pass-12 finding #2: D1 added original_lines to the
symbol_disappeared payload (richer relocation context) but left the
v0.6.4 verification_instruction text directing callers to use
bicameral.bind for ALL pending_grounding_checks. For
reason='ungrounded' that CTA is correct (no prior binding to retire,
no duplicate-binding risk). For reason='symbol_disappeared' it is
unsafe: bind on the new location leaves the old edge live under the
N:N binds_to relation, producing duplicate-binding state. Atomic
rebind that retires the stale edge in the same write ships in V2
(design doc 8 D2; Codex pass-10 finding #2).

Net effect of D1 was that we added more relocation context routed
through an unsafe CTA. This commit removes the unsafe CTA for
relocation cases without reducing the safe CTA for ungrounded cases.

Changes in handlers/link_commit.py:
  - _VERIFICATION_INSTRUCTION (single string) split into
    _VERIFICATION_INSTRUCTION_BASE +
    _GROUNDING_INSTRUCTION_UNGROUNDED +
    _GROUNDING_INSTRUCTION_RELOCATION.
  - New _build_verification_instruction(pending_compliance,
    pending_grounding) composer assembles the response text per call
    based on which reason values actually fired. Symbol-disappeared
    cases get an explicit "INFORMATIONAL ONLY — do NOT call
    bicameral.bind" warning citing the duplicate-binding hazard.
  - Response wiring at the bottom of handle_link_commit swapped from
    the constant lookup to the composer call.

Tests (tests/test_link_commit_grounding.py):
  - test_pending_grounding_checks_for_ungrounded_decisions now
    asserts "bicameral.bind" IN the verification_instruction AND
    "INFORMATIONAL ONLY" NOT in it.
  - test_pending_grounding_checks_symbol_not_found now asserts
    "INFORMATIONAL ONLY" IN the instruction AND an explicit
    bind-prohibition phrase.

Net destructive surface change for V1: arguably negative — V1 actively
*removes* a v0.6.4 unsafe CTA for relocation, while introducing zero
new mutating capabilities.

Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic
rebind — by design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…d payload

V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
  - ledger/adapter.py:412-420 emits a symbol_disappeared grounding
    check on the authoritative ref when resolve_symbol_lines returns
    None for a tracked region;
  - handlers/link_commit.py:21 verification_instruction tells the
    caller to use Grep/Read + validate_symbols / extract_symbols +
    bicameral.bind;
  - tests/test_link_commit_grounding.py covers the rename-→-grounding
    flow end-to-end.

V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.

Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.

Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…trix

New tests/test_desync_scenarios.py — one test per scenario from the
Notion "Auto-Grounding Problem" catalog (Notion ID
3332a51619c4813caccec86c36d9bf98), routed through the real handler
layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss
post-ingest hooks). Self-contained fixture builds a tmp git repo on
``main`` plus a memory ledger per test, so scenarios are independent
of each other and of the bicameral-mcp checkout.

Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13).

Pass:
  1. New decision, matching code exists           → ungrounded → caller binds
  2. Code changed after grounded                   → pending + pending_compliance_check
  3. Code deleted after grounded                   → symbol_disappeared
  4. Symbol renamed in file                        → symbol_disappeared with original_lines (V1 D1)
  5. Symbol moved to different file                → symbol_disappeared
  6. Code added later, decision becomes resolvable → caller binds explicitly
  7. Cold start: no matching code in repo          → stays ungrounded
  9. Intent description supersession               → re-ingest succeeds
 10. Multiple intents share a symbol               → both surface in drift response
 11. No server-side BM25 grounding (post-v0.6.0)   → stays ungrounded — caller-LLM-driven
 12. Line-shift edit does not trigger drift        → resolve_symbol_lines re-resolves
 13. [Open Question] prefix                        → ingested as gap-class decision

XFAIL:
  8. Drifted intent → atomic re-ground             → requires V2 bicameral_rebind
                                                     with old-binding CAS + fresh
                                                     L3 verdict on new target
                                                     (design doc 8 D2; Codex pass-10 #2)

Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per
post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached
``compliant`` verdict yields ``pending``. ``drifted`` requires a
caller-LLM verdict via the verdict cache (V2 territory).

Incidental bug surfaced by F1 and fixed in ledger/adapter.py:
``pending_grounding_checks`` for ungrounded decisions read
``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``,
but that query aliases the field to ``decision_id``. Result: every
ungrounded grounding-check entry shipped with ``decision_id=""``,
leaving callers no handle to bind against. The existing
``test_pending_grounding_checks_for_ungrounded_decisions`` regression
didn't catch it because it only asserted ``len > 0`` on the list, not
non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id``
for forward compatibility. Real correctness bug, not a test artifact.

Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios,
test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration,
test_sync_middleware, test_bind, test_link_commit_grounding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…don't get bind CTA

Codex pass-12 finding #2: D1 added original_lines to the
symbol_disappeared payload (richer relocation context) but left the
v0.6.4 verification_instruction text directing callers to use
bicameral.bind for ALL pending_grounding_checks. For
reason='ungrounded' that CTA is correct (no prior binding to retire,
no duplicate-binding risk). For reason='symbol_disappeared' it is
unsafe: bind on the new location leaves the old edge live under the
N:N binds_to relation, producing duplicate-binding state. Atomic
rebind that retires the stale edge in the same write ships in V2
(design doc 8 D2; Codex pass-10 finding #2).

Net effect of D1 was that we added more relocation context routed
through an unsafe CTA. This commit removes the unsafe CTA for
relocation cases without reducing the safe CTA for ungrounded cases.

Changes in handlers/link_commit.py:
  - _VERIFICATION_INSTRUCTION (single string) split into
    _VERIFICATION_INSTRUCTION_BASE +
    _GROUNDING_INSTRUCTION_UNGROUNDED +
    _GROUNDING_INSTRUCTION_RELOCATION.
  - New _build_verification_instruction(pending_compliance,
    pending_grounding) composer assembles the response text per call
    based on which reason values actually fired. Symbol-disappeared
    cases get an explicit "INFORMATIONAL ONLY — do NOT call
    bicameral.bind" warning citing the duplicate-binding hazard.
  - Response wiring at the bottom of handle_link_commit swapped from
    the constant lookup to the composer call.

Tests (tests/test_link_commit_grounding.py):
  - test_pending_grounding_checks_for_ungrounded_decisions now
    asserts "bicameral.bind" IN the verification_instruction AND
    "INFORMATIONAL ONLY" NOT in it.
  - test_pending_grounding_checks_symbol_not_found now asserts
    "INFORMATIONAL ONLY" IN the instruction AND an explicit
    bind-prohibition phrase.

Net destructive surface change for V1: arguably negative — V1 actively
*removes* a v0.6.4 unsafe CTA for relocation, while introducing zero
new mutating capabilities.

Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic
rebind — by design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…d payload

V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
  - ledger/adapter.py:412-420 emits a symbol_disappeared grounding
    check on the authoritative ref when resolve_symbol_lines returns
    None for a tracked region;
  - handlers/link_commit.py:21 verification_instruction tells the
    caller to use Grep/Read + validate_symbols / extract_symbols +
    bicameral.bind;
  - tests/test_link_commit_grounding.py covers the rename-→-grounding
    flow end-to-end.

V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.

Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.

Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…trix

New tests/test_desync_scenarios.py — one test per scenario from the
Notion "Auto-Grounding Problem" catalog (Notion ID
3332a51619c4813caccec86c36d9bf98), routed through the real handler
layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss
post-ingest hooks). Self-contained fixture builds a tmp git repo on
``main`` plus a memory ledger per test, so scenarios are independent
of each other and of the bicameral-mcp checkout.

Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13).

Pass:
  1. New decision, matching code exists           → ungrounded → caller binds
  2. Code changed after grounded                   → pending + pending_compliance_check
  3. Code deleted after grounded                   → symbol_disappeared
  4. Symbol renamed in file                        → symbol_disappeared with original_lines (V1 D1)
  5. Symbol moved to different file                → symbol_disappeared
  6. Code added later, decision becomes resolvable → caller binds explicitly
  7. Cold start: no matching code in repo          → stays ungrounded
  9. Intent description supersession               → re-ingest succeeds
 10. Multiple intents share a symbol               → both surface in drift response
 11. No server-side BM25 grounding (post-v0.6.0)   → stays ungrounded — caller-LLM-driven
 12. Line-shift edit does not trigger drift        → resolve_symbol_lines re-resolves
 13. [Open Question] prefix                        → ingested as gap-class decision

XFAIL:
  8. Drifted intent → atomic re-ground             → requires V2 bicameral_rebind
                                                     with old-binding CAS + fresh
                                                     L3 verdict on new target
                                                     (design doc 8 D2; Codex pass-10 #2)

Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per
post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached
``compliant`` verdict yields ``pending``. ``drifted`` requires a
caller-LLM verdict via the verdict cache (V2 territory).

Incidental bug surfaced by F1 and fixed in ledger/adapter.py:
``pending_grounding_checks`` for ungrounded decisions read
``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``,
but that query aliases the field to ``decision_id``. Result: every
ungrounded grounding-check entry shipped with ``decision_id=""``,
leaving callers no handle to bind against. The existing
``test_pending_grounding_checks_for_ungrounded_decisions`` regression
didn't catch it because it only asserted ``len > 0`` on the list, not
non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id``
for forward compatibility. Real correctness bug, not a test artifact.

Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios,
test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration,
test_sync_middleware, test_bind, test_link_commit_grounding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…don't get bind CTA

Codex pass-12 finding #2: D1 added original_lines to the
symbol_disappeared payload (richer relocation context) but left the
v0.6.4 verification_instruction text directing callers to use
bicameral.bind for ALL pending_grounding_checks. For
reason='ungrounded' that CTA is correct (no prior binding to retire,
no duplicate-binding risk). For reason='symbol_disappeared' it is
unsafe: bind on the new location leaves the old edge live under the
N:N binds_to relation, producing duplicate-binding state. Atomic
rebind that retires the stale edge in the same write ships in V2
(design doc 8 D2; Codex pass-10 finding #2).

Net effect of D1 was that we added more relocation context routed
through an unsafe CTA. This commit removes the unsafe CTA for
relocation cases without reducing the safe CTA for ungrounded cases.

Changes in handlers/link_commit.py:
  - _VERIFICATION_INSTRUCTION (single string) split into
    _VERIFICATION_INSTRUCTION_BASE +
    _GROUNDING_INSTRUCTION_UNGROUNDED +
    _GROUNDING_INSTRUCTION_RELOCATION.
  - New _build_verification_instruction(pending_compliance,
    pending_grounding) composer assembles the response text per call
    based on which reason values actually fired. Symbol-disappeared
    cases get an explicit "INFORMATIONAL ONLY — do NOT call
    bicameral.bind" warning citing the duplicate-binding hazard.
  - Response wiring at the bottom of handle_link_commit swapped from
    the constant lookup to the composer call.

Tests (tests/test_link_commit_grounding.py):
  - test_pending_grounding_checks_for_ungrounded_decisions now
    asserts "bicameral.bind" IN the verification_instruction AND
    "INFORMATIONAL ONLY" NOT in it.
  - test_pending_grounding_checks_symbol_not_found now asserts
    "INFORMATIONAL ONLY" IN the instruction AND an explicit
    bind-prohibition phrase.

Net destructive surface change for V1: arguably negative — V1 actively
*removes* a v0.6.4 unsafe CTA for relocation, while introducing zero
new mutating capabilities.

Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic
rebind — by design).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request Apr 28, 2026
Per Jin's "L1/L2: Claim/Identity" spec-governance proposal §4.2 +
Failure Mode #2: only decisions explicitly tagged ``decision_level =
"L2"`` enter the codegenome identity graph. L1 decisions (behavioral
commitments evaluated against evidence) are intentionally ungrounded
at the identity layer. L3 is never tracked. ``NULL`` (unclassified)
is treated as L3 by the tolerant policy — preserves backward-compat
for existing ingest payloads.

Without this guard, binding an L1 decision pollutes the identity
graph with subsystem-level matches that erode Phase-3 continuity
precision.

Changes:
  - Added ``ledger.queries.get_decision_level`` + adapter wrapper.
  - Bind handler reads decision_level before invoking the codegenome
    side-effect; skips for non-L2 levels (handler-level option A from
    the doc's open Q3).
  - 5 new tests in tests/test_codegenome_l1_exemption.py covering
    L2 (writes), L1 (skip), L3 (skip), NULL (skip), and the
    BindResponse-shape-unchanged invariant.
  - Updated tests/test_codegenome_bind_integration.py _seed_decision
    helper to default decision_level="L2" so existing tests exercise
    the codegenome write path.

Tests: 63/63 codegenome tests pass (was 49 + 5 L1 + 9 housekeeping).
No regressions on existing bind tests.

🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request Apr 28, 2026
* feat: CodeGenome Phase 1+2 (#59) — QOR-process build

Implements upstream issue #59: adapter boundary + bind-time identity records.
All flags default off; zero existing-behavior change unless
BICAMERAL_CODEGENOME_ENABLED=1 and BICAMERAL_CODEGENOME_WRITE_IDENTITY_RECORDS=1.

Phase 1 — adapter skeleton:
  - codegenome/{adapter,contracts,confidence,config,__init__}.py
  - 4 ABC methods (compute_identity is the only one implemented in #59)
  - 3 issue-mandated Pydantic models
  - noisy_or, weighted_average + DEFAULT_CONFIDENCE_WEIGHTS
  - CodeGenomeConfig.from_env() reads BICAMERAL_CODEGENOME_*

Phase 2 — identity record write path:
  - SCHEMA_VERSION 10 → 11 (additive _migrate_v10_to_v11)
  - Tables: code_subject, subject_identity, subject_version
  - Edges: has_identity, has_version, about (decision→subject)
  - 5 ledger queries (upsert_code_subject, upsert_subject_identity,
    relate_has_identity, link_decision_to_subject,
    find_subject_identities_for_decision)
  - codegenome/deterministic_adapter.py (compute_identity for
    deterministic_location_v1)
  - codegenome/bind_service.py (write_codegenome_identity, side-effect only)
  - adapters/codegenome.py (get_codegenome() factory parallels
    get_ledger / get_code_locator / get_drift_analyzer)
  - context.py wires codegenome + codegenome_config into BicameralContext
  - handlers/bind.py hooks identity write after bind_decision();
    failure caught and logged — bind contract unchanged

QOR governance artifacts (this PR was built via /qor-bootstrap →
/qor-plan → /qor-audit → /qor-implement workflow):
  - docs/CONCEPT.md, ARCHITECTURE_PLAN.md (genesis hash 29dfd085)
  - docs/META_LEDGER.md (5-entry chain: GENESIS → PLAN → VETO → PASS → IMPLEMENT)
  - docs/BACKLOG.md, SHADOW_GENOME.md
  - plan-codegenome-phase-1-2.md
  - .agent/staging/ (gitignored — audit reports)

Tests: 49/49 codegenome unit + integration tests pass.
Section 4 razor: all new functions ≤ 40 lines (one mid-build
violation in write_codegenome_identity caught and refactored into
helpers per qor-implement Step 9).
Regression: 0 — full suite passes 254/335 vs baseline 250/335; the
4 delta is exactly the codegenome integration tests. The 81
pre-existing failures on upstream/main are filed as
#67-#70.

Scope vs #59: all mandated deliverables present. Two justified
deviations documented in plan-codegenome-phase-1-2.md and
docs/META_LEDGER.md Entry #5:
  - One extra edge (`about` decision→code_subject) required by
    find_subject_identities_for_decision's two-hop graph walk per
    the issue's exit criterion.
  - content_hash uses sha256-with-whitespace-normalization
    (ledger.status.hash_lines) instead of literal blake2b(body_text)
    to satisfy the exit criterion "subject_identity.content_hash
    matches code_region.content_hash at bind time".

Schema compat-map entry for v11 ships as "0.11.0" placeholder;
release-eng pins the final value at PR merge.

🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(#71): address CodeRabbit review findings on PR #71

Six CodeRabbit comments resolved on the CodeGenome Phase 1+2 PR.

CRITICAL (ledger/queries.py)
  - Add `_validated_record_id` helper enforcing canonical `table:id`
    SurrealDB record-id shape. Apply at every interpolation site for
    new CodeGenome queries: `relate_has_identity` (validates
    `code_subject_id` and `subject_identity_id`),
    `link_decision_to_subject` (validates `decision_id` and
    `code_subject_id`), `find_subject_identities_for_decision`
    (validates `decision_id`). Closes the f-string injection vector
    for caller-supplied IDs reaching SurrealQL string-build.

MAJOR (ledger/queries.py)
  - `upsert_subject_identity` is now race-safe under concurrent
    writers. Previous SELECT-then-CREATE could lose the race on the
    UNIQUE(address) index; new code wraps CREATE in try/except,
    detects "already contains" from the dedup-edge helper convention,
    and re-SELECTs to return the winning row's id.

MINOR (codegenome/bind_service.py)
  - `write_codegenome_identity` now honors the `SubjectIdentity | None`
    return contract: returns `None` when `_persist_subject_and_identity`
    reports persistence did not complete (empty IDs from upserts),
    `identity` only on full success.

MINOR (codegenome/confidence.py)
  - Replace Unicode minus (U+2212) with ASCII hyphen-minus in the
    `noisy_or` docstring formula. Closes RUF002 lint warning.

MINOR (docs/ARCHITECTURE_PLAN.md)
  - Add `text` language identifier to two unlabeled fenced code blocks
    (file tree, data flow). Closes MD040 markdownlint warnings.

MINOR (docs/QOR_VS_ADHOC_COMPARISON.md)
  - Replace duplicated phrase "audit auditability" with "auditability"
    in the verdict sentence.

NIT (tests/test_codegenome_adapter.py)
  - `test_subject_identity_is_frozen` now uses
    `pytest.raises(dataclasses.FrozenInstanceError)` instead of the
    overly broad `pytest.raises(Exception)`. Asserts the specific
    frozen-dataclass behavior, not any exception.

All 49 codegenome tests still pass. Section 4 razor still clean.
The 3 ingest-flow CI failures (test_phase2_ledger.py +
test_phase3_integration.py) remain unaddressed by this commit —
diagnosing whether they are pre-existing on upstream/main is the
next step.

🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add EPHEMERAL_AUTHORITATIVE.md

Explains the ephemeral/authoritative branch-aware compliance verdict
design: ephemeral flag semantics, promotion lifecycle, branch-delta
sweep, flow_id coupling, 17-scenario matrix, and key invariants.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: rename EPHEMERAL_AUTHORITATIVE.md to kebab case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(#59): L1 exemption guard for codegenome identity writes

Per Jin's "L1/L2: Claim/Identity" spec-governance proposal §4.2 +
Failure Mode #2: only decisions explicitly tagged ``decision_level =
"L2"`` enter the codegenome identity graph. L1 decisions (behavioral
commitments evaluated against evidence) are intentionally ungrounded
at the identity layer. L3 is never tracked. ``NULL`` (unclassified)
is treated as L3 by the tolerant policy — preserves backward-compat
for existing ingest payloads.

Without this guard, binding an L1 decision pollutes the identity
graph with subsystem-level matches that erode Phase-3 continuity
precision.

Changes:
  - Added ``ledger.queries.get_decision_level`` + adapter wrapper.
  - Bind handler reads decision_level before invoking the codegenome
    side-effect; skips for non-L2 levels (handler-level option A from
    the doc's open Q3).
  - 5 new tests in tests/test_codegenome_l1_exemption.py covering
    L2 (writes), L1 (skip), L3 (skip), NULL (skip), and the
    BindResponse-shape-unchanged invariant.
  - Updated tests/test_codegenome_bind_integration.py _seed_decision
    helper to default decision_level="L2" so existing tests exercise
    the codegenome write path.

Tests: 63/63 codegenome tests pass (was 49 + 5 L1 + 9 housekeeping).
No regressions on existing bind tests.

🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(#59): spec-governance feedback — L1/L2 claim/identity model

Written response to Jin's Notion spec-governance proposal. Covers:

- Q1 (claim_evaluator schema): defer until first concrete evaluator
  exists to model. Adding the table now bakes in one shape and forces
  another schema migration we'll likely revisit.
- Q2 (NULL decision_level policy): tolerant — treat as L3 (skip
  codegenome write), but invest in classification surfacing
  (dashboard badge + bulk-classify utility) so it doesn't stay NULL
  forever.
- Q3 (where to put the L1 exemption guard): handler-level, not
  ledger-internal. Keeps ledger/ as pure CRUD, makes the rule visible
  at the call site, simplifies test mocks.

Also enumerates open follow-ups (§6) — bug fixes already filed
(#67–70, #72, #74) plus three new docs/feature items to file
separately when ready.

No code changes; pure documentation.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: jinhongkuan <kuanjh123@gmail.com>
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…iation seal

Reality matches Promise. Three changes (2 repo files + 2 deferred external
gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged
advisory — manual verification mitigates); Section 4 razor clean.

Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7).
Implementation: Entry #25 chain hash 51c8a45c.
Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5

Architectural decision recorded: bicameral-mcp#135's auto-resolve direction
abandoned (no caller LLM in hook context, MCP sampling not viable in Claude
Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync.
The tooltip surfaces the pending state; the human in their session is the
qualified judge.

Plan addition tracking (Entry #24 preconditions, final state):
  ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here)
  🟡 #1 — PR description manual verification step (composed in /qor-document)
  🟡 #3#135 close comment README/docs deferral (composed in /qor-document)

Surfaced for follow-up (not blocking):
  bicameral-mcp#125 scope should be widened — 7 skills under
  pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/
  location claimed by pilot/mcp/CLAUDE.md.

Spec correction queued (post-merge gh action):
  bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates
  exists when it does not; collision detection lives caller-side via
  bicameral-context-sentry skill, surfaces via
  bicameral.preflight.unresolved_collisions.

Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/
runtime helpers absent (gate artifacts not written), tools/reliability/
validators absent (Steps 4.6–4.8 skipped), agent-teams not declared,
codex-plugin not declared (solo audit/seal), intent_lock capture skipped.

Refs #135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 3, 2026
Phase 2 of plan-priority-c-team-server-slack-v0.md — adds Slack OAuth v2
flow, workspace persistence with at-rest encrypted tokens, and a
schema-validated channel allow-list config loader.

Per audit Advisory #2, OAuth routes are factored into
team_server/auth/router.py (parallel to Phase 4's events router pattern)
rather than left inline in app.py — keeps app.py at 47 lines well under
the Section 4 razor cap.

Files added:
- team_server/auth/slack_oauth.py (58 LOC): pure functions —
  build_authorize_url, exchange_code; raises SlackOAuthError on ok=false.
- team_server/auth/encryption.py: Fernet encrypt/decrypt for OAuth tokens
  at rest; key loaded from BICAMERAL_TEAM_SERVER_SECRET_KEY env var.
- team_server/auth/router.py (73 LOC): /oauth/slack/install +
  /oauth/slack/callback routes with CSRF state defense; persists
  workspace row with token encrypted before storage.
- team_server/config.py (40 LOC): pydantic-validated YAML loader for
  channel allow-list; raises ValueError with descriptive message on
  schema failure.
- team_server/app.py (modified): include auth router.

Tests (7 functionality tests, all green):
- tests/test_team_server_slack_oauth.py: authorize URL contains required
  params + scopes, exchange_code POSTs correctly, encrypt/decrypt
  round-trip preserves plaintext while ciphertext differs, callback
  rejects mismatched state (CSRF defense), callback persists workspace.
- tests/test_team_server_channel_allowlist.py: load_channel_allowlist
  parses valid YAML, raises ValueError on missing team_id.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 3, 2026
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 8af60f3)
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
… to research brief (#205)

Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the
brief's review block. Tier C items + the subsequent Kilo / Gemini /
Codex-2nd-pass review layers are tracked as follow-ups (will be
surfaced in the PR thread for direction).

Changes:

- § 1.4 ingest pipeline: adds explicit "Risk amplification
  (durable-feedback-loop)" paragraph framing ingest as the durable
  write-surface that propagates poisoned content through preflight
  back into the agent's reasoning context. Strengthens LLM-01 + LLM-04
  P0 defensibility (Codex #2).
- § 1.8 skills surface: adds worked before/after example contrasting
  instruction-only `bicameral-report-bug` keys-only commitment vs the
  deterministic `_resolve_signer_email` gate that replaced it in #204.
  Makes the doctrine concrete for non-agent-systems readers (Codex #3).
- § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4"
  promise to "intentionally not enumerated; activation PR authors
  TEAM-NN IDs against actual activated topology" (Codex #8).
- § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now
  describes bicameral-mcp as an AI-adjacent developer-tool component
  whose risk-tier classification properly attaches to the integrated
  system + deployment context, requiring counsel review for any
  specific tier claim (Codex #7).
- § 5 gap synthesis: adds Deployment trigger column (`all` /
  `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is
  defensible per deployment shape. SOC2-01 reclassified as
  pre-team/hosted P0 with local-only boundary statement; GDPR-05
  reclassified as team/hosted P1 with local single-user P2; OWASP-03
  reclassified as hosted P1 with local P2 (uv/pipx provides
  install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1).
- Appendix method notes: softens "every claim should be verifiable by
  re-reading the cited file at the cited line range" to acknowledge
  that most findings cite components rather than path:line, and
  defers a line-level evidence appendix as a follow-up improvement
  (Codex #9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
…+Gemini+Codex-2 (#205)

Authors a single Reviewer Disposition Pass table at the top of the
brief reconciling all 32 review points across four review layers
(Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one
post-review consensus before downstream P1 issue-filing — per the
explicit Codex-2 #1 directive.

Decisions: 21 applied this commit, 6 already applied in 1d82658,
3 deferred to follow-up, 2 note-only. Net new gap IDs added per
disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning
+ revocation), LLM-11 (cross-tool config-file modification surface),
MCP-01 (host UX as external dependency), CFG-01 (config precedence
+ fail-closed model). Reclassification: LLM-06 P0/M → P1/M with
scope narrowed to future remote-skill-loading channel (per Kilo #2).

Major content additions to the brief:

- § 1.1: MCP host UX is external dependency, not security gate (new
  gap MCP-01) — host that auto-approves tool calls bypasses any
  "operator will see this" assumption.
- § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11).
- § 1.7: cross-tool config-file modification surface (new gap LLM-11)
  distinct from skill-content surface — `setup_wizard` writes shell
  commands into `.claude/settings.json` that run host-side at hook fire.
- § 1.11 (new): Configuration precedence + fail-closed model — single
  uniform precedence rule across all knobs (env > config.yaml >
  hardcoded defaults), fail-closed semantics on missing/malformed/
  contradictory config (Codex-2 #5).
- § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 +
  OWASP-04 (Kilo #13).
- § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL
  (Kilo #3) — security context cannot rely on agent-filled params.
- § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) +
  guardrail-not-classifier framing (Codex-1 #6) + control-acceptance
  template (Codex-2 #4) — quarantine, override, test fixtures,
  measurement counters.
- § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3).
- § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not
  agent-supplied confirm parameters (Kilo #3).
- § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in
  current install model the wheel-trust covers it (Kilo #2).
- § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed
  hooks-manifest.json) distinct from skill manifest.
- § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and-
  rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini
  #1), or scope-out via PII detect-and-refuse.
- § 2.1 (c) GDPR-02: data-subject-access search must cover full
  identifier surface (description, source_ref, topic, file paths) not
  just signer email (Codex-1 #5).
- § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap,
  WAL, crash dumps) (Kilo #7).
- § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics
  (Kilo #8 + Codex-2 #3).
- § 5: gap table updated with new rows + LLM-06 reclassification;
  gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total,
  up from 41).
- § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) —
  ingest boundary guardrails, per-tool authority gradation, supply-
  chain signing, telemetry & consent.
- § 6.2 (new): six-section control-acceptance template for every DG
  gap (Codex-2 #4) — positive / negative / bypass / fail-closed /
  telemetry / docs.

Filed-issue updates:
- Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope
  narrowing, full disposition comment added.
- Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added
  capturing the guardrail framing, classifier extensibility, and
  control-acceptance template applicable to both.

Deferred for follow-up: Codex-1 #4 (controller/processor
restructure of standards table), Codex-1 #9 (full evidence appendix
beyond the methodology softening), Codex-2 #2 (full 3-column
deployment-profile matrix beyond the single-column trigger).

Brief now 706 lines (up from 606); +124 line diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
…knobs (BicameralAI#200 Phase 3)

Closes A4 of BicameralAI#200's preflight findings via two new deterministic config
gates that filter what data flows from server to agent / disk.

Deterministic gate #1 — render_source_attribution:
- New BicameralContext.render_source_attribution field (modes: full,
  redacted (default), hidden) read from .bicameral/config.yaml at
  config-load.
- New handlers.preflight._apply_attribution_policy applies the mode to
  DecisionMatch.source_ref before the response leaves the server. The
  agent receives pre-filtered output and renders verbatim.
  - full: legacy passthrough
  - redacted: replace `[A-Z][a-z]+` (names) with <NAME_REDACTED>,
    `\d{4}-\d{2}-\d{2}` (dates) with <DATE_REDACTED>; preserves
    structural shape so operator sees "<NAME_REDACTED> review ·
    <NAME_REDACTED>, <DATE_REDACTED>" instead of full attribution
  - hidden: blank source_ref entirely

Deterministic gate BicameralAI#2 — preflight_bypass_tracking:
- New BicameralContext.preflight_bypass_tracking field (modes: enabled
  (default), disabled).
- handlers.record_bypass.handle_record_bypass short-circuits BEFORE the
  preflight_telemetry.write_bypass_event call when ctx setting is
  disabled. Returns recorded=False, reason="tracking_disabled". When
  disabled, ~/.bicameral/preflight_events.jsonl gets no writes; engine's
  recency read sees no events → no escalation drop, which matches the
  user's privacy choice.

Config schema:
- context.py refactored to share a generic _read_yaml_string_field
  helper across the three Phase 2/3 config readers
  (signer_email_fallback, render_source_attribution,
  preflight_bypass_tracking). Each is a fail-soft string-with-valid-set
  read with documented privacy-positive defaults.
- setup_wizard.py writes both new fields to fresh .bicameral/config.yaml
  with their defaults.

SKILL.md updates:
- skills/bicameral-preflight/SKILL.md: telemetry transparency note
  added above HITL prompts; render_source_attribution doc inserted
  inline ("the deterministic gate is the config field, not this
  instruction"); preflight_bypass_tracking note appended to the
  bypass-semantics list.

Tests (5 new):
- tests/test_preflight_render_source_attribution.py: 3 tests for
  full/redacted/hidden modes against the pure helper.
- tests/test_preflight_bypass_tracking.py: 2 tests for enabled/disabled
  paths through the handler with monkeypatched preflight_telemetry.

10/10 Phase 2+3 tests pass; ruff/mypy/format clean.

Phase 3 of plan-200-skills-audit-hardening (PR C of three — completes
the plan). After merge, the queued workstream is the e2e security audit
across all #205 compliance standards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
v0.14.1 publish run BicameralAI#2 still failed at SBOM gen — the previous hotfix
added `--schema-version 1.5` to the cyclonedx-py command but
cyclonedx-py 7.x rejected the flag with exit code 2 (argparse error,
"command-line usage error"). Stderr was captured but never printed,
so the actual error message was invisible from the publish workflow log.

This commit:

1. Drops the `--schema-version` flag entirely. cyclonedx-py emits
   whichever CycloneDX 1.x revision is its default (currently 1.6).
2. Loosens the validator from `specVersion == "1.5"` to `specVersion`
   starts with "1.". CycloneDX is forward-compatible within 1.x;
   every consumer that reads 1.5 reads 1.6.
3. On `subprocess.CalledProcessError`, prints the captured stderr to
   our own stderr before re-raising. Future CLI-flag drift becomes
   diagnosable from the workflow log without re-running.
4. Module docstring updated — title is "CycloneDX SBOM emitter" (no
   1.5 anymore), policy section explains the version drift fight.

The contract reduction from "exact 1.5" to "any 1.x" is a deliberate
trade for resilience against transitive `cyclonedx-bom` upgrades. If
we ever need to pin to a specific spec version for compliance reasons,
the right move is to pin `cyclonedx-bom` itself in pyproject.toml
[release] extras and document the dependency-policy decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant