feat: event-sourced collaboration (Phase 1)#2
Conversation
…terializer
Adds cross-team decision sharing via append-only JSON event files
committed to git. In team mode, every ingest/link_commit writes an
event file to .bicameral/events/{git-email}/ before updating the
local DB. On startup, a watermark-based materializer replays peer
events into the local ledger. Zero merge conflicts by design.
New: events/ package (models, writer, materializer, team_adapter)
Modified: adapters/ledger.py (config-driven adapter factory)
Modified: setup_wizard.py (solo/team mode selection)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 1 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds a configuration-driven "team" collaboration mode: new event-sourcing components (EventEnvelope, EventFileWriter, EventMaterializer, TeamWriteAdapter), ledger factory wiring to optionally wrap the SurrealDB adapter in team mode, and setup changes to persist and respect collaboration mode. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Factory as get_ledger()
participant Inner as SurrealDBLedgerAdapter
participant Team as TeamWriteAdapter
participant Writer as EventFileWriter
participant Materializer as EventMaterializer
participant FS as Event Filesystem
Factory->>Inner: instantiate SurrealDBLedgerAdapter
Factory->>Writer: create EventFileWriter(.bicameral/events/, author)
Factory->>Materializer: create EventMaterializer(.bicameral/events/, .bicameral/local/)
Factory->>Team: wrap Inner with TeamWriteAdapter(Inner, Writer, Materializer)
Team->>Inner: connect()
Team->>Materializer: replay_new_events(inner)
Materializer->>FS: read new event files (*.json)
Materializer-->>Inner: call ingest_payload / ingest_commit for events
User->>Team: ingest_payload(payload)
Team->>Writer: write(event_type=ingest.completed, payload)
Writer->>FS: atomic write timestamp-uuid.json
Team->>Inner: inner.ingest_payload(payload)
Inner-->>Team: result
User->>Team: get_all_decisions(...)
Team->>Inner: get_all_decisions(...)
Inner-->>Team: decisions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@events/materializer.py`:
- Around line 57-69: The single scalar watermark (watermark from
_read_watermark) is too coarse because EventFileWriter timestamps only to the
second; change the filtering logic that builds new_events (and the analogous
logic at lines 103-105) to perform per-event bookkeeping for the current
timestamp bucket: persist (or atomically store) a set of processed event
identifiers or filenames for the latest watermark-second and use
_extract_timestamp(f.name) to group by second, allowing events with timestamp ==
watermark-second to be accepted if their filename/ID is not in the persisted
processed set; update the watermark persistence API (or add a companion
persisted processed_ids set) so after processing you append processed file IDs
for that second and advance the watermark when the set is complete, and adjust
the filter to use ">" for seconds newer than watermark-second and "== and id not
in processed_ids" for the current second.
In `@events/team_adapter.py`:
- Around line 41-56: The ingest_payload and ingest_commit methods currently
write events via self._writer.write then delegate to self._inner but never
record that the local event has been materialized; after the await of
self._inner.ingest_payload(...) and self._inner.ingest_commit(...), update the
local replay cursor/state to mark that the written event has been applied (e.g.,
call your replay cursor/state API such as
self._replay_cursor.mark_applied(event_id) or update self._local_state to
include the event), ensuring this occurs only after the inner call succeeds so
replay won’t reapply the same event after restart.
- Around line 32-37: The current materialization only runs in connect(), so add
an internal async method async def _ensure_ready(self) that is idempotent and
called from every public method (e.g., read/write methods, any public API on
this adapter) to guarantee the inner adapter and materializer are initialized;
implement _ensure_ready to await a per-instance asyncio.Lock or an initialized
flag to avoid races, call await self._inner.connect() then await
self._materializer.replay_new_events(self._inner) and log as before, and set a
_ready flag so subsequent calls are no-ops; remove reliance on callers invoking
connect() and instead call await self._ensure_ready() at the start of each
public method.
In `@events/writer.py`:
- Around line 41-45: The constructor stores author_email directly into a
filesystem path which lets values with separators, '..', or absolute paths
escape the events dir; update Writer.__init__ to validate and sanitize
author_email before using it: reject or normalize inputs containing path
separators, leading slashes, or components like '..' (e.g., ensure author_email
== Path(author_email).name or apply a safe-encoding), raise ValueError for
invalid inputs, then set self._author to the sanitized value and build
self._user_dir from events_dir / sanitized_author and mkdir as before; ensure
any external callers still get a clear exception on invalid author strings.
In `@setup_wizard.py`:
- Around line 400-407: The team-mode config currently leaves SURREAL_URL and
CODE_LOCATOR_SQLITE_DB pointing to .bicameral/ledger.db and
.bicameral/code-graph.db which will be tracked; update the config generation
called after _select_collaboration_mode/_write_collaboration_config to accept
the collab_mode flag and, when collab_mode == "team", set SURREAL_URL and
CODE_LOCATOR_SQLITE_DB to files under .bicameral/local/ (e.g.,
.bicameral/local/ledger.db and .bicameral/local/code-graph.db) or add explicit
ignores for those exact paths; modify the function that writes MCP/env config
(where SURREAL_URL and CODE_LOCATOR_SQLITE_DB are emitted) to branch on
collab_mode and ensure the moved paths match what _ensure_gitignore produces.
In `@tests/test_team_events.py`:
- Around line 16-17: The test module currently uses
os.environ.setdefault("SURREAL_URL", "memory://") which can leave an existing
SURREAL_URL and cause tests to hit a persistent DB; replace this with an
unconditional override by assigning os.environ["SURREAL_URL"] = "memory://" at
module import time so the SURREAL_URL is always set to the in-memory SurrealDB
for tests (refer to the SURREAL_URL environment variable and the
tests/test_team_events.py top-level env setup).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 06293b92-42c6-4788-8add-40ed6e13ec23
📒 Files selected for processing (8)
adapters/ledger.pyevents/__init__.pyevents/materializer.pyevents/models.pyevents/team_adapter.pyevents/writer.pysetup_wizard.pytests/test_team_events.py
| watermark = self._read_watermark() | ||
|
|
||
| # Glob all event files across all user directories | ||
| event_files = sorted( | ||
| self._events_dir.glob("*/*.json"), | ||
| key=lambda f: f.name, # lexicographic = chronological | ||
| ) | ||
|
|
||
| # Filter to new events | ||
| new_events = [ | ||
| f for f in event_files | ||
| if self._extract_timestamp(f.name) > watermark | ||
| ] |
There was a problem hiding this comment.
A single timestamp watermark is too coarse for these filenames.
EventFileWriter only guarantees second-level timestamps, so once the watermark is 20260410T180000Z, any event file merged later from that same second is skipped forever by the > filter here. You need per-event bookkeeping for the current timestamp bucket (or persistent processed event IDs), not a single scalar timestamp.
Also applies to: 103-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@events/materializer.py` around lines 57 - 69, The single scalar watermark
(watermark from _read_watermark) is too coarse because EventFileWriter
timestamps only to the second; change the filtering logic that builds new_events
(and the analogous logic at lines 103-105) to perform per-event bookkeeping for
the current timestamp bucket: persist (or atomically store) a set of processed
event identifiers or filenames for the latest watermark-second and use
_extract_timestamp(f.name) to group by second, allowing events with timestamp ==
watermark-second to be accepted if their filename/ID is not in the persisted
processed set; update the watermark persistence API (or add a companion
persisted processed_ids set) so after processing you append processed file IDs
for that second and advance the watermark when the set is complete, and adjust
the filter to use ">" for seconds newer than watermark-second and "== and id not
in processed_ids" for the current second.
| async def ingest_payload(self, payload: dict) -> dict: | ||
| """Write ingest event, then delegate to inner adapter.""" | ||
| self._writer.write("ingest.completed", payload) | ||
| return await self._inner.ingest_payload(payload) | ||
|
|
||
| async def ingest_commit( | ||
| self, commit_hash: str, repo_path: str, drift_analyzer=None, | ||
| ) -> dict: | ||
| """Write link_commit event, then delegate to inner adapter.""" | ||
| self._writer.write( | ||
| "link_commit.completed", | ||
| {"commit_hash": commit_hash, "repo_path": repo_path}, | ||
| ) | ||
| return await self._inner.ingest_commit( | ||
| commit_hash, repo_path, drift_analyzer=drift_analyzer, | ||
| ) |
There was a problem hiding this comment.
Mark successful local writes as already applied.
These methods append the event file and then apply the same mutation to the local DB, but nothing records that the local event has already been materialized. After a restart, replay can apply that same event to the same ledger again. Once the inner write succeeds, record the written event in the local replay cursor/state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@events/team_adapter.py` around lines 41 - 56, The ingest_payload and
ingest_commit methods currently write events via self._writer.write then
delegate to self._inner but never record that the local event has been
materialized; after the await of self._inner.ingest_payload(...) and
self._inner.ingest_commit(...), update the local replay cursor/state to mark
that the written event has been applied (e.g., call your replay cursor/state API
such as self._replay_cursor.mark_applied(event_id) or update self._local_state
to include the event), ensuring this occurs only after the inner call succeeds
so replay won’t reapply the same event after restart.
| def __init__(self, events_dir: Path, author_email: str) -> None: | ||
| self._events_dir = events_dir | ||
| self._author = author_email | ||
| self._user_dir = events_dir / author_email | ||
| self._user_dir.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Validate author_email before using it as a path component.
author_email is joined directly into the filesystem path here. A value containing separators, .., or an absolute path can escape .bicameral/events/ and also breaks the materializer's */*.json scan.
Proposed fix
class EventFileWriter:
"""Writes append-only JSON event files to a per-user directory."""
def __init__(self, events_dir: Path, author_email: str) -> None:
+ if (
+ not author_email
+ or author_email in {".", ".."}
+ or any(sep in author_email for sep in ("/", "\\"))
+ ):
+ raise ValueError("author_email must be a single directory name")
self._events_dir = events_dir
self._author = author_email
self._user_dir = events_dir / author_email
self._user_dir.mkdir(parents=True, exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@events/writer.py` around lines 41 - 45, The constructor stores author_email
directly into a filesystem path which lets values with separators, '..', or
absolute paths escape the events dir; update Writer.__init__ to validate and
sanitize author_email before using it: reject or normalize inputs containing
path separators, leading slashes, or components like '..' (e.g., ensure
author_email == Path(author_email).name or apply a safe-encoding), raise
ValueError for invalid inputs, then set self._author to the sanitized value and
build self._user_dir from events_dir / sanitized_author and mkdir as before;
ensure any external callers still get a clear exception on invalid author
strings.
| # Ensure in-memory SurrealDB for all tests in this module | ||
| os.environ.setdefault("SURREAL_URL", "memory://") |
There was a problem hiding this comment.
Force the in-memory DB here instead of using setdefault().
setdefault() leaves any preexisting SURREAL_URL untouched, so this module can unexpectedly run against a persistent ledger and leak state across test runs. Use a test-scoped env override that always sets memory://.
As per coding guidelines, "Set SURREAL_URL environment variable to configure SurrealDB connection; use memory:// for tests (no persistence)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_team_events.py` around lines 16 - 17, The test module currently
uses os.environ.setdefault("SURREAL_URL", "memory://") which can leave an
existing SURREAL_URL and cause tests to hit a persistent DB; replace this with
an unconditional override by assigning os.environ["SURREAL_URL"] = "memory://"
at module import time so the SURREAL_URL is always set to the in-memory
SurrealDB for tests (refer to the SURREAL_URL environment variable and the
tests/test_team_events.py top-level env setup).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…olation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 488-495: The fenced code block that shows the directory tree
starting with ".bicameral/" is missing a language tag, triggering markdownlint
MD040; update the opening fence from ``` to ```text (or another appropriate
language identifier) for the block that contains ".bicameral/ ├── events/ ..."
so the code fence declares a language and the linter passes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| ``` | ||
| .bicameral/ | ||
| ├── events/ ← committed to git (shared decisions) | ||
| │ ├── pm@co.com/ ← PM's ingested PRDs and transcripts | ||
| │ └── dev@co.com/ ← developer's commit syncs | ||
| ├── config.yaml ← committed (mode: solo | team) | ||
| └── local/ ← gitignored (materialized state) | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block (markdownlint MD040).
At Line 488, the code fence should declare a language to satisfy lint rules.
🛠️ Proposed fix
-```
+```text
.bicameral/
├── events/ ← committed to git (shared decisions)
│ ├── pm@co.com/ ← PM's ingested PRDs and transcripts
│ └── dev@co.com/ ← developer's commit syncs
├── config.yaml ← committed (mode: solo | team)
└── local/ ← gitignored (materialized state)</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 488-488: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @README.md around lines 488 - 495, The fenced code block that shows the
directory tree starting with ".bicameral/" is missing a language tag, triggering
markdownlint MD040; update the opening fence from totext (or another
appropriate language identifier) for the block that contains ".bicameral/ ├──
events/ ..." so the code fence declares a language and the linter passes.
</details>
<!-- fingerprinting:phantom:triton:hawk:07d99fa3-2b89-4bda-9949-9f584bdc7bb1 -->
<!-- This is an auto-generated comment by CodeRabbit -->
…arseness 1. DB paths (setup_wizard): team mode now places ledger.db and code-graph.db under .bicameral/local/ so they stay gitignored 2. Lazy connect (team_adapter): _ensure_ready() guard on all public methods ensures materialization runs even without explicit connect() 3. Watermark (materializer): use >= instead of > to handle multiple events in the same second (safe due to upsert idempotency) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline. Before running the full BM25 + fuzzy grounding pipeline, check if the ledger already has a similar intent with high-confidence maps_to edges and reuse its code_regions. Key design choices: - Uses existing intent table BM25 index, not the vestigial vocab_cache table - Repo-isolated via code_region[WHERE repo = $repo] in graph traversal - Ranked by maps_to confidence (not search::score which returns 0.0 in v2) - Cached regions validated against live symbol index before acceptance - Qualified name fallback (PaymentService.process → process) - File-path disambiguation when multiple symbols share a name Closes gap #2 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline. Before running the full BM25 + fuzzy grounding pipeline, check if the ledger already has a similar intent with high-confidence maps_to edges and reuse its code_regions. Key design choices: - Uses existing intent table BM25 index, not the vestigial vocab_cache table - Repo-isolated via code_region[WHERE repo = $repo] in graph traversal - Ranked by maps_to confidence (not search::score which returns 0.0 in v2) - Cached regions validated against live symbol index before acceptance - Qualified name fallback (PaymentService.process → process) - File-path disambiguation when multiple symbols share a name Closes gap #2 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the tech spec §4.5 vocab cache pattern into the ingest pipeline. Before running the full BM25 + fuzzy grounding pipeline, check if the ledger already has a similar intent with high-confidence maps_to edges and reuse its code_regions. Key design choices: - Uses existing intent table BM25 index, not the vestigial vocab_cache table - Repo-isolated via code_region[WHERE repo = $repo] in graph traversal - Ranked by maps_to confidence (not search::score which returns 0.0 in v2) - Cached regions validated against live symbol index before acceptance - Qualified name fallback (PaymentService.process → process) - File-path disambiguation when multiple symbols share a name Closes gap #2 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uzzy Two fixes from Codex review feedback: 1. P1 #1: Filter weak BM25 scores (< 0.1) when no fuzzy matches exist. Prevents false groundings for under-specified queries that pass the FC-1 guard but have only noise-level BM25 hits. 2. P1 #2: Stage 2 now runs as enrichment after fuzzy direct lookup, not just as fallback. When fuzzy finds symbols from a subset of files, file-level retrieval fills remaining budget with additional files — preserving multi-file feature discovery. 49/49 tests pass. Eval metrics stable: MRR@5=0.605, Recall=20.0%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, index hardening 1. Split Phase 1 into 1a (DONE, groundwork) and 1b (TODO, actual enrichment wiring). Branch currently ships zero recall improvement. 2. Added compare-and-set content_hash guard to resolve_compliance. Caller must send the hash of the code it read; server rejects verdict if hash no longer matches current content. 3. Added try/except + atomic write (temp+rename) for symbol index build. Failure falls back to file-level BM25 gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d payload
V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
- ledger/adapter.py:412-420 emits a symbol_disappeared grounding
check on the authoritative ref when resolve_symbol_lines returns
None for a tracked region;
- handlers/link_commit.py:21 verification_instruction tells the
caller to use Grep/Read + validate_symbols / extract_symbols +
bicameral.bind;
- tests/test_link_commit_grounding.py covers the rename-→-grounding
flow end-to-end.
V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.
Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.
Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trix New tests/test_desync_scenarios.py — one test per scenario from the Notion "Auto-Grounding Problem" catalog (Notion ID 3332a51619c4813caccec86c36d9bf98), routed through the real handler layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss post-ingest hooks). Self-contained fixture builds a tmp git repo on ``main`` plus a memory ledger per test, so scenarios are independent of each other and of the bicameral-mcp checkout. Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13). Pass: 1. New decision, matching code exists → ungrounded → caller binds 2. Code changed after grounded → pending + pending_compliance_check 3. Code deleted after grounded → symbol_disappeared 4. Symbol renamed in file → symbol_disappeared with original_lines (V1 D1) 5. Symbol moved to different file → symbol_disappeared 6. Code added later, decision becomes resolvable → caller binds explicitly 7. Cold start: no matching code in repo → stays ungrounded 9. Intent description supersession → re-ingest succeeds 10. Multiple intents share a symbol → both surface in drift response 11. No server-side BM25 grounding (post-v0.6.0) → stays ungrounded — caller-LLM-driven 12. Line-shift edit does not trigger drift → resolve_symbol_lines re-resolves 13. [Open Question] prefix → ingested as gap-class decision XFAIL: 8. Drifted intent → atomic re-ground → requires V2 bicameral_rebind with old-binding CAS + fresh L3 verdict on new target (design doc 8 D2; Codex pass-10 #2) Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached ``compliant`` verdict yields ``pending``. ``drifted`` requires a caller-LLM verdict via the verdict cache (V2 territory). Incidental bug surfaced by F1 and fixed in ledger/adapter.py: ``pending_grounding_checks`` for ungrounded decisions read ``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``, but that query aliases the field to ``decision_id``. Result: every ungrounded grounding-check entry shipped with ``decision_id=""``, leaving callers no handle to bind against. The existing ``test_pending_grounding_checks_for_ungrounded_decisions`` regression didn't catch it because it only asserted ``len > 0`` on the list, not non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id`` for forward compatibility. Real correctness bug, not a test artifact. Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios, test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration, test_sync_middleware, test_bind, test_link_commit_grounding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…don't get bind CTA Codex pass-12 finding #2: D1 added original_lines to the symbol_disappeared payload (richer relocation context) but left the v0.6.4 verification_instruction text directing callers to use bicameral.bind for ALL pending_grounding_checks. For reason='ungrounded' that CTA is correct (no prior binding to retire, no duplicate-binding risk). For reason='symbol_disappeared' it is unsafe: bind on the new location leaves the old edge live under the N:N binds_to relation, producing duplicate-binding state. Atomic rebind that retires the stale edge in the same write ships in V2 (design doc 8 D2; Codex pass-10 finding #2). Net effect of D1 was that we added more relocation context routed through an unsafe CTA. This commit removes the unsafe CTA for relocation cases without reducing the safe CTA for ungrounded cases. Changes in handlers/link_commit.py: - _VERIFICATION_INSTRUCTION (single string) split into _VERIFICATION_INSTRUCTION_BASE + _GROUNDING_INSTRUCTION_UNGROUNDED + _GROUNDING_INSTRUCTION_RELOCATION. - New _build_verification_instruction(pending_compliance, pending_grounding) composer assembles the response text per call based on which reason values actually fired. Symbol-disappeared cases get an explicit "INFORMATIONAL ONLY — do NOT call bicameral.bind" warning citing the duplicate-binding hazard. - Response wiring at the bottom of handle_link_commit swapped from the constant lookup to the composer call. Tests (tests/test_link_commit_grounding.py): - test_pending_grounding_checks_for_ungrounded_decisions now asserts "bicameral.bind" IN the verification_instruction AND "INFORMATIONAL ONLY" NOT in it. - test_pending_grounding_checks_symbol_not_found now asserts "INFORMATIONAL ONLY" IN the instruction AND an explicit bind-prohibition phrase. Net destructive surface change for V1: arguably negative — V1 actively *removes* a v0.6.4 unsafe CTA for relocation, while introducing zero new mutating capabilities. Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic rebind — by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d payload
V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
- ledger/adapter.py:412-420 emits a symbol_disappeared grounding
check on the authoritative ref when resolve_symbol_lines returns
None for a tracked region;
- handlers/link_commit.py:21 verification_instruction tells the
caller to use Grep/Read + validate_symbols / extract_symbols +
bicameral.bind;
- tests/test_link_commit_grounding.py covers the rename-→-grounding
flow end-to-end.
V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.
Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.
Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trix New tests/test_desync_scenarios.py — one test per scenario from the Notion "Auto-Grounding Problem" catalog (Notion ID 3332a51619c4813caccec86c36d9bf98), routed through the real handler layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss post-ingest hooks). Self-contained fixture builds a tmp git repo on ``main`` plus a memory ledger per test, so scenarios are independent of each other and of the bicameral-mcp checkout. Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13). Pass: 1. New decision, matching code exists → ungrounded → caller binds 2. Code changed after grounded → pending + pending_compliance_check 3. Code deleted after grounded → symbol_disappeared 4. Symbol renamed in file → symbol_disappeared with original_lines (V1 D1) 5. Symbol moved to different file → symbol_disappeared 6. Code added later, decision becomes resolvable → caller binds explicitly 7. Cold start: no matching code in repo → stays ungrounded 9. Intent description supersession → re-ingest succeeds 10. Multiple intents share a symbol → both surface in drift response 11. No server-side BM25 grounding (post-v0.6.0) → stays ungrounded — caller-LLM-driven 12. Line-shift edit does not trigger drift → resolve_symbol_lines re-resolves 13. [Open Question] prefix → ingested as gap-class decision XFAIL: 8. Drifted intent → atomic re-ground → requires V2 bicameral_rebind with old-binding CAS + fresh L3 verdict on new target (design doc 8 D2; Codex pass-10 #2) Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached ``compliant`` verdict yields ``pending``. ``drifted`` requires a caller-LLM verdict via the verdict cache (V2 territory). Incidental bug surfaced by F1 and fixed in ledger/adapter.py: ``pending_grounding_checks`` for ungrounded decisions read ``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``, but that query aliases the field to ``decision_id``. Result: every ungrounded grounding-check entry shipped with ``decision_id=""``, leaving callers no handle to bind against. The existing ``test_pending_grounding_checks_for_ungrounded_decisions`` regression didn't catch it because it only asserted ``len > 0`` on the list, not non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id`` for forward compatibility. Real correctness bug, not a test artifact. Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios, test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration, test_sync_middleware, test_bind, test_link_commit_grounding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…don't get bind CTA Codex pass-12 finding #2: D1 added original_lines to the symbol_disappeared payload (richer relocation context) but left the v0.6.4 verification_instruction text directing callers to use bicameral.bind for ALL pending_grounding_checks. For reason='ungrounded' that CTA is correct (no prior binding to retire, no duplicate-binding risk). For reason='symbol_disappeared' it is unsafe: bind on the new location leaves the old edge live under the N:N binds_to relation, producing duplicate-binding state. Atomic rebind that retires the stale edge in the same write ships in V2 (design doc 8 D2; Codex pass-10 finding #2). Net effect of D1 was that we added more relocation context routed through an unsafe CTA. This commit removes the unsafe CTA for relocation cases without reducing the safe CTA for ungrounded cases. Changes in handlers/link_commit.py: - _VERIFICATION_INSTRUCTION (single string) split into _VERIFICATION_INSTRUCTION_BASE + _GROUNDING_INSTRUCTION_UNGROUNDED + _GROUNDING_INSTRUCTION_RELOCATION. - New _build_verification_instruction(pending_compliance, pending_grounding) composer assembles the response text per call based on which reason values actually fired. Symbol-disappeared cases get an explicit "INFORMATIONAL ONLY — do NOT call bicameral.bind" warning citing the duplicate-binding hazard. - Response wiring at the bottom of handle_link_commit swapped from the constant lookup to the composer call. Tests (tests/test_link_commit_grounding.py): - test_pending_grounding_checks_for_ungrounded_decisions now asserts "bicameral.bind" IN the verification_instruction AND "INFORMATIONAL ONLY" NOT in it. - test_pending_grounding_checks_symbol_not_found now asserts "INFORMATIONAL ONLY" IN the instruction AND an explicit bind-prohibition phrase. Net destructive surface change for V1: arguably negative — V1 actively *removes* a v0.6.4 unsafe CTA for relocation, while introducing zero new mutating capabilities. Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic rebind — by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d payload
V1 plan D1 was originally designed around a server-side
search_code(decision.description) to surface relocation candidates.
That approach is obsolete after v0.6.4 nuked search_code and shifted
all code retrieval to the caller LLM. v0.6.4 already implemented the
spirit of D1:
- ledger/adapter.py:412-420 emits a symbol_disappeared grounding
check on the authoritative ref when resolve_symbol_lines returns
None for a tracked region;
- handlers/link_commit.py:21 verification_instruction tells the
caller to use Grep/Read + validate_symbols / extract_symbols +
bicameral.bind;
- tests/test_link_commit_grounding.py covers the rename-→-grounding
flow end-to-end.
V1's actual contribution: add original_lines: [start_line, end_line]
to the symbol_disappeared payload so the caller LLM can inspect the
symbol's prior position via `git show <prev_ref>:<file_path>` to
ground its own retrieval. Single-line addition in ledger/adapter.py;
new assertion in the existing regression test.
Strictly informational. No new "call bicameral_bind" CTA — Codex
pass-10 finding #2 stands: V2 must ship bicameral_rebind with
old-binding CAS + a fresh L3 verdict on the new target before any
rebind workflow becomes safe.
Tests: 63 passed in 4.52s — zero regressions across ast_diff,
b2_cosmetic_hint, sync_middleware, bind, link_commit_grounding,
phase3 integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trix New tests/test_desync_scenarios.py — one test per scenario from the Notion "Auto-Grounding Problem" catalog (Notion ID 3332a51619c4813caccec86c36d9bf98), routed through the real handler layer per the Apr 8 PR #84 lesson (tests bypassing handlers miss post-ingest hooks). Self-contained fixture builds a tmp git repo on ``main`` plus a memory ledger per test, so scenarios are independent of each other and of the bicameral-mcp checkout. Scorecard: 12 PASS, 1 XFAIL — meets V1 plan acceptance gate (>=12/13). Pass: 1. New decision, matching code exists → ungrounded → caller binds 2. Code changed after grounded → pending + pending_compliance_check 3. Code deleted after grounded → symbol_disappeared 4. Symbol renamed in file → symbol_disappeared with original_lines (V1 D1) 5. Symbol moved to different file → symbol_disappeared 6. Code added later, decision becomes resolvable → caller binds explicitly 7. Cold start: no matching code in repo → stays ungrounded 9. Intent description supersession → re-ingest succeeds 10. Multiple intents share a symbol → both surface in drift response 11. No server-side BM25 grounding (post-v0.6.0) → stays ungrounded — caller-LLM-driven 12. Line-shift edit does not trigger drift → resolve_symbol_lines re-resolves 13. [Open Question] prefix → ingested as gap-class decision XFAIL: 8. Drifted intent → atomic re-ground → requires V2 bicameral_rebind with old-binding CAS + fresh L3 verdict on new target (design doc 8 D2; Codex pass-10 #2) Scenario 2 explicitly asserts ``pending`` (not ``drifted``) per post-v0.5.0 derive_status semantics: a hash diff WITHOUT a cached ``compliant`` verdict yields ``pending``. ``drifted`` requires a caller-LLM verdict via the verdict cache (V2 territory). Incidental bug surfaced by F1 and fixed in ledger/adapter.py: ``pending_grounding_checks`` for ungrounded decisions read ``d.get("id", "")`` from ``get_all_decisions(filter="ungrounded")``, but that query aliases the field to ``decision_id``. Result: every ungrounded grounding-check entry shipped with ``decision_id=""``, leaving callers no handle to bind against. The existing ``test_pending_grounding_checks_for_ungrounded_decisions`` regression didn't catch it because it only asserted ``len > 0`` on the list, not non-empty IDs. Fix: read ``decision_id`` first, fall back to ``id`` for forward compatibility. Real correctness bug, not a test artifact. Tests: 76 passed, 1 xfailed in 7.35s across test_desync_scenarios, test_b2_cosmetic_hint, test_ast_diff, test_phase3_integration, test_sync_middleware, test_bind, test_link_commit_grounding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…don't get bind CTA Codex pass-12 finding #2: D1 added original_lines to the symbol_disappeared payload (richer relocation context) but left the v0.6.4 verification_instruction text directing callers to use bicameral.bind for ALL pending_grounding_checks. For reason='ungrounded' that CTA is correct (no prior binding to retire, no duplicate-binding risk). For reason='symbol_disappeared' it is unsafe: bind on the new location leaves the old edge live under the N:N binds_to relation, producing duplicate-binding state. Atomic rebind that retires the stale edge in the same write ships in V2 (design doc 8 D2; Codex pass-10 finding #2). Net effect of D1 was that we added more relocation context routed through an unsafe CTA. This commit removes the unsafe CTA for relocation cases without reducing the safe CTA for ungrounded cases. Changes in handlers/link_commit.py: - _VERIFICATION_INSTRUCTION (single string) split into _VERIFICATION_INSTRUCTION_BASE + _GROUNDING_INSTRUCTION_UNGROUNDED + _GROUNDING_INSTRUCTION_RELOCATION. - New _build_verification_instruction(pending_compliance, pending_grounding) composer assembles the response text per call based on which reason values actually fired. Symbol-disappeared cases get an explicit "INFORMATIONAL ONLY — do NOT call bicameral.bind" warning citing the duplicate-binding hazard. - Response wiring at the bottom of handle_link_commit swapped from the constant lookup to the composer call. Tests (tests/test_link_commit_grounding.py): - test_pending_grounding_checks_for_ungrounded_decisions now asserts "bicameral.bind" IN the verification_instruction AND "INFORMATIONAL ONLY" NOT in it. - test_pending_grounding_checks_symbol_not_found now asserts "INFORMATIONAL ONLY" IN the instruction AND an explicit bind-prohibition phrase. Net destructive surface change for V1: arguably negative — V1 actively *removes* a v0.6.4 unsafe CTA for relocation, while introducing zero new mutating capabilities. Tests: 75 passed, 1 xfailed in 7.11s (xfail is scenario 8, V2 atomic rebind — by design). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Jin's "L1/L2: Claim/Identity" spec-governance proposal §4.2 + Failure Mode #2: only decisions explicitly tagged ``decision_level = "L2"`` enter the codegenome identity graph. L1 decisions (behavioral commitments evaluated against evidence) are intentionally ungrounded at the identity layer. L3 is never tracked. ``NULL`` (unclassified) is treated as L3 by the tolerant policy — preserves backward-compat for existing ingest payloads. Without this guard, binding an L1 decision pollutes the identity graph with subsystem-level matches that erode Phase-3 continuity precision. Changes: - Added ``ledger.queries.get_decision_level`` + adapter wrapper. - Bind handler reads decision_level before invoking the codegenome side-effect; skips for non-L2 levels (handler-level option A from the doc's open Q3). - 5 new tests in tests/test_codegenome_l1_exemption.py covering L2 (writes), L1 (skip), L3 (skip), NULL (skip), and the BindResponse-shape-unchanged invariant. - Updated tests/test_codegenome_bind_integration.py _seed_decision helper to default decision_level="L2" so existing tests exercise the codegenome write path. Tests: 63/63 codegenome tests pass (was 49 + 5 L1 + 9 housekeeping). No regressions on existing bind tests. 🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: CodeGenome Phase 1+2 (#59) — QOR-process build Implements upstream issue #59: adapter boundary + bind-time identity records. All flags default off; zero existing-behavior change unless BICAMERAL_CODEGENOME_ENABLED=1 and BICAMERAL_CODEGENOME_WRITE_IDENTITY_RECORDS=1. Phase 1 — adapter skeleton: - codegenome/{adapter,contracts,confidence,config,__init__}.py - 4 ABC methods (compute_identity is the only one implemented in #59) - 3 issue-mandated Pydantic models - noisy_or, weighted_average + DEFAULT_CONFIDENCE_WEIGHTS - CodeGenomeConfig.from_env() reads BICAMERAL_CODEGENOME_* Phase 2 — identity record write path: - SCHEMA_VERSION 10 → 11 (additive _migrate_v10_to_v11) - Tables: code_subject, subject_identity, subject_version - Edges: has_identity, has_version, about (decision→subject) - 5 ledger queries (upsert_code_subject, upsert_subject_identity, relate_has_identity, link_decision_to_subject, find_subject_identities_for_decision) - codegenome/deterministic_adapter.py (compute_identity for deterministic_location_v1) - codegenome/bind_service.py (write_codegenome_identity, side-effect only) - adapters/codegenome.py (get_codegenome() factory parallels get_ledger / get_code_locator / get_drift_analyzer) - context.py wires codegenome + codegenome_config into BicameralContext - handlers/bind.py hooks identity write after bind_decision(); failure caught and logged — bind contract unchanged QOR governance artifacts (this PR was built via /qor-bootstrap → /qor-plan → /qor-audit → /qor-implement workflow): - docs/CONCEPT.md, ARCHITECTURE_PLAN.md (genesis hash 29dfd085) - docs/META_LEDGER.md (5-entry chain: GENESIS → PLAN → VETO → PASS → IMPLEMENT) - docs/BACKLOG.md, SHADOW_GENOME.md - plan-codegenome-phase-1-2.md - .agent/staging/ (gitignored — audit reports) Tests: 49/49 codegenome unit + integration tests pass. Section 4 razor: all new functions ≤ 40 lines (one mid-build violation in write_codegenome_identity caught and refactored into helpers per qor-implement Step 9). Regression: 0 — full suite passes 254/335 vs baseline 250/335; the 4 delta is exactly the codegenome integration tests. The 81 pre-existing failures on upstream/main are filed as #67-#70. Scope vs #59: all mandated deliverables present. Two justified deviations documented in plan-codegenome-phase-1-2.md and docs/META_LEDGER.md Entry #5: - One extra edge (`about` decision→code_subject) required by find_subject_identities_for_decision's two-hop graph walk per the issue's exit criterion. - content_hash uses sha256-with-whitespace-normalization (ledger.status.hash_lines) instead of literal blake2b(body_text) to satisfy the exit criterion "subject_identity.content_hash matches code_region.content_hash at bind time". Schema compat-map entry for v11 ships as "0.11.0" placeholder; release-eng pins the final value at PR merge. 🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(#71): address CodeRabbit review findings on PR #71 Six CodeRabbit comments resolved on the CodeGenome Phase 1+2 PR. CRITICAL (ledger/queries.py) - Add `_validated_record_id` helper enforcing canonical `table:id` SurrealDB record-id shape. Apply at every interpolation site for new CodeGenome queries: `relate_has_identity` (validates `code_subject_id` and `subject_identity_id`), `link_decision_to_subject` (validates `decision_id` and `code_subject_id`), `find_subject_identities_for_decision` (validates `decision_id`). Closes the f-string injection vector for caller-supplied IDs reaching SurrealQL string-build. MAJOR (ledger/queries.py) - `upsert_subject_identity` is now race-safe under concurrent writers. Previous SELECT-then-CREATE could lose the race on the UNIQUE(address) index; new code wraps CREATE in try/except, detects "already contains" from the dedup-edge helper convention, and re-SELECTs to return the winning row's id. MINOR (codegenome/bind_service.py) - `write_codegenome_identity` now honors the `SubjectIdentity | None` return contract: returns `None` when `_persist_subject_and_identity` reports persistence did not complete (empty IDs from upserts), `identity` only on full success. MINOR (codegenome/confidence.py) - Replace Unicode minus (U+2212) with ASCII hyphen-minus in the `noisy_or` docstring formula. Closes RUF002 lint warning. MINOR (docs/ARCHITECTURE_PLAN.md) - Add `text` language identifier to two unlabeled fenced code blocks (file tree, data flow). Closes MD040 markdownlint warnings. MINOR (docs/QOR_VS_ADHOC_COMPARISON.md) - Replace duplicated phrase "audit auditability" with "auditability" in the verdict sentence. NIT (tests/test_codegenome_adapter.py) - `test_subject_identity_is_frozen` now uses `pytest.raises(dataclasses.FrozenInstanceError)` instead of the overly broad `pytest.raises(Exception)`. Asserts the specific frozen-dataclass behavior, not any exception. All 49 codegenome tests still pass. Section 4 razor still clean. The 3 ingest-flow CI failures (test_phase2_ledger.py + test_phase3_integration.py) remain unaddressed by this commit — diagnosing whether they are pre-existing on upstream/main is the next step. 🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add EPHEMERAL_AUTHORITATIVE.md Explains the ephemeral/authoritative branch-aware compliance verdict design: ephemeral flag semantics, promotion lifecycle, branch-delta sweep, flow_id coupling, 17-scenario matrix, and key invariants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * chore: rename EPHEMERAL_AUTHORITATIVE.md to kebab case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(#59): L1 exemption guard for codegenome identity writes Per Jin's "L1/L2: Claim/Identity" spec-governance proposal §4.2 + Failure Mode #2: only decisions explicitly tagged ``decision_level = "L2"`` enter the codegenome identity graph. L1 decisions (behavioral commitments evaluated against evidence) are intentionally ungrounded at the identity layer. L3 is never tracked. ``NULL`` (unclassified) is treated as L3 by the tolerant policy — preserves backward-compat for existing ingest payloads. Without this guard, binding an L1 decision pollutes the identity graph with subsystem-level matches that erode Phase-3 continuity precision. Changes: - Added ``ledger.queries.get_decision_level`` + adapter wrapper. - Bind handler reads decision_level before invoking the codegenome side-effect; skips for non-L2 levels (handler-level option A from the doc's open Q3). - 5 new tests in tests/test_codegenome_l1_exemption.py covering L2 (writes), L1 (skip), L3 (skip), NULL (skip), and the BindResponse-shape-unchanged invariant. - Updated tests/test_codegenome_bind_integration.py _seed_decision helper to default decision_level="L2" so existing tests exercise the codegenome write path. Tests: 63/63 codegenome tests pass (was 49 + 5 L1 + 9 housekeeping). No regressions on existing bind tests. 🤖 Authored via [QorLogic SDLC](https://github.com/MythologIQ-Labs-LLC/qor-logic) on [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(#59): spec-governance feedback — L1/L2 claim/identity model Written response to Jin's Notion spec-governance proposal. Covers: - Q1 (claim_evaluator schema): defer until first concrete evaluator exists to model. Adding the table now bakes in one shape and forces another schema migration we'll likely revisit. - Q2 (NULL decision_level policy): tolerant — treat as L3 (skip codegenome write), but invest in classification surfacing (dashboard badge + bulk-classify utility) so it doesn't stay NULL forever. - Q3 (where to put the L1 exemption guard): handler-level, not ledger-internal. Keeps ledger/ as pure CRUD, makes the rule visible at the call site, simplifies test mocks. Also enumerates open follow-ups (§6) — bug fixes already filed (#67–70, #72, #74) plus three new docs/feature items to file separately when ready. No code changes; pure documentation. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: jinhongkuan <kuanjh123@gmail.com>
…iation seal Reality matches Promise. Three changes (2 repo files + 2 deferred external gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged advisory — manual verification mitigates); Section 4 razor clean. Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7). Implementation: Entry #25 chain hash 51c8a45c. Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5 Architectural decision recorded: bicameral-mcp#135's auto-resolve direction abandoned (no caller LLM in hook context, MCP sampling not viable in Claude Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync. The tooltip surfaces the pending state; the human in their session is the qualified judge. Plan addition tracking (Entry #24 preconditions, final state): ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here) 🟡 #1 — PR description manual verification step (composed in /qor-document) 🟡 #3 — #135 close comment README/docs deferral (composed in /qor-document) Surfaced for follow-up (not blocking): bicameral-mcp#125 scope should be widened — 7 skills under pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/ location claimed by pilot/mcp/CLAUDE.md. Spec correction queued (post-merge gh action): bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates exists when it does not; collision detection lives caller-side via bicameral-context-sentry skill, surfaces via bicameral.preflight.unresolved_collisions. Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/ runtime helpers absent (gate artifacts not written), tools/reliability/ validators absent (Steps 4.6–4.8 skipped), agent-teams not declared, codex-plugin not declared (solo audit/seal), intent_lock capture skipped. Refs #135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of plan-priority-c-team-server-slack-v0.md — adds Slack OAuth v2 flow, workspace persistence with at-rest encrypted tokens, and a schema-validated channel allow-list config loader. Per audit Advisory #2, OAuth routes are factored into team_server/auth/router.py (parallel to Phase 4's events router pattern) rather than left inline in app.py — keeps app.py at 47 lines well under the Section 4 razor cap. Files added: - team_server/auth/slack_oauth.py (58 LOC): pure functions — build_authorize_url, exchange_code; raises SlackOAuthError on ok=false. - team_server/auth/encryption.py: Fernet encrypt/decrypt for OAuth tokens at rest; key loaded from BICAMERAL_TEAM_SERVER_SECRET_KEY env var. - team_server/auth/router.py (73 LOC): /oauth/slack/install + /oauth/slack/callback routes with CSRF state defense; persists workspace row with token encrypted before storage. - team_server/config.py (40 LOC): pydantic-validated YAML loader for channel allow-list; raises ValueError with descriptive message on schema failure. - team_server/app.py (modified): include auth router. Tests (7 functionality tests, all green): - tests/test_team_server_slack_oauth.py: authorize URL contains required params + scopes, exchange_code POSTs correctly, encrypt/decrypt round-trip preserves plaintext while ciphertext differs, callback rejects mismatched state (CSRF defense), callback persists workspace. - tests/test_team_server_channel_allowlist.py: load_channel_allowlist parses valid YAML, raises ValueError on missing team_id. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 8af60f3)
… to research brief (#205) Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the brief's review block. Tier C items + the subsequent Kilo / Gemini / Codex-2nd-pass review layers are tracked as follow-ups (will be surfaced in the PR thread for direction). Changes: - § 1.4 ingest pipeline: adds explicit "Risk amplification (durable-feedback-loop)" paragraph framing ingest as the durable write-surface that propagates poisoned content through preflight back into the agent's reasoning context. Strengthens LLM-01 + LLM-04 P0 defensibility (Codex #2). - § 1.8 skills surface: adds worked before/after example contrasting instruction-only `bicameral-report-bug` keys-only commitment vs the deterministic `_resolve_signer_email` gate that replaced it in #204. Makes the doctrine concrete for non-agent-systems readers (Codex #3). - § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4" promise to "intentionally not enumerated; activation PR authors TEAM-NN IDs against actual activated topology" (Codex #8). - § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now describes bicameral-mcp as an AI-adjacent developer-tool component whose risk-tier classification properly attaches to the integrated system + deployment context, requiring counsel review for any specific tier claim (Codex #7). - § 5 gap synthesis: adds Deployment trigger column (`all` / `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is defensible per deployment shape. SOC2-01 reclassified as pre-team/hosted P0 with local-only boundary statement; GDPR-05 reclassified as team/hosted P1 with local single-user P2; OWASP-03 reclassified as hosted P1 with local P2 (uv/pipx provides install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1). - Appendix method notes: softens "every claim should be verifiable by re-reading the cited file at the cited line range" to acknowledge that most findings cite components rather than path:line, and defers a line-level evidence appendix as a follow-up improvement (Codex #9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+Gemini+Codex-2 (#205) Authors a single Reviewer Disposition Pass table at the top of the brief reconciling all 32 review points across four review layers (Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one post-review consensus before downstream P1 issue-filing — per the explicit Codex-2 #1 directive. Decisions: 21 applied this commit, 6 already applied in 1d82658, 3 deferred to follow-up, 2 note-only. Net new gap IDs added per disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning + revocation), LLM-11 (cross-tool config-file modification surface), MCP-01 (host UX as external dependency), CFG-01 (config precedence + fail-closed model). Reclassification: LLM-06 P0/M → P1/M with scope narrowed to future remote-skill-loading channel (per Kilo #2). Major content additions to the brief: - § 1.1: MCP host UX is external dependency, not security gate (new gap MCP-01) — host that auto-approves tool calls bypasses any "operator will see this" assumption. - § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11). - § 1.7: cross-tool config-file modification surface (new gap LLM-11) distinct from skill-content surface — `setup_wizard` writes shell commands into `.claude/settings.json` that run host-side at hook fire. - § 1.11 (new): Configuration precedence + fail-closed model — single uniform precedence rule across all knobs (env > config.yaml > hardcoded defaults), fail-closed semantics on missing/malformed/ contradictory config (Codex-2 #5). - § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 + OWASP-04 (Kilo #13). - § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL (Kilo #3) — security context cannot rely on agent-filled params. - § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) + guardrail-not-classifier framing (Codex-1 #6) + control-acceptance template (Codex-2 #4) — quarantine, override, test fixtures, measurement counters. - § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3). - § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not agent-supplied confirm parameters (Kilo #3). - § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in current install model the wheel-trust covers it (Kilo #2). - § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed hooks-manifest.json) distinct from skill manifest. - § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and- rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini #1), or scope-out via PII detect-and-refuse. - § 2.1 (c) GDPR-02: data-subject-access search must cover full identifier surface (description, source_ref, topic, file paths) not just signer email (Codex-1 #5). - § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap, WAL, crash dumps) (Kilo #7). - § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics (Kilo #8 + Codex-2 #3). - § 5: gap table updated with new rows + LLM-06 reclassification; gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total, up from 41). - § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) — ingest boundary guardrails, per-tool authority gradation, supply- chain signing, telemetry & consent. - § 6.2 (new): six-section control-acceptance template for every DG gap (Codex-2 #4) — positive / negative / bypass / fail-closed / telemetry / docs. Filed-issue updates: - Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope narrowing, full disposition comment added. - Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added capturing the guardrail framing, classifier extensibility, and control-acceptance template applicable to both. Deferred for follow-up: Codex-1 #4 (controller/processor restructure of standards table), Codex-1 #9 (full evidence appendix beyond the methodology softening), Codex-2 #2 (full 3-column deployment-profile matrix beyond the single-column trigger). Brief now 706 lines (up from 606); +124 line diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…knobs (BicameralAI#200 Phase 3) Closes A4 of BicameralAI#200's preflight findings via two new deterministic config gates that filter what data flows from server to agent / disk. Deterministic gate #1 — render_source_attribution: - New BicameralContext.render_source_attribution field (modes: full, redacted (default), hidden) read from .bicameral/config.yaml at config-load. - New handlers.preflight._apply_attribution_policy applies the mode to DecisionMatch.source_ref before the response leaves the server. The agent receives pre-filtered output and renders verbatim. - full: legacy passthrough - redacted: replace `[A-Z][a-z]+` (names) with <NAME_REDACTED>, `\d{4}-\d{2}-\d{2}` (dates) with <DATE_REDACTED>; preserves structural shape so operator sees "<NAME_REDACTED> review · <NAME_REDACTED>, <DATE_REDACTED>" instead of full attribution - hidden: blank source_ref entirely Deterministic gate BicameralAI#2 — preflight_bypass_tracking: - New BicameralContext.preflight_bypass_tracking field (modes: enabled (default), disabled). - handlers.record_bypass.handle_record_bypass short-circuits BEFORE the preflight_telemetry.write_bypass_event call when ctx setting is disabled. Returns recorded=False, reason="tracking_disabled". When disabled, ~/.bicameral/preflight_events.jsonl gets no writes; engine's recency read sees no events → no escalation drop, which matches the user's privacy choice. Config schema: - context.py refactored to share a generic _read_yaml_string_field helper across the three Phase 2/3 config readers (signer_email_fallback, render_source_attribution, preflight_bypass_tracking). Each is a fail-soft string-with-valid-set read with documented privacy-positive defaults. - setup_wizard.py writes both new fields to fresh .bicameral/config.yaml with their defaults. SKILL.md updates: - skills/bicameral-preflight/SKILL.md: telemetry transparency note added above HITL prompts; render_source_attribution doc inserted inline ("the deterministic gate is the config field, not this instruction"); preflight_bypass_tracking note appended to the bypass-semantics list. Tests (5 new): - tests/test_preflight_render_source_attribution.py: 3 tests for full/redacted/hidden modes against the pure helper. - tests/test_preflight_bypass_tracking.py: 2 tests for enabled/disabled paths through the handler with monkeypatched preflight_telemetry. 10/10 Phase 2+3 tests pass; ruff/mypy/format clean. Phase 3 of plan-200-skills-audit-hardening (PR C of three — completes the plan). After merge, the queued workstream is the e2e security audit across all #205 compliance standards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.14.1 publish run BicameralAI#2 still failed at SBOM gen — the previous hotfix added `--schema-version 1.5` to the cyclonedx-py command but cyclonedx-py 7.x rejected the flag with exit code 2 (argparse error, "command-line usage error"). Stderr was captured but never printed, so the actual error message was invisible from the publish workflow log. This commit: 1. Drops the `--schema-version` flag entirely. cyclonedx-py emits whichever CycloneDX 1.x revision is its default (currently 1.6). 2. Loosens the validator from `specVersion == "1.5"` to `specVersion` starts with "1.". CycloneDX is forward-compatible within 1.x; every consumer that reads 1.5 reads 1.6. 3. On `subprocess.CalledProcessError`, prints the captured stderr to our own stderr before re-raising. Future CLI-flag drift becomes diagnosable from the workflow log without re-running. 4. Module docstring updated — title is "CycloneDX SBOM emitter" (no 1.5 anymore), policy section explains the version drift fight. The contract reduction from "exact 1.5" to "any 1.x" is a deliberate trade for resilience against transitive `cyclonedx-bom` upgrades. If we ever need to pin to a specific spec version for compliance reasons, the right move is to pin `cyclonedx-bom` itself in pyproject.toml [release] extras and document the dependency-policy decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
ingest/link_commitwrites an event file to.bicameral/events/{git-email}/before updating the local DBserver.pyand all handlersNew files
events/__init__.pyevents/models.pyEventEnvelopePydantic modelevents/writer.pyEventFileWriter— atomic JSON writes to per-user direvents/materializer.pyEventMaterializer— watermark + glob + replayevents/team_adapter.pyTeamWriteAdapter— composition wrappertests/test_team_events.pyModified files
adapters/ledger.pyget_ledger()reads.bicameral/config.yaml, returnsTeamWriteAdapterwhenmode: teamsetup_wizard.py.gitignorehandlingTest plan
bicameral-mcp setup→ select team mode → verify config.yaml + .gitignore.bicameral/events/🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation