fix: ingest pipeline — input contracts, payload normalization, freshness guards#3
Conversation
…ess guards
The bicameral_ingest tool silently returned all-zero stats when Claude
sent natural transcript payloads ({decisions, action_items, open_questions})
because the handler expected {mappings}. Additionally, auto-grounding
failed when the code graph index didn't exist yet.
- Add Pydantic input contracts (IngestPayload, IngestMapping, etc.) to
contracts.py so payload shape mismatches fail fast with ValidationError
- Add _normalize_payload() that converts natural LLM format to internal
mappings format via Pydantic validation
- Add ensure_code_graph_fresh() to code_locator adapter — reusable
freshness guard that ensures BM25 index + SymbolDB match HEAD
- Wire freshness guard into _auto_ground_via_search and
_resolve_symbols_to_regions so ingest auto-grounds without requiring
a prior search_code call
- Add post-ingest link_commit(HEAD) to sync ledger and re-ground
previously ungrounded intents
- Add auto-sync to decision_status handler (was read-only, now syncs
to HEAD like search and drift already do)
- Update tool descriptions to document both accepted payload formats
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis change introduces support for two ingest payload formats (internal mappings and natural decision-based formats), adds code graph freshness validation, and enables automatic ledger synchronization before status checks and post-ingest operations. Five files are modified to implement payload normalization, new input contracts, and updated handler control flow. Changes
Sequence DiagramssequenceDiagram
participant Client
participant IngestHandler as Ingest Handler
participant PayloadNorm as Payload Normalizer
participant CodeGraph as Code Graph
participant SymbolDB as Symbol DB
participant LinkCommit as Link Commit Handler
Client->>IngestHandler: POST /ingest (payload)
IngestHandler->>PayloadNorm: _normalize_payload(payload)
Note over PayloadNorm: Validate & convert decisions→mappings
PayloadNorm->>IngestHandler: normalized mappings
IngestHandler->>CodeGraph: ensure_code_graph_fresh(repo)
Note over CodeGraph: Rebuild index if needed
CodeGraph->>IngestHandler: ✓ ready
IngestHandler->>SymbolDB: resolve symbols
Note over SymbolDB: Auto-ground via BM25
SymbolDB->>IngestHandler: resolved regions
IngestHandler->>LinkCommit: handle_link_commit("HEAD")
Note over LinkCommit: Sync ledger to HEAD
LinkCommit->>IngestHandler: ✓ linked
IngestHandler->>Client: IngestResponse
sequenceDiagram
participant Client
participant StatusHandler as Status Handler
participant LinkCommit as Link Commit Handler
participant Ledger as Ledger
Client->>StatusHandler: GET /status
StatusHandler->>LinkCommit: handle_link_commit(ref)
Note over LinkCommit: Auto-sync before read
LinkCommit->>Ledger: sync to HEAD
Ledger->>LinkCommit: ✓ synced
LinkCommit->>StatusHandler: ✓ complete
StatusHandler->>Ledger: fetch decisions
Ledger->>StatusHandler: decisions
StatusHandler->>Client: DecisionStatusResponse
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@adapters/code_locator.py`:
- Around line 24-34: The current synchronous ensure_code_graph_fresh calls
ensure_index_matches_repo inline which can trigger long rebuilds; change it so
the expensive work runs off the request thread by either making
ensure_code_graph_fresh async and awaiting asyncio.to_thread(lambda:
ensure_index_matches_repo(repo, config)) or keeping a sync entry that schedules
asyncio.get_running_loop().create_task(asyncio.to_thread(...)) and returns
immediately; update callers of ensure_code_graph_fresh to await the new async
variant where necessary (or to not block on the scheduled task) and keep
references to ensure_code_graph_fresh and ensure_index_matches_repo to locate
and modify the call site.
In `@contracts.py`:
- Around line 167-188: Add a Pydantic model-level validator to IngestPayload
that enforces the contract: either mappings is non-empty XOR at least one of
decisions / action_items / open_questions is non-empty; if neither or both are
provided raise a ValueError so malformed payloads fail fast. Implement this as a
`@root_validator` in the IngestPayload class that examines values["mappings"],
values["decisions"], values["action_items"], and values["open_questions"], and
raises with a clear message; this prevents _normalize_payload()'s silent
zero-mapping fallback path from accepting empty/mixed payloads.
In `@handlers/decision_status.py`:
- Around line 26-31: The handler currently swallows exceptions from
handle_link_commit(ref) and proceeds to build a DecisionStatusResponse which may
report a stale view; instead either re-raise the exception so the request fails
like the search/drift handlers or set an explicit degraded flag (e.g.,
sync_failed) on the DecisionStatusResponse and skip returning a normal status;
update the try/except around handle_link_commit in decision_status.py to either
remove the exception catch (allowing exceptions to propagate) or replace the
logger.warning with logic that records the sync failure into the
DecisionStatusResponse object so callers can see the degraded state.
In `@handlers/ingest.py`:
- Around line 149-152: The code refreshes the graph for variable repo via
ensure_code_graph_fresh(repo) but then calls get_code_locator() which defaults
to the REPO_PATH env var, causing mismatch when payload["repo"] differs; update
the call site to pass the same repo into get_code_locator (e.g.,
get_code_locator(repo)) and ensure_code_graph_fresh(repo) is invoked with the
same repo variable, and also update adapters/code_locator.get_code_locator to
accept an optional repo_path parameter (as suggested) so RealCodeLocatorAdapter
is instantiated with repo_path=repo_path; in short: propagate the repo variable
into get_code_locator to guarantee refresh and searches target the same
repository.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dfa78f11-1745-4c04-addd-05c5cad2b9f9
📒 Files selected for processing (5)
adapters/code_locator.pycontracts.pyhandlers/decision_status.pyhandlers/ingest.pyserver.py
| def ensure_code_graph_fresh(repo_path: str | None = None) -> None: | ||
| """Ensure the code graph index exists and matches HEAD. | ||
|
|
||
| Safe to call multiple times — only rebuilds if stale. | ||
| Called automatically by tools that depend on the code graph. | ||
| """ | ||
| repo = repo_path or os.getenv("REPO_PATH", ".") | ||
| ensure_runtime_env() | ||
| from code_locator.config import load_config | ||
| config = load_config() | ||
| ensure_index_matches_repo(repo, config) |
There was a problem hiding this comment.
Avoid running freshness rebuilds inline on the request path.
ensure_index_matches_repo() can cold-start or fully rebuild the sqlite/BM25 artifacts. Exposing that work through a synchronous helper is risky now that ingest/status flows can call it during normal tool execution: one stale repo can block the whole MCP server until indexing finishes. Please push this behind an async/off-thread wrapper, or require callers to invoke it via asyncio.to_thread(...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@adapters/code_locator.py` around lines 24 - 34, The current synchronous
ensure_code_graph_fresh calls ensure_index_matches_repo inline which can trigger
long rebuilds; change it so the expensive work runs off the request thread by
either making ensure_code_graph_fresh async and awaiting
asyncio.to_thread(lambda: ensure_index_matches_repo(repo, config)) or keeping a
sync entry that schedules
asyncio.get_running_loop().create_task(asyncio.to_thread(...)) and returns
immediately; update callers of ensure_code_graph_fresh to await the new async
variant where necessary (or to not block on the scheduled task) and keep
references to ensure_code_graph_fresh and ensure_index_matches_repo to locate
and modify the call site.
| class IngestPayload(BaseModel): | ||
| """Ingest input — accepts EITHER mappings (internal) or decisions (natural LLM). | ||
|
|
||
| If ``mappings`` is present, it's used directly (internal pipeline format). | ||
| If ``decisions`` is present, they are normalized into mappings automatically. | ||
| """ | ||
| # Common fields | ||
| repo: str = "" | ||
| commit_hash: str = "" | ||
| query: str = "" | ||
|
|
||
| # Internal pipeline format | ||
| mappings: list[IngestMapping] = [] | ||
|
|
||
| # Natural LLM-generated format (normalized into mappings if present) | ||
| source: str = "manual" | ||
| title: str = "" | ||
| date: str = "" | ||
| participants: list[str] = [] | ||
| decisions: list[IngestDecision] = [] | ||
| action_items: list[IngestActionItem] = [] | ||
| open_questions: list[str] = [] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Locate contracts.py and handlers/ingest.py
git ls-files | grep -E "(contracts\.py|ingest\.py)"Repository: BicameralAI/bicameral-mcp
Length of output: 100
🏁 Script executed:
# Check contracts.py to see if IngestPayload exists at lines 167-188
head -200 contracts.py | tail -50Repository: BicameralAI/bicameral-mcp
Length of output: 1421
🏁 Script executed:
# Get context around the IngestPayload class
rg "class IngestPayload" -A 25 contracts.pyRepository: BicameralAI/bicameral-mcp
Length of output: 881
🏁 Script executed:
# Check handlers/ingest.py for the model_validate and normalization logic
cat -n handlers/ingest.py | head -110Repository: BicameralAI/bicameral-mcp
Length of output: 4438
🏁 Script executed:
# Check imports in contracts.py to see current Pydantic usage
head -30 contracts.pyRepository: BicameralAI/bicameral-mcp
Length of output: 911
🏁 Script executed:
# Search for any existing validators on IngestPayload
rg "@.*validator|from pydantic" contracts.py -B 2 -A 10Repository: BicameralAI/bicameral-mcp
Length of output: 396
🏁 Script executed:
# Check pyproject.toml or requirements for Pydantic version
fd -e "pyproject.toml" -o -e "requirements.txt" -o -e "requirements*.txt" | head -5 | xargs cat 2>/dev/null | grep -i pydanticRepository: BicameralAI/bicameral-mcp
Length of output: 185
🏁 Script executed:
# Check Pydantic version in config files
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "requirements*.txt" -o -name "setup.py" \) -exec cat {} \; | grep -i pydanticRepository: BicameralAI/bicameral-mcp
Length of output: 105
🏁 Script executed:
# Verify that contracts.py currently has NO validators on IngestPayload
ast-grep --pattern $'class IngestPayload {
$$$
}'Repository: BicameralAI/bicameral-mcp
Length of output: 51
🏁 Script executed:
# Double-check the exact behavior: empty payload validation
python3 << 'EOF'
# Simulate what happens with empty dict and IngestPayload
# to confirm defaults allow empty input
from typing import List
from pydantic import BaseModel
class MockIngestDecision(BaseModel):
id: str = ""
title: str = ""
class MockIngestActionItem(BaseModel):
owner: str = "unassigned"
action: str = ""
class MockIngestPayload(BaseModel):
repo: str = ""
commit_hash: str = ""
query: str = ""
mappings: List[dict] = []
source: str = "manual"
title: str = ""
date: str = ""
participants: List[str] = []
decisions: List[MockIngestDecision] = []
action_items: List[MockIngestActionItem] = []
open_questions: List[str] = []
# Test 1: empty dict
try:
p1 = MockIngestPayload.model_validate({})
print("✓ Empty dict validates: OK")
print(f" mappings={p1.mappings}, decisions={p1.decisions}, action_items={p1.action_items}, open_questions={p1.open_questions}")
except Exception as e:
print(f"✗ Empty dict rejected: {e}")
# Test 2: metadata only
try:
p2 = MockIngestPayload.model_validate({"repo": "test", "commit_hash": "abc"})
print("✓ Metadata-only payload validates: OK")
except Exception as e:
print(f"✗ Metadata-only rejected: {e}")
# Test 3: mixed (mappings + decisions)
try:
p3 = MockIngestPayload.model_validate({"mappings": [{"x": 1}], "decisions": [{"id": "d1"}]})
print("✓ Mixed payload (mappings + decisions) validates: OK")
except Exception as e:
print(f"✗ Mixed payload rejected: {e}")
EOFRepository: BicameralAI/bicameral-mcp
Length of output: 250
🏁 Script executed:
# Search for any other validation logic that might catch these cases
rg "_normalize_payload|IngestPayload" handlers/ingest.py -B 3 -A 15 | head -80Repository: BicameralAI/bicameral-mcp
Length of output: 2560
Enforce the ingest shape instead of defaulting to empty payloads.
IngestPayload currently accepts {}, metadata-only payloads, and mixed mappings + natural fields because every collection defaults to []. _normalize_payload() then falls back to the silent zero-mapping path (lines 92–97), so malformed input doesn't fail fast. This violates the contract stated in the docstring ("accepts EITHER mappings or decisions"). Add a model-level validator to enforce: non-empty mappings xor at least one of decisions / action_items / open_questions.
Proposed fix
-from pydantic import BaseModel
+from pydantic import BaseModel, model_validator
...
class IngestPayload(BaseModel):
"""Ingest input — accepts EITHER mappings (internal) or decisions (natural LLM).
@@
decisions: list[IngestDecision] = []
action_items: list[IngestActionItem] = []
open_questions: list[str] = []
+
+ `@model_validator`(mode="after")
+ def validate_shape(self) -> "IngestPayload":
+ has_mappings = bool(self.mappings)
+ has_natural = bool(self.decisions or self.action_items or self.open_questions)
+ if has_mappings == has_natural:
+ raise ValueError(
+ "Provide exactly one ingest format: non-empty `mappings` or at least one "
+ "of `decisions`, `action_items`, `open_questions`."
+ )
+ return self🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@contracts.py` around lines 167 - 188, Add a Pydantic model-level validator to
IngestPayload that enforces the contract: either mappings is non-empty XOR at
least one of decisions / action_items / open_questions is non-empty; if neither
or both are provided raise a ValueError so malformed payloads fail fast.
Implement this as a `@root_validator` in the IngestPayload class that examines
values["mappings"], values["decisions"], values["action_items"], and
values["open_questions"], and raises with a clear message; this prevents
_normalize_payload()'s silent zero-mapping fallback path from accepting
empty/mixed payloads.
| # Auto-sync to HEAD so status reflects current code state | ||
| try: | ||
| from handlers.link_commit import handle_link_commit | ||
| await handle_link_commit(ref) | ||
| except Exception as exc: | ||
| logger.warning("[status] auto-sync failed: %s", exc) |
There was a problem hiding this comment.
Don't return status after a failed auto-sync.
handle_link_commit() can fail before the ledger reflects ref. Catching that and continuing here means DecisionStatusResponse may contain stale statuses while still claiming the requested ref/as-of view. I'd either let the exception propagate, like the search/drift handlers do, or surface an explicit degraded sync_failed state in the response.
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 30-30: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@handlers/decision_status.py` around lines 26 - 31, The handler currently
swallows exceptions from handle_link_commit(ref) and proceeds to build a
DecisionStatusResponse which may report a stale view; instead either re-raise
the exception so the request fails like the search/drift handlers or set an
explicit degraded flag (e.g., sync_failed) on the DecisionStatusResponse and
skip returning a normal status; update the try/except around handle_link_commit
in decision_status.py to either remove the exception catch (allowing exceptions
to propagate) or replace the logger.warning with logic that records the sync
failure into the DecisionStatusResponse object so callers can see the degraded
state.
| from adapters.code_locator import get_code_locator, ensure_code_graph_fresh | ||
| from code_locator.indexing.sqlite_store import SymbolDB | ||
| ensure_code_graph_fresh(repo) | ||
| locator = get_code_locator() |
There was a problem hiding this comment.
Use the same repo for refresh and search.
This refreshes the code graph for repo, then creates the locator via get_code_locator(), which still reads REPO_PATH. If payload["repo"] differs from the process env, ingest can rebuild one repo and run BM25/symbol validation against another, which will yield wrong or empty grounding.
🔧 Proposed fix
- from adapters.code_locator import get_code_locator, ensure_code_graph_fresh
+ from adapters.code_locator import get_code_locator, ensure_code_graph_fresh
from code_locator.indexing.sqlite_store import SymbolDB
ensure_code_graph_fresh(repo)
- locator = get_code_locator()
+ locator = get_code_locator(repo_path=repo)Companion change outside this hunk:
# adapters/code_locator.py
def get_code_locator(repo_path: str | None = None):
repo = repo_path or os.getenv("REPO_PATH", ".")
return RealCodeLocatorAdapter(repo_path=repo)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@handlers/ingest.py` around lines 149 - 152, The code refreshes the graph for
variable repo via ensure_code_graph_fresh(repo) but then calls
get_code_locator() which defaults to the REPO_PATH env var, causing mismatch
when payload["repo"] differs; update the call site to pass the same repo into
get_code_locator (e.g., get_code_locator(repo)) and
ensure_code_graph_fresh(repo) is invoked with the same repo variable, and also
update adapters/code_locator.get_code_locator to accept an optional repo_path
parameter (as suggested) so RealCodeLocatorAdapter is instantiated with
repo_path=repo_path; in short: propagate the repo variable into get_code_locator
to guarantee refresh and searches target the same repository.
…dening Refactor ground_mappings into a 3-tier coverage loop that relaxes both BM25 score threshold and fuzzy match threshold before giving up: - Tier 0 (strict): 0.5/80 — identical to previous behavior - Tier 1 (relaxed): 0.3/70 - Tier 2 (broad): 0.1/60 Key changes: - Extract _ground_single() for per-mapping grounding at a given tier - BM25 search called once per mapping and reused across tiers - _validate_with_threshold() overrides fuzzy threshold per-call - Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently - Mark vocab_cache table as deprecated in schema.py Closes gap #3 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dening Refactor ground_mappings into a 3-tier coverage loop that relaxes both BM25 score threshold and fuzzy match threshold before giving up: - Tier 0 (strict): 0.5/80 — identical to previous behavior - Tier 1 (relaxed): 0.3/70 - Tier 2 (broad): 0.1/60 Key changes: - Extract _ground_single() for per-mapping grounding at a given tier - BM25 search called once per mapping and reused across tiers - _validate_with_threshold() overrides fuzzy threshold per-call - Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently - Mark vocab_cache table as deprecated in schema.py Closes gap #3 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dening Refactor ground_mappings into a 3-tier coverage loop that relaxes both BM25 score threshold and fuzzy match threshold before giving up: - Tier 0 (strict): 0.5/80 — identical to previous behavior - Tier 1 (relaxed): 0.3/70 - Tier 2 (broad): 0.1/60 Key changes: - Extract _ground_single() for per-mapping grounding at a given tier - BM25 search called once per mapping and reused across tiers - _validate_with_threshold() overrides fuzzy threshold per-call - Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently - Mark vocab_cache table as deprecated in schema.py Closes gap #3 from the code locator drift audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sory B1 — ledger/ast_diff.py is_cosmetic_change(before, after, lang) returns True only when two snippets differ by inter-token whitespace alone. Strategy: parse both with tree-sitter and compare a recursive (node.type, child_sigs | leaf_bytes) signature. Two trees with identical signatures differ only by whitespace tree-sitter doesn't represent as nodes; any structural or leaf-text difference returns False. Strict whitelist by design — variable renames, trailing commas, comment edits (incl. # type: ignore / // @ts-ignore / build tags), docstring edits, and string-literal changes all return False. False negatives (real cosmetic changes routed unbiased to L3) are cheap; false positives bias the future V2 caller-LLM verdict prompt toward "looks fine" — which is the exact failure mode Codex pass-7 finding #3 flagged. Reuses code_locator.indexing.symbol_extractor._get_parser and LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer can never silently disagree on supported languages. Supports python, javascript, typescript, java, go, rust, c_sharp (plus jsx → javascript and tsx → typescript via fallback). Unsupported langs, parse failures, and has_error trees fail safe to False. B2 — DriftEntry.cosmetic_hint Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive, non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints runs after the pure raw_decisions_to_drift_entries mapping and only touches drifted entries. Source of "before" / "after": HEAD bytes vs working-tree bytes via ledger.status.get_git_content, sliced to the region's (start_line, end_line). Language resolved from file extension via EXTENSION_LANGUAGE. Five fail-safe paths leave cosmetic_hint=False: non-drifted status, equal HEAD/working-tree bytes (no diff to classify), unsupported extension, invalid line range, exception during is_cosmetic_change. Codex invariants enforced by tests: * The hint NEVER mutates content_hash or any other DriftEntry field besides cosmetic_hint itself (test_content_hash_never_mutated). * The hint never gates drift surfacing or status — it is metadata consumed by the eventual V2 caller-LLM verdict prompt. Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19 sync_middleware, 6 bind, 7 phase3 integration. Zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sory B1 — ledger/ast_diff.py is_cosmetic_change(before, after, lang) returns True only when two snippets differ by inter-token whitespace alone. Strategy: parse both with tree-sitter and compare a recursive (node.type, child_sigs | leaf_bytes) signature. Two trees with identical signatures differ only by whitespace tree-sitter doesn't represent as nodes; any structural or leaf-text difference returns False. Strict whitelist by design — variable renames, trailing commas, comment edits (incl. # type: ignore / // @ts-ignore / build tags), docstring edits, and string-literal changes all return False. False negatives (real cosmetic changes routed unbiased to L3) are cheap; false positives bias the future V2 caller-LLM verdict prompt toward "looks fine" — which is the exact failure mode Codex pass-7 finding #3 flagged. Reuses code_locator.indexing.symbol_extractor._get_parser and LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer can never silently disagree on supported languages. Supports python, javascript, typescript, java, go, rust, c_sharp (plus jsx → javascript and tsx → typescript via fallback). Unsupported langs, parse failures, and has_error trees fail safe to False. B2 — DriftEntry.cosmetic_hint Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive, non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints runs after the pure raw_decisions_to_drift_entries mapping and only touches drifted entries. Source of "before" / "after": HEAD bytes vs working-tree bytes via ledger.status.get_git_content, sliced to the region's (start_line, end_line). Language resolved from file extension via EXTENSION_LANGUAGE. Five fail-safe paths leave cosmetic_hint=False: non-drifted status, equal HEAD/working-tree bytes (no diff to classify), unsupported extension, invalid line range, exception during is_cosmetic_change. Codex invariants enforced by tests: * The hint NEVER mutates content_hash or any other DriftEntry field besides cosmetic_hint itself (test_content_hash_never_mutated). * The hint never gates drift surfacing or status — it is metadata consumed by the eventual V2 caller-LLM verdict prompt. Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19 sync_middleware, 6 bind, 7 phase3 integration. Zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sory B1 — ledger/ast_diff.py is_cosmetic_change(before, after, lang) returns True only when two snippets differ by inter-token whitespace alone. Strategy: parse both with tree-sitter and compare a recursive (node.type, child_sigs | leaf_bytes) signature. Two trees with identical signatures differ only by whitespace tree-sitter doesn't represent as nodes; any structural or leaf-text difference returns False. Strict whitelist by design — variable renames, trailing commas, comment edits (incl. # type: ignore / // @ts-ignore / build tags), docstring edits, and string-literal changes all return False. False negatives (real cosmetic changes routed unbiased to L3) are cheap; false positives bias the future V2 caller-LLM verdict prompt toward "looks fine" — which is the exact failure mode Codex pass-7 finding #3 flagged. Reuses code_locator.indexing.symbol_extractor._get_parser and LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer can never silently disagree on supported languages. Supports python, javascript, typescript, java, go, rust, c_sharp (plus jsx → javascript and tsx → typescript via fallback). Unsupported langs, parse failures, and has_error trees fail safe to False. B2 — DriftEntry.cosmetic_hint Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive, non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints runs after the pure raw_decisions_to_drift_entries mapping and only touches drifted entries. Source of "before" / "after": HEAD bytes vs working-tree bytes via ledger.status.get_git_content, sliced to the region's (start_line, end_line). Language resolved from file extension via EXTENSION_LANGUAGE. Five fail-safe paths leave cosmetic_hint=False: non-drifted status, equal HEAD/working-tree bytes (no diff to classify), unsupported extension, invalid line range, exception during is_cosmetic_change. Codex invariants enforced by tests: * The hint NEVER mutates content_hash or any other DriftEntry field besides cosmetic_hint itself (test_content_hash_never_mutated). * The hint never gates drift surfacing or status — it is metadata consumed by the eventual V2 caller-LLM verdict prompt. Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19 sync_middleware, 6 bind, 7 phase3 integration. Zero regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…17 items) Comprehensive review-feedback round on PR BicameralAI#73 (CodeGenome Phase 3 / Issue BicameralAI#60). Eight MAJOR + seven minor CodeRabbit findings + two Devin CRITICAL bugs all addressed in this commit. ## Devin CRITICAL fixes ### `adapters/code_locator.py:134` — silent zero-neighbors bug `_resolve_symbol_id_for_span` referenced `self._validate_tool.config.sqlite_db_path`, but the dataclass attribute is `sqlite_db` (no `_path`). The `AttributeError` was silently swallowed by `neighbors_for`'s broad `except`, so the method ALWAYS returned `()` and the continuity Jaccard signal was permanently zero in production. Combined with CodeRabbit's "reuse SymbolDB handle" finding, the fix is to use `self._db` (already initialized in `_ensure_initialized`) instead of opening a new SQLite handle per call. ### `contracts.py:212` — missing skill update PR adds `continuity_resolutions: list[ContinuityResolution]` to `LinkCommitResponse` but no skill file was updated. Per CLAUDE.md's "Tool Changes Require Skill Changes" rule, this commit updates `skills/bicameral-sync/SKILL.md` with a callout explaining the new field and the auto-resolution behaviour (auto-resolved regions stripped from `pending_compliance_checks`; `needs_review` resolutions treated as advisory pendings). ## CodeRabbit MAJOR fixes (8) ### `adapters/code_locator.py:136` — combined with Devin fix above. ### `codegenome/bind_service.py:80` — partial-bind rollback The 4-step write sequence (upsert subject → upsert identity → has_identity edge → about edge) was fire-and-forget; a failure on edges 3 or 4 left orphaned `code_subject` and `subject_identity` rows. Added `_rollback_partial_bind` that deletes the freshly-written rows on edge-write failure. Equivalent to BEGIN/COMMIT semantics for the failure modes that matter, without rewriting the four upsert queries as one giant SurrealQL transaction. ### `handlers/link_commit.py:255` — DriftContext seeding `DriftContext` was hardcoded with `old_symbol_kind="unknown"` and `start_line=0, end_line=0`, permanently dropping the kind signal (20% of the continuity score) and reporting `ContinuityResolution.old_location` as `:0-0`. Now loads the bound region's actual span + identity_type via the new `ledger.queries.get_region_metadata` helper. Lookup failure falls back to the previous behaviour (response shape preserved). ### `ledger/adapter.py:365` — create-only continuity wrapper `upsert_code_region` (continuity-flow wrapper) delegated to `queries.upsert_code_region` which keys on `(file_path, symbol_name)` and silently reused IDs across same-file moves. Switched the wrapper to call the new `queries.create_code_region` (create-only) so every continuity redirect targets a distinct new region id and `update_binds_to_region` can actually redirect. ### `ledger/queries.py:1567` — link_decision_to_subject carries region_id A decision can bind multiple `code_region`s but the `about` edge stored only `decision -> code_subject`, flattening all subjects per decision. The continuity matcher couldn't disambiguate which identity belonged to a given drifted region. Added optional `region_id` parameter; threaded through `_persist_subject_and_identity` and `write_codegenome_identity`; bind handler now passes it. ### `ledger/queries.py:1602` — atomic redirect `update_binds_to_region` did DELETE-then-RELATE non-atomically. Failure on the second statement left the decision unbound. Wrapped in `BEGIN TRANSACTION ... COMMIT TRANSACTION`. Idempotency preserved via a pre-flight existence check on the new edge before attempting the transaction (SurrealDB v2 transaction rollback hides the underlying UNIQUE-collision message, so we can't catch it post-hoc). ### `plan-codegenome-phase-3.md:49` — already addressed in earlier commits. ## CodeRabbit minor fixes (7) - `CHANGELOG.md` — added v0.12.0 entry covering Phase 3 + the review hardening; v0.11.0 entry preserved. - `contracts.py:175` — replaced EN DASH (`–`) with ASCII hyphen (`-`) in `ContinuityResolution` docstring (RUF002). - `contracts.py:182` — constrained `confidence` to probability bounds via `Field(ge=0.0, le=1.0)`. Added `Field` import. - `docs/BACKLOG.md` — B2 (Issue BicameralAI#60) marked completed in PR BicameralAI#73. - `docs/SHADOW_GENOME.md:130` — fixed accidental `BicameralAI#3:` heading token by reformatting to `and BicameralAI#3:` (MD018). - `docs/SYSTEM_STATE.md:11,53` — added `text` language tag to unlabeled fenced blocks (MD040). - `plan-codegenome-phase-3.md:50` — already addressed. ## New ledger primitives (Phase 3 hardening) - `queries.create_code_region(...)` — create-only region (vs upsert) for continuity-redirect path. - `queries.get_region_metadata(region_id)` — single-query lookup of span + linked identity_type via graph traversal (`code_region <-binds_to<- decision ->about-> code_subject <-has_identity<- subject_identity.identity_type`). - Adapter wrappers for both, plus `link_decision_to_subject` extended with `region_id` kwarg. ## Verification - 105/105 codegenome + ledger regression tests pass on Windows local. - 140/141 broader regression (1 pre-existing failure `test_phase1_l1_wiring.py::test_backfill_restores_hash_but_stays_pending_without_verdict` fails identically on `dev` — Windows-env issue, not introduced by this commit). - All Python files parse cleanly. - Section 4 razor: every modified function ≤ 40 LOC; every modified file ≤ 250 LOC (or pre-existing exception, unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…i/ package Audit found: - F-1 (BLOCKING): `bicameral/` Python package does not exist; the proposed `bicameral/drift_report.py` module would be unbuildable. SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3 in this session. - F-2: dev tip moved 2 commits during planning (PR BicameralAI#106 + BicameralAI#107). Remediation: - Replace all 7 references: `bicameral/drift_report.py` → `cli/drift_report.py`; `bicameral.drift_report` (CLI module path) → `cli.drift_report`. - Update branch baseline note to current dev tip `c478dd5`. - The cli/ package was added by PR BicameralAI#107 (decision_level CLI for BicameralAI#77); it's the natural home for a CLI-invoked renderer. The plan's intent is preserved: pure-function renderer + GitHub Action workflow + sticky-comment poster. Re-audit pending.
… — audit PASS - META_LEDGER BicameralAI#15 on this branch: GATE TRIBUNAL entry covering v1 VETO (2f31d6f) + v2 PASS (7da919c). Chain hash b2925935 extends from BicameralAI#14 Phase 4 SEAL (0ebcf69b) on dev. Note: branches feat/44 and feat/49 each carry their own Entry BicameralAI#15 chain extension off dev's BicameralAI#14; reconciliation occurs at release time. - SHADOW_GENOME #5b: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3. Cross-references instances #1 (PR BicameralAI#93 §9) and BicameralAI#2 (feat/44 branch's Entry BicameralAI#5). Pattern is now triggered by plan author trusting mental model over filesystem `ls`. Mitigation: every plan must enumerate existing packages before proposing a new module's home. Plan PASS at 7da919c; chain to /qor-implement.
…iation seal Reality matches Promise. Three changes (2 repo files + 2 deferred external gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged advisory — manual verification mitigates); Section 4 razor clean. Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7). Implementation: Entry #25 chain hash 51c8a45c. Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5 Architectural decision recorded: bicameral-mcp#135's auto-resolve direction abandoned (no caller LLM in hook context, MCP sampling not viable in Claude Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync. The tooltip surfaces the pending state; the human in their session is the qualified judge. Plan addition tracking (Entry #24 preconditions, final state): ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here) 🟡 #1 — PR description manual verification step (composed in /qor-document) 🟡 #3 — #135 close comment README/docs deferral (composed in /qor-document) Surfaced for follow-up (not blocking): bicameral-mcp#125 scope should be widened — 7 skills under pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/ location claimed by pilot/mcp/CLAUDE.md. Spec correction queued (post-merge gh action): bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates exists when it does not; collision detection lives caller-side via bicameral-context-sentry skill, surfaces via bicameral.preflight.unresolved_collisions. Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/ runtime helpers absent (gate artifacts not written), tools/reliability/ validators absent (Steps 4.6–4.8 skipped), agent-teams not declared, codex-plugin not declared (solo audit/seal), intent_lock capture skipped. Refs #135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Priority C v0 — Phase 1 of the team-server vertical-slice for multi-dev decision continuity at organizational scale (per the Sales Enablement & Positioning Playbook + research-brief-priority-c-selective-ingest-2026-05-02.md). The team-server is a self-managing, customer-self-hosted Python service. Per CONCEPT.md anti-goals under literal-keyword parsing (SHADOW_GENOME Failure Entry #6 addendum): "no managed backend" forbids vendor SaaS and human-ops-tax architectures, NOT self-managing customer-deployable backends. Sentry self-hosted, Supabase OSS, and the embedded-SurrealDB philosophy already in repo are precedents. Scaffold delivered: - team_server/app.py (47 LOC): FastAPI app factory; lifespan migrates schema on startup, closes DB on teardown; /health endpoint - team_server/schema.py (80 LOC): v0 schema for workspace, channel_allowlist, extraction_cache, team_event tables. Idempotent ensure_schema(); migration dispatch table for future versions. FLEXIBLE TYPE object on canonical_extraction + payload (per #72 lesson + audit Advisory #3). - team_server/db.py (41 LOC): TeamServerDB factory wrapping ledger.client.LedgerClient with team-server's own ns/db pair. - deploy/team-server.docker-compose.yml + Dockerfile.team-server: single-service compose; volume for persistent data; healthcheck on /health; runs as non-root. Tests (6 functionality tests, all green): - tests/test_team_server_app.py: app starts + serves health, schema migrates from empty + idempotent, lifespan teardown closes DB, /health returns well-formed JSON. - tests/test_team_server_deploy.py: docker-compose config validates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit d76b419)
… to research brief (#205) Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the brief's review block. Tier C items + the subsequent Kilo / Gemini / Codex-2nd-pass review layers are tracked as follow-ups (will be surfaced in the PR thread for direction). Changes: - § 1.4 ingest pipeline: adds explicit "Risk amplification (durable-feedback-loop)" paragraph framing ingest as the durable write-surface that propagates poisoned content through preflight back into the agent's reasoning context. Strengthens LLM-01 + LLM-04 P0 defensibility (Codex #2). - § 1.8 skills surface: adds worked before/after example contrasting instruction-only `bicameral-report-bug` keys-only commitment vs the deterministic `_resolve_signer_email` gate that replaced it in #204. Makes the doctrine concrete for non-agent-systems readers (Codex #3). - § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4" promise to "intentionally not enumerated; activation PR authors TEAM-NN IDs against actual activated topology" (Codex #8). - § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now describes bicameral-mcp as an AI-adjacent developer-tool component whose risk-tier classification properly attaches to the integrated system + deployment context, requiring counsel review for any specific tier claim (Codex #7). - § 5 gap synthesis: adds Deployment trigger column (`all` / `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is defensible per deployment shape. SOC2-01 reclassified as pre-team/hosted P0 with local-only boundary statement; GDPR-05 reclassified as team/hosted P1 with local single-user P2; OWASP-03 reclassified as hosted P1 with local P2 (uv/pipx provides install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1). - Appendix method notes: softens "every claim should be verifiable by re-reading the cited file at the cited line range" to acknowledge that most findings cite components rather than path:line, and defers a line-level evidence appendix as a follow-up improvement (Codex #9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+Gemini+Codex-2 (#205) Authors a single Reviewer Disposition Pass table at the top of the brief reconciling all 32 review points across four review layers (Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one post-review consensus before downstream P1 issue-filing — per the explicit Codex-2 #1 directive. Decisions: 21 applied this commit, 6 already applied in 1d82658, 3 deferred to follow-up, 2 note-only. Net new gap IDs added per disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning + revocation), LLM-11 (cross-tool config-file modification surface), MCP-01 (host UX as external dependency), CFG-01 (config precedence + fail-closed model). Reclassification: LLM-06 P0/M → P1/M with scope narrowed to future remote-skill-loading channel (per Kilo #2). Major content additions to the brief: - § 1.1: MCP host UX is external dependency, not security gate (new gap MCP-01) — host that auto-approves tool calls bypasses any "operator will see this" assumption. - § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11). - § 1.7: cross-tool config-file modification surface (new gap LLM-11) distinct from skill-content surface — `setup_wizard` writes shell commands into `.claude/settings.json` that run host-side at hook fire. - § 1.11 (new): Configuration precedence + fail-closed model — single uniform precedence rule across all knobs (env > config.yaml > hardcoded defaults), fail-closed semantics on missing/malformed/ contradictory config (Codex-2 #5). - § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 + OWASP-04 (Kilo #13). - § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL (Kilo #3) — security context cannot rely on agent-filled params. - § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) + guardrail-not-classifier framing (Codex-1 #6) + control-acceptance template (Codex-2 #4) — quarantine, override, test fixtures, measurement counters. - § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3). - § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not agent-supplied confirm parameters (Kilo #3). - § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in current install model the wheel-trust covers it (Kilo #2). - § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed hooks-manifest.json) distinct from skill manifest. - § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and- rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini #1), or scope-out via PII detect-and-refuse. - § 2.1 (c) GDPR-02: data-subject-access search must cover full identifier surface (description, source_ref, topic, file paths) not just signer email (Codex-1 #5). - § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap, WAL, crash dumps) (Kilo #7). - § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics (Kilo #8 + Codex-2 #3). - § 5: gap table updated with new rows + LLM-06 reclassification; gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total, up from 41). - § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) — ingest boundary guardrails, per-tool authority gradation, supply- chain signing, telemetry & consent. - § 6.2 (new): six-section control-acceptance template for every DG gap (Codex-2 #4) — positive / negative / bypass / fail-closed / telemetry / docs. Filed-issue updates: - Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope narrowing, full disposition comment added. - Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added capturing the guardrail framing, classifier extensibility, and control-acceptance template applicable to both. Deferred for follow-up: Codex-1 #4 (controller/processor restructure of standards table), Codex-1 #9 (full evidence appendix beyond the methodology softening), Codex-2 #2 (full 3-column deployment-profile matrix beyond the single-column trigger). Brief now 706 lines (up from 606); +124 line diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l intent - Section 1: replace 'no HTTP transport' gap framing with 'BackendAdapter pipeline not yet wired'; add R1 answer summary - Section 1: flag #196 acceptance criteria re-scoping need - Section 2: reframe failure mode around BackendAdapter pipeline extension, not team-server absence - Section 3: 'team-server tier' → 'BackendAdapter-mediated team-mode' - A4: reframe from 'Option 3 eventually' to 'future BackendAdapter subclasses (S3, Supabase) are the extension point, not server runtimes'; preserve architectural intent for future iterations - A6: note question posted to @jinhongkuan - Section 5: non-goal BicameralAI#5 updated to reflect R1 (no server process of any kind); add forbidden interpretation BicameralAI#3 (architectural evolution through BackendAdapter subclasses); add #196 re-scoping note - Section 6: concept brief updated to BackendAdapter-mediated language - Section 8: 'team-server' → 'remote backend' in evidence/escalation - Section 9: failure classes updated to BackendAdapter-based scenarios - Section 10: readiness acknowledges open non-blocking A6 assumption - Research brief: fix Windows target path → repo-relative Co-Authored-By: kevin <kevin@bicameral-ai.com>
Summary
Ingest payload mismatch fixed: Added Pydantic input contracts (
IngestPayload,IngestMapping,IngestDecision, etc.) tocontracts.py. The_normalize_payload()function validates incoming payloads and converts Claude's natural transcript format ({decisions, action_items, open_questions}) to the internal{mappings}format. Shape mismatches now fail fast withValidationErrorinstead of silently returning zero stats.Automatic code graph freshness: Added
ensure_code_graph_fresh()toadapters/code_locator.py— a reusable guard that ensures BM25 index + SymbolDB exist and match HEAD. Wired into_auto_ground_via_searchand_resolve_symbols_to_regionsso ingest auto-grounds without requiring a priorsearch_codecall.Post-ingest sync:
handle_ingestnow callslink_commit(HEAD)after writing to the ledger, triggering re-grounding of previously ungrounded intents.Status auto-sync:
decision_statushandler now auto-syncs to HEAD before returning (matchingsearchanddriftbehavior).Test plan
pytest tests/passes (pre-existing SurrealDB test slowness may cause timeouts — not related to this change)bicameral_ingestwith natural transcript payload → verifyintents_created > 0bicameral_ingestwith internalmappingsformat → verify passthrough worksbicameral_status→ verify decisions appear after ingestsearch_code) → ingest still auto-grounds🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements