Skip to content

fix: ingest pipeline — input contracts, payload normalization, freshness guards#3

Merged
jinhongkuan merged 1 commit into
mainfrom
fix/ingest-pipeline-contracts-freshness
Apr 10, 2026
Merged

fix: ingest pipeline — input contracts, payload normalization, freshness guards#3
jinhongkuan merged 1 commit into
mainfrom
fix/ingest-pipeline-contracts-freshness

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Ingest payload mismatch fixed: Added Pydantic input contracts (IngestPayload, IngestMapping, IngestDecision, etc.) to contracts.py. The _normalize_payload() function validates incoming payloads and converts Claude's natural transcript format ({decisions, action_items, open_questions}) to the internal {mappings} format. Shape mismatches now fail fast with ValidationError instead of silently returning zero stats.

  • Automatic code graph freshness: Added ensure_code_graph_fresh() to adapters/code_locator.py — a reusable guard that ensures BM25 index + SymbolDB exist and match HEAD. Wired into _auto_ground_via_search and _resolve_symbols_to_regions so ingest auto-grounds without requiring a prior search_code call.

  • Post-ingest sync: handle_ingest now calls link_commit(HEAD) after writing to the ledger, triggering re-grounding of previously ungrounded intents.

  • Status auto-sync: decision_status handler now auto-syncs to HEAD before returning (matching search and drift behavior).

Test plan

  • pytest tests/ passes (pre-existing SurrealDB test slowness may cause timeouts — not related to this change)
  • Call bicameral_ingest with natural transcript payload → verify intents_created > 0
  • Call bicameral_ingest with internal mappings format → verify passthrough works
  • Call bicameral_status → verify decisions appear after ingest
  • Fresh MCP session (no prior search_code) → ingest still auto-grounds

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Ingest endpoint now accepts two alternative payload formats: internal mappings or natural decision/action format.
    • Auto-grounds decisions to code via semantic search.
    • Auto-links ingestions to repository commits.
  • Improvements

    • Status endpoint now auto-syncs the ledger to HEAD before returning status.

…ess guards

The bicameral_ingest tool silently returned all-zero stats when Claude
sent natural transcript payloads ({decisions, action_items, open_questions})
because the handler expected {mappings}. Additionally, auto-grounding
failed when the code graph index didn't exist yet.

- Add Pydantic input contracts (IngestPayload, IngestMapping, etc.) to
  contracts.py so payload shape mismatches fail fast with ValidationError
- Add _normalize_payload() that converts natural LLM format to internal
  mappings format via Pydantic validation
- Add ensure_code_graph_fresh() to code_locator adapter — reusable
  freshness guard that ensures BM25 index + SymbolDB match HEAD
- Wire freshness guard into _auto_ground_via_search and
  _resolve_symbols_to_regions so ingest auto-grounds without requiring
  a prior search_code call
- Add post-ingest link_commit(HEAD) to sync ledger and re-ground
  previously ungrounded intents
- Add auto-sync to decision_status handler (was read-only, now syncs
  to HEAD like search and drift already do)
- Update tool descriptions to document both accepted payload formats

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 10, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This change introduces support for two ingest payload formats (internal mappings and natural decision-based formats), adds code graph freshness validation, and enables automatic ledger synchronization before status checks and post-ingest operations. Five files are modified to implement payload normalization, new input contracts, and updated handler control flow.

Changes

Cohort / File(s) Summary
Code Graph Readiness
adapters/code_locator.py
Added ensure_code_graph_fresh() helper function that selects repository path from argument or environment, invokes runtime setup, and validates/builds code index against current HEAD.
Ingest Input Contracts
contracts.py
Introduced six new Pydantic models: IngestSpan, IngestCodeRegion, IngestMapping, IngestDecision, IngestActionItem, and IngestPayload to define schema for two mutually alternative ingest formats (internal mappings or natural LLM decisions).
Handler Auto-Sync Integration
handlers/decision_status.py, handlers/ingest.py
Added automatic ledger synchronization logic: decision status handler calls handle_link_commit() before fetching decisions; ingest handler ensures fresh code graph before symbol resolution and performs post-ingest commit linking. Both wrap auto-sync calls in exception handling with logged warnings.
Ingest Payload Normalization
handlers/ingest.py
Implemented _normalize_payload() function that validates incoming payloads via IngestPayload.model_validate, converts decision/action-item format to internal mappings format with generated intents and span metadata, or passes through validated mappings unchanged.
Tool Descriptions
server.py
Updated MCP tool metadata: bicameral.status now documents auto-sync to HEAD behavior; bicameral.ingest describes two accepted payload formats and code graph freshness assurance.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant IngestHandler as Ingest Handler
    participant PayloadNorm as Payload Normalizer
    participant CodeGraph as Code Graph
    participant SymbolDB as Symbol DB
    participant LinkCommit as Link Commit Handler
    
    Client->>IngestHandler: POST /ingest (payload)
    IngestHandler->>PayloadNorm: _normalize_payload(payload)
    Note over PayloadNorm: Validate & convert decisions→mappings
    PayloadNorm->>IngestHandler: normalized mappings
    IngestHandler->>CodeGraph: ensure_code_graph_fresh(repo)
    Note over CodeGraph: Rebuild index if needed
    CodeGraph->>IngestHandler: ✓ ready
    IngestHandler->>SymbolDB: resolve symbols
    Note over SymbolDB: Auto-ground via BM25
    SymbolDB->>IngestHandler: resolved regions
    IngestHandler->>LinkCommit: handle_link_commit("HEAD")
    Note over LinkCommit: Sync ledger to HEAD
    LinkCommit->>IngestHandler: ✓ linked
    IngestHandler->>Client: IngestResponse
Loading
sequenceDiagram
    participant Client
    participant StatusHandler as Status Handler
    participant LinkCommit as Link Commit Handler
    participant Ledger as Ledger
    
    Client->>StatusHandler: GET /status
    StatusHandler->>LinkCommit: handle_link_commit(ref)
    Note over LinkCommit: Auto-sync before read
    LinkCommit->>Ledger: sync to HEAD
    Ledger->>LinkCommit: ✓ synced
    LinkCommit->>StatusHandler: ✓ complete
    StatusHandler->>Ledger: fetch decisions
    Ledger->>StatusHandler: decisions
    StatusHandler->>Client: DecisionStatusResponse
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Fresh graphs and mappings galore,
Two formats where once was just one door,
Auto-sync keeps us in sync with the HEAD,
Decisions to code—a path clear and well-bred! 🌿✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: adding input contracts, implementing payload normalization, and adding freshness guards to the ingest pipeline.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ingest-pipeline-contracts-freshness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@adapters/code_locator.py`:
- Around line 24-34: The current synchronous ensure_code_graph_fresh calls
ensure_index_matches_repo inline which can trigger long rebuilds; change it so
the expensive work runs off the request thread by either making
ensure_code_graph_fresh async and awaiting asyncio.to_thread(lambda:
ensure_index_matches_repo(repo, config)) or keeping a sync entry that schedules
asyncio.get_running_loop().create_task(asyncio.to_thread(...)) and returns
immediately; update callers of ensure_code_graph_fresh to await the new async
variant where necessary (or to not block on the scheduled task) and keep
references to ensure_code_graph_fresh and ensure_index_matches_repo to locate
and modify the call site.

In `@contracts.py`:
- Around line 167-188: Add a Pydantic model-level validator to IngestPayload
that enforces the contract: either mappings is non-empty XOR at least one of
decisions / action_items / open_questions is non-empty; if neither or both are
provided raise a ValueError so malformed payloads fail fast. Implement this as a
`@root_validator` in the IngestPayload class that examines values["mappings"],
values["decisions"], values["action_items"], and values["open_questions"], and
raises with a clear message; this prevents _normalize_payload()'s silent
zero-mapping fallback path from accepting empty/mixed payloads.

In `@handlers/decision_status.py`:
- Around line 26-31: The handler currently swallows exceptions from
handle_link_commit(ref) and proceeds to build a DecisionStatusResponse which may
report a stale view; instead either re-raise the exception so the request fails
like the search/drift handlers or set an explicit degraded flag (e.g.,
sync_failed) on the DecisionStatusResponse and skip returning a normal status;
update the try/except around handle_link_commit in decision_status.py to either
remove the exception catch (allowing exceptions to propagate) or replace the
logger.warning with logic that records the sync failure into the
DecisionStatusResponse object so callers can see the degraded state.

In `@handlers/ingest.py`:
- Around line 149-152: The code refreshes the graph for variable repo via
ensure_code_graph_fresh(repo) but then calls get_code_locator() which defaults
to the REPO_PATH env var, causing mismatch when payload["repo"] differs; update
the call site to pass the same repo into get_code_locator (e.g.,
get_code_locator(repo)) and ensure_code_graph_fresh(repo) is invoked with the
same repo variable, and also update adapters/code_locator.get_code_locator to
accept an optional repo_path parameter (as suggested) so RealCodeLocatorAdapter
is instantiated with repo_path=repo_path; in short: propagate the repo variable
into get_code_locator to guarantee refresh and searches target the same
repository.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfa78f11-1745-4c04-addd-05c5cad2b9f9

📥 Commits

Reviewing files that changed from the base of the PR and between 06120fc and 558714e.

📒 Files selected for processing (5)
  • adapters/code_locator.py
  • contracts.py
  • handlers/decision_status.py
  • handlers/ingest.py
  • server.py

Comment thread adapters/code_locator.py
Comment on lines +24 to +34
def ensure_code_graph_fresh(repo_path: str | None = None) -> None:
"""Ensure the code graph index exists and matches HEAD.

Safe to call multiple times — only rebuilds if stale.
Called automatically by tools that depend on the code graph.
"""
repo = repo_path or os.getenv("REPO_PATH", ".")
ensure_runtime_env()
from code_locator.config import load_config
config = load_config()
ensure_index_matches_repo(repo, config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid running freshness rebuilds inline on the request path.

ensure_index_matches_repo() can cold-start or fully rebuild the sqlite/BM25 artifacts. Exposing that work through a synchronous helper is risky now that ingest/status flows can call it during normal tool execution: one stale repo can block the whole MCP server until indexing finishes. Please push this behind an async/off-thread wrapper, or require callers to invoke it via asyncio.to_thread(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@adapters/code_locator.py` around lines 24 - 34, The current synchronous
ensure_code_graph_fresh calls ensure_index_matches_repo inline which can trigger
long rebuilds; change it so the expensive work runs off the request thread by
either making ensure_code_graph_fresh async and awaiting
asyncio.to_thread(lambda: ensure_index_matches_repo(repo, config)) or keeping a
sync entry that schedules
asyncio.get_running_loop().create_task(asyncio.to_thread(...)) and returns
immediately; update callers of ensure_code_graph_fresh to await the new async
variant where necessary (or to not block on the scheduled task) and keep
references to ensure_code_graph_fresh and ensure_index_matches_repo to locate
and modify the call site.

Comment thread contracts.py
Comment on lines +167 to +188
class IngestPayload(BaseModel):
"""Ingest input — accepts EITHER mappings (internal) or decisions (natural LLM).

If ``mappings`` is present, it's used directly (internal pipeline format).
If ``decisions`` is present, they are normalized into mappings automatically.
"""
# Common fields
repo: str = ""
commit_hash: str = ""
query: str = ""

# Internal pipeline format
mappings: list[IngestMapping] = []

# Natural LLM-generated format (normalized into mappings if present)
source: str = "manual"
title: str = ""
date: str = ""
participants: list[str] = []
decisions: list[IngestDecision] = []
action_items: list[IngestActionItem] = []
open_questions: list[str] = []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Locate contracts.py and handlers/ingest.py
git ls-files | grep -E "(contracts\.py|ingest\.py)"

Repository: BicameralAI/bicameral-mcp

Length of output: 100


🏁 Script executed:

# Check contracts.py to see if IngestPayload exists at lines 167-188
head -200 contracts.py | tail -50

Repository: BicameralAI/bicameral-mcp

Length of output: 1421


🏁 Script executed:

# Get context around the IngestPayload class
rg "class IngestPayload" -A 25 contracts.py

Repository: BicameralAI/bicameral-mcp

Length of output: 881


🏁 Script executed:

# Check handlers/ingest.py for the model_validate and normalization logic
cat -n handlers/ingest.py | head -110

Repository: BicameralAI/bicameral-mcp

Length of output: 4438


🏁 Script executed:

# Check imports in contracts.py to see current Pydantic usage
head -30 contracts.py

Repository: BicameralAI/bicameral-mcp

Length of output: 911


🏁 Script executed:

# Search for any existing validators on IngestPayload
rg "@.*validator|from pydantic" contracts.py -B 2 -A 10

Repository: BicameralAI/bicameral-mcp

Length of output: 396


🏁 Script executed:

# Check pyproject.toml or requirements for Pydantic version
fd -e "pyproject.toml" -o -e "requirements.txt" -o -e "requirements*.txt" | head -5 | xargs cat 2>/dev/null | grep -i pydantic

Repository: BicameralAI/bicameral-mcp

Length of output: 185


🏁 Script executed:

# Check Pydantic version in config files
find . -maxdepth 2 -type f \( -name "pyproject.toml" -o -name "requirements*.txt" -o -name "setup.py" \) -exec cat {} \; | grep -i pydantic

Repository: BicameralAI/bicameral-mcp

Length of output: 105


🏁 Script executed:

# Verify that contracts.py currently has NO validators on IngestPayload
ast-grep --pattern $'class IngestPayload {
  $$$
}'

Repository: BicameralAI/bicameral-mcp

Length of output: 51


🏁 Script executed:

# Double-check the exact behavior: empty payload validation
python3 << 'EOF'
# Simulate what happens with empty dict and IngestPayload
# to confirm defaults allow empty input
from typing import List
from pydantic import BaseModel

class MockIngestDecision(BaseModel):
    id: str = ""
    title: str = ""

class MockIngestActionItem(BaseModel):
    owner: str = "unassigned"
    action: str = ""

class MockIngestPayload(BaseModel):
    repo: str = ""
    commit_hash: str = ""
    query: str = ""
    mappings: List[dict] = []
    source: str = "manual"
    title: str = ""
    date: str = ""
    participants: List[str] = []
    decisions: List[MockIngestDecision] = []
    action_items: List[MockIngestActionItem] = []
    open_questions: List[str] = []

# Test 1: empty dict
try:
    p1 = MockIngestPayload.model_validate({})
    print("✓ Empty dict validates: OK")
    print(f"  mappings={p1.mappings}, decisions={p1.decisions}, action_items={p1.action_items}, open_questions={p1.open_questions}")
except Exception as e:
    print(f"✗ Empty dict rejected: {e}")

# Test 2: metadata only
try:
    p2 = MockIngestPayload.model_validate({"repo": "test", "commit_hash": "abc"})
    print("✓ Metadata-only payload validates: OK")
except Exception as e:
    print(f"✗ Metadata-only rejected: {e}")

# Test 3: mixed (mappings + decisions)
try:
    p3 = MockIngestPayload.model_validate({"mappings": [{"x": 1}], "decisions": [{"id": "d1"}]})
    print("✓ Mixed payload (mappings + decisions) validates: OK")
except Exception as e:
    print(f"✗ Mixed payload rejected: {e}")
EOF

Repository: BicameralAI/bicameral-mcp

Length of output: 250


🏁 Script executed:

# Search for any other validation logic that might catch these cases
rg "_normalize_payload|IngestPayload" handlers/ingest.py -B 3 -A 15 | head -80

Repository: BicameralAI/bicameral-mcp

Length of output: 2560


Enforce the ingest shape instead of defaulting to empty payloads.

IngestPayload currently accepts {}, metadata-only payloads, and mixed mappings + natural fields because every collection defaults to []. _normalize_payload() then falls back to the silent zero-mapping path (lines 92–97), so malformed input doesn't fail fast. This violates the contract stated in the docstring ("accepts EITHER mappings or decisions"). Add a model-level validator to enforce: non-empty mappings xor at least one of decisions / action_items / open_questions.

Proposed fix
-from pydantic import BaseModel
+from pydantic import BaseModel, model_validator
...
 class IngestPayload(BaseModel):
     """Ingest input — accepts EITHER mappings (internal) or decisions (natural LLM).
@@
     decisions: list[IngestDecision] = []
     action_items: list[IngestActionItem] = []
     open_questions: list[str] = []
+
+    `@model_validator`(mode="after")
+    def validate_shape(self) -> "IngestPayload":
+        has_mappings = bool(self.mappings)
+        has_natural = bool(self.decisions or self.action_items or self.open_questions)
+        if has_mappings == has_natural:
+            raise ValueError(
+                "Provide exactly one ingest format: non-empty `mappings` or at least one "
+                "of `decisions`, `action_items`, `open_questions`."
+            )
+        return self
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@contracts.py` around lines 167 - 188, Add a Pydantic model-level validator to
IngestPayload that enforces the contract: either mappings is non-empty XOR at
least one of decisions / action_items / open_questions is non-empty; if neither
or both are provided raise a ValueError so malformed payloads fail fast.
Implement this as a `@root_validator` in the IngestPayload class that examines
values["mappings"], values["decisions"], values["action_items"], and
values["open_questions"], and raises with a clear message; this prevents
_normalize_payload()'s silent zero-mapping fallback path from accepting
empty/mixed payloads.

Comment on lines +26 to +31
# Auto-sync to HEAD so status reflects current code state
try:
from handlers.link_commit import handle_link_commit
await handle_link_commit(ref)
except Exception as exc:
logger.warning("[status] auto-sync failed: %s", exc)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't return status after a failed auto-sync.

handle_link_commit() can fail before the ledger reflects ref. Catching that and continuing here means DecisionStatusResponse may contain stale statuses while still claiming the requested ref/as-of view. I'd either let the exception propagate, like the search/drift handlers do, or surface an explicit degraded sync_failed state in the response.

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 30-30: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@handlers/decision_status.py` around lines 26 - 31, The handler currently
swallows exceptions from handle_link_commit(ref) and proceeds to build a
DecisionStatusResponse which may report a stale view; instead either re-raise
the exception so the request fails like the search/drift handlers or set an
explicit degraded flag (e.g., sync_failed) on the DecisionStatusResponse and
skip returning a normal status; update the try/except around handle_link_commit
in decision_status.py to either remove the exception catch (allowing exceptions
to propagate) or replace the logger.warning with logic that records the sync
failure into the DecisionStatusResponse object so callers can see the degraded
state.

Comment thread handlers/ingest.py
Comment on lines +149 to 152
from adapters.code_locator import get_code_locator, ensure_code_graph_fresh
from code_locator.indexing.sqlite_store import SymbolDB
ensure_code_graph_fresh(repo)
locator = get_code_locator()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the same repo for refresh and search.

This refreshes the code graph for repo, then creates the locator via get_code_locator(), which still reads REPO_PATH. If payload["repo"] differs from the process env, ingest can rebuild one repo and run BM25/symbol validation against another, which will yield wrong or empty grounding.

🔧 Proposed fix
-        from adapters.code_locator import get_code_locator, ensure_code_graph_fresh
+        from adapters.code_locator import get_code_locator, ensure_code_graph_fresh
         from code_locator.indexing.sqlite_store import SymbolDB
         ensure_code_graph_fresh(repo)
-        locator = get_code_locator()
+        locator = get_code_locator(repo_path=repo)

Companion change outside this hunk:

# adapters/code_locator.py
def get_code_locator(repo_path: str | None = None):
    repo = repo_path or os.getenv("REPO_PATH", ".")
    return RealCodeLocatorAdapter(repo_path=repo)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@handlers/ingest.py` around lines 149 - 152, The code refreshes the graph for
variable repo via ensure_code_graph_fresh(repo) but then calls
get_code_locator() which defaults to the REPO_PATH env var, causing mismatch
when payload["repo"] differs; update the call site to pass the same repo into
get_code_locator (e.g., get_code_locator(repo)) and
ensure_code_graph_fresh(repo) is invoked with the same repo variable, and also
update adapters/code_locator.get_code_locator to accept an optional repo_path
parameter (as suggested) so RealCodeLocatorAdapter is instantiated with
repo_path=repo_path; in short: propagate the repo variable into get_code_locator
to guarantee refresh and searches target the same repository.

@jinhongkuan jinhongkuan merged commit 8114503 into main Apr 10, 2026
1 check passed
silongtan added a commit that referenced this pull request Apr 11, 2026
…dening

Refactor ground_mappings into a 3-tier coverage loop that relaxes both
BM25 score threshold and fuzzy match threshold before giving up:
- Tier 0 (strict): 0.5/80 — identical to previous behavior
- Tier 1 (relaxed): 0.3/70
- Tier 2 (broad): 0.1/60

Key changes:
- Extract _ground_single() for per-mapping grounding at a given tier
- BM25 search called once per mapping and reused across tiers
- _validate_with_threshold() overrides fuzzy threshold per-call
- Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently
- Mark vocab_cache table as deprecated in schema.py

Closes gap #3 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 11, 2026
…dening

Refactor ground_mappings into a 3-tier coverage loop that relaxes both
BM25 score threshold and fuzzy match threshold before giving up:
- Tier 0 (strict): 0.5/80 — identical to previous behavior
- Tier 1 (relaxed): 0.3/70
- Tier 2 (broad): 0.1/60

Key changes:
- Extract _ground_single() for per-mapping grounding at a given tier
- BM25 search called once per mapping and reused across tiers
- _validate_with_threshold() overrides fuzzy threshold per-call
- Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently
- Mark vocab_cache table as deprecated in schema.py

Closes gap #3 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 14, 2026
…dening

Refactor ground_mappings into a 3-tier coverage loop that relaxes both
BM25 score threshold and fuzzy match threshold before giving up:
- Tier 0 (strict): 0.5/80 — identical to previous behavior
- Tier 1 (relaxed): 0.3/70
- Tier 2 (broad): 0.1/60

Key changes:
- Extract _ground_single() for per-mapping grounding at a given tier
- BM25 search called once per mapping and reused across tiers
- _validate_with_threshold() overrides fuzzy threshold per-call
- Both Stage 1 and Stage 2 use the tier's fuzzy threshold consistently
- Mark vocab_cache table as deprecated in schema.py

Closes gap #3 from the code locator drift audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
…sory

B1 — ledger/ast_diff.py
  is_cosmetic_change(before, after, lang) returns True only when two
  snippets differ by inter-token whitespace alone. Strategy: parse both
  with tree-sitter and compare a recursive (node.type, child_sigs |
  leaf_bytes) signature. Two trees with identical signatures differ
  only by whitespace tree-sitter doesn't represent as nodes; any
  structural or leaf-text difference returns False.

  Strict whitelist by design — variable renames, trailing commas,
  comment edits (incl. # type: ignore / // @ts-ignore / build tags),
  docstring edits, and string-literal changes all return False. False
  negatives (real cosmetic changes routed unbiased to L3) are cheap;
  false positives bias the future V2 caller-LLM verdict prompt toward
  "looks fine" — which is the exact failure mode Codex pass-7 finding
  #3 flagged.

  Reuses code_locator.indexing.symbol_extractor._get_parser and
  LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer
  can never silently disagree on supported languages. Supports
  python, javascript, typescript, java, go, rust, c_sharp (plus
  jsx → javascript and tsx → typescript via fallback). Unsupported
  langs, parse failures, and has_error trees fail safe to False.

B2 — DriftEntry.cosmetic_hint
  Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive,
  non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints
  runs after the pure raw_decisions_to_drift_entries mapping and only
  touches drifted entries.

  Source of "before" / "after": HEAD bytes vs working-tree bytes via
  ledger.status.get_git_content, sliced to the region's
  (start_line, end_line). Language resolved from file extension via
  EXTENSION_LANGUAGE.

  Five fail-safe paths leave cosmetic_hint=False: non-drifted status,
  equal HEAD/working-tree bytes (no diff to classify), unsupported
  extension, invalid line range, exception during is_cosmetic_change.

  Codex invariants enforced by tests:
    * The hint NEVER mutates content_hash or any other DriftEntry
      field besides cosmetic_hint itself (test_content_hash_never_mutated).
    * The hint never gates drift surfacing or status — it is metadata
      consumed by the eventual V2 caller-LLM verdict prompt.

Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19
sync_middleware, 6 bind, 7 phase3 integration. Zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…sory

B1 — ledger/ast_diff.py
  is_cosmetic_change(before, after, lang) returns True only when two
  snippets differ by inter-token whitespace alone. Strategy: parse both
  with tree-sitter and compare a recursive (node.type, child_sigs |
  leaf_bytes) signature. Two trees with identical signatures differ
  only by whitespace tree-sitter doesn't represent as nodes; any
  structural or leaf-text difference returns False.

  Strict whitelist by design — variable renames, trailing commas,
  comment edits (incl. # type: ignore / // @ts-ignore / build tags),
  docstring edits, and string-literal changes all return False. False
  negatives (real cosmetic changes routed unbiased to L3) are cheap;
  false positives bias the future V2 caller-LLM verdict prompt toward
  "looks fine" — which is the exact failure mode Codex pass-7 finding
  #3 flagged.

  Reuses code_locator.indexing.symbol_extractor._get_parser and
  LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer
  can never silently disagree on supported languages. Supports
  python, javascript, typescript, java, go, rust, c_sharp (plus
  jsx → javascript and tsx → typescript via fallback). Unsupported
  langs, parse failures, and has_error trees fail safe to False.

B2 — DriftEntry.cosmetic_hint
  Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive,
  non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints
  runs after the pure raw_decisions_to_drift_entries mapping and only
  touches drifted entries.

  Source of "before" / "after": HEAD bytes vs working-tree bytes via
  ledger.status.get_git_content, sliced to the region's
  (start_line, end_line). Language resolved from file extension via
  EXTENSION_LANGUAGE.

  Five fail-safe paths leave cosmetic_hint=False: non-drifted status,
  equal HEAD/working-tree bytes (no diff to classify), unsupported
  extension, invalid line range, exception during is_cosmetic_change.

  Codex invariants enforced by tests:
    * The hint NEVER mutates content_hash or any other DriftEntry
      field besides cosmetic_hint itself (test_content_hash_never_mutated).
    * The hint never gates drift surfacing or status — it is metadata
      consumed by the eventual V2 caller-LLM verdict prompt.

Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19
sync_middleware, 6 bind, 7 phase3 integration. Zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
…sory

B1 — ledger/ast_diff.py
  is_cosmetic_change(before, after, lang) returns True only when two
  snippets differ by inter-token whitespace alone. Strategy: parse both
  with tree-sitter and compare a recursive (node.type, child_sigs |
  leaf_bytes) signature. Two trees with identical signatures differ
  only by whitespace tree-sitter doesn't represent as nodes; any
  structural or leaf-text difference returns False.

  Strict whitelist by design — variable renames, trailing commas,
  comment edits (incl. # type: ignore / // @ts-ignore / build tags),
  docstring edits, and string-literal changes all return False. False
  negatives (real cosmetic changes routed unbiased to L3) are cheap;
  false positives bias the future V2 caller-LLM verdict prompt toward
  "looks fine" — which is the exact failure mode Codex pass-7 finding
  #3 flagged.

  Reuses code_locator.indexing.symbol_extractor._get_parser and
  LANGUAGE_FALLBACK so the cosmetic detector and the symbol indexer
  can never silently disagree on supported languages. Supports
  python, javascript, typescript, java, go, rust, c_sharp (plus
  jsx → javascript and tsx → typescript via fallback). Unsupported
  langs, parse failures, and has_error trees fail safe to False.

B2 — DriftEntry.cosmetic_hint
  Adds cosmetic_hint: bool = False on contracts.DriftEntry (additive,
  non-breaking). handlers/detect_drift._enrich_with_cosmetic_hints
  runs after the pure raw_decisions_to_drift_entries mapping and only
  touches drifted entries.

  Source of "before" / "after": HEAD bytes vs working-tree bytes via
  ledger.status.get_git_content, sliced to the region's
  (start_line, end_line). Language resolved from file extension via
  EXTENSION_LANGUAGE.

  Five fail-safe paths leave cosmetic_hint=False: non-drifted status,
  equal HEAD/working-tree bytes (no diff to classify), unsupported
  extension, invalid line range, exception during is_cosmetic_change.

  Codex invariants enforced by tests:
    * The hint NEVER mutates content_hash or any other DriftEntry
      field besides cosmetic_hint itself (test_content_hash_never_mutated).
    * The hint never gates drift surfacing or status — it is metadata
      consumed by the eventual V2 caller-LLM verdict prompt.

Tests: 61 passed in 3.81s — 21 ast_diff, 8 b2_cosmetic_hint, 19
sync_middleware, 6 bind, 7 phase3 integration. Zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 28, 2026
…17 items)

Comprehensive review-feedback round on PR BicameralAI#73 (CodeGenome Phase 3 /
Issue BicameralAI#60). Eight MAJOR + seven minor CodeRabbit findings + two Devin
CRITICAL bugs all addressed in this commit.

## Devin CRITICAL fixes

### `adapters/code_locator.py:134` — silent zero-neighbors bug

`_resolve_symbol_id_for_span` referenced `self._validate_tool.config.sqlite_db_path`,
but the dataclass attribute is `sqlite_db` (no `_path`). The
`AttributeError` was silently swallowed by `neighbors_for`'s broad
`except`, so the method ALWAYS returned `()` and the continuity
Jaccard signal was permanently zero in production. Combined with
CodeRabbit's "reuse SymbolDB handle" finding, the fix is to use
`self._db` (already initialized in `_ensure_initialized`) instead of
opening a new SQLite handle per call.

### `contracts.py:212` — missing skill update

PR adds `continuity_resolutions: list[ContinuityResolution]` to
`LinkCommitResponse` but no skill file was updated. Per CLAUDE.md's
"Tool Changes Require Skill Changes" rule, this commit updates
`skills/bicameral-sync/SKILL.md` with a callout explaining the new
field and the auto-resolution behaviour (auto-resolved regions
stripped from `pending_compliance_checks`; `needs_review` resolutions
treated as advisory pendings).

## CodeRabbit MAJOR fixes (8)

### `adapters/code_locator.py:136` — combined with Devin fix above.

### `codegenome/bind_service.py:80` — partial-bind rollback

The 4-step write sequence (upsert subject → upsert identity →
has_identity edge → about edge) was fire-and-forget; a failure on
edges 3 or 4 left orphaned `code_subject` and `subject_identity`
rows. Added `_rollback_partial_bind` that deletes the freshly-written
rows on edge-write failure. Equivalent to BEGIN/COMMIT semantics for
the failure modes that matter, without rewriting the four upsert
queries as one giant SurrealQL transaction.

### `handlers/link_commit.py:255` — DriftContext seeding

`DriftContext` was hardcoded with `old_symbol_kind="unknown"` and
`start_line=0, end_line=0`, permanently dropping the kind signal
(20% of the continuity score) and reporting
`ContinuityResolution.old_location` as `:0-0`. Now loads the bound
region's actual span + identity_type via the new
`ledger.queries.get_region_metadata` helper. Lookup failure falls
back to the previous behaviour (response shape preserved).

### `ledger/adapter.py:365` — create-only continuity wrapper

`upsert_code_region` (continuity-flow wrapper) delegated to
`queries.upsert_code_region` which keys on `(file_path, symbol_name)`
and silently reused IDs across same-file moves. Switched the wrapper
to call the new `queries.create_code_region` (create-only) so every
continuity redirect targets a distinct new region id and
`update_binds_to_region` can actually redirect.

### `ledger/queries.py:1567` — link_decision_to_subject carries region_id

A decision can bind multiple `code_region`s but the `about` edge
stored only `decision -> code_subject`, flattening all subjects per
decision. The continuity matcher couldn't disambiguate which
identity belonged to a given drifted region. Added optional
`region_id` parameter; threaded through `_persist_subject_and_identity`
and `write_codegenome_identity`; bind handler now passes it.

### `ledger/queries.py:1602` — atomic redirect

`update_binds_to_region` did DELETE-then-RELATE non-atomically.
Failure on the second statement left the decision unbound. Wrapped
in `BEGIN TRANSACTION ... COMMIT TRANSACTION`. Idempotency preserved
via a pre-flight existence check on the new edge before attempting
the transaction (SurrealDB v2 transaction rollback hides the
underlying UNIQUE-collision message, so we can't catch it
post-hoc).

### `plan-codegenome-phase-3.md:49` — already addressed in earlier commits.

## CodeRabbit minor fixes (7)

- `CHANGELOG.md` — added v0.12.0 entry covering Phase 3 + the review
  hardening; v0.11.0 entry preserved.
- `contracts.py:175` — replaced EN DASH (`–`) with ASCII hyphen (`-`)
  in `ContinuityResolution` docstring (RUF002).
- `contracts.py:182` — constrained `confidence` to probability bounds
  via `Field(ge=0.0, le=1.0)`. Added `Field` import.
- `docs/BACKLOG.md` — B2 (Issue BicameralAI#60) marked completed in PR BicameralAI#73.
- `docs/SHADOW_GENOME.md:130` — fixed accidental `BicameralAI#3:` heading token
  by reformatting to `and BicameralAI#3:` (MD018).
- `docs/SYSTEM_STATE.md:11,53` — added `text` language tag to
  unlabeled fenced blocks (MD040).
- `plan-codegenome-phase-3.md:50` — already addressed.

## New ledger primitives (Phase 3 hardening)

- `queries.create_code_region(...)` — create-only region (vs upsert)
  for continuity-redirect path.
- `queries.get_region_metadata(region_id)` — single-query lookup of
  span + linked identity_type via graph traversal
  (`code_region <-binds_to<- decision ->about-> code_subject
   <-has_identity<- subject_identity.identity_type`).
- Adapter wrappers for both, plus `link_decision_to_subject` extended
  with `region_id` kwarg.

## Verification

- 105/105 codegenome + ledger regression tests pass on Windows local.
- 140/141 broader regression (1 pre-existing failure
  `test_phase1_l1_wiring.py::test_backfill_restores_hash_but_stays_pending_without_verdict`
  fails identically on `dev` — Windows-env issue, not introduced by
  this commit).
- All Python files parse cleanly.
- Section 4 razor: every modified function ≤ 40 LOC; every modified
  file ≤ 250 LOC (or pre-existing exception, unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
…i/ package

Audit found:
- F-1 (BLOCKING): `bicameral/` Python package does not exist; the
  proposed `bicameral/drift_report.py` module would be unbuildable.
  SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3 in this session.
- F-2: dev tip moved 2 commits during planning (PR BicameralAI#106 + BicameralAI#107).

Remediation:
- Replace all 7 references: `bicameral/drift_report.py` →
  `cli/drift_report.py`; `bicameral.drift_report` (CLI module path)
  → `cli.drift_report`.
- Update branch baseline note to current dev tip `c478dd5`.
- The cli/ package was added by PR BicameralAI#107 (decision_level CLI for
  BicameralAI#77); it's the natural home for a CLI-invoked renderer.

The plan's intent is preserved: pure-function renderer + GitHub
Action workflow + sticky-comment poster. Re-audit pending.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
… — audit PASS

- META_LEDGER BicameralAI#15 on this branch: GATE TRIBUNAL entry covering
  v1 VETO (2f31d6f) + v2 PASS (7da919c). Chain hash b2925935
  extends from BicameralAI#14 Phase 4 SEAL (0ebcf69b) on dev.
  Note: branches feat/44 and feat/49 each carry their own
  Entry BicameralAI#15 chain extension off dev's BicameralAI#14; reconciliation
  occurs at release time.
- SHADOW_GENOME #5b: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3.
  Cross-references instances #1 (PR BicameralAI#93 §9) and BicameralAI#2 (feat/44
  branch's Entry BicameralAI#5). Pattern is now triggered by plan author
  trusting mental model over filesystem `ls`. Mitigation:
  every plan must enumerate existing packages before proposing
  a new module's home.

Plan PASS at 7da919c; chain to /qor-implement.
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…iation seal

Reality matches Promise. Three changes (2 repo files + 2 deferred external
gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged
advisory — manual verification mitigates); Section 4 razor clean.

Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7).
Implementation: Entry #25 chain hash 51c8a45c.
Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5

Architectural decision recorded: bicameral-mcp#135's auto-resolve direction
abandoned (no caller LLM in hook context, MCP sampling not viable in Claude
Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync.
The tooltip surfaces the pending state; the human in their session is the
qualified judge.

Plan addition tracking (Entry #24 preconditions, final state):
  ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here)
  🟡 #1 — PR description manual verification step (composed in /qor-document)
  🟡 #3#135 close comment README/docs deferral (composed in /qor-document)

Surfaced for follow-up (not blocking):
  bicameral-mcp#125 scope should be widened — 7 skills under
  pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/
  location claimed by pilot/mcp/CLAUDE.md.

Spec correction queued (post-merge gh action):
  bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates
  exists when it does not; collision detection lives caller-side via
  bicameral-context-sentry skill, surfaces via
  bicameral.preflight.unresolved_collisions.

Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/
runtime helpers absent (gate artifacts not written), tools/reliability/
validators absent (Steps 4.6–4.8 skipped), agent-teams not declared,
codex-plugin not declared (solo audit/seal), intent_lock capture skipped.

Refs #135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 3, 2026
Priority C v0 — Phase 1 of the team-server vertical-slice for multi-dev
decision continuity at organizational scale (per the Sales Enablement &
Positioning Playbook + research-brief-priority-c-selective-ingest-2026-05-02.md).

The team-server is a self-managing, customer-self-hosted Python service.
Per CONCEPT.md anti-goals under literal-keyword parsing (SHADOW_GENOME
Failure Entry #6 addendum): "no managed backend" forbids vendor SaaS
and human-ops-tax architectures, NOT self-managing customer-deployable
backends. Sentry self-hosted, Supabase OSS, and the embedded-SurrealDB
philosophy already in repo are precedents.

Scaffold delivered:
- team_server/app.py (47 LOC): FastAPI app factory; lifespan migrates
  schema on startup, closes DB on teardown; /health endpoint
- team_server/schema.py (80 LOC): v0 schema for workspace,
  channel_allowlist, extraction_cache, team_event tables. Idempotent
  ensure_schema(); migration dispatch table for future versions.
  FLEXIBLE TYPE object on canonical_extraction + payload (per #72
  lesson + audit Advisory #3).
- team_server/db.py (41 LOC): TeamServerDB factory wrapping
  ledger.client.LedgerClient with team-server's own ns/db pair.
- deploy/team-server.docker-compose.yml + Dockerfile.team-server:
  single-service compose; volume for persistent data; healthcheck on
  /health; runs as non-root.

Tests (6 functionality tests, all green):
- tests/test_team_server_app.py: app starts + serves health, schema
  migrates from empty + idempotent, lifespan teardown closes DB,
  /health returns well-formed JSON.
- tests/test_team_server_deploy.py: docker-compose config validates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 3, 2026
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit d76b419)
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
… to research brief (#205)

Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the
brief's review block. Tier C items + the subsequent Kilo / Gemini /
Codex-2nd-pass review layers are tracked as follow-ups (will be
surfaced in the PR thread for direction).

Changes:

- § 1.4 ingest pipeline: adds explicit "Risk amplification
  (durable-feedback-loop)" paragraph framing ingest as the durable
  write-surface that propagates poisoned content through preflight
  back into the agent's reasoning context. Strengthens LLM-01 + LLM-04
  P0 defensibility (Codex #2).
- § 1.8 skills surface: adds worked before/after example contrasting
  instruction-only `bicameral-report-bug` keys-only commitment vs the
  deterministic `_resolve_signer_email` gate that replaced it in #204.
  Makes the doctrine concrete for non-agent-systems readers (Codex #3).
- § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4"
  promise to "intentionally not enumerated; activation PR authors
  TEAM-NN IDs against actual activated topology" (Codex #8).
- § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now
  describes bicameral-mcp as an AI-adjacent developer-tool component
  whose risk-tier classification properly attaches to the integrated
  system + deployment context, requiring counsel review for any
  specific tier claim (Codex #7).
- § 5 gap synthesis: adds Deployment trigger column (`all` /
  `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is
  defensible per deployment shape. SOC2-01 reclassified as
  pre-team/hosted P0 with local-only boundary statement; GDPR-05
  reclassified as team/hosted P1 with local single-user P2; OWASP-03
  reclassified as hosted P1 with local P2 (uv/pipx provides
  install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1).
- Appendix method notes: softens "every claim should be verifiable by
  re-reading the cited file at the cited line range" to acknowledge
  that most findings cite components rather than path:line, and
  defers a line-level evidence appendix as a follow-up improvement
  (Codex #9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
…+Gemini+Codex-2 (#205)

Authors a single Reviewer Disposition Pass table at the top of the
brief reconciling all 32 review points across four review layers
(Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one
post-review consensus before downstream P1 issue-filing — per the
explicit Codex-2 #1 directive.

Decisions: 21 applied this commit, 6 already applied in 1d82658,
3 deferred to follow-up, 2 note-only. Net new gap IDs added per
disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning
+ revocation), LLM-11 (cross-tool config-file modification surface),
MCP-01 (host UX as external dependency), CFG-01 (config precedence
+ fail-closed model). Reclassification: LLM-06 P0/M → P1/M with
scope narrowed to future remote-skill-loading channel (per Kilo #2).

Major content additions to the brief:

- § 1.1: MCP host UX is external dependency, not security gate (new
  gap MCP-01) — host that auto-approves tool calls bypasses any
  "operator will see this" assumption.
- § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11).
- § 1.7: cross-tool config-file modification surface (new gap LLM-11)
  distinct from skill-content surface — `setup_wizard` writes shell
  commands into `.claude/settings.json` that run host-side at hook fire.
- § 1.11 (new): Configuration precedence + fail-closed model — single
  uniform precedence rule across all knobs (env > config.yaml >
  hardcoded defaults), fail-closed semantics on missing/malformed/
  contradictory config (Codex-2 #5).
- § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 +
  OWASP-04 (Kilo #13).
- § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL
  (Kilo #3) — security context cannot rely on agent-filled params.
- § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) +
  guardrail-not-classifier framing (Codex-1 #6) + control-acceptance
  template (Codex-2 #4) — quarantine, override, test fixtures,
  measurement counters.
- § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3).
- § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not
  agent-supplied confirm parameters (Kilo #3).
- § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in
  current install model the wheel-trust covers it (Kilo #2).
- § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed
  hooks-manifest.json) distinct from skill manifest.
- § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and-
  rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini
  #1), or scope-out via PII detect-and-refuse.
- § 2.1 (c) GDPR-02: data-subject-access search must cover full
  identifier surface (description, source_ref, topic, file paths) not
  just signer email (Codex-1 #5).
- § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap,
  WAL, crash dumps) (Kilo #7).
- § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics
  (Kilo #8 + Codex-2 #3).
- § 5: gap table updated with new rows + LLM-06 reclassification;
  gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total,
  up from 41).
- § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) —
  ingest boundary guardrails, per-tool authority gradation, supply-
  chain signing, telemetry & consent.
- § 6.2 (new): six-section control-acceptance template for every DG
  gap (Codex-2 #4) — positive / negative / bypass / fail-closed /
  telemetry / docs.

Filed-issue updates:
- Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope
  narrowing, full disposition comment added.
- Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added
  capturing the guardrail framing, classifier extensibility, and
  control-acceptance template applicable to both.

Deferred for follow-up: Codex-1 #4 (controller/processor
restructure of standards table), Codex-1 #9 (full evidence appendix
beyond the methodology softening), Codex-2 #2 (full 3-column
deployment-profile matrix beyond the single-column trigger).

Brief now 706 lines (up from 606); +124 line diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
…l intent

- Section 1: replace 'no HTTP transport' gap framing with
  'BackendAdapter pipeline not yet wired'; add R1 answer summary
- Section 1: flag #196 acceptance criteria re-scoping need
- Section 2: reframe failure mode around BackendAdapter pipeline
  extension, not team-server absence
- Section 3: 'team-server tier' → 'BackendAdapter-mediated team-mode'
- A4: reframe from 'Option 3 eventually' to 'future BackendAdapter
  subclasses (S3, Supabase) are the extension point, not server
  runtimes'; preserve architectural intent for future iterations
- A6: note question posted to @jinhongkuan
- Section 5: non-goal BicameralAI#5 updated to reflect R1 (no server process of
  any kind); add forbidden interpretation BicameralAI#3 (architectural evolution
  through BackendAdapter subclasses); add #196 re-scoping note
- Section 6: concept brief updated to BackendAdapter-mediated language
- Section 8: 'team-server' → 'remote backend' in evidence/escalation
- Section 9: failure classes updated to BackendAdapter-based scenarios
- Section 10: readiness acknowledges open non-blocking A6 assumption
- Research brief: fix Windows target path → repo-relative

Co-Authored-By: kevin <kevin@bicameral-ai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant