refactor: port interfaces + source_span for drift pipeline#1
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 4 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds drift-analysis support: new ports and DriftResult, a HashDriftAnalyzer implementation, ledger schema/queries for source spans, ledger adapter integration to accept a drift analyzer, and factory exports; handlers now pass a drift analyzer into ledger ingestion. (48 words) Changes
Sequence DiagramsequenceDiagram
participant Handler as Handler (link_commit)
participant Adapter as SurrealDBLedgerAdapter
participant Analyzer as HashDriftAnalyzer
participant DB as SurrealDB
Handler->>Adapter: ingest_commit(commit_hash, repo_path, drift_analyzer)
Adapter->>Analyzer: analyze_region(file_path, symbol_name, start_line, end_line, stored_hash, repo_path, ref)
Analyzer->>Analyzer: resolve_symbol_lines()/compute_content_hash()/derive_status()
Analyzer-->>Adapter: DriftResult(status, content_hash, confidence)
Adapter->>DB: upsert_source_span(text, source_type, source_ref)
DB-->>Adapter: span_id
Adapter->>DB: relate_yields(span_id, intent_id)
Adapter->>DB: update region (content_hash, start_line, end_line)
Adapter-->>Handler: ingest result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ledger/adapter.py`:
- Around line 178-187: ingest_commit is calling drift_analyzer.analyze_region
without passing source_context/source_span, so any SemanticDriftAnalyzer will
receive empty text; update the call site in ingest_commit to extract the
region's source_span.text (or the region.intent/description if text missing)
from the output of get_regions_for_files (the same structure that stores
source_span) and pass it as the new source_context/source_span (use the
parameter name expected by analyze_region) when invoking
drift_analyzer.analyze_region; ensure you reference the region object returned
by get_regions_for_files, read region.source_span.text (falling back to
intent/description), and include that value in the analyze_region call so
semantic comparison can run.
In `@ledger/queries.py`:
- Around line 470-473: The fallback CREATE for source_span in the code path
after the UPSERT (where rows is empty) omits provenance fields and loses
metadata; update the fallback query in ledger/queries.py (the CREATE source_span
statement executed when rows == []) to include the speakers and meeting_date
fields and pass their values in the params dict (same keys used in the UPSERT
branch, e.g., "speakers" and "meeting_date" or whatever variable names are
declared in the surrounding function) so both branches preserve the same
metadata.
In `@ledger/schema.py`:
- Around line 64-71: The schema lacks a unique index for the deduplication key
used by upsert_source_span(), causing possible duplicates; add a UNIQUE index on
source_span that covers source_type, source_ref and text (in addition to or
replacing idx_span_ref) so the DB enforces uniqueness across those three fields
and aligns with the UPSERT WHERE clause in upsert_source_span().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 56757df4-7f65-496d-936e-8d931ddc6c8f
📒 Files selected for processing (8)
adapters/code_locator.pyadapters/ledger.pyhandlers/link_commit.pyledger/adapter.pyledger/drift.pyledger/queries.pyledger/schema.pyports.py
Decouple handlers from concrete code analysis and drift detection backends via Protocol types, enabling independent development of Phase 1 (collaboration), drift intelligence (L2 AST + L3 LLM), and future CodeGenome integration. Changes: - ports.py: DriftAnalyzerPort, CodeIntelligencePort, DriftResult - ledger/drift.py: HashDriftAnalyzer (L1 hash-only implementation) - ledger/schema.py: source_span table + yields edge (provenance) - ledger/queries.py: upsert_source_span, relate_yields - ledger/adapter.py: ingest_commit() delegates to DriftAnalyzerPort, ingest_payload() creates source_span records - adapters/: get_drift_analyzer() + get_code_intelligence() factories - handlers/link_commit.py: wires drift_analyzer through factory Drift Layer Status: L1 (hash): ✅ Implemented (HashDriftAnalyzer) L2 (AST): ⬜ Pending — collaborator WIP L3 (semantic): ⬜ Pending — needs L2 + source_span traversal Zero behavior change. All CI tests pass (phase1: 6/6, phase2: 13/13). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightweight version-gated migration for the SurrealDB ledger: - schema_meta table tracks current schema version - migrate() runs on connect, after init_schema() - v0→v1: stamps version for source_span + yields addition - Future breaking migrations clear affected tables + log warning Version 0.3.0 reflects: port interfaces, source_span entity, schema migration system. SCHEMA_VERSION is separate from package version (incremented only on breaking schema changes). Once Phase 1 (event-sourced spec log) ships, breaking migrations become lossless: drop local DB → re-materialize from git events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f8a6251 to
6a55c04
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
ledger/adapter.py (1)
179-188:⚠️ Potential issue | 🟠 MajorPlumb
source_contextinto the analyzer call.The new port was expanded specifically for L3 semantic checks, but this call still relies on the default
"". A future semantic analyzer swapped in viaget_drift_analyzer()will never see thesource_span.textthis PR stores. Please extendget_regions_for_files()to return the related source text (or at least the intent description as fallback) and pass it through here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ledger/adapter.py` around lines 179 - 188, The call to drift_analyzer.analyze_region currently omits the source context so L3 semantic analyzers won't receive source_span.text; update get_regions_for_files() (the producer) to include the source text (or fall back to the intent description) in each region tuple/object and then pass that value as the source_context (or similar named param) into drift_analyzer.analyze_region in ledger.adapter (where drift_result is created), ensuring analyze_region/source_context and any calls via get_drift_analyzer() consistently accept and forward the source text to downstream analyzers.ledger/schema.py (1)
79-86:⚠️ Potential issue | 🟠 MajorAdd a DB-enforced unique key for
source_spandedupe.
upsert_source_span()deduplicates on(source_type, source_ref, text), but the schema only indexes(source_type, source_ref). If two ingests race past the lookup path, identical excerpts can still be persisted twice. Please add a unique index that matches the full dedupe key.Suggested schema change
"DEFINE FIELD meeting_date ON source_span TYPE string DEFAULT ''", "DEFINE FIELD created_at ON source_span TYPE datetime DEFAULT time::now()", "DEFINE INDEX idx_span_ref ON source_span FIELDS source_type, source_ref", + "DEFINE INDEX idx_span_dedupe ON source_span FIELDS source_type, source_ref, text UNIQUE",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ledger/schema.py` around lines 79 - 86, The schema currently only indexes (source_type, source_ref) so concurrent upserts can insert duplicate excerpts; add a DB-enforced unique index on the full dedupe key to match upsert_source_span()’s logic by defining a unique index such as "DEFINE INDEX idx_span_unique ON source_span FIELDS source_type, source_ref, text UNIQUE" (leave existing idx_span_ref if desired) so identical (source_type, source_ref, text) rows cannot be persisted concurrently.
🧹 Nitpick comments (1)
ledger/adapter.py (1)
195-203: Avoid resolving symbol lines twice per region.
HashDriftAnalyzer.analyze_region()already doesresolve_symbol_lines(), then this block does the same work again just to persist the updated bounds. On larger commits this doubles the tree-sitter/git lookup cost on the hot path. Consider returning the resolved span inDriftResult, or moving all line-resolution responsibility into the adapter.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ledger/adapter.py` around lines 195 - 203, The adapter is re-calling resolve_symbol_lines() for each region even though HashDriftAnalyzer.analyze_region() already resolves spans; change the data flow so the analyzer returns the resolved span (e.g., add resolved_span or resolved_start/resolved_end fields to DriftResult from HashDriftAnalyzer.analyze_region()) and then in the adapter (where you currently call resolve_symbol_lines() and run the UPDATE via self._client.query) use the resolved values from the returned DriftResult instead of re-invoking resolve_symbol_lines(); alternatively, centralize line-resolution in the adapter by removing resolve_symbol_lines() from analyze_region and having analyze_region return only raw drift info—update the code paths that consume DriftResult and the DB UPDATE logic in the adapter (the region_id/start_line/end_line update block) accordingly so no duplicate tree-sitter/git lookups occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ledger/schema.py`:
- Around line 120-122: Make the yields relation idempotent so replayed ingest
doesn't create duplicate source_span→intent edges: either add a uniqueness
constraint to the yields relation (unique on its endpoints) in the schema for
the yields table, or change the write path to dedupe before creating edges —
update ledger/schema.py to enforce uniqueness on yields endpoints (the yields
relation) OR modify ledger/queries.py where RELATE is used (and the
relate_yields() call in ledger/adapter.py) to perform a conditional/merge-style
create (check existence first or use a RELATE/MERGE-unique variant) so duplicate
edges are not inserted.
---
Duplicate comments:
In `@ledger/adapter.py`:
- Around line 179-188: The call to drift_analyzer.analyze_region currently omits
the source context so L3 semantic analyzers won't receive source_span.text;
update get_regions_for_files() (the producer) to include the source text (or
fall back to the intent description) in each region tuple/object and then pass
that value as the source_context (or similar named param) into
drift_analyzer.analyze_region in ledger.adapter (where drift_result is created),
ensuring analyze_region/source_context and any calls via get_drift_analyzer()
consistently accept and forward the source text to downstream analyzers.
In `@ledger/schema.py`:
- Around line 79-86: The schema currently only indexes (source_type, source_ref)
so concurrent upserts can insert duplicate excerpts; add a DB-enforced unique
index on the full dedupe key to match upsert_source_span()’s logic by defining a
unique index such as "DEFINE INDEX idx_span_unique ON source_span FIELDS
source_type, source_ref, text UNIQUE" (leave existing idx_span_ref if desired)
so identical (source_type, source_ref, text) rows cannot be persisted
concurrently.
---
Nitpick comments:
In `@ledger/adapter.py`:
- Around line 195-203: The adapter is re-calling resolve_symbol_lines() for each
region even though HashDriftAnalyzer.analyze_region() already resolves spans;
change the data flow so the analyzer returns the resolved span (e.g., add
resolved_span or resolved_start/resolved_end fields to DriftResult from
HashDriftAnalyzer.analyze_region()) and then in the adapter (where you currently
call resolve_symbol_lines() and run the UPDATE via self._client.query) use the
resolved values from the returned DriftResult instead of re-invoking
resolve_symbol_lines(); alternatively, centralize line-resolution in the adapter
by removing resolve_symbol_lines() from analyze_region and having analyze_region
return only raw drift info—update the code paths that consume DriftResult and
the DB UPDATE logic in the adapter (the region_id/start_line/end_line update
block) accordingly so no duplicate tree-sitter/git lookups occur.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12affbaa-f92c-4307-8724-4f7da291b13c
📒 Files selected for processing (10)
RECOMMENDED_VERSIONadapters/code_locator.pyadapters/ledger.pyhandlers/link_commit.pyledger/adapter.pyledger/drift.pyledger/queries.pyledger/schema.pyports.pypyproject.toml
✅ Files skipped from review due to trivial changes (5)
- RECOMMENDED_VERSION
- pyproject.toml
- adapters/ledger.py
- ledger/queries.py
- ports.py
🚧 Files skipped from review as they are similar to previous changes (2)
- adapters/code_locator.py
- handlers/link_commit.py
…, fallback metadata 1. Plumb source_context into analyze_region() — collect intent descriptions from linked intents and pass through to drift analyzer. L1 ignores it; L3 (semantic) will use it for compliance checking. 2. Add UNIQUE index on source_span (source_type, source_ref, text) to enforce dedup at the DB level, matching the UPSERT WHERE clause. 3. Include speakers + meeting_date in fallback CREATE path of upsert_source_span() so metadata isn't silently lost. 4. Enrich get_regions_for_files() to return intent.description alongside id and status. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r pipeline Scenario #1: route ingest through handle_ingest() (which runs _auto_ground_via_search) instead of calling ledger.ingest_payload() directly. Removes the xfail. Scenario BicameralAI#6: after committing new code, call handle_link_commit() which triggers _reground_ungrounded to re-ground previously ungrounded intents. Removes the xfail. Sprint scorecard: 77% → 92% (12/13 pass). Only BicameralAI#9 (intent supersession) remains — carried to next sprint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changes since v0.2.15: - Close XFAIL #1/BicameralAI#6: auto-grounding and re-grounding now wired through handler pipeline (92% desync score) - eval_code_locator.py: --multi-repo, --min-mrr, --max-repo-variance flags with proper exit codes (regression gate) - Gibberish test descriptions to resist auto-grounding in CI - README upscaled to professional OSS standard (credit: MythologIQ) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uzzy Two fixes from Codex review feedback: 1. P1 #1: Filter weak BM25 scores (< 0.1) when no fuzzy matches exist. Prevents false groundings for under-specified queries that pass the FC-1 guard but have only noise-level BM25 hits. 2. P1 #2: Stage 2 now runs as enrichment after fuzzy direct lookup, not just as fallback. When fuzzy finds symbols from a subset of files, file-level retrieval fills remaining budget with additional files — preserving multi-file feature discovery. 49/49 tests pass. Eval metrics stable: MRR@5=0.605, Recall=20.0%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iation seal Reality matches Promise. Three changes (2 repo files + 2 deferred external gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged advisory — manual verification mitigates); Section 4 razor clean. Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7). Implementation: Entry #25 chain hash 51c8a45c. Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5 Architectural decision recorded: bicameral-mcp#135's auto-resolve direction abandoned (no caller LLM in hook context, MCP sampling not viable in Claude Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync. The tooltip surfaces the pending state; the human in their session is the qualified judge. Plan addition tracking (Entry #24 preconditions, final state): ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here) 🟡 #1 — PR description manual verification step (composed in /qor-document) 🟡 #3 — #135 close comment README/docs deferral (composed in /qor-document) Surfaced for follow-up (not blocking): bicameral-mcp#125 scope should be widened — 7 skills under pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/ location claimed by pilot/mcp/CLAUDE.md. Spec correction queued (post-merge gh action): bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates exists when it does not; collision detection lives caller-side via bicameral-context-sentry skill, surfaces via bicameral.preflight.unresolved_collisions. Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/ runtime helpers absent (gate artifacts not written), tools/reliability/ validators absent (Steps 4.6–4.8 skipped), agent-teams not declared, codex-plugin not declared (solo audit/seal), intent_lock capture skipped. Refs #135. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves four conflicts: - .gitignore: keeps both qor:seed block and .claude/worktrees ignore. - docs/META_LEDGER.md: takes dev's chain (#1-#26) as canonical. This branch's parallel #7-#14 entries (sealed against the obsolete Entry #6 base before dev added #7-#26) are not folded in here — they need to be re-authored against dev's #26 seal via /qor-meta-log-decision in a follow-up. Their session content is preserved in commit messages (f4de501, 3f856af) and in docs/SYSTEM_STATE.md. - docs/SYSTEM_STATE.md: keeps both the v0 process cleanup / preflight hook session blocks from this branch and dev's #124 implementation block. - skills/bicameral-preflight/SKILL.md: keeps both the new "Hook reinforcement" subsection (this branch) and the Telemetry section (dev). The earlier removal of Telemetry in this branch's f4de501 was authored against a stale base; restoring it here.
…rompts (#146) Closes #146 — Flow 2 in tests/e2e/run_e2e_flows.py fails because bicameral.preflight does not auto-fire in headless `claude -p` even when the user prompt explicitly contradicts a prior decision. The existing SKILL.md auto-fire description has plateaued; the agent's default tool-selection priority puts Bash/Glob ahead of preflight. Solution: deterministic UserPromptSubmit hook that detects code-implementation intent via shared verb list and injects an authoritative <system-reminder> elevating preflight above file-inspection tools. Architecture (Hickey razor): - Verb list lives once in scripts/hooks/preflight_intent.py as data (frozenset). Future UI configurability is a one-edit change. - should_fire_preflight(): pure function, 11 lines, depth 2, no network, no LLM, sub-millisecond regex scan. - preflight_reminder.py: 9-line UserPromptSubmit hook entry point; fail-permissive (exit 0 + empty response on errors); never blocks the user. - v0 verb-list duplication between SKILL.md description (frontmatter) and the Python module is documented honestly in the SKILL.md addendum per audit Advisory #1, not papered over with a false SSOT claim. Tests: 11 functionality tests (TDD-light invariant — every test invokes the unit and asserts on output, no presence-only patterns): - 6 classifier tests covering all 30 verbs, 3 skip patterns, indirect intent, data shape, the literal Flow 2 contradiction prompt - 5 hook subprocess tests covering match/no-match/malformed-stdin/ idempotent invocations + Flow 2 fixture Authoritative integration test: tests/e2e/run_e2e_flows.py::test_flow_2 on dev branch (preflight tool_use.id must precede first non-bicameral discovery tool in the stream-json transcript). QorLogic SDLC artifacts: plan-preflight-autofire-hook.md, META_LEDGER Entries #11-#14 (PLAN, GATE PASS, IMPLEMENT, SUBSTANTIATE seal). Merkle seal: 33007d2a72fe3db237935216e063327750896d595faa15001757761e43a8e83c Risk grade: L2 (blast radius: every user prompt; individual-action risk: small + bounded + reversible) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of plan-priority-c-team-server-slack-v0.md — adds the polling worker, the canonical-extraction cache that closes the multi-dev extraction-divergence gap, and the peer-author event writer. Multi-dev convergence mechanism: any dev's session that touches a Slack message produces the SAME canonical extraction across the team because get_or_compute is keyed on (source_type, source_ref, content_hash) and the cache row is shared via the team-server's append-only event log. This is what closes Playbook Pillar #1 (Decision Continuity) at multi-dev/multi-agent scale that the v1 brief incorrectly framed as "build curation only" — see SHADOW_GENOME Failure Entry #6. The interim canonical extraction uses Anthropic Claude with the model_version='interim-claude-v1' tombstone so a future Phase 5 (CocoIndex #136 integration) can identify and rebuild interim entries when memoized deterministic transforms become available. Files added: - team_server/extraction/canonical_cache.py (45 LOC): get_or_compute() returns cached extraction OR invokes compute_fn + persists. - team_server/extraction/llm_extractor.py: interim Anthropic-backed extractor; production-default until CocoIndex lands. - team_server/sync/peer_writer.py (42 LOC): write_team_event() — appends to team_event with author_email='team-server@<team_id>.bicameral' identity (single-bot per workspace; multi-instance HA is v1). - team_server/workers/slack_worker.py (100 LOC): poll_once() — conversations_history per allowlisted channel, content_hash dedup, cache-keyed extraction, peer event write per new message. Tests (6 functionality tests, all green): - tests/test_team_server_canonical_cache.py: cache hit returns existing without compute_fn, cache miss persists + subsequent call returns cached, content_hash variation produces new rows. - tests/test_team_server_slack_worker.py: worker polls only allowlisted channels, writes one team_event per message with peer-author identity, dedups via message ts (idempotent re-poll). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rompts (#146) Closes #146 — Flow 2 in tests/e2e/run_e2e_flows.py fails because bicameral.preflight does not auto-fire in headless `claude -p` even when the user prompt explicitly contradicts a prior decision. The existing SKILL.md auto-fire description has plateaued; the agent's default tool-selection priority puts Bash/Glob ahead of preflight. Solution: deterministic UserPromptSubmit hook that detects code-implementation intent via shared verb list and injects an authoritative <system-reminder> elevating preflight above file-inspection tools. Architecture (Hickey razor): - Verb list lives once in scripts/hooks/preflight_intent.py as data (frozenset). Future UI configurability is a one-edit change. - should_fire_preflight(): pure function, 11 lines, depth 2, no network, no LLM, sub-millisecond regex scan. - preflight_reminder.py: 9-line UserPromptSubmit hook entry point; fail-permissive (exit 0 + empty response on errors); never blocks the user. - v0 verb-list duplication between SKILL.md description (frontmatter) and the Python module is documented honestly in the SKILL.md addendum per audit Advisory #1, not papered over with a false SSOT claim. Tests: 11 functionality tests (TDD-light invariant — every test invokes the unit and asserts on output, no presence-only patterns): - 6 classifier tests covering all 30 verbs, 3 skip patterns, indirect intent, data shape, the literal Flow 2 contradiction prompt - 5 hook subprocess tests covering match/no-match/malformed-stdin/ idempotent invocations + Flow 2 fixture Authoritative integration test: tests/e2e/run_e2e_flows.py::test_flow_2 on dev branch (preflight tool_use.id must precede first non-bicameral discovery tool in the stream-json transcript). QorLogic SDLC artifacts: plan-preflight-autofire-hook.md, META_LEDGER Entries #11-#14 (PLAN, GATE PASS, IMPLEMENT, SUBSTANTIATE seal). Merkle seal: 33007d2a72fe3db237935216e063327750896d595faa15001757761e43a8e83c Risk grade: L2 (blast radius: every user prompt; individual-action risk: small + bounded + reversible) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ca02b68)
… to research brief (#205) Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the brief's review block. Tier C items + the subsequent Kilo / Gemini / Codex-2nd-pass review layers are tracked as follow-ups (will be surfaced in the PR thread for direction). Changes: - § 1.4 ingest pipeline: adds explicit "Risk amplification (durable-feedback-loop)" paragraph framing ingest as the durable write-surface that propagates poisoned content through preflight back into the agent's reasoning context. Strengthens LLM-01 + LLM-04 P0 defensibility (Codex #2). - § 1.8 skills surface: adds worked before/after example contrasting instruction-only `bicameral-report-bug` keys-only commitment vs the deterministic `_resolve_signer_email` gate that replaced it in #204. Makes the doctrine concrete for non-agent-systems readers (Codex #3). - § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4" promise to "intentionally not enumerated; activation PR authors TEAM-NN IDs against actual activated topology" (Codex #8). - § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now describes bicameral-mcp as an AI-adjacent developer-tool component whose risk-tier classification properly attaches to the integrated system + deployment context, requiring counsel review for any specific tier claim (Codex #7). - § 5 gap synthesis: adds Deployment trigger column (`all` / `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is defensible per deployment shape. SOC2-01 reclassified as pre-team/hosted P0 with local-only boundary statement; GDPR-05 reclassified as team/hosted P1 with local single-user P2; OWASP-03 reclassified as hosted P1 with local P2 (uv/pipx provides install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1). - Appendix method notes: softens "every claim should be verifiable by re-reading the cited file at the cited line range" to acknowledge that most findings cite components rather than path:line, and defers a line-level evidence appendix as a follow-up improvement (Codex #9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+Gemini+Codex-2 (#205) Authors a single Reviewer Disposition Pass table at the top of the brief reconciling all 32 review points across four review layers (Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one post-review consensus before downstream P1 issue-filing — per the explicit Codex-2 #1 directive. Decisions: 21 applied this commit, 6 already applied in 1d82658, 3 deferred to follow-up, 2 note-only. Net new gap IDs added per disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning + revocation), LLM-11 (cross-tool config-file modification surface), MCP-01 (host UX as external dependency), CFG-01 (config precedence + fail-closed model). Reclassification: LLM-06 P0/M → P1/M with scope narrowed to future remote-skill-loading channel (per Kilo #2). Major content additions to the brief: - § 1.1: MCP host UX is external dependency, not security gate (new gap MCP-01) — host that auto-approves tool calls bypasses any "operator will see this" assumption. - § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11). - § 1.7: cross-tool config-file modification surface (new gap LLM-11) distinct from skill-content surface — `setup_wizard` writes shell commands into `.claude/settings.json` that run host-side at hook fire. - § 1.11 (new): Configuration precedence + fail-closed model — single uniform precedence rule across all knobs (env > config.yaml > hardcoded defaults), fail-closed semantics on missing/malformed/ contradictory config (Codex-2 #5). - § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 + OWASP-04 (Kilo #13). - § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL (Kilo #3) — security context cannot rely on agent-filled params. - § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) + guardrail-not-classifier framing (Codex-1 #6) + control-acceptance template (Codex-2 #4) — quarantine, override, test fixtures, measurement counters. - § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3). - § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not agent-supplied confirm parameters (Kilo #3). - § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in current install model the wheel-trust covers it (Kilo #2). - § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed hooks-manifest.json) distinct from skill manifest. - § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and- rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini #1), or scope-out via PII detect-and-refuse. - § 2.1 (c) GDPR-02: data-subject-access search must cover full identifier surface (description, source_ref, topic, file paths) not just signer email (Codex-1 #5). - § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap, WAL, crash dumps) (Kilo #7). - § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics (Kilo #8 + Codex-2 #3). - § 5: gap table updated with new rows + LLM-06 reclassification; gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total, up from 41). - § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) — ingest boundary guardrails, per-tool authority gradation, supply- chain signing, telemetry & consent. - § 6.2 (new): six-section control-acceptance template for every DG gap (Codex-2 #4) — positive / negative / bypass / fail-closed / telemetry / docs. Filed-issue updates: - Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope narrowing, full disposition comment added. - Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added capturing the guardrail framing, classifier extensibility, and control-acceptance template applicable to both. Deferred for follow-up: Codex-1 #4 (controller/processor restructure of standards table), Codex-1 #9 (full evidence appendix beyond the methodology softening), Codex-2 #2 (full 3-column deployment-profile matrix beyond the single-column trigger). Brief now 706 lines (up from 606); +124 line diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-intent slash-commands Revives the proven fix from PR #524 (closed only because the v0 e2e CI failed on stale-auth substrate — not the diff; that e2e workflow is now shelved to dispatch-only via #556). Rebased onto current main (all 6 files applied cleanly — nobody touched these since #524 diverged) and the bug re-reproduced on main before fixing: `should_fire_preflight("/qor-plan https://…/issues/1")` returned False. Root cause (Hypothesis 1, confirmed): the UserPromptSubmit classifier in scripts/hooks/preflight_intent.py matched only free-text IMPLEMENTATION_VERBS and had no slash-command awareness. `plan` is not a verb, so `/qor-plan <url>` never fired preflight — planning ran blind to the ledger. Sibling commands worked by coincidence (`/qor-implement`→implement, `/qor-refactor`→refactor). Hypothesis 3 (skill-chain ordering) refuted: UserPromptSubmit fires on raw text before slash-command resolution; the hook ran — its regex was the gate that returned False. Fix: - New IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev) — these short-circuit to fire regardless of argument (URL / text / empty). - New layered classify_prompt() → ClassifyResult(fire, prompt_surface_form, slash_command); preflight_reminder.py records prompt_surface_form to telemetry so a future trigger-surface regression is observable. should_fire_preflight() preserved (delegates) for backward compat. - SKILL.md description (Tier-2 caller-LLM gate) updated in lockstep with the Tier-1 hook set, and correctly keeps read-only commands (/qor-status, /qor-help, /qor-audit, /qor-validate) on the SKIP list. Regression guard: test_preflight_intent.py + test_preflight_hook.py now run in test-mcp-regression.yml (they were dormant — in no CI workflow). flow-6 e2e prompt added for reference (the e2e harness is dispatch-only per #556; the unit tests are the durable gate). Closes #402. Supersedes the closed #524. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
ports.py):DriftAnalyzerPortandCodeIntelligencePortProtocol types decouple handlers from concrete backends. Enables independent development of Phase 1 (collaboration), drift intelligence (AST + LLM), and CodeGenome integration.source_spantable: Separates raw source text (meeting excerpt, PRD paragraph) from extracted decisions (intent). Theyieldsedge linkssource_span → intentfor extraction provenance. This is the input for L3 semantic compliance checking.HashDriftAnalyzer: L1 hash-only implementation ofDriftAnalyzerPort. Wraps existingstatus.pyfunctions — zero behavior change.get_drift_analyzer()andget_code_intelligence()inadapters/— swap backend by changing one factory return.Drift Layer Status
SHA-256(old) == SHA-256(new)?HashDriftAnalyzer)source_span.text+ code → compliant?Files changed
ports.py— Protocol types +DriftResultdataclassledger/drift.py—HashDriftAnalyzer(L1)ledger/schema.py—source_spantable +yieldsedgeledger/queries.py—upsert_source_span,relate_yieldsledger/adapter.py—ingest_commit()delegates toDriftAnalyzerPort,ingest_payload()createssource_spanrecordsadapters/ledger.py—get_drift_analyzer()factoryadapters/code_locator.py—get_code_intelligencealiashandlers/link_commit.py— wiresdrift_analyzerthrough factoryTest plan
test_phase2_vc.pyfixture errors confirmed on main (not a regression)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Infrastructure
Chore