Skip to content

refactor: port interfaces + source_span for drift pipeline#1

Merged
jinhongkuan merged 3 commits into
mainfrom
port-interfaces-source-span
Apr 10, 2026
Merged

refactor: port interfaces + source_span for drift pipeline#1
jinhongkuan merged 3 commits into
mainfrom
port-interfaces-source-span

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Port interfaces (ports.py): DriftAnalyzerPort and CodeIntelligencePort Protocol types decouple handlers from concrete backends. Enables independent development of Phase 1 (collaboration), drift intelligence (AST + LLM), and CodeGenome integration.
  • source_span table: Separates raw source text (meeting excerpt, PRD paragraph) from extracted decisions (intent). The yields edge links source_span → intent for extraction provenance. This is the input for L3 semantic compliance checking.
  • HashDriftAnalyzer: L1 hash-only implementation of DriftAnalyzerPort. Wraps existing status.py functions — zero behavior change.
  • Factory wiring: get_drift_analyzer() and get_code_intelligence() in adapters/ — swap backend by changing one factory return.

Drift Layer Status

Layer What Status
L1: Hash SHA-256(old) == SHA-256(new)? ✅ Implemented (HashDriftAnalyzer)
L2: AST tree-sitter structural diff — cosmetic or real? ⬜ Pending (collaborator)
L3: Semantic LLM reads source_span.text + code → compliant? ⬜ Pending (needs L2 + source_span traversal)

Files changed

  • ports.py — Protocol types + DriftResult dataclass
  • ledger/drift.pyHashDriftAnalyzer (L1)
  • ledger/schema.pysource_span table + yields edge
  • ledger/queries.pyupsert_source_span, relate_yields
  • ledger/adapter.pyingest_commit() delegates to DriftAnalyzerPort, ingest_payload() creates source_span records
  • adapters/ledger.pyget_drift_analyzer() factory
  • adapters/code_locator.pyget_code_intelligence alias
  • handlers/link_commit.py — wires drift_analyzer through factory

Test plan

  • Phase 1 tests: 6/6 passed
  • Phase 2 ledger tests: 13/13 passed
  • Zero behavior change — same hash-only drift runs, just through protocol interface
  • Pre-existing test_phase2_vc.py fixture errors confirmed on main (not a regression)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added drift analysis to detect and record content changes and update region status.
    • Linked source excerpts to intents to preserve provenance for analysis results.
  • Infrastructure

    • Introduced formal interfaces for drift and code-intelligence integrations.
    • Database schema versioning and migration support to safely apply schema updates.
  • Chore

    • Package version bumped to 0.3.0.

@coderabbitai

coderabbitai Bot commented Apr 10, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@jinhongkuan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 4 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 12 minutes and 4 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1d56fd26-6914-4b14-96cf-0dcdc64ce8e5

📥 Commits

Reviewing files that changed from the base of the PR and between 6a55c04 and 127be32.

📒 Files selected for processing (3)
  • ledger/adapter.py
  • ledger/queries.py
  • ledger/schema.py
📝 Walkthrough

Walkthrough

Adds drift-analysis support: new ports and DriftResult, a HashDriftAnalyzer implementation, ledger schema/queries for source spans, ledger adapter integration to accept a drift analyzer, and factory exports; handlers now pass a drift analyzer into ledger ingestion. (48 words)

Changes

Cohort / File(s) Summary
Public Ports & Models
ports.py
Added DriftResult dataclass and runtime-checkable DriftAnalyzerPort and CodeIntelligencePort protocols.
Drift Analyzer Implementation
ledger/drift.py
Added HashDriftAnalyzer with async analyze_region(...) that resolves symbol lines, computes content hash, and derives drift status.
Ledger Adapter & Persistence
ledger/adapter.py, ledger/queries.py, ledger/schema.py
Extended SurrealDBLedgerAdapter.ingest_commit(..., drift_analyzer=None) to delegate drift checks, added upsert_source_span and relate_yields queries, introduced source_span table, yields edge, schema versioning, and migration entrypoint.
Factory Functions / Adapters
adapters/code_locator.py, adapters/ledger.py
Added get_code_intelligence alias to get_code_locator; added get_drift_analyzer() factory returning HashDriftAnalyzer().
Handler Integration
handlers/link_commit.py
handle_link_commit() now creates/uses drift_analyzer and passes it into ledger.ingest_commit(...).
Version Bump
pyproject.toml, RECOMMENDED_VERSION
Package/project version updated from 0.2.160.3.0.

Sequence Diagram

sequenceDiagram
    participant Handler as Handler (link_commit)
    participant Adapter as SurrealDBLedgerAdapter
    participant Analyzer as HashDriftAnalyzer
    participant DB as SurrealDB

    Handler->>Adapter: ingest_commit(commit_hash, repo_path, drift_analyzer)
    Adapter->>Analyzer: analyze_region(file_path, symbol_name, start_line, end_line, stored_hash, repo_path, ref)
    Analyzer->>Analyzer: resolve_symbol_lines()/compute_content_hash()/derive_status()
    Analyzer-->>Adapter: DriftResult(status, content_hash, confidence)
    Adapter->>DB: upsert_source_span(text, source_type, source_ref)
    DB-->>Adapter: span_id
    Adapter->>DB: relate_yields(span_id, intent_id)
    Adapter->>DB: update region (content_hash, start_line, end_line)
    Adapter-->>Handler: ingest result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I dug a tunnel through the code tonight,
Sniffed hashes, found where functions shifted slight,
I stitched source spans where meanings play,
Ledger kept trace of every hop and stray,
Hooray — the drift is caught, the map feels right!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'refactor: port interfaces + source_span for drift pipeline' accurately and specifically summarizes the two main changes: introducing port interfaces and adding source_span support for the drift pipeline.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch port-interfaces-source-span

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ledger/adapter.py`:
- Around line 178-187: ingest_commit is calling drift_analyzer.analyze_region
without passing source_context/source_span, so any SemanticDriftAnalyzer will
receive empty text; update the call site in ingest_commit to extract the
region's source_span.text (or the region.intent/description if text missing)
from the output of get_regions_for_files (the same structure that stores
source_span) and pass it as the new source_context/source_span (use the
parameter name expected by analyze_region) when invoking
drift_analyzer.analyze_region; ensure you reference the region object returned
by get_regions_for_files, read region.source_span.text (falling back to
intent/description), and include that value in the analyze_region call so
semantic comparison can run.

In `@ledger/queries.py`:
- Around line 470-473: The fallback CREATE for source_span in the code path
after the UPSERT (where rows is empty) omits provenance fields and loses
metadata; update the fallback query in ledger/queries.py (the CREATE source_span
statement executed when rows == []) to include the speakers and meeting_date
fields and pass their values in the params dict (same keys used in the UPSERT
branch, e.g., "speakers" and "meeting_date" or whatever variable names are
declared in the surrounding function) so both branches preserve the same
metadata.

In `@ledger/schema.py`:
- Around line 64-71: The schema lacks a unique index for the deduplication key
used by upsert_source_span(), causing possible duplicates; add a UNIQUE index on
source_span that covers source_type, source_ref and text (in addition to or
replacing idx_span_ref) so the DB enforces uniqueness across those three fields
and aligns with the UPSERT WHERE clause in upsert_source_span().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56757df4-7f65-496d-936e-8d931ddc6c8f

📥 Commits

Reviewing files that changed from the base of the PR and between 904f3f0 and f8a6251.

📒 Files selected for processing (8)
  • adapters/code_locator.py
  • adapters/ledger.py
  • handlers/link_commit.py
  • ledger/adapter.py
  • ledger/drift.py
  • ledger/queries.py
  • ledger/schema.py
  • ports.py

Comment thread ledger/adapter.py
Comment thread ledger/queries.py
Comment thread ledger/schema.py Outdated
jinhongkuan and others added 2 commits April 10, 2026 13:09
Decouple handlers from concrete code analysis and drift detection
backends via Protocol types, enabling independent development of
Phase 1 (collaboration), drift intelligence (L2 AST + L3 LLM),
and future CodeGenome integration.

Changes:
- ports.py: DriftAnalyzerPort, CodeIntelligencePort, DriftResult
- ledger/drift.py: HashDriftAnalyzer (L1 hash-only implementation)
- ledger/schema.py: source_span table + yields edge (provenance)
- ledger/queries.py: upsert_source_span, relate_yields
- ledger/adapter.py: ingest_commit() delegates to DriftAnalyzerPort,
  ingest_payload() creates source_span records
- adapters/: get_drift_analyzer() + get_code_intelligence() factories
- handlers/link_commit.py: wires drift_analyzer through factory

Drift Layer Status:
  L1 (hash):     ✅ Implemented (HashDriftAnalyzer)
  L2 (AST):      ⬜ Pending — collaborator WIP
  L3 (semantic): ⬜ Pending — needs L2 + source_span traversal

Zero behavior change. All CI tests pass (phase1: 6/6, phase2: 13/13).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightweight version-gated migration for the SurrealDB ledger:
- schema_meta table tracks current schema version
- migrate() runs on connect, after init_schema()
- v0→v1: stamps version for source_span + yields addition
- Future breaking migrations clear affected tables + log warning

Version 0.3.0 reflects: port interfaces, source_span entity,
schema migration system. SCHEMA_VERSION is separate from package
version (incremented only on breaking schema changes).

Once Phase 1 (event-sourced spec log) ships, breaking migrations
become lossless: drop local DB → re-materialize from git events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan force-pushed the port-interfaces-source-span branch from f8a6251 to 6a55c04 Compare April 10, 2026 17:10

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
ledger/adapter.py (1)

179-188: ⚠️ Potential issue | 🟠 Major

Plumb source_context into the analyzer call.

The new port was expanded specifically for L3 semantic checks, but this call still relies on the default "". A future semantic analyzer swapped in via get_drift_analyzer() will never see the source_span.text this PR stores. Please extend get_regions_for_files() to return the related source text (or at least the intent description as fallback) and pass it through here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ledger/adapter.py` around lines 179 - 188, The call to
drift_analyzer.analyze_region currently omits the source context so L3 semantic
analyzers won't receive source_span.text; update get_regions_for_files() (the
producer) to include the source text (or fall back to the intent description) in
each region tuple/object and then pass that value as the source_context (or
similar named param) into drift_analyzer.analyze_region in ledger.adapter (where
drift_result is created), ensuring analyze_region/source_context and any calls
via get_drift_analyzer() consistently accept and forward the source text to
downstream analyzers.
ledger/schema.py (1)

79-86: ⚠️ Potential issue | 🟠 Major

Add a DB-enforced unique key for source_span dedupe.

upsert_source_span() deduplicates on (source_type, source_ref, text), but the schema only indexes (source_type, source_ref). If two ingests race past the lookup path, identical excerpts can still be persisted twice. Please add a unique index that matches the full dedupe key.

Suggested schema change
     "DEFINE FIELD meeting_date   ON source_span TYPE string DEFAULT ''",
     "DEFINE FIELD created_at     ON source_span TYPE datetime DEFAULT time::now()",
     "DEFINE INDEX idx_span_ref   ON source_span FIELDS source_type, source_ref",
+    "DEFINE INDEX idx_span_dedupe ON source_span FIELDS source_type, source_ref, text UNIQUE",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ledger/schema.py` around lines 79 - 86, The schema currently only indexes
(source_type, source_ref) so concurrent upserts can insert duplicate excerpts;
add a DB-enforced unique index on the full dedupe key to match
upsert_source_span()’s logic by defining a unique index such as "DEFINE INDEX
idx_span_unique ON source_span FIELDS source_type, source_ref, text UNIQUE"
(leave existing idx_span_ref if desired) so identical (source_type, source_ref,
text) rows cannot be persisted concurrently.
🧹 Nitpick comments (1)
ledger/adapter.py (1)

195-203: Avoid resolving symbol lines twice per region.

HashDriftAnalyzer.analyze_region() already does resolve_symbol_lines(), then this block does the same work again just to persist the updated bounds. On larger commits this doubles the tree-sitter/git lookup cost on the hot path. Consider returning the resolved span in DriftResult, or moving all line-resolution responsibility into the adapter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ledger/adapter.py` around lines 195 - 203, The adapter is re-calling
resolve_symbol_lines() for each region even though
HashDriftAnalyzer.analyze_region() already resolves spans; change the data flow
so the analyzer returns the resolved span (e.g., add resolved_span or
resolved_start/resolved_end fields to DriftResult from
HashDriftAnalyzer.analyze_region()) and then in the adapter (where you currently
call resolve_symbol_lines() and run the UPDATE via self._client.query) use the
resolved values from the returned DriftResult instead of re-invoking
resolve_symbol_lines(); alternatively, centralize line-resolution in the adapter
by removing resolve_symbol_lines() from analyze_region and having analyze_region
return only raw drift info—update the code paths that consume DriftResult and
the DB UPDATE logic in the adapter (the region_id/start_line/end_line update
block) accordingly so no duplicate tree-sitter/git lookups occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ledger/schema.py`:
- Around line 120-122: Make the yields relation idempotent so replayed ingest
doesn't create duplicate source_span→intent edges: either add a uniqueness
constraint to the yields relation (unique on its endpoints) in the schema for
the yields table, or change the write path to dedupe before creating edges —
update ledger/schema.py to enforce uniqueness on yields endpoints (the yields
relation) OR modify ledger/queries.py where RELATE is used (and the
relate_yields() call in ledger/adapter.py) to perform a conditional/merge-style
create (check existence first or use a RELATE/MERGE-unique variant) so duplicate
edges are not inserted.

---

Duplicate comments:
In `@ledger/adapter.py`:
- Around line 179-188: The call to drift_analyzer.analyze_region currently omits
the source context so L3 semantic analyzers won't receive source_span.text;
update get_regions_for_files() (the producer) to include the source text (or
fall back to the intent description) in each region tuple/object and then pass
that value as the source_context (or similar named param) into
drift_analyzer.analyze_region in ledger.adapter (where drift_result is created),
ensuring analyze_region/source_context and any calls via get_drift_analyzer()
consistently accept and forward the source text to downstream analyzers.

In `@ledger/schema.py`:
- Around line 79-86: The schema currently only indexes (source_type, source_ref)
so concurrent upserts can insert duplicate excerpts; add a DB-enforced unique
index on the full dedupe key to match upsert_source_span()’s logic by defining a
unique index such as "DEFINE INDEX idx_span_unique ON source_span FIELDS
source_type, source_ref, text UNIQUE" (leave existing idx_span_ref if desired)
so identical (source_type, source_ref, text) rows cannot be persisted
concurrently.

---

Nitpick comments:
In `@ledger/adapter.py`:
- Around line 195-203: The adapter is re-calling resolve_symbol_lines() for each
region even though HashDriftAnalyzer.analyze_region() already resolves spans;
change the data flow so the analyzer returns the resolved span (e.g., add
resolved_span or resolved_start/resolved_end fields to DriftResult from
HashDriftAnalyzer.analyze_region()) and then in the adapter (where you currently
call resolve_symbol_lines() and run the UPDATE via self._client.query) use the
resolved values from the returned DriftResult instead of re-invoking
resolve_symbol_lines(); alternatively, centralize line-resolution in the adapter
by removing resolve_symbol_lines() from analyze_region and having analyze_region
return only raw drift info—update the code paths that consume DriftResult and
the DB UPDATE logic in the adapter (the region_id/start_line/end_line update
block) accordingly so no duplicate tree-sitter/git lookups occur.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 12affbaa-f92c-4307-8724-4f7da291b13c

📥 Commits

Reviewing files that changed from the base of the PR and between f8a6251 and 6a55c04.

📒 Files selected for processing (10)
  • RECOMMENDED_VERSION
  • adapters/code_locator.py
  • adapters/ledger.py
  • handlers/link_commit.py
  • ledger/adapter.py
  • ledger/drift.py
  • ledger/queries.py
  • ledger/schema.py
  • ports.py
  • pyproject.toml
✅ Files skipped from review due to trivial changes (5)
  • RECOMMENDED_VERSION
  • pyproject.toml
  • adapters/ledger.py
  • ledger/queries.py
  • ports.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • adapters/code_locator.py
  • handlers/link_commit.py

Comment thread ledger/schema.py
…, fallback metadata

1. Plumb source_context into analyze_region() — collect intent descriptions
   from linked intents and pass through to drift analyzer. L1 ignores it;
   L3 (semantic) will use it for compliance checking.
2. Add UNIQUE index on source_span (source_type, source_ref, text) to
   enforce dedup at the DB level, matching the UPSERT WHERE clause.
3. Include speakers + meeting_date in fallback CREATE path of
   upsert_source_span() so metadata isn't silently lost.
4. Enrich get_regions_for_files() to return intent.description alongside
   id and status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan merged commit 32f9097 into main Apr 10, 2026
1 check passed
Knapp-Kevin referenced this pull request in Knapp-Kevin/bicameral-mcp Apr 11, 2026
…r pipeline

Scenario #1: route ingest through handle_ingest() (which runs
_auto_ground_via_search) instead of calling ledger.ingest_payload()
directly. Removes the xfail.

Scenario BicameralAI#6: after committing new code, call handle_link_commit()
which triggers _reground_ungrounded to re-ground previously ungrounded
intents. Removes the xfail.

Sprint scorecard: 77% → 92% (12/13 pass). Only BicameralAI#9 (intent
supersession) remains — carried to next sprint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knapp-Kevin referenced this pull request in Knapp-Kevin/bicameral-mcp Apr 11, 2026
Changes since v0.2.15:
- Close XFAIL #1/BicameralAI#6: auto-grounding and re-grounding now wired
  through handler pipeline (92% desync score)
- eval_code_locator.py: --multi-repo, --min-mrr, --max-repo-variance
  flags with proper exit codes (regression gate)
- Gibberish test descriptions to resist auto-grounding in CI
- README upscaled to professional OSS standard (credit: MythologIQ)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 18, 2026
…uzzy

Two fixes from Codex review feedback:

1. P1 #1: Filter weak BM25 scores (< 0.1) when no fuzzy matches exist.
   Prevents false groundings for under-specified queries that pass
   the FC-1 guard but have only noise-level BM25 hits.

2. P1 #2: Stage 2 now runs as enrichment after fuzzy direct lookup,
   not just as fallback. When fuzzy finds symbols from a subset of
   files, file-level retrieval fills remaining budget with additional
   files — preserving multi-file feature discovery.

49/49 tests pass. Eval metrics stable: MRR@5=0.605, Recall=20.0%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 25, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
silongtan added a commit that referenced this pull request Apr 26, 2026
All four findings verified against current code; only the actionable
ones applied. 81 passed + 1 xfailed in 9.02s.

#1 — skills/bicameral-preflight/SKILL.md sync_metrics note
  The .claude/skills copy got the sync_metrics observability note
  back when V1 A3 shipped, but the canonical skills/ copy never
  did. Mirror the wording verbatim near step 2 so the rendering
  guidance and response-field documentation stay in sync.

#2 — handlers/detect_drift.py per-entry alignment
  The cosmetic-hint enrichment was slicing both head_full and
  wt_full using entry.lines (the baseline anchor). HEAD and the
  working tree can shift the symbol independently, so a single
  index range can't align both sides. The narrow consequence: a
  drifted entry with shifted lines could yield a misleading
  cosmetic_hint=true on bytes that aren't the bound region.

  Fix: re-resolve the symbol against each ref via
  resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD")
  and ref="working_tree" separately, slice each ref using its own
  resolved range. Resolution failure on either side → safe default
  of cosmetic_hint=False (matches the V1 contract: "False is cheap,
  True must be earned"). Empty symbol → skip (new fail-safe path).

  Test refactor: test_invalid_lines_skipped renamed to
  test_unresolvable_symbol_skipped — the old test asserted that
  lines=(0,0) was the failsafe trigger, but entry.lines is no
  longer the alignment input. New test exercises the
  resolve_symbol_lines-returns-None path via a nonexistent symbol
  name, which is the real fail-safe gate now.

#3 — V2 guide TOC anchor for §9
  GitHub auto-generates fragment IDs from heading text by
  lowercasing, replacing spaces with hyphens, and dropping
  punctuation. "## 9. Acceptance criteria for V2" maps to
  #9-acceptance-criteria-for-v2, but the TOC pointed at
  #9-acceptance-criteria (truncated). Link broken.
  Updated to the correct fragment.

#4 — V2 guide unlabeled fenced code blocks (markdownlint MD040)
  Six fenced opens used bare ``` instead of a labeled fence.
  Tagged each with ```text — the contents are commit listings,
  ASCII DAG diagrams, pseudocode protocols, and tuple notation,
  none of which fit a real language tag. The other fenced blocks
  in the guide (already tagged ```sql / ```python) are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request Apr 30, 2026
…iation seal

Reality matches Promise. Three changes (2 repo files + 2 deferred external
gh actions) land per Entry #24 audit blueprint 1:1; 0 new tests (acknowledged
advisory — manual verification mitigates); Section 4 razor clean.

Audit verdict: PASS, L1 (Entry #24 chain hash 1de1fac7).
Implementation: Entry #25 chain hash 51c8a45c.
Merkle seal: efd0304b2f0e0b3ca28aa4620c2b8ea2eda5ab9e2828ca852ab9f3c5adda6eb5

Architectural decision recorded: bicameral-mcp#135's auto-resolve direction
abandoned (no caller LLM in hook context, MCP sampling not viable in Claude
Code's main chat). Resolution path = dashboard tooltip → /bicameral-sync.
The tooltip surfaces the pending state; the human in their session is the
qualified judge.

Plan addition tracking (Entry #24 preconditions, final state):
  ✅ #2 — SKILL.md tooltip note (delivered in IMPL, sealed here)
  🟡 #1 — PR description manual verification step (composed in /qor-document)
  🟡 #3#135 close comment README/docs deferral (composed in /qor-document)

Surfaced for follow-up (not blocking):
  bicameral-mcp#125 scope should be widened — 7 skills under
  pilot/mcp/.claude/skills/ are absent from the canonical pilot/mcp/skills/
  location claimed by pilot/mcp/CLAUDE.md.

Spec correction queued (post-merge gh action):
  bicameral#108 Flow 1 step 3 claims IngestResponse.supersession_candidates
  exists when it does not; collision detection lives caller-side via
  bicameral-context-sentry skill, surfaces via
  bicameral.preflight.unresolved_collisions.

Capability shortfalls (carried, no regression vs Entry #23): qor/scripts/
runtime helpers absent (gate artifacts not written), tools/reliability/
validators absent (Steps 4.6–4.8 skipped), agent-teams not declared,
codex-plugin not declared (solo audit/seal), intent_lock capture skipped.

Refs #135.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request May 2, 2026
Resolves four conflicts:
- .gitignore: keeps both qor:seed block and .claude/worktrees ignore.
- docs/META_LEDGER.md: takes dev's chain (#1-#26) as canonical. This
  branch's parallel #7-#14 entries (sealed against the obsolete Entry #6
  base before dev added #7-#26) are not folded in here — they need to be
  re-authored against dev's #26 seal via /qor-meta-log-decision in a
  follow-up. Their session content is preserved in commit messages
  (f4de501, 3f856af) and in docs/SYSTEM_STATE.md.
- docs/SYSTEM_STATE.md: keeps both the v0 process cleanup / preflight
  hook session blocks from this branch and dev's #124 implementation
  block.
- skills/bicameral-preflight/SKILL.md: keeps both the new "Hook
  reinforcement" subsection (this branch) and the Telemetry section
  (dev). The earlier removal of Telemetry in this branch's f4de501 was
  authored against a stale base; restoring it here.
jinhongkuan pushed a commit that referenced this pull request May 2, 2026
…rompts (#146)

Closes #146 — Flow 2 in tests/e2e/run_e2e_flows.py fails because
bicameral.preflight does not auto-fire in headless `claude -p` even
when the user prompt explicitly contradicts a prior decision. The
existing SKILL.md auto-fire description has plateaued; the agent's
default tool-selection priority puts Bash/Glob ahead of preflight.

Solution: deterministic UserPromptSubmit hook that detects
code-implementation intent via shared verb list and injects an
authoritative <system-reminder> elevating preflight above
file-inspection tools.

Architecture (Hickey razor):
- Verb list lives once in scripts/hooks/preflight_intent.py as data
  (frozenset). Future UI configurability is a one-edit change.
- should_fire_preflight(): pure function, 11 lines, depth 2, no
  network, no LLM, sub-millisecond regex scan.
- preflight_reminder.py: 9-line UserPromptSubmit hook entry point;
  fail-permissive (exit 0 + empty response on errors); never blocks
  the user.
- v0 verb-list duplication between SKILL.md description (frontmatter)
  and the Python module is documented honestly in the SKILL.md
  addendum per audit Advisory #1, not papered over with a false SSOT
  claim.

Tests: 11 functionality tests (TDD-light invariant — every test
invokes the unit and asserts on output, no presence-only patterns):
- 6 classifier tests covering all 30 verbs, 3 skip patterns, indirect
  intent, data shape, the literal Flow 2 contradiction prompt
- 5 hook subprocess tests covering match/no-match/malformed-stdin/
  idempotent invocations + Flow 2 fixture

Authoritative integration test: tests/e2e/run_e2e_flows.py::test_flow_2
on dev branch (preflight tool_use.id must precede first non-bicameral
discovery tool in the stream-json transcript).

QorLogic SDLC artifacts: plan-preflight-autofire-hook.md, META_LEDGER
Entries #11-#14 (PLAN, GATE PASS, IMPLEMENT, SUBSTANTIATE seal).
Merkle seal: 33007d2a72fe3db237935216e063327750896d595faa15001757761e43a8e83c

Risk grade: L2 (blast radius: every user prompt; individual-action
risk: small + bounded + reversible)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 3, 2026
Phase 3 of plan-priority-c-team-server-slack-v0.md — adds the polling
worker, the canonical-extraction cache that closes the multi-dev
extraction-divergence gap, and the peer-author event writer.

Multi-dev convergence mechanism: any dev's session that touches a Slack
message produces the SAME canonical extraction across the team because
get_or_compute is keyed on (source_type, source_ref, content_hash) and
the cache row is shared via the team-server's append-only event log.
This is what closes Playbook Pillar #1 (Decision Continuity) at
multi-dev/multi-agent scale that the v1 brief incorrectly framed as
"build curation only" — see SHADOW_GENOME Failure Entry #6.

The interim canonical extraction uses Anthropic Claude with the
model_version='interim-claude-v1' tombstone so a future Phase 5 (CocoIndex
#136 integration) can identify and rebuild interim entries when memoized
deterministic transforms become available.

Files added:
- team_server/extraction/canonical_cache.py (45 LOC): get_or_compute()
  returns cached extraction OR invokes compute_fn + persists.
- team_server/extraction/llm_extractor.py: interim Anthropic-backed
  extractor; production-default until CocoIndex lands.
- team_server/sync/peer_writer.py (42 LOC): write_team_event() — appends
  to team_event with author_email='team-server@<team_id>.bicameral'
  identity (single-bot per workspace; multi-instance HA is v1).
- team_server/workers/slack_worker.py (100 LOC): poll_once() —
  conversations_history per allowlisted channel, content_hash dedup,
  cache-keyed extraction, peer event write per new message.

Tests (6 functionality tests, all green):
- tests/test_team_server_canonical_cache.py: cache hit returns existing
  without compute_fn, cache miss persists + subsequent call returns
  cached, content_hash variation produces new rows.
- tests/test_team_server_slack_worker.py: worker polls only allowlisted
  channels, writes one team_event per message with peer-author identity,
  dedups via message ts (idempotent re-poll).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request May 3, 2026
…rompts (#146)

Closes #146 — Flow 2 in tests/e2e/run_e2e_flows.py fails because
bicameral.preflight does not auto-fire in headless `claude -p` even
when the user prompt explicitly contradicts a prior decision. The
existing SKILL.md auto-fire description has plateaued; the agent's
default tool-selection priority puts Bash/Glob ahead of preflight.

Solution: deterministic UserPromptSubmit hook that detects
code-implementation intent via shared verb list and injects an
authoritative <system-reminder> elevating preflight above
file-inspection tools.

Architecture (Hickey razor):
- Verb list lives once in scripts/hooks/preflight_intent.py as data
  (frozenset). Future UI configurability is a one-edit change.
- should_fire_preflight(): pure function, 11 lines, depth 2, no
  network, no LLM, sub-millisecond regex scan.
- preflight_reminder.py: 9-line UserPromptSubmit hook entry point;
  fail-permissive (exit 0 + empty response on errors); never blocks
  the user.
- v0 verb-list duplication between SKILL.md description (frontmatter)
  and the Python module is documented honestly in the SKILL.md
  addendum per audit Advisory #1, not papered over with a false SSOT
  claim.

Tests: 11 functionality tests (TDD-light invariant — every test
invokes the unit and asserts on output, no presence-only patterns):
- 6 classifier tests covering all 30 verbs, 3 skip patterns, indirect
  intent, data shape, the literal Flow 2 contradiction prompt
- 5 hook subprocess tests covering match/no-match/malformed-stdin/
  idempotent invocations + Flow 2 fixture

Authoritative integration test: tests/e2e/run_e2e_flows.py::test_flow_2
on dev branch (preflight tool_use.id must precede first non-bicameral
discovery tool in the stream-json transcript).

QorLogic SDLC artifacts: plan-preflight-autofire-hook.md, META_LEDGER
Entries #11-#14 (PLAN, GATE PASS, IMPLEMENT, SUBSTANTIATE seal).
Merkle seal: 33007d2a72fe3db237935216e063327750896d595faa15001757761e43a8e83c

Risk grade: L2 (blast radius: every user prompt; individual-action
risk: small + bounded + reversible)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit ca02b68)
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
… to research brief (#205)

Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the
brief's review block. Tier C items + the subsequent Kilo / Gemini /
Codex-2nd-pass review layers are tracked as follow-ups (will be
surfaced in the PR thread for direction).

Changes:

- § 1.4 ingest pipeline: adds explicit "Risk amplification
  (durable-feedback-loop)" paragraph framing ingest as the durable
  write-surface that propagates poisoned content through preflight
  back into the agent's reasoning context. Strengthens LLM-01 + LLM-04
  P0 defensibility (Codex #2).
- § 1.8 skills surface: adds worked before/after example contrasting
  instruction-only `bicameral-report-bug` keys-only commitment vs the
  deterministic `_resolve_signer_email` gate that replaced it in #204.
  Makes the doctrine concrete for non-agent-systems readers (Codex #3).
- § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4"
  promise to "intentionally not enumerated; activation PR authors
  TEAM-NN IDs against actual activated topology" (Codex #8).
- § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now
  describes bicameral-mcp as an AI-adjacent developer-tool component
  whose risk-tier classification properly attaches to the integrated
  system + deployment context, requiring counsel review for any
  specific tier claim (Codex #7).
- § 5 gap synthesis: adds Deployment trigger column (`all` /
  `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is
  defensible per deployment shape. SOC2-01 reclassified as
  pre-team/hosted P0 with local-only boundary statement; GDPR-05
  reclassified as team/hosted P1 with local single-user P2; OWASP-03
  reclassified as hosted P1 with local P2 (uv/pipx provides
  install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1).
- Appendix method notes: softens "every claim should be verifiable by
  re-reading the cited file at the cited line range" to acknowledge
  that most findings cite components rather than path:line, and
  defers a line-level evidence appendix as a follow-up improvement
  (Codex #9).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 6, 2026
…+Gemini+Codex-2 (#205)

Authors a single Reviewer Disposition Pass table at the top of the
brief reconciling all 32 review points across four review layers
(Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one
post-review consensus before downstream P1 issue-filing — per the
explicit Codex-2 #1 directive.

Decisions: 21 applied this commit, 6 already applied in 1d82658,
3 deferred to follow-up, 2 note-only. Net new gap IDs added per
disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning
+ revocation), LLM-11 (cross-tool config-file modification surface),
MCP-01 (host UX as external dependency), CFG-01 (config precedence
+ fail-closed model). Reclassification: LLM-06 P0/M → P1/M with
scope narrowed to future remote-skill-loading channel (per Kilo #2).

Major content additions to the brief:

- § 1.1: MCP host UX is external dependency, not security gate (new
  gap MCP-01) — host that auto-approves tool calls bypasses any
  "operator will see this" assumption.
- § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11).
- § 1.7: cross-tool config-file modification surface (new gap LLM-11)
  distinct from skill-content surface — `setup_wizard` writes shell
  commands into `.claude/settings.json` that run host-side at hook fire.
- § 1.11 (new): Configuration precedence + fail-closed model — single
  uniform precedence rule across all knobs (env > config.yaml >
  hardcoded defaults), fail-closed semantics on missing/malformed/
  contradictory config (Codex-2 #5).
- § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 +
  OWASP-04 (Kilo #13).
- § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL
  (Kilo #3) — security context cannot rely on agent-filled params.
- § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) +
  guardrail-not-classifier framing (Codex-1 #6) + control-acceptance
  template (Codex-2 #4) — quarantine, override, test fixtures,
  measurement counters.
- § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3).
- § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not
  agent-supplied confirm parameters (Kilo #3).
- § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in
  current install model the wheel-trust covers it (Kilo #2).
- § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed
  hooks-manifest.json) distinct from skill manifest.
- § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and-
  rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini
  #1), or scope-out via PII detect-and-refuse.
- § 2.1 (c) GDPR-02: data-subject-access search must cover full
  identifier surface (description, source_ref, topic, file paths) not
  just signer email (Codex-1 #5).
- § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap,
  WAL, crash dumps) (Kilo #7).
- § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics
  (Kilo #8 + Codex-2 #3).
- § 5: gap table updated with new rows + LLM-06 reclassification;
  gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total,
  up from 41).
- § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) —
  ingest boundary guardrails, per-tool authority gradation, supply-
  chain signing, telemetry & consent.
- § 6.2 (new): six-section control-acceptance template for every DG
  gap (Codex-2 #4) — positive / negative / bypass / fail-closed /
  telemetry / docs.

Filed-issue updates:
- Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope
  narrowing, full disposition comment added.
- Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added
  capturing the guardrail framing, classifier extensibility, and
  control-acceptance template applicable to both.

Deferred for follow-up: Codex-1 #4 (controller/processor
restructure of standards table), Codex-1 #9 (full evidence appendix
beyond the methodology softening), Codex-2 #2 (full 3-column
deployment-profile matrix beyond the single-column trigger).

Brief now 706 lines (up from 606); +124 line diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan pushed a commit that referenced this pull request Jun 9, 2026
…-intent slash-commands

Revives the proven fix from PR #524 (closed only because the v0 e2e CI failed on
stale-auth substrate — not the diff; that e2e workflow is now shelved to
dispatch-only via #556). Rebased onto current main (all 6 files applied cleanly —
nobody touched these since #524 diverged) and the bug re-reproduced on main before
fixing: `should_fire_preflight("/qor-plan https://…/issues/1")` returned False.

Root cause (Hypothesis 1, confirmed): the UserPromptSubmit classifier in
scripts/hooks/preflight_intent.py matched only free-text IMPLEMENTATION_VERBS and
had no slash-command awareness. `plan` is not a verb, so `/qor-plan <url>` never
fired preflight — planning ran blind to the ledger. Sibling commands worked by
coincidence (`/qor-implement`→implement, `/qor-refactor`→refactor). Hypothesis 3
(skill-chain ordering) refuted: UserPromptSubmit fires on raw text before
slash-command resolution; the hook ran — its regex was the gate that returned False.

Fix:
- New IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor,
  qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev) — these
  short-circuit to fire regardless of argument (URL / text / empty).
- New layered classify_prompt() → ClassifyResult(fire, prompt_surface_form,
  slash_command); preflight_reminder.py records prompt_surface_form to telemetry
  so a future trigger-surface regression is observable. should_fire_preflight()
  preserved (delegates) for backward compat.
- SKILL.md description (Tier-2 caller-LLM gate) updated in lockstep with the
  Tier-1 hook set, and correctly keeps read-only commands (/qor-status, /qor-help,
  /qor-audit, /qor-validate) on the SKIP list.

Regression guard: test_preflight_intent.py + test_preflight_hook.py now run in
test-mcp-regression.yml (they were dormant — in no CI workflow). flow-6 e2e prompt
added for reference (the e2e harness is dispatch-only per #556; the unit tests are
the durable gate).

Closes #402. Supersedes the closed #524.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant