chore: bump to v0.4.4 — grounding reuse + coverage loop by jinhongkuan · Pull Request #9 · BicameralAI/bicameral-mcp

jinhongkuan · 2026-04-14T03:05:49Z

Bump version 0.4.3 → 0.4.4 to release the grounding-pipeline improvements that landed in PR #5 + PR #8 since v0.4.3.

What's in v0.4.4 vs v0.4.3

PR decision grounding reuse + coverage loop #5 (silong/code-locator-fix-drift): decision grounding reuse + 3-tier coverage loop. New cache_hits field on IngestStats, grounding_tier stamped on maps_to edge provenance, 26 new unit tests. Fully deterministic, no LLM added to grounding path.
PR test: tolerate v0.4.3 SKILL.md structure in step1 excerpt test #8: small test fix tolerating the v0.4.3 SKILL.md structure changes.

M1 adversarial regression (local)

	P	R	F1	TP	FP	FN
v0.4.3	0.81	0.87	0.84	13	3	2
v0.4.4	0.81	0.87	0.84	13	3	2

Bit-for-bit identical. Expected — grounding-pipeline changes don't affect M1 extraction P/R/F1 (which measures extraction quality against the Opus fixture). Offline suite: 71/71 pass.

Summary by CodeRabbit

Chores
- Version bumped to 0.4.4

Bumps version 0.4.3 → 0.4.4 to release the grounding-pipeline improvements that landed since v0.4.3. What's in v0.4.4 (vs v0.4.3): - **PR #5** (silong/code-locator-fix-drift): decision grounding reuse + 3-tier coverage loop • Before BM25, handle_ingest checks the ledger for similar previously-grounded intents via search_grounded_intents() and reuses their code_regions after live-symbol validation • ground_mappings retries with progressively relaxed thresholds (strict 0.5/80 → relaxed 0.3/70 → broad 0.1/60) before giving up • New cache_hits field on IngestStats; grounding_tier stamped on maps_to edge provenance for observability • 26 new unit tests (10 vocab cache + 16 coverage loop) • Fully deterministic, no LLM added to the grounding path - **PR #8**: small test fix — case-insensitive INCLUDE/EXCLUDE check + bumped excerpt size ceiling for the v0.4.3 few-shot SKILL.md structure. M1 adversarial regression (local, before/after the v0.4.4 changes): v0.4.3: P=0.81 R=0.87 F1=0.84 (TP=13 FP=3 FN=2) v0.4.4: P=0.81 R=0.87 F1=0.84 (TP=13 FP=3 FN=2) ^^^^^^^^^^^^^^ identical — extraction quality unchanged The grounding-pipeline changes are conceptually independent of M1 extraction quality. Cache hits and tier-relaxation only affect grounded_pct, which was already 100% on the adversarial corpus. Offline test suite: 71/71 pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-14T03:06:03Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0146d4ab-4b89-4f44-9e2f-34dd35ef6bcd

📥 Commits

Reviewing files that changed from the base of the PR and between 0914e6d and 278b755.

📒 Files selected for processing (2)

RECOMMENDED_VERSION
pyproject.toml

📝 Walkthrough

Walkthrough

Version constants were incremented from 0.4.3 to 0.4.4 across two project files to reflect a new release version.

Changes

Cohort / File(s)	Summary
Version Bump `RECOMMENDED_VERSION` (constant file), `pyproject.toml`	Updated version identifiers from `0.4.3` to `0.4.4` in project metadata and version constant definitions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 From point-four-three to point-four-four,
A tiny hop, but worth it more!
The versions bump, the tags align,
Release time magic, pure and fine! ✨

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/bump-v0.4.4

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

All four findings verified against current code; only the actionable ones applied. 81 passed + 1 xfailed in 9.02s. #1 — skills/bicameral-preflight/SKILL.md sync_metrics note The .claude/skills copy got the sync_metrics observability note back when V1 A3 shipped, but the canonical skills/ copy never did. Mirror the wording verbatim near step 2 so the rendering guidance and response-field documentation stay in sync. #2 — handlers/detect_drift.py per-entry alignment The cosmetic-hint enrichment was slicing both head_full and wt_full using entry.lines (the baseline anchor). HEAD and the working tree can shift the symbol independently, so a single index range can't align both sides. The narrow consequence: a drifted entry with shifted lines could yield a misleading cosmetic_hint=true on bytes that aren't the bound region. Fix: re-resolve the symbol against each ref via resolve_symbol_lines(file_path, entry.symbol, repo, ref="HEAD") and ref="working_tree" separately, slice each ref using its own resolved range. Resolution failure on either side → safe default of cosmetic_hint=False (matches the V1 contract: "False is cheap, True must be earned"). Empty symbol → skip (new fail-safe path). Test refactor: test_invalid_lines_skipped renamed to test_unresolvable_symbol_skipped — the old test asserted that lines=(0,0) was the failsafe trigger, but entry.lines is no longer the alignment input. New test exercises the resolve_symbol_lines-returns-None path via a nonexistent symbol name, which is the real fail-safe gate now. #3 — V2 guide TOC anchor for §9 GitHub auto-generates fragment IDs from heading text by lowercasing, replacing spaces with hyphens, and dropping punctuation. "## 9. Acceptance criteria for V2" maps to #9-acceptance-criteria-for-v2, but the TOC pointed at #9-acceptance-criteria (truncated). Link broken. Updated to the correct fragment. #4 — V2 guide unlabeled fenced code blocks (markdownlint MD040) Six fenced opens used bare ``` instead of a labeled fence. Tagged each with ```text — the contents are commit listings, ASCII DAG diagrams, pseudocode protocols, and tuple notation, none of which fit a real language tag. The other fenced blocks in the guide (already tagged ```sql / ```python) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… to research brief (#205) Addresses Codex first-pass review notes #1, #2, #3, #7, #8, #9 from the brief's review block. Tier C items + the subsequent Kilo / Gemini / Codex-2nd-pass review layers are tracked as follow-ups (will be surfaced in the PR thread for direction). Changes: - § 1.4 ingest pipeline: adds explicit "Risk amplification (durable-feedback-loop)" paragraph framing ingest as the durable write-surface that propagates poisoned content through preflight back into the agent's reasoning context. Strengthens LLM-01 + LLM-04 P0 defensibility (Codex #2). - § 1.8 skills surface: adds worked before/after example contrasting instruction-only `bicameral-report-bug` keys-only commitment vs the deterministic `_resolve_signer_email` gate that replaced it in #204. Makes the doctrine concrete for non-agent-systems readers (Codex #3). - § 1.9 team-server: rewrites the dangling "TEAM-NN gaps in § 4" promise to "intentionally not enumerated; activation PR authors TEAM-NN IDs against actual activated topology" (Codex #8). - § 2.6 EU AI Act: removes unilateral "limited risk" claim. Now describes bicameral-mcp as an AI-adjacent developer-tool component whose risk-tier classification properly attaches to the integrated system + deployment context, requiring counsel review for any specific tier claim (Codex #7). - § 5 gap synthesis: adds Deployment trigger column (`all` / `local-OK` / `team/hosted` / `pre-team` / `hosted`) so severity is defensible per deployment shape. SOC2-01 reclassified as pre-team/hosted P0 with local-only boundary statement; GDPR-05 reclassified as team/hosted P1 with local single-user P2; OWASP-03 reclassified as hosted P1 with local P2 (uv/pipx provides install-time lock); OWASP-02 trigger narrowed to team/hosted (Codex #1). - Appendix method notes: softens "every claim should be verifiable by re-reading the cited file at the cited line range" to acknowledge that most findings cite components rather than path:line, and defers a line-level evidence appendix as a follow-up improvement (Codex #9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+Gemini+Codex-2 (#205) Authors a single Reviewer Disposition Pass table at the top of the brief reconciling all 32 review points across four review layers (Codex first-pass, Kilo, Gemini CLI, Codex second-pass) into one post-review consensus before downstream P1 issue-filing — per the explicit Codex-2 #1 directive. Decisions: 21 applied this commit, 6 already applied in 1d82658, 3 deferred to follow-up, 2 note-only. Net new gap IDs added per disposition: GDPR-08 (ephemeral data), GDPR-09 (consent versioning + revocation), LLM-11 (cross-tool config-file modification surface), MCP-01 (host UX as external dependency), CFG-01 (config precedence + fail-closed model). Reclassification: LLM-06 P0/M → P1/M with scope narrowed to future remote-skill-loading channel (per Kilo #2). Major content additions to the brief: - § 1.1: MCP host UX is external dependency, not security gate (new gap MCP-01) — host that auto-approves tool calls bypasses any "operator will see this" assumption. - § 1.2: SurrealDB version pinning supply-chain callout (Kilo #11). - § 1.7: cross-tool config-file modification surface (new gap LLM-11) distinct from skill-content surface — `setup_wizard` writes shell commands into `.claude/settings.json` that run host-side at hook fire. - § 1.11 (new): Configuration precedence + fail-closed model — single uniform precedence rule across all knobs (env > config.yaml > hardcoded defaults), fail-closed semantics on missing/malformed/ contradictory config (Codex-2 #5). - § 2.4 (a): LLM02 mapping note clarifying it folds into LLM-07 + OWASP-04 (Kilo #13). - § 2.4 (b): explicit `confirm=True` is agent-supplied not HITL (Kilo #3) — security context cannot rely on agent-filled params. - § 2.4 (c) LLM-01 + LLM-04: extensible classifier (Gemini #2) + guardrail-not-classifier framing (Codex-1 #6) + control-acceptance template (Codex-2 #4) — quarantine, override, test fixtures, measurement counters. - § 2.4 (c) LLM-03: timeouts as `.bicameral/config.yaml` knobs (Gemini #3). - § 2.4 (c) LLM-05 + LLM-09: out-of-band operator confirmation, not agent-supplied confirm parameters (Kilo #3). - § 2.4 (c) LLM-06: scope-narrowed to future remote-skill-loading; in current install model the wheel-trust covers it (Kilo #2). - § 2.4 (c) LLM-11 (new): cross-tool config-file gate (signed hooks-manifest.json) distinct from skill manifest. - § 2.1 (c) GDPR-01: three remediation candidates — tombstone-and- rebuild with signed manifest (Kilo #12), crypto-shredding (Gemini #1), or scope-out via PII detect-and-refuse. - § 2.1 (c) GDPR-02: data-subject-access search must cover full identifier surface (description, source_ref, topic, file paths) not just signer email (Codex-1 #5). - § 2.1 (c) GDPR-08 (new): ephemeral data surfaces (tempfiles, swap, WAL, crash dumps) (Kilo #7). - § 2.1 (c) GDPR-09 (new): consent versioning + revocation semantics (Kilo #8 + Codex-2 #3). - § 5: gap table updated with new rows + LLM-06 reclassification; gap counts post-disposition (5 P0 / 19 P1 / 16 P2 / 5 P3 = 45 total, up from 41). - § 6.1 (new): epic grouping for deferred P1 batch (Codex-1 #10) — ingest boundary guardrails, per-tool authority gradation, supply- chain signing, telemetry & consent. - § 6.2 (new): six-section control-acceptance template for every DG gap (Codex-2 #4) — positive / negative / bypass / fail-closed / telemetry / docs. Filed-issue updates: - Issue #214 (LLM-06): relabeled P0 → P1, retitled to reflect scope narrowing, full disposition comment added. - Issue #212 (LLM-01) + #213 (LLM-04): disposition comments added capturing the guardrail framing, classifier extensibility, and control-acceptance template applicable to both. Deferred for follow-up: Codex-1 #4 (controller/processor restructure of standards table), Codex-1 #9 (full evidence appendix beyond the methodology softening), Codex-2 #2 (full 3-column deployment-profile matrix beyond the single-column trigger). Brief now 706 lines (up from 606); +124 line diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lization **Phase B-1 of N. Does NOT close GDPR-01.** This cycle ships the load-bearing schema-level deterministic gate (#205 doctrine, gate_kind: schema) that segregates verbatim transcript text into the operator-erasable PiiArchive from Phase A. Speakers/source_ref pseudonymization (Phase B-2), cross-author replay sanitizer (Phase B-3), and erase-subject CLI + legacy backfill (Phase C) remain. Plan: plan-221-phase-b-1-ingest-cutover.md (qor-judge PASS at L2 round 3; two prior VETOes captured F-B1-{1,2,3} + F-B2-{1,2,3} as Shadow Genome Entries BicameralAI#8 and BicameralAI#9 with new heuristics BicameralAI#7-9). What ships: - Schema v21→v22 migration: relaxes input_span.text ASSERT to "$value != '' OR $this.archive_key != ''". DB-engine-enforced; refactor-resistant. Legacy UNIQUE-on-(source_type, source_ref, text) index preserved for backward-compat. - ledger/queries.py::_resolve_span_text(archive, row) — sync helper, single point of truth for input_span.text reads. Returns archive content when archive_key is set, "[ERASED]" sentinel post-erasure, legacy row.text as fallback. - _ERASED_SENTINEL constant hoisted (load-bearing in helper return AND real_spans filter exclusion). - 7 read sites refactored to route through helper: * 4 graph projections in queries.py (get_all_decisions, search_by_bm25, get_decisions_for_file, get_decisions_for_files) * handlers/history.py:217 enriched-fetch site * handlers/remove_source.py audit-telemetry consumer of get_input_span_row (post-erasure audit captures sentinel, not stale plaintext) - upsert_input_span gains archive_key parameter — when set, writes with text='' and dedup keyed on archive_key; legacy text-only path preserved. - SurrealDBLedgerAdapter.ingest_payload writes verbatim to archive before input_span CREATE; falls back to inline-text on archive write failure. - PiiArchive plumbed onto adapter via adapters/ledger.py::get_ledger(); path from BICAMERAL_PII_ARCHIVE_PATH env or ~/.bicameral/pii-archive.db default. - governance-gates.yaml entry: gate_kind: schema pointing at input_span.text ASSERT (strongest deterministic-gate variant). Tests (24 new sociable, all passing): - 6 schema migration tests (deterministic-gate ASSERT, legacy-shape acceptance, archive-key-only acceptance, both-empty rejection) - 8 _resolve_span_text unit tests (sentinel constant, archive path, legacy fallback, erasure → sentinel, broken-archive grace, both- set archive-wins, idempotency) - 4 load-bearing erasure propagation tests (audit-required): * test_resolve_returns_erased_sentinel_after_archive_erase * test_get_all_decisions_filters_erased_sentinel_from_source_excerpt * test_legacy_row_with_no_archive_key_still_renders_normally * test_ingest_writes_text_to_archive_and_empty_to_input_span - Regression: 15 test_phase2_ledger + 19 Phase A tests still pass. Honest non-closure framing per audit: - Phase B-1 segregates input_span.text only. decision.speakers and decision.source_ref remain raw PII surfaces — Phase B-2. - Cross-author replay sanitizer (events/materializer.py) — Phase B-3. - erase-subject CLI + legacy-row backfill — Phase C. - Schema-level UNIQUE-on-archive_key NOT added (legacy rows have archive_key='' which would violate UNIQUE on empty values). Python-side dedup via get_input_span_id is the gate; partial UNIQUE index lands post-backfill. ruff format + ruff check clean. Roadmap doc updated; Shadow Genome Entry BicameralAI#9 captures the round-2 VETO lessons (cross-section signature consistency + sentinel downstream-consumer audit as heuristics BicameralAI#8-9). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jinhongkuan merged commit aae8809 into main Apr 14, 2026
1 check was pending

jinhongkuan mentioned this pull request Apr 21, 2026

[P0] Grounding abstention: prevent degenerate BM25 queries from forcing incorrect symbol bindings #38

Closed

Knapp-Kevin mentioned this pull request Apr 28, 2026

feat: CodeGenome Phase 3 (#60) — continuity evaluation in link_commit #73

Merged

9 tasks

Knapp-Kevin mentioned this pull request May 15, 2026

feat(pii-archive): #221 Phase B-1 — ingest cutover + read-path centralization (NOT closure) #356

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: bump to v0.4.4 — grounding reuse + coverage loop#9

chore: bump to v0.4.4 — grounding reuse + coverage loop#9
jinhongkuan merged 1 commit into
mainfrom
chore/bump-v0.4.4

jinhongkuan commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinhongkuan commented Apr 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in v0.4.4 vs v0.4.3

M1 adversarial regression (local)

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinhongkuan commented Apr 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 14, 2026 •

edited

Loading