v0.4.9 — tester mode + search status fix by jinhongkuan · Pull Request #15 · BicameralAI/bicameral-mcp

jinhongkuan · 2026-04-15T02:15:20Z

Summary

Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester mode (BICAMERAL_TESTER_MODE=1) that makes bicameral.search and bicameral.brief emit blocking action_hints the agent must address before any write operation. For onboarding, demos, and skill evaluation flows.

Four hint kinds:

review_drift (search + brief) — drifted decisions in scope
ground_decision (search) — ungrounded matches
resolve_divergence (brief) — contradictory decisions on same symbol
answer_open_questions (brief) — open-question gaps

Regular mode (tester_mode=False, default) is byte-identical to v0.4.8 except for the new empty action_hints=[] field.

Bug fix (load-bearing): handle_search_decisions was reading status from raw_regions[0] but code_region rows don't carry a status field — it's an intent property. Every match had been silently reported as pending regardless of real state, masking drifted decisions from callers. Now reads intent-level status from the search_by_bm25 row.

Surfaced during an end-to-end drift demo on Accountable — see the full walkthrough in the parent repo: thoughts/shared/plans/2026-04-14-accountable-drift-demo.md.

Test plan

24 new cases in tests/test_v049_tester_mode.py covering every hint kind (fires / doesn't fire), both generators, backward compat, tester_mode env parse
Full v0.4.9 regression: 146 passed in 12s
Manual: verified on Accountable ledger — tester_mode=OFF returns empty hints; tester_mode=ON fires review_drift with refs to the edited file, brief fires same hint via its own generator
Skill contracts updated: bicameral-search, bicameral-brief, new top-level bicameral-tester

🤖 Generated with Claude Code

Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester mode (BICAMERAL_TESTER_MODE=1) that makes bicameral.search and bicameral.brief emit blocking action_hints the agent must address before any write operation. For onboarding, demos, and skill evaluation flows where you want bicameral to push signal at the agent instead of waiting for the agent to ask. Four hint kinds: - review_drift (search + brief) — drifted decisions in scope - ground_decision (search) — ungrounded matches - resolve_divergence (brief) — contradictory decisions on same symbol - answer_open_questions (brief) — open-question gaps Each hint has blocking=True and a human-readable message. Enforcement lives in the skill contract (bicameral-tester SKILL.md). Regular mode (tester_mode=False, default) is byte-identical to v0.4.8 except for the new empty action_hints=[] field. BUG FIX (load-bearing for Phase 2): handle_search_decisions was reading `status` from raw_regions[0] but code_region rows don't carry a status field — it's an intent property. Every match had been silently reported as `pending` regardless of real state, masking drifted decisions from callers. Now reads intent-level status from the search_by_bm25 row. Without this fix the review_drift hint generator couldn't fire because no match ever looked drifted to it. Surfaced during an end-to-end drift demo on Accountable — see thoughts/shared/plans/2026-04-14-accountable-drift-demo.md for the full walkthrough (steps to reproduce, behavior diff, gotchas). Tests: 24 new cases in tests/test_v049_tester_mode.py covering every hint kind, both generators, backward compat, and tester_mode env parse. Full v0.4.9 regression: 146 passed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-15T02:15:28Z

Warning

Rate limit exceeded

@jinhongkuan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 637ba6f1-1214-41b7-a478-500100d3ef31

📥 Commits

Reviewing files that changed from the base of the PR and between 71b0c8a and 897fec0.

📒 Files selected for processing (11)

CHANGELOG.md
context.py
contracts.py
handlers/action_hints.py
handlers/brief.py
handlers/search_decisions.py
pyproject.toml
skills/bicameral-brief/SKILL.md
skills/bicameral-search/SKILL.md
skills/bicameral-tester/SKILL.md
tests/test_v049_tester_mode.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/bump-v0.4.9

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ameralAI#5 — audit PASS - META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL. - SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued. Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause: CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD has no pilot/ directory). Followup: docs:claude-md-cleanup workstream to fix CLAUDE.md itself. Plan PASS at d846a4a; chain to /qor-implement.

Phase 1 — M3 benchmark judge-corpus extension: - tests/test_m3_benchmark_judge_corpus.py (4 tests, 83 LOC) - tests/fixtures/m3_benchmark/cases.py — expected_judge field added to all 10 uncertain cases (pure data, ground-truth labels for the operator QC pass) Phase 2 — bicameral-sync skill rubric + training doc: - tests/test_skill_uncertain_protocol.py (4 tests, 96 LOC) - skills/bicameral-sync/SKILL.md §2.bis — Uncertain-band sub-protocol section: Axis 1 (compliance) FIRST, Axis 2 (cosmetic-vs-semantic) SECOND, signals advisory, evidence_refs echoed back. Maps to existing typed contracts (no new fields). - docs/training/cosmetic-vs-semantic.md (198 LOC) — concept doc with worked example from py_12_constant_value_tuned. Pairs with the rubric. - docs/training/README.md — index with cosmetic-vs-semantic active row. Soft-depends on PR BicameralAI#93's docs/training/ scaffolding; this branch creates a minimal version that PR BicameralAI#93 will reconcile on merge. Validation: - Phase 1 + Phase 2 new tests: 8/8 green. - M3 + drift_classifier regression: 32/32 green. - Total: 40/40 green in the targeted sweep. Razor: - All test files ≤ 96 LOC (cap 250). - All test functions ≤ 25 LOC (cap 40). - cases.py 431 LOC under tests/ ruff exclusion. - No production code changes; no schema changes; no new contracts; no new tools; no new dependencies. CHANGELOG: [Unreleased] entry added under Added. Plan: plan-codegenome-llm-drift-judge.md (audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Next: /qor-substantiate.

Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 4 new + 3 modified files exist - Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier + drift_service) - Razor final: all files within caps (test ≤96 LOC, no new production functions, cases.py under tests/ exclusion) - Skill file integrity: SKILL.md §2.bis structure verified - SYSTEM_STATE.md synced - Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per user direction; v0.14.0 release PR is Jin's call) Plan deviation documented: - docs/training/README.md created (not modified) on this branch because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror; merges will reconcile. Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not a CI blocker. Chain: 16 entries; integrity VALID. Next: /qor-document.

…ameralAI#5 — audit PASS - META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL. - SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued. Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause: CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD has no pilot/ directory). Followup: docs:claude-md-cleanup workstream to fix CLAUDE.md itself. Plan PASS at d846a4a; chain to /qor-implement.

Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 4 new + 3 modified files exist - Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier + drift_service) - Razor final: all files within caps (test ≤96 LOC, no new production functions, cases.py under tests/ exclusion) - Skill file integrity: SKILL.md §2.bis structure verified - SYSTEM_STATE.md synced - Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per user direction; v0.14.0 release PR is Jin's call) Plan deviation documented: - docs/training/README.md created (not modified) on this branch because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror; merges will reconcile. Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not a CI blocker. Chain: 16 entries; integrity VALID. Next: /qor-document.

… — audit PASS - META_LEDGER BicameralAI#15 on this branch: GATE TRIBUNAL entry covering v1 VETO (2f31d6f) + v2 PASS (7da919c). Chain hash b2925935 extends from BicameralAI#14 Phase 4 SEAL (0ebcf69b) on dev. Note: branches feat/44 and feat/49 each carry their own Entry BicameralAI#15 chain extension off dev's BicameralAI#14; reconciliation occurs at release time. - SHADOW_GENOME #5b: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3. Cross-references instances #1 (PR BicameralAI#93 §9) and BicameralAI#2 (feat/44 branch's Entry BicameralAI#5). Pattern is now triggered by plan author trusting mental model over filesystem `ls`. Mitigation: every plan must enumerate existing packages before proposing a new module's home. Plan PASS at 7da919c; chain to /qor-implement.

…derer + poster Phase 1 — Pure-function renderer: - cli/drift_report.py (242 LOC) — render_drift_report(response, pr_number, head_sha, base_ref) → Markdown sticky body. HTML marker on line 1 for stateless sticky-comment lookup. - tests/test_drift_report_renderer.py (211 LOC, 8 tests) covering HTML marker, status grouping, zero-row omission, clean state, skip state, list truncation (top 10 + "and N more"), pipe escaping in rendered fields, and idempotence. Phase 2 — GitHub Action workflow + sticky-comment poster: - .github/scripts/post_drift_comment.py (180 LOC) — stdlib-only (urllib) GitHub API client. POST new comment if no marker found, PATCH the existing one if found. Pagination-aware comment finder. - .github/workflows/drift-report.yml (~70 LOC) — advisory (continue-on-error: true) workflow on pull_request: [main, dev], paths-filtered to source files + manifest. permissions: pull-requests: write + contents: read (minimum). pull_request not pull_request_target — fork-safe. - cli/drift_report.py main() CLI entry: Path C graceful skip when no bicameral/decisions.yaml manifest in repo. - tests/test_drift_report_workflow_helpers.py (67 LOC, 4 tests): comment-finder covers no-match, match, duplicate-oldest-wins, empty-list paths. Phase 3 — Integration smoke: - tests/test_drift_report_integration.py (65 LOC, 4 tests) exercises clean.json, drifted.json, truncate.json fixtures through the renderer. Verifies sticky body shape end-to-end. - tests/fixtures/drift_report/{clean,drifted,truncate}.json — hand-crafted LinkCommitResponse fixtures. Validation: - 16/16 new tests pass; 32/32 regression on drift_classifier + M3 benchmark; 48/48 total in targeted sweep. - ruff check + format: all clean. - mypy: no issues found in cli/drift_report.py. - Razor: cli/drift_report.py 242 LOC (≤250); all entry funcs ≤30 LOC; all helpers ≤25 LOC; nesting ≤3; zero nested ternaries. - Maintainer-locked design Q1=Path C honored: graceful skip when no manifest. Manifest spec deferred to follow-up issue. CHANGELOG: [Unreleased] entry under Added. Plan: plan-49-sticky-drift-pr-comment.md (audit PASS at META_LEDGER BicameralAI#15, chain hash b2925935). Implementation commit chains to seal in /qor-substantiate.

Substantiation seal for plan-49-sticky-drift-pr-comment.md (Issue BicameralAI#49, audit PASS at META_LEDGER BicameralAI#15, chain hash b2925935). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 9 new + 1 modified file exist - Test audit: 48/48 (16 new + 32 regression on drift_classifier + M3 benchmark) - Razor final: cli/drift_report.py 242 LOC (≤250); helpers ≤25; nesting ≤3; zero nested ternaries - Skill file integrity: N/A (no MCP tool changes) - SYSTEM_STATE.md synced - Merkle seal computed: 751647b3c58a893c18221db557226af854947f33 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per maintainer direction; release PR is Jin's call) Plan deviation documented: - Integration test count grew 3 → 4 (added truncate test on truncate.json fixture). Plan-augmenting; same infrastructure. Chain: 16 entries on this branch; integrity VALID. Next: /qor-document.

jinhongkuan merged commit a553506 into main Apr 15, 2026
1 check was pending

jinhongkuan deleted the chore/bump-v0.4.9 branch April 15, 2026 02:15

Knapp-Kevin mentioned this pull request Apr 29, 2026

feat(#44): LLM drift judge — uncertain-band sub-protocol #103

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.9 — tester mode + search status fix#15

v0.4.9 — tester mode + search status fix#15
jinhongkuan merged 1 commit into
mainfrom
chore/bump-v0.4.9

jinhongkuan commented Apr 15, 2026

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jinhongkuan commented Apr 15, 2026

Summary

Test plan

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Rate limit exceeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant