v0.4.9 — tester mode + search status fix#15
Conversation
Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester
mode (BICAMERAL_TESTER_MODE=1) that makes bicameral.search and
bicameral.brief emit blocking action_hints the agent must address
before any write operation. For onboarding, demos, and skill
evaluation flows where you want bicameral to push signal at the
agent instead of waiting for the agent to ask.
Four hint kinds:
- review_drift (search + brief) — drifted decisions in scope
- ground_decision (search) — ungrounded matches
- resolve_divergence (brief) — contradictory decisions on
same symbol
- answer_open_questions (brief) — open-question gaps
Each hint has blocking=True and a human-readable message. Enforcement
lives in the skill contract (bicameral-tester SKILL.md). Regular mode
(tester_mode=False, default) is byte-identical to v0.4.8 except for
the new empty action_hints=[] field.
BUG FIX (load-bearing for Phase 2): handle_search_decisions was
reading `status` from raw_regions[0] but code_region rows don't carry
a status field — it's an intent property. Every match had been
silently reported as `pending` regardless of real state, masking
drifted decisions from callers. Now reads intent-level status from
the search_by_bm25 row. Without this fix the review_drift hint
generator couldn't fire because no match ever looked drifted to it.
Surfaced during an end-to-end drift demo on Accountable — see
thoughts/shared/plans/2026-04-14-accountable-drift-demo.md for the
full walkthrough (steps to reproduce, behavior diff, gotchas).
Tests: 24 new cases in tests/test_v049_tester_mode.py covering every
hint kind, both generators, backward compat, and tester_mode env
parse. Full v0.4.9 regression: 146 passed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 39 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ameralAI#5 — audit PASS - META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL. - SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued. Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause: CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD has no pilot/ directory). Followup: docs:claude-md-cleanup workstream to fix CLAUDE.md itself. Plan PASS at d846a4a; chain to /qor-implement.
Phase 1 — M3 benchmark judge-corpus extension: - tests/test_m3_benchmark_judge_corpus.py (4 tests, 83 LOC) - tests/fixtures/m3_benchmark/cases.py — expected_judge field added to all 10 uncertain cases (pure data, ground-truth labels for the operator QC pass) Phase 2 — bicameral-sync skill rubric + training doc: - tests/test_skill_uncertain_protocol.py (4 tests, 96 LOC) - skills/bicameral-sync/SKILL.md §2.bis — Uncertain-band sub-protocol section: Axis 1 (compliance) FIRST, Axis 2 (cosmetic-vs-semantic) SECOND, signals advisory, evidence_refs echoed back. Maps to existing typed contracts (no new fields). - docs/training/cosmetic-vs-semantic.md (198 LOC) — concept doc with worked example from py_12_constant_value_tuned. Pairs with the rubric. - docs/training/README.md — index with cosmetic-vs-semantic active row. Soft-depends on PR BicameralAI#93's docs/training/ scaffolding; this branch creates a minimal version that PR BicameralAI#93 will reconcile on merge. Validation: - Phase 1 + Phase 2 new tests: 8/8 green. - M3 + drift_classifier regression: 32/32 green. - Total: 40/40 green in the targeted sweep. Razor: - All test files ≤ 96 LOC (cap 250). - All test functions ≤ 25 LOC (cap 40). - cases.py 431 LOC under tests/ ruff exclusion. - No production code changes; no schema changes; no new contracts; no new tools; no new dependencies. CHANGELOG: [Unreleased] entry added under Added. Plan: plan-codegenome-llm-drift-judge.md (audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Next: /qor-substantiate.
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 4 new + 3 modified files exist - Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier + drift_service) - Razor final: all files within caps (test ≤96 LOC, no new production functions, cases.py under tests/ exclusion) - Skill file integrity: SKILL.md §2.bis structure verified - SYSTEM_STATE.md synced - Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per user direction; v0.14.0 release PR is Jin's call) Plan deviation documented: - docs/training/README.md created (not modified) on this branch because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror; merges will reconcile. Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not a CI blocker. Chain: 16 entries; integrity VALID. Next: /qor-document.
…ameralAI#5 — audit PASS - META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL. - SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued. Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause: CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD has no pilot/ directory). Followup: docs:claude-md-cleanup workstream to fix CLAUDE.md itself. Plan PASS at d846a4a; chain to /qor-implement.
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 4 new + 3 modified files exist - Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier + drift_service) - Razor final: all files within caps (test ≤96 LOC, no new production functions, cases.py under tests/ exclusion) - Skill file integrity: SKILL.md §2.bis structure verified - SYSTEM_STATE.md synced - Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per user direction; v0.14.0 release PR is Jin's call) Plan deviation documented: - docs/training/README.md created (not modified) on this branch because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror; merges will reconcile. Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not a CI blocker. Chain: 16 entries; integrity VALID. Next: /qor-document.
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 4 new + 3 modified files exist - Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier + drift_service) - Razor final: all files within caps (test ≤96 LOC, no new production functions, cases.py under tests/ exclusion) - Skill file integrity: SKILL.md §2.bis structure verified - SYSTEM_STATE.md synced - Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per user direction; v0.14.0 release PR is Jin's call) Plan deviation documented: - docs/training/README.md created (not modified) on this branch because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror; merges will reconcile. Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not a CI blocker. Chain: 16 entries; integrity VALID. Next: /qor-document.
… — audit PASS - META_LEDGER BicameralAI#15 on this branch: GATE TRIBUNAL entry covering v1 VETO (2f31d6f) + v2 PASS (7da919c). Chain hash b2925935 extends from BicameralAI#14 Phase 4 SEAL (0ebcf69b) on dev. Note: branches feat/44 and feat/49 each carry their own Entry BicameralAI#15 chain extension off dev's BicameralAI#14; reconciliation occurs at release time. - SHADOW_GENOME #5b: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3. Cross-references instances #1 (PR BicameralAI#93 §9) and BicameralAI#2 (feat/44 branch's Entry BicameralAI#5). Pattern is now triggered by plan author trusting mental model over filesystem `ls`. Mitigation: every plan must enumerate existing packages before proposing a new module's home. Plan PASS at 7da919c; chain to /qor-implement.
…derer + poster
Phase 1 — Pure-function renderer:
- cli/drift_report.py (242 LOC) — render_drift_report(response,
pr_number, head_sha, base_ref) → Markdown sticky body. HTML
marker on line 1 for stateless sticky-comment lookup.
- tests/test_drift_report_renderer.py (211 LOC, 8 tests) covering
HTML marker, status grouping, zero-row omission, clean state,
skip state, list truncation (top 10 + "and N more"), pipe
escaping in rendered fields, and idempotence.
Phase 2 — GitHub Action workflow + sticky-comment poster:
- .github/scripts/post_drift_comment.py (180 LOC) — stdlib-only
(urllib) GitHub API client. POST new comment if no marker found,
PATCH the existing one if found. Pagination-aware comment finder.
- .github/workflows/drift-report.yml (~70 LOC) — advisory
(continue-on-error: true) workflow on pull_request: [main, dev],
paths-filtered to source files + manifest. permissions:
pull-requests: write + contents: read (minimum). pull_request
not pull_request_target — fork-safe.
- cli/drift_report.py main() CLI entry: Path C graceful skip when
no bicameral/decisions.yaml manifest in repo.
- tests/test_drift_report_workflow_helpers.py (67 LOC, 4 tests):
comment-finder covers no-match, match, duplicate-oldest-wins,
empty-list paths.
Phase 3 — Integration smoke:
- tests/test_drift_report_integration.py (65 LOC, 4 tests)
exercises clean.json, drifted.json, truncate.json fixtures
through the renderer. Verifies sticky body shape end-to-end.
- tests/fixtures/drift_report/{clean,drifted,truncate}.json —
hand-crafted LinkCommitResponse fixtures.
Validation:
- 16/16 new tests pass; 32/32 regression on drift_classifier +
M3 benchmark; 48/48 total in targeted sweep.
- ruff check + format: all clean.
- mypy: no issues found in cli/drift_report.py.
- Razor: cli/drift_report.py 242 LOC (≤250); all entry funcs
≤30 LOC; all helpers ≤25 LOC; nesting ≤3; zero nested ternaries.
- Maintainer-locked design Q1=Path C honored: graceful skip when
no manifest. Manifest spec deferred to follow-up issue.
CHANGELOG: [Unreleased] entry under Added.
Plan: plan-49-sticky-drift-pr-comment.md (audit PASS at META_LEDGER
BicameralAI#15, chain hash b2925935). Implementation commit chains to seal in
/qor-substantiate.
Substantiation seal for plan-49-sticky-drift-pr-comment.md (Issue BicameralAI#49, audit PASS at META_LEDGER BicameralAI#15, chain hash b2925935). Verification gates (10 of 12 passed; 2 advisory skipped per capability shortfalls): - Reality vs Promise: ✓ all 9 new + 1 modified file exist - Test audit: 48/48 (16 new + 32 regression on drift_classifier + M3 benchmark) - Razor final: cli/drift_report.py 242 LOC (≤250); helpers ≤25; nesting ≤3; zero nested ternaries - Skill file integrity: N/A (no MCP tool changes) - SYSTEM_STATE.md synced - Merkle seal computed: 751647b3c58a893c18221db557226af854947f33 - Step 4.6 reliability sweep: skipped (qor/reliability/ absent) - Step 7.5 version bump: skipped (per maintainer direction; release PR is Jin's call) Plan deviation documented: - Integration test count grew 3 → 4 (added truncate test on truncate.json fixture). Plan-augmenting; same infrastructure. Chain: 16 entries on this branch; integrity VALID. Next: /qor-document.
Summary
Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester mode (
BICAMERAL_TESTER_MODE=1) that makesbicameral.searchandbicameral.briefemit blockingaction_hintsthe agent must address before any write operation. For onboarding, demos, and skill evaluation flows.Four hint kinds:
review_drift(search + brief) — drifted decisions in scopeground_decision(search) — ungrounded matchesresolve_divergence(brief) — contradictory decisions on same symbolanswer_open_questions(brief) — open-question gapsRegular mode (
tester_mode=False, default) is byte-identical to v0.4.8 except for the new emptyaction_hints=[]field.Bug fix (load-bearing):
handle_search_decisionswas readingstatusfromraw_regions[0]butcode_regionrows don't carry a status field — it's an intent property. Every match had been silently reported aspendingregardless of real state, masking drifted decisions from callers. Now reads intent-level status from thesearch_by_bm25row.Surfaced during an end-to-end drift demo on Accountable — see the full walkthrough in the parent repo:
thoughts/shared/plans/2026-04-14-accountable-drift-demo.md.Test plan
tests/test_v049_tester_mode.pycovering every hint kind (fires / doesn't fire), both generators, backward compat, tester_mode env parsetester_mode=OFFreturns empty hints;tester_mode=ONfiresreview_driftwith refs to the edited file,brieffires same hint via its own generatorbicameral-search,bicameral-brief, new top-levelbicameral-tester🤖 Generated with Claude Code