Skip to content

v0.4.9 — tester mode + search status fix#15

Merged
jinhongkuan merged 1 commit into
mainfrom
chore/bump-v0.4.9
Apr 15, 2026
Merged

v0.4.9 — tester mode + search status fix#15
jinhongkuan merged 1 commit into
mainfrom
chore/bump-v0.4.9

Conversation

@jinhongkuan

Copy link
Copy Markdown
Contributor

Summary

Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester mode (BICAMERAL_TESTER_MODE=1) that makes bicameral.search and bicameral.brief emit blocking action_hints the agent must address before any write operation. For onboarding, demos, and skill evaluation flows.

Four hint kinds:

  • review_drift (search + brief) — drifted decisions in scope
  • ground_decision (search) — ungrounded matches
  • resolve_divergence (brief) — contradictory decisions on same symbol
  • answer_open_questions (brief) — open-question gaps

Regular mode (tester_mode=False, default) is byte-identical to v0.4.8 except for the new empty action_hints=[] field.

Bug fix (load-bearing): handle_search_decisions was reading status from raw_regions[0] but code_region rows don't carry a status field — it's an intent property. Every match had been silently reported as pending regardless of real state, masking drifted decisions from callers. Now reads intent-level status from the search_by_bm25 row.

Surfaced during an end-to-end drift demo on Accountable — see the full walkthrough in the parent repo: thoughts/shared/plans/2026-04-14-accountable-drift-demo.md.

Test plan

  • 24 new cases in tests/test_v049_tester_mode.py covering every hint kind (fires / doesn't fire), both generators, backward compat, tester_mode env parse
  • Full v0.4.9 regression: 146 passed in 12s
  • Manual: verified on Accountable ledger — tester_mode=OFF returns empty hints; tester_mode=ON fires review_drift with refs to the edited file, brief fires same hint via its own generator
  • Skill contracts updated: bicameral-search, bicameral-brief, new top-level bicameral-tester

🤖 Generated with Claude Code

Phase 2 of the v0.4.8 "Second Moment" bucket. Adds an opt-in tester
mode (BICAMERAL_TESTER_MODE=1) that makes bicameral.search and
bicameral.brief emit blocking action_hints the agent must address
before any write operation. For onboarding, demos, and skill
evaluation flows where you want bicameral to push signal at the
agent instead of waiting for the agent to ask.

Four hint kinds:
  - review_drift        (search + brief) — drifted decisions in scope
  - ground_decision     (search)         — ungrounded matches
  - resolve_divergence  (brief)          — contradictory decisions on
                                            same symbol
  - answer_open_questions (brief)        — open-question gaps

Each hint has blocking=True and a human-readable message. Enforcement
lives in the skill contract (bicameral-tester SKILL.md). Regular mode
(tester_mode=False, default) is byte-identical to v0.4.8 except for
the new empty action_hints=[] field.

BUG FIX (load-bearing for Phase 2): handle_search_decisions was
reading `status` from raw_regions[0] but code_region rows don't carry
a status field — it's an intent property. Every match had been
silently reported as `pending` regardless of real state, masking
drifted decisions from callers. Now reads intent-level status from
the search_by_bm25 row. Without this fix the review_drift hint
generator couldn't fire because no match ever looked drifted to it.

Surfaced during an end-to-end drift demo on Accountable — see
thoughts/shared/plans/2026-04-14-accountable-drift-demo.md for the
full walkthrough (steps to reproduce, behavior diff, gotchas).

Tests: 24 new cases in tests/test_v049_tester_mode.py covering every
hint kind, both generators, backward compat, and tester_mode env
parse. Full v0.4.9 regression: 146 passed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan merged commit a553506 into main Apr 15, 2026
1 check was pending
@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@jinhongkuan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 637ba6f1-1214-41b7-a478-500100d3ef31

📥 Commits

Reviewing files that changed from the base of the PR and between 71b0c8a and 897fec0.

📒 Files selected for processing (11)
  • CHANGELOG.md
  • context.py
  • contracts.py
  • handlers/action_hints.py
  • handlers/brief.py
  • handlers/search_decisions.py
  • pyproject.toml
  • skills/bicameral-brief/SKILL.md
  • skills/bicameral-search/SKILL.md
  • skills/bicameral-tester/SKILL.md
  • tests/test_v049_tester_mode.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/bump-v0.4.9

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan deleted the chore/bump-v0.4.9 branch April 15, 2026 02:15
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
…ameralAI#5 — audit PASS

- META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit
  iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain
  hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL.
- SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued.
  Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause:
  CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD
  has no pilot/ directory). Followup: docs:claude-md-cleanup
  workstream to fix CLAUDE.md itself.

Plan PASS at d846a4a; chain to /qor-implement.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
Phase 1 — M3 benchmark judge-corpus extension:
- tests/test_m3_benchmark_judge_corpus.py (4 tests, 83 LOC)
- tests/fixtures/m3_benchmark/cases.py — expected_judge field added
  to all 10 uncertain cases (pure data, ground-truth labels for
  the operator QC pass)

Phase 2 — bicameral-sync skill rubric + training doc:
- tests/test_skill_uncertain_protocol.py (4 tests, 96 LOC)
- skills/bicameral-sync/SKILL.md §2.bis — Uncertain-band
  sub-protocol section: Axis 1 (compliance) FIRST, Axis 2
  (cosmetic-vs-semantic) SECOND, signals advisory, evidence_refs
  echoed back. Maps to existing typed contracts (no new fields).
- docs/training/cosmetic-vs-semantic.md (198 LOC) — concept doc
  with worked example from py_12_constant_value_tuned. Pairs
  with the rubric.
- docs/training/README.md — index with cosmetic-vs-semantic
  active row. Soft-depends on PR BicameralAI#93's docs/training/ scaffolding;
  this branch creates a minimal version that PR BicameralAI#93 will reconcile
  on merge.

Validation:
- Phase 1 + Phase 2 new tests: 8/8 green.
- M3 + drift_classifier regression: 32/32 green.
- Total: 40/40 green in the targeted sweep.

Razor:
- All test files ≤ 96 LOC (cap 250).
- All test functions ≤ 25 LOC (cap 40).
- cases.py 431 LOC under tests/ ruff exclusion.
- No production code changes; no schema changes; no new contracts;
  no new tools; no new dependencies.

CHANGELOG: [Unreleased] entry added under Added.

Plan: plan-codegenome-llm-drift-judge.md (audit PASS at META_LEDGER
BicameralAI#15, chain hash 536dd15f...). Next: /qor-substantiate.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue
BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...).

Verification gates (10 of 12 passed; 2 advisory skipped per
capability shortfalls):
- Reality vs Promise: ✓ all 4 new + 3 modified files exist
- Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier
  + drift_service)
- Razor final: all files within caps (test ≤96 LOC, no new
  production functions, cases.py under tests/ exclusion)
- Skill file integrity: SKILL.md §2.bis structure verified
- SYSTEM_STATE.md synced
- Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904
- Step 4.6 reliability sweep: skipped (qor/reliability/ absent)
- Step 7.5 version bump: skipped (per user direction; v0.14.0
  release PR is Jin's call)

Plan deviation documented:
- docs/training/README.md created (not modified) on this branch
  because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror;
  merges will reconcile.

Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not
a CI blocker.

Chain: 16 entries; integrity VALID. Next: /qor-document.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
…ameralAI#5 — audit PASS

- META_LEDGER BicameralAI#15: GATE TRIBUNAL entry covering both audit
  iterations (v1 VETO at b15c9ef, v2 PASS at d846a4a). Chain
  hash 536dd15f extends from BicameralAI#14 Phase 4 SEAL.
- SHADOW_GENOME BicameralAI#5: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#2 catalogued.
  Cross-references PR BicameralAI#93 §9 as instance #1 (same root cause:
  CLAUDE.md asserts pilot/mcp/skills/ canonicality but dev HEAD
  has no pilot/ directory). Followup: docs:claude-md-cleanup
  workstream to fix CLAUDE.md itself.

Plan PASS at d846a4a; chain to /qor-implement.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue
BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...).

Verification gates (10 of 12 passed; 2 advisory skipped per
capability shortfalls):
- Reality vs Promise: ✓ all 4 new + 3 modified files exist
- Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier
  + drift_service)
- Razor final: all files within caps (test ≤96 LOC, no new
  production functions, cases.py under tests/ exclusion)
- Skill file integrity: SKILL.md §2.bis structure verified
- SYSTEM_STATE.md synced
- Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904
- Step 4.6 reliability sweep: skipped (qor/reliability/ absent)
- Step 7.5 version bump: skipped (per user direction; v0.14.0
  release PR is Jin's call)

Plan deviation documented:
- docs/training/README.md created (not modified) on this branch
  because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror;
  merges will reconcile.

Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not
a CI blocker.

Chain: 16 entries; integrity VALID. Next: /qor-document.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
Substantiation seal for plan-codegenome-llm-drift-judge.md (Issue
BicameralAI#44, audit PASS at META_LEDGER BicameralAI#15, chain hash 536dd15f...).

Verification gates (10 of 12 passed; 2 advisory skipped per
capability shortfalls):
- Reality vs Promise: ✓ all 4 new + 3 modified files exist
- Test audit: 48/48 (8 new + 40 regression on M3 + drift_classifier
  + drift_service)
- Razor final: all files within caps (test ≤96 LOC, no new
  production functions, cases.py under tests/ exclusion)
- Skill file integrity: SKILL.md §2.bis structure verified
- SYSTEM_STATE.md synced
- Merkle seal computed: 567170e0f1dc008cd5663201d8b1582dbabb5904
- Step 4.6 reliability sweep: skipped (qor/reliability/ absent)
- Step 7.5 version bump: skipped (per user direction; v0.14.0
  release PR is Jin's call)

Plan deviation documented:
- docs/training/README.md created (not modified) on this branch
  because PR BicameralAI#93 scaffolding hasn't merged to dev. Minimal mirror;
  merges will reconcile.

Operator QC pass (D6 BicameralAI#5) recorded as pending qualitative gate, not
a CI blocker.

Chain: 16 entries; integrity VALID. Next: /qor-document.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
… — audit PASS

- META_LEDGER BicameralAI#15 on this branch: GATE TRIBUNAL entry covering
  v1 VETO (2f31d6f) + v2 PASS (7da919c). Chain hash b2925935
  extends from BicameralAI#14 Phase 4 SEAL (0ebcf69b) on dev.
  Note: branches feat/44 and feat/49 each carry their own
  Entry BicameralAI#15 chain extension off dev's BicameralAI#14; reconciliation
  occurs at release time.
- SHADOW_GENOME #5b: SG-PLAN-GROUNDING-DRIFT instance BicameralAI#3.
  Cross-references instances #1 (PR BicameralAI#93 §9) and BicameralAI#2 (feat/44
  branch's Entry BicameralAI#5). Pattern is now triggered by plan author
  trusting mental model over filesystem `ls`. Mitigation:
  every plan must enumerate existing packages before proposing
  a new module's home.

Plan PASS at 7da919c; chain to /qor-implement.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
…derer + poster

Phase 1 — Pure-function renderer:
- cli/drift_report.py (242 LOC) — render_drift_report(response,
  pr_number, head_sha, base_ref) → Markdown sticky body. HTML
  marker on line 1 for stateless sticky-comment lookup.
- tests/test_drift_report_renderer.py (211 LOC, 8 tests) covering
  HTML marker, status grouping, zero-row omission, clean state,
  skip state, list truncation (top 10 + "and N more"), pipe
  escaping in rendered fields, and idempotence.

Phase 2 — GitHub Action workflow + sticky-comment poster:
- .github/scripts/post_drift_comment.py (180 LOC) — stdlib-only
  (urllib) GitHub API client. POST new comment if no marker found,
  PATCH the existing one if found. Pagination-aware comment finder.
- .github/workflows/drift-report.yml (~70 LOC) — advisory
  (continue-on-error: true) workflow on pull_request: [main, dev],
  paths-filtered to source files + manifest. permissions:
  pull-requests: write + contents: read (minimum). pull_request
  not pull_request_target — fork-safe.
- cli/drift_report.py main() CLI entry: Path C graceful skip when
  no bicameral/decisions.yaml manifest in repo.
- tests/test_drift_report_workflow_helpers.py (67 LOC, 4 tests):
  comment-finder covers no-match, match, duplicate-oldest-wins,
  empty-list paths.

Phase 3 — Integration smoke:
- tests/test_drift_report_integration.py (65 LOC, 4 tests)
  exercises clean.json, drifted.json, truncate.json fixtures
  through the renderer. Verifies sticky body shape end-to-end.
- tests/fixtures/drift_report/{clean,drifted,truncate}.json —
  hand-crafted LinkCommitResponse fixtures.

Validation:
- 16/16 new tests pass; 32/32 regression on drift_classifier +
  M3 benchmark; 48/48 total in targeted sweep.
- ruff check + format: all clean.
- mypy: no issues found in cli/drift_report.py.
- Razor: cli/drift_report.py 242 LOC (≤250); all entry funcs
  ≤30 LOC; all helpers ≤25 LOC; nesting ≤3; zero nested ternaries.
- Maintainer-locked design Q1=Path C honored: graceful skip when
  no manifest. Manifest spec deferred to follow-up issue.

CHANGELOG: [Unreleased] entry under Added.

Plan: plan-49-sticky-drift-pr-comment.md (audit PASS at META_LEDGER
BicameralAI#15, chain hash b2925935). Implementation commit chains to seal in
/qor-substantiate.
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request Apr 29, 2026
Substantiation seal for plan-49-sticky-drift-pr-comment.md (Issue BicameralAI#49,
audit PASS at META_LEDGER BicameralAI#15, chain hash b2925935).

Verification gates (10 of 12 passed; 2 advisory skipped per
capability shortfalls):
- Reality vs Promise: ✓ all 9 new + 1 modified file exist
- Test audit: 48/48 (16 new + 32 regression on drift_classifier
  + M3 benchmark)
- Razor final: cli/drift_report.py 242 LOC (≤250); helpers ≤25;
  nesting ≤3; zero nested ternaries
- Skill file integrity: N/A (no MCP tool changes)
- SYSTEM_STATE.md synced
- Merkle seal computed: 751647b3c58a893c18221db557226af854947f33
- Step 4.6 reliability sweep: skipped (qor/reliability/ absent)
- Step 7.5 version bump: skipped (per maintainer direction;
  release PR is Jin's call)

Plan deviation documented:
- Integration test count grew 3 → 4 (added truncate test on
  truncate.json fixture). Plan-augmenting; same infrastructure.

Chain: 16 entries on this branch; integrity VALID. Next:
/qor-document.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant