Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147) by Knapp-Kevin · Pull Request #152 · BicameralAI/bicameral-mcp

Knapp-Kevin · 2026-05-02T04:25:06Z

Summary

Two clean commits on top of dev, scoped to research-brief recommendations P0 #2 and P1 #3:

3baeedb — fix(hooks): SessionEnd hook drift — re-entrancy guard + --auto-ingest (#147) — restores the BICAMERAL_SESSION_END_RUNNING re-entrancy guard + --auto-ingest flag in .claude/settings.json and setup_wizard._BICAMERAL_SESSION_END_COMMAND. Both were missing relative to the canonical skills/bicameral-capture-corrections/SKILL.md:207 prescription. Without the guard, the spawned subprocess's own SessionEnd hook recurses indefinitely.
31a40ca — test(e2e): add Flow 4 path-X-(b) ledger validation (#147) — adds _validate_flow4_via_ledger() to the e2e harness, invoked once after _validate_flow3_via_ledger in main() when dev_session ran. Snapshots the ledger after the harness completes and counts decisions with source_type='agent_session'. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Decoupled from agent caprice; validates product outcome rather than mechanism.

Closes

Closes feat(skill): session-end auto-capture of uningested decisions — research + observable validation #147 — both acceptance criteria satisfied: an automated signal backs "the system captures the decisions you forgot to ingest" without naming a tool in the prompt.

Rebase note (2026-05-02)

This branch was previously stacked on top of #151 (which has been closed in favor of #155). The branch has been rebased onto dev and the bundled commits dropped:

3f856af chore(governance): v0 process cleanup — separate governance PR (META_LEDGER conflicts with parallel cleanup on dev).
f4de501 fix(skill): preflight auto-fire #146 — shipping via fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 instead.
1864196 docs(governance): SHADOW_GENOME H1-H4 entry + #147 plan/research/seal — separate governance PR (META_LEDGER + planning artifacts).

The original 1f54f1a was also re-applied surgically: it had been authored against an older base where tests/e2e/run_e2e_flows.py did not yet exist, so the original commit appeared to "create" 1267 lines of harness scaffold that dev now owns. The rebased version (31a40ca) ports only the genuinely new logic — the _validate_flow4_via_ledger function, its helper, the unit test file, and the call site — on top of dev's existing harness.

PR is now ready for rebase merge (linear history, no merge commits, no conflicts).

Test plan

Local validation:

pytest tests/test_flow4_ledger_validation.py tests/test_session_end_hook_drift.py — 10/10 PASS in 0.19s
python -m json.tool .claude/settings.json — valid
ruff format --check . clean (207 files)
ruff check . clean

Authoritative integration test (runs in CI on dev):

tests/e2e/run_e2e_flows.py — Flow 4 ledger-validation upgrade-path executes end-to-end. Once fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 lands first, the path-A in-stream signal becomes more reliable; this PR's path-X-(b) is the orthogonal post-hoc check that catches the SessionEnd subprocess effect regardless.

Out of scope (deferred to separate PRs)

v0 process cleanup governance commit (3f856af from old branch).
Auto-fire UserPromptSubmit hook — see fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155.
SHADOW_GENOME H1-H4 hypotheses entry + feat(skill): session-end auto-capture of uningested decisions — research + observable validation #147 META_LEDGER seal — governance PR.

fix(skill): preflight does not auto-fire on natural refactor prompts in headless Claude Code sessions #146 — closed by fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 (auto-fire fix).
[P0] Preflight skill does not instruct agent to capture refinements when user prompt contradicts surfaced decisions #154 — P0 skill-layer gap: preflight surfaces decisions but doesn't instruct the agent to capture refinements when the user prompt contradicts a surfaced decision. This PR's path-X-(b) ledger validation is the orthogonal way to capture the SessionEnd subprocess effect when the in-stream agentic-layer signal is absent.

🤖 Generated with Claude Code

coderabbitai · 2026-05-02T04:25:13Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 648f190a-50f8-4387-9119-1053d3fb6659

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/issue-147-flow4-ledger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

devin-ai-integration

Devin Review found 1 potential issue.

⚠️ 1 issue in files not directly in the diff

⚠️ CLAUDE.md Auto-Tick Rule violation: TODO.md item not ticked after stale skill deletion (`TODO.md:169`)

The CLAUDE.md Auto-Tick Rule mandates: "After completing any implementation work in this directory: 1. Open TODO.md — tick every item that is now done under Engineering Progress." This PR deletes all .claude/skills/bicameral-*/SKILL.md stale duplicates, which completes TODO.md line 169: - [ ] Delete stale .claude/skills/bicameral-*/SKILL.md duplicates that have canonical counterparts. The canonical skills at skills/bicameral-*/ are verified present and untouched. The TODO item was not ticked.

View 6 additional findings in Devin Review.

…ow 4 Flow 4's gap was incorrectly described as "the same skill-layer gap as Flow 2a" pointing at #154. That's wrong: #154 covers the contradiction-with-prior-decision case (preflight surfaces decision X, user's prompt contradicts X → skill should ingest refinement + resolve_collision). Flow 4 is the *emerging-constraint* case — the user states a new load-bearing constraint mid-session via correction markers ("wait", "shouldn't"), and capture-corrections handles it without any collision-detection logic at all. Updated Flow 4's advisory to reflect this: - Removed the #154 reference (collision-detection isn't in scope for Flow 4) - Pointed at #147 / PR #152 instead, which fixes the path-X-(b) substrate (.bicameral/ bootstrap + --mcp-config passthrough) that the harness needs for Flow 4's SessionEnd subprocess to actually fire - Noted the advisory becomes obsolete once #152 lands on dev Flow 2a's #154 references in `assert_flow_2a`'s docstring and FlowSpec advisory are unchanged — those correctly describe the contradiction- with-prior-decision case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…bprocess (#147) Without this, Flow 4's path-X-(b) ledger validation has nothing to observe in CI: the SessionEnd hook short-circuits on `[ -d .bicameral ]` because /tmp/desktop-clone has no .bicameral/ subdirectory, so the spawned `claude -p '/bicameral:capture-corrections --auto-ingest'` subprocess never runs. Two changes to the harness, both reusing setup_wizard helpers (no drift between the harness's path and an end-user install): 1. `_bootstrap_bicameral_dir()` — wipes + recreates .bicameral/ inside DESKTOP_REPO_PATH at run start, calling `setup_wizard._write_collaboration_config(mode='solo', ...)` to write a minimal config.yaml. Wired into main() right after the existing ledger + repo resets. 2. `_materialize_settings_with_hook()` now builds the SessionEnd hook command via `setup_wizard._build_session_end_command(mcp_config_path =MCP_CONFIG_PATH)` instead of the bare canonical constant. The parameterized form appends `--mcp-config <materialized.json> --strict-mcp-config` after the prompt, so the spawned subprocess writes its `source=agent_session` decisions into the harness's test ledger (test-results/e2e/ledger.db) — the same ledger `_validate_flow4_via_ledger` queries — instead of the user's default ~/.bicameral/ledger.db. Production end-user installs are unchanged: `_install_claude_hooks` still writes the no-args canonical command (verified by existing test_setup_wizard_renders_canonical_session_end_hook). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two corrections to Flow 4's advisory text: 1. Drop the "#154" reference. #154 is Flow 2a-specific — it covers the contradiction-with-prior-decision case where the agent must call resolve_collision after ingesting a refinement. Flow 4 is the emerging-constraint case (correction markers "wait", "shouldn't") — capture-corrections handles it without any collision-detection logic. Two distinct gaps; mixing them is misleading. 2. Add #156 reference. The path-X-(b) substrate fixes in this PR are correct (re-entrancy guard, --auto-ingest flag drift, harness .bicameral/ bootstrap, --mcp-config passthrough), but they don't make path-X-(b) actually fire end-to-end. Two stacked problems above the substrate: - Canonical SessionEnd hook command can't pass parent transcript_path to the spawned subprocess (transcript-passing bug) - Even if fixed, --auto-ingest produces unresolved/contradictory state in the ledger by skipping collision detection and confirmation Both tracked as P1 in #156 (design pivot to next-session surfacing via .bicameral/pending-transcripts/ queue). Tests/CI behavior: Flow 4's advisory FAIL still doesn't block CI per the existing advisory gate. The advisory text now accurately reflects why Flow 4 can't pass with this PR's fixes alone, and what would unblock it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jinhongkuan self-requested a review May 2, 2026 05:47

devin-ai-integration Bot reviewed May 2, 2026

View reviewed changes

jinhongkuan force-pushed the claude/issue-147-flow4-ledger branch from 1864196 to 31a40ca Compare May 2, 2026 07:59

jinhongkuan temporarily deployed to ci-test May 2, 2026 07:59 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 2, 2026 07:59 — with GitHub Actions Failure

jinhongkuan temporarily deployed to production May 2, 2026 07:59 — with GitHub Actions Inactive

jinhongkuan force-pushed the claude/issue-147-flow4-ledger branch from 31a40ca to c6b68a9 Compare May 2, 2026 09:17

Knapp-Kevin and others added 3 commits May 2, 2026 02:20

jinhongkuan force-pushed the claude/issue-147-flow4-ledger branch from c6b68a9 to db7be94 Compare May 2, 2026 09:22

jinhongkuan temporarily deployed to ci-test May 2, 2026 09:22 — with GitHub Actions Inactive

jinhongkuan temporarily deployed to production May 2, 2026 09:22 — with GitHub Actions Inactive

jinhongkuan temporarily deployed to ci-test May 2, 2026 09:22 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 2, 2026 09:22 — with GitHub Actions Failure

jinhongkuan mentioned this pull request May 2, 2026

[P1] SessionEnd capture-corrections hook is silently broken — design pivot to next-session surfacing #156

Closed

jinhongkuan temporarily deployed to production May 2, 2026 10:19 — with GitHub Actions Inactive

jinhongkuan had a problem deploying to recording-approval May 2, 2026 10:19 — with GitHub Actions Failure

jinhongkuan temporarily deployed to ci-test May 2, 2026 10:19 — with GitHub Actions Inactive

jinhongkuan merged commit cd9b7d2 into dev May 2, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147)#152

Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147)#152
jinhongkuan merged 4 commits into
devfrom
claude/issue-147-flow4-ledger

Knapp-Kevin commented May 2, 2026 •

edited by jinhongkuan

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Review skipped

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Knapp-Kevin commented May 2, 2026 • edited by jinhongkuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Closes

Rebase note (2026-05-02)

Test plan

Out of scope (deferred to separate PRs)

Related

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

⚠️ CLAUDE.md Auto-Tick Rule violation: TODO.md item not ticked after stale skill deletion (TODO.md:169)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Knapp-Kevin commented May 2, 2026 •

edited by jinhongkuan

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading

⚠️ CLAUDE.md Auto-Tick Rule violation: TODO.md item not ticked after stale skill deletion (`TODO.md:169`)