Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147)#152
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Devin Review found 1 potential issue.
⚠️ 1 issue in files not directly in the diff
⚠️ CLAUDE.md Auto-Tick Rule violation: TODO.md item not ticked after stale skill deletion (TODO.md:169)
The CLAUDE.md Auto-Tick Rule mandates: "After completing any implementation work in this directory: 1. Open TODO.md — tick every item that is now done under Engineering Progress." This PR deletes all .claude/skills/bicameral-*/SKILL.md stale duplicates, which completes TODO.md line 169: - [ ] Delete stale .claude/skills/bicameral-*/SKILL.md duplicates that have canonical counterparts. The canonical skills at skills/bicameral-*/ are verified present and untouched. The TODO item was not ticked.
View 6 additional findings in Devin Review.
1864196 to
31a40ca
Compare
31a40ca to
c6b68a9
Compare
…ow 4 Flow 4's gap was incorrectly described as "the same skill-layer gap as Flow 2a" pointing at #154. That's wrong: #154 covers the contradiction-with-prior-decision case (preflight surfaces decision X, user's prompt contradicts X → skill should ingest refinement + resolve_collision). Flow 4 is the *emerging-constraint* case — the user states a new load-bearing constraint mid-session via correction markers ("wait", "shouldn't"), and capture-corrections handles it without any collision-detection logic at all. Updated Flow 4's advisory to reflect this: - Removed the #154 reference (collision-detection isn't in scope for Flow 4) - Pointed at #147 / PR #152 instead, which fixes the path-X-(b) substrate (.bicameral/ bootstrap + --mcp-config passthrough) that the harness needs for Flow 4's SessionEnd subprocess to actually fire - Noted the advisory becomes obsolete once #152 lands on dev Flow 2a's #154 references in `assert_flow_2a`'s docstring and FlowSpec advisory are unchanged — those correctly describe the contradiction- with-prior-decision case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ow 4 Flow 4's gap was incorrectly described as "the same skill-layer gap as Flow 2a" pointing at #154. That's wrong: #154 covers the contradiction-with-prior-decision case (preflight surfaces decision X, user's prompt contradicts X → skill should ingest refinement + resolve_collision). Flow 4 is the *emerging-constraint* case — the user states a new load-bearing constraint mid-session via correction markers ("wait", "shouldn't"), and capture-corrections handles it without any collision-detection logic at all. Updated Flow 4's advisory to reflect this: - Removed the #154 reference (collision-detection isn't in scope for Flow 4) - Pointed at #147 / PR #152 instead, which fixes the path-X-(b) substrate (.bicameral/ bootstrap + --mcp-config passthrough) that the harness needs for Flow 4's SessionEnd subprocess to actually fire - Noted the advisory becomes obsolete once #152 lands on dev Flow 2a's #154 references in `assert_flow_2a`'s docstring and FlowSpec advisory are unchanged — those correctly describe the contradiction- with-prior-decision case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#147) Closes research brief recommendation P1 #3. The installed SessionEnd hook in .claude/settings.json and the source-of-truth constant in setup_wizard.py both omitted the canonical guard prescribed by skills/bicameral-capture-corrections/SKILL.md:207. Two missing pieces, now restored byte-exact: 1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the spawned `claude -p` subprocess fires its OWN SessionEnd hook on exit, recursing indefinitely (bounded only by Claude Code's per-session subprocess depth limit, if any, or filesystem/process exhaustion). The guard env var is inherited by the subprocess; its nested SessionEnd hook short-circuits. 2. `--auto-ingest` flag. The capture-corrections skill in batch mode reads this flag to scan the full session transcript and ingest mechanical corrections directly without surfacing prompts. Without it, the subprocess would default to interactive-mode behavior, producing prompts no one will answer (parent session is closing). Files modified: - .claude/settings.json: SessionEnd hook command replaced with canonical - setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant updated to canonical (drives fresh installs via _install_claude_hooks) Tests: - tests/test_session_end_hook_drift.py: 3 functionality tests - parses .claude/settings.json and asserts substring presence of re-entrancy guard tokens and --auto-ingest flag - imports setup_wizard and asserts byte-exact match against the canonical SKILL.md prescription Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution. The original commit was authored against an older base where the e2e harness scaffold did not yet exist; this rebased version adds only the new logic on top of dev's existing harness. What this commit adds: - `tests/e2e/_ledger_helpers.py` — pure helper `count_agent_session_decisions(snapshot)`, extracted so unit tests can import without triggering the harness's top-level env-var / CLI guards. - `tests/e2e/run_e2e_flows.py`: - `_count_agent_session_decisions(snapshot)` — thin wrapper around the helper that hides the import inside the harness. - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query. Snapshots the ledger after the harness completes and counts decisions with `source_type='agent_session'`. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Ledger error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix cases documented in the docstring. - Invocation site: called once after `_validate_flow3_via_ledger` in `main()`, only when `dev_session` ran. - `tests/test_flow4_ledger_validation.py` — five unit tests against the helper covering: zero rows, error snapshot (None), agent_session presence, mixed source types, and empty decisions list. Why this is decoupled from agent caprice: in-stream Flow 4 evidence requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to trigger capture-corrections. Path-X-(b) validates the *product outcome* (decisions written with the canonical source_type) rather than the *mechanism* (which tool the agent chose). This means a SessionEnd subprocess effect that lands in the ledger after the parent stream-json closes still upgrades the verdict, even when the in-stream signal is absent. Closes research-brief recommendation P0 #2. Note: this commit replaces the original 1f54f1a SHA on the branch via rebase. Governance/META_LEDGER edits and the planning artifacts that were bundled with the original have been dropped here and will land via a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix) that was also bundled is shipping via #155. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bprocess (#147) Without this, Flow 4's path-X-(b) ledger validation has nothing to observe in CI: the SessionEnd hook short-circuits on `[ -d .bicameral ]` because /tmp/desktop-clone has no .bicameral/ subdirectory, so the spawned `claude -p '/bicameral:capture-corrections --auto-ingest'` subprocess never runs. Two changes to the harness, both reusing setup_wizard helpers (no drift between the harness's path and an end-user install): 1. `_bootstrap_bicameral_dir()` — wipes + recreates .bicameral/ inside DESKTOP_REPO_PATH at run start, calling `setup_wizard._write_collaboration_config(mode='solo', ...)` to write a minimal config.yaml. Wired into main() right after the existing ledger + repo resets. 2. `_materialize_settings_with_hook()` now builds the SessionEnd hook command via `setup_wizard._build_session_end_command(mcp_config_path =MCP_CONFIG_PATH)` instead of the bare canonical constant. The parameterized form appends `--mcp-config <materialized.json> --strict-mcp-config` after the prompt, so the spawned subprocess writes its `source=agent_session` decisions into the harness's test ledger (test-results/e2e/ledger.db) — the same ledger `_validate_flow4_via_ledger` queries — instead of the user's default ~/.bicameral/ledger.db. Production end-user installs are unchanged: `_install_claude_hooks` still writes the no-args canonical command (verified by existing test_setup_wizard_renders_canonical_session_end_hook). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c6b68a9 to
db7be94
Compare
Two corrections to Flow 4's advisory text: 1. Drop the "#154" reference. #154 is Flow 2a-specific — it covers the contradiction-with-prior-decision case where the agent must call resolve_collision after ingesting a refinement. Flow 4 is the emerging-constraint case (correction markers "wait", "shouldn't") — capture-corrections handles it without any collision-detection logic. Two distinct gaps; mixing them is misleading. 2. Add #156 reference. The path-X-(b) substrate fixes in this PR are correct (re-entrancy guard, --auto-ingest flag drift, harness .bicameral/ bootstrap, --mcp-config passthrough), but they don't make path-X-(b) actually fire end-to-end. Two stacked problems above the substrate: - Canonical SessionEnd hook command can't pass parent transcript_path to the spawned subprocess (transcript-passing bug) - Even if fixed, --auto-ingest produces unresolved/contradictory state in the ledger by skipping collision detection and confirmation Both tracked as P1 in #156 (design pivot to next-session surfacing via .bicameral/pending-transcripts/ queue). Tests/CI behavior: Flow 4's advisory FAIL still doesn't block CI per the existing advisory gate. The advisory text now accurately reflects why Flow 4 can't pass with this PR's fixes alone, and what would unblock it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Two clean commits on top of
dev, scoped to research-brief recommendations P0 #2 and P1 #3:3baeedb—fix(hooks): SessionEnd hook drift — re-entrancy guard + --auto-ingest (#147)— restores theBICAMERAL_SESSION_END_RUNNINGre-entrancy guard +--auto-ingestflag in.claude/settings.jsonandsetup_wizard._BICAMERAL_SESSION_END_COMMAND. Both were missing relative to the canonicalskills/bicameral-capture-corrections/SKILL.md:207prescription. Without the guard, the spawned subprocess's own SessionEnd hook recurses indefinitely.31a40ca—test(e2e): add Flow 4 path-X-(b) ledger validation (#147)— adds_validate_flow4_via_ledger()to the e2e harness, invoked once after_validate_flow3_via_ledgerinmain()whendev_sessionran. Snapshots the ledger after the harness completes and counts decisions withsource_type='agent_session'. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Decoupled from agent caprice; validates product outcome rather than mechanism.Closes
Rebase note (2026-05-02)
This branch was previously stacked on top of #151 (which has been closed in favor of #155). The branch has been rebased onto
devand the bundled commits dropped:3f856af chore(governance): v0 process cleanup— separate governance PR (META_LEDGER conflicts with parallel cleanup on dev).f4de501 fix(skill): preflight auto-fire #146— shipping via fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 instead.1864196 docs(governance): SHADOW_GENOME H1-H4 entry + #147 plan/research/seal— separate governance PR (META_LEDGER + planning artifacts).The original
1f54f1awas also re-applied surgically: it had been authored against an older base wheretests/e2e/run_e2e_flows.pydid not yet exist, so the original commit appeared to "create" 1267 lines of harness scaffold that dev now owns. The rebased version (31a40ca) ports only the genuinely new logic — the_validate_flow4_via_ledgerfunction, its helper, the unit test file, and the call site — on top of dev's existing harness.PR is now ready for rebase merge (linear history, no merge commits, no conflicts).
Test plan
Local validation:
pytest tests/test_flow4_ledger_validation.py tests/test_session_end_hook_drift.py— 10/10 PASS in 0.19spython -m json.tool .claude/settings.json— validruff format --check .clean (207 files)ruff check .cleanAuthoritative integration test (runs in CI on
dev):tests/e2e/run_e2e_flows.py— Flow 4 ledger-validation upgrade-path executes end-to-end. Once fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 lands first, the path-A in-stream signal becomes more reliable; this PR's path-X-(b) is the orthogonal post-hoc check that catches the SessionEnd subprocess effect regardless.Out of scope (deferred to separate PRs)
3f856affrom old branch).UserPromptSubmithook — see fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155.Related
🤖 Generated with Claude Code