Skip to content

Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147)#152

Merged
jinhongkuan merged 4 commits into
devfrom
claude/issue-147-flow4-ledger
May 2, 2026
Merged

Flow 4 path-X-(b) ledger validation + SessionEnd hook drift fix (#147)#152
jinhongkuan merged 4 commits into
devfrom
claude/issue-147-flow4-ledger

Conversation

@Knapp-Kevin

@Knapp-Kevin Knapp-Kevin commented May 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Two clean commits on top of dev, scoped to research-brief recommendations P0 #2 and P1 #3:

  • 3baeedbfix(hooks): SessionEnd hook drift — re-entrancy guard + --auto-ingest (#147) — restores the BICAMERAL_SESSION_END_RUNNING re-entrancy guard + --auto-ingest flag in .claude/settings.json and setup_wizard._BICAMERAL_SESSION_END_COMMAND. Both were missing relative to the canonical skills/bicameral-capture-corrections/SKILL.md:207 prescription. Without the guard, the spawned subprocess's own SessionEnd hook recurses indefinitely.

  • 31a40catest(e2e): add Flow 4 path-X-(b) ledger validation (#147) — adds _validate_flow4_via_ledger() to the e2e harness, invoked once after _validate_flow3_via_ledger in main() when dev_session ran. Snapshots the ledger after the harness completes and counts decisions with source_type='agent_session'. Asserter FAIL + ledger has agent_session → UPGRADE to PASS with explicit annotation. Decoupled from agent caprice; validates product outcome rather than mechanism.

Closes

Rebase note (2026-05-02)

This branch was previously stacked on top of #151 (which has been closed in favor of #155). The branch has been rebased onto dev and the bundled commits dropped:

  • 3f856af chore(governance): v0 process cleanup — separate governance PR (META_LEDGER conflicts with parallel cleanup on dev).
  • f4de501 fix(skill): preflight auto-fire #146 — shipping via fix(skill): preflight auto-fire on natural refactor prompts (replaces #151) #155 instead.
  • 1864196 docs(governance): SHADOW_GENOME H1-H4 entry + #147 plan/research/seal — separate governance PR (META_LEDGER + planning artifacts).

The original 1f54f1a was also re-applied surgically: it had been authored against an older base where tests/e2e/run_e2e_flows.py did not yet exist, so the original commit appeared to "create" 1267 lines of harness scaffold that dev now owns. The rebased version (31a40ca) ports only the genuinely new logic — the _validate_flow4_via_ledger function, its helper, the unit test file, and the call site — on top of dev's existing harness.

PR is now ready for rebase merge (linear history, no merge commits, no conflicts).

Test plan

Local validation:

  • pytest tests/test_flow4_ledger_validation.py tests/test_session_end_hook_drift.py — 10/10 PASS in 0.19s
  • python -m json.tool .claude/settings.json — valid
  • ruff format --check . clean (207 files)
  • ruff check . clean

Authoritative integration test (runs in CI on dev):

Out of scope (deferred to separate PRs)

Related

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 2, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 648f190a-50f8-4387-9119-1053d3fb6659

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/issue-147-flow4-ledger

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan self-requested a review May 2, 2026 05:47

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

⚠️ 1 issue in files not directly in the diff

⚠️ CLAUDE.md Auto-Tick Rule violation: TODO.md item not ticked after stale skill deletion (TODO.md:169)

The CLAUDE.md Auto-Tick Rule mandates: "After completing any implementation work in this directory: 1. Open TODO.md — tick every item that is now done under Engineering Progress." This PR deletes all .claude/skills/bicameral-*/SKILL.md stale duplicates, which completes TODO.md line 169: - [ ] Delete stale .claude/skills/bicameral-*/SKILL.md duplicates that have canonical counterparts. The canonical skills at skills/bicameral-*/ are verified present and untouched. The TODO item was not ticked.

View 6 additional findings in Devin Review.

Open in Devin Review

@jinhongkuan jinhongkuan force-pushed the claude/issue-147-flow4-ledger branch from 1864196 to 31a40ca Compare May 2, 2026 07:59
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 2, 2026 07:59 — with GitHub Actions Failure
@jinhongkuan jinhongkuan force-pushed the claude/issue-147-flow4-ledger branch from 31a40ca to c6b68a9 Compare May 2, 2026 09:17
jinhongkuan added a commit that referenced this pull request May 2, 2026
…ow 4

Flow 4's gap was incorrectly described as "the same skill-layer gap as
Flow 2a" pointing at #154. That's wrong: #154 covers the
contradiction-with-prior-decision case (preflight surfaces decision X,
user's prompt contradicts X → skill should ingest refinement +
resolve_collision). Flow 4 is the *emerging-constraint* case — the user
states a new load-bearing constraint mid-session via correction markers
("wait", "shouldn't"), and capture-corrections handles it without any
collision-detection logic at all.

Updated Flow 4's advisory to reflect this:
- Removed the #154 reference (collision-detection isn't in scope for
  Flow 4)
- Pointed at #147 / PR #152 instead, which fixes the path-X-(b)
  substrate (.bicameral/ bootstrap + --mcp-config passthrough) that
  the harness needs for Flow 4's SessionEnd subprocess to actually fire
- Noted the advisory becomes obsolete once #152 lands on dev

Flow 2a's #154 references in `assert_flow_2a`'s docstring and FlowSpec
advisory are unchanged — those correctly describe the contradiction-
with-prior-decision case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jinhongkuan added a commit that referenced this pull request May 2, 2026
…ow 4

Flow 4's gap was incorrectly described as "the same skill-layer gap as
Flow 2a" pointing at #154. That's wrong: #154 covers the
contradiction-with-prior-decision case (preflight surfaces decision X,
user's prompt contradicts X → skill should ingest refinement +
resolve_collision). Flow 4 is the *emerging-constraint* case — the user
states a new load-bearing constraint mid-session via correction markers
("wait", "shouldn't"), and capture-corrections handles it without any
collision-detection logic at all.

Updated Flow 4's advisory to reflect this:
- Removed the #154 reference (collision-detection isn't in scope for
  Flow 4)
- Pointed at #147 / PR #152 instead, which fixes the path-X-(b)
  substrate (.bicameral/ bootstrap + --mcp-config passthrough) that
  the harness needs for Flow 4's SessionEnd subprocess to actually fire
- Noted the advisory becomes obsolete once #152 lands on dev

Flow 2a's #154 references in `assert_flow_2a`'s docstring and FlowSpec
advisory are unchanged — those correctly describe the contradiction-
with-prior-decision case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin and others added 3 commits May 2, 2026 02:20
…#147)

Closes research brief recommendation P1 #3. The installed SessionEnd
hook in .claude/settings.json and the source-of-truth constant in
setup_wizard.py both omitted the canonical guard prescribed by
skills/bicameral-capture-corrections/SKILL.md:207.

Two missing pieces, now restored byte-exact:

1. BICAMERAL_SESSION_END_RUNNING env-var guard. Without it, the
   spawned `claude -p` subprocess fires its OWN SessionEnd hook on
   exit, recursing indefinitely (bounded only by Claude Code's
   per-session subprocess depth limit, if any, or filesystem/process
   exhaustion). The guard env var is inherited by the subprocess; its
   nested SessionEnd hook short-circuits.

2. `--auto-ingest` flag. The capture-corrections skill in batch mode
   reads this flag to scan the full session transcript and ingest
   mechanical corrections directly without surfacing prompts. Without
   it, the subprocess would default to interactive-mode behavior,
   producing prompts no one will answer (parent session is closing).

Files modified:
- .claude/settings.json: SessionEnd hook command replaced with canonical
- setup_wizard.py:343-347: _BICAMERAL_SESSION_END_COMMAND constant
  updated to canonical (drives fresh installs via _install_claude_hooks)

Tests:
- tests/test_session_end_hook_drift.py: 3 functionality tests
  - parses .claude/settings.json and asserts substring presence of
    re-entrancy guard tokens and --auto-ingest flag
  - imports setup_wizard and asserts byte-exact match against the
    canonical SKILL.md prescription

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cherry-picked from 1f54f1a, scope-narrowed to the surgical contribution.
The original commit was authored against an older base where the e2e
harness scaffold did not yet exist; this rebased version adds only the
new logic on top of dev's existing harness.

What this commit adds:

- `tests/e2e/_ledger_helpers.py` — pure helper
  `count_agent_session_decisions(snapshot)`, extracted so unit tests can
  import without triggering the harness's top-level env-var / CLI guards.

- `tests/e2e/run_e2e_flows.py`:
  - `_count_agent_session_decisions(snapshot)` — thin wrapper around the
    helper that hides the import inside the harness.
  - `_validate_flow4_via_ledger()` — path-X-(b) post-hoc ledger query.
    Snapshots the ledger after the harness completes and counts decisions
    with `source_type='agent_session'`. Asserter FAIL + ledger has
    agent_session → UPGRADE to PASS with explicit annotation. Ledger
    error → INCONCLUSIVE (verdict unchanged). All five behavior-matrix
    cases documented in the docstring.
  - Invocation site: called once after `_validate_flow3_via_ledger` in
    `main()`, only when `dev_session` ran.

- `tests/test_flow4_ledger_validation.py` — five unit tests against the
  helper covering: zero rows, error snapshot (None), agent_session
  presence, mixed source types, and empty decisions list.

Why this is decoupled from agent caprice: in-stream Flow 4 evidence
requires the agent to invoke `bicameral.preflight` and walk Step 3.5 to
trigger capture-corrections. Path-X-(b) validates the *product outcome*
(decisions written with the canonical source_type) rather than the
*mechanism* (which tool the agent chose). This means a SessionEnd
subprocess effect that lands in the ledger after the parent stream-json
closes still upgrades the verdict, even when the in-stream signal is
absent.

Closes research-brief recommendation P0 #2.

Note: this commit replaces the original 1f54f1a SHA on the branch via
rebase. Governance/META_LEDGER edits and the planning artifacts that
were bundled with the original have been dropped here and will land via
a separate governance PR. The auto-fire UserPromptSubmit hook (#146 fix)
that was also bundled is shipping via #155.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bprocess (#147)

Without this, Flow 4's path-X-(b) ledger validation has nothing to
observe in CI: the SessionEnd hook short-circuits on `[ -d .bicameral ]`
because /tmp/desktop-clone has no .bicameral/ subdirectory, so the
spawned `claude -p '/bicameral:capture-corrections --auto-ingest'`
subprocess never runs.

Two changes to the harness, both reusing setup_wizard helpers (no drift
between the harness's path and an end-user install):

1. `_bootstrap_bicameral_dir()` — wipes + recreates .bicameral/ inside
   DESKTOP_REPO_PATH at run start, calling
   `setup_wizard._write_collaboration_config(mode='solo', ...)` to write
   a minimal config.yaml. Wired into main() right after the existing
   ledger + repo resets.

2. `_materialize_settings_with_hook()` now builds the SessionEnd hook
   command via `setup_wizard._build_session_end_command(mcp_config_path
   =MCP_CONFIG_PATH)` instead of the bare canonical constant. The
   parameterized form appends `--mcp-config <materialized.json>
   --strict-mcp-config` after the prompt, so the spawned subprocess
   writes its `source=agent_session` decisions into the harness's test
   ledger (test-results/e2e/ledger.db) — the same ledger
   `_validate_flow4_via_ledger` queries — instead of the user's default
   ~/.bicameral/ledger.db.

Production end-user installs are unchanged: `_install_claude_hooks`
still writes the no-args canonical command (verified by existing
test_setup_wizard_renders_canonical_session_end_hook).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two corrections to Flow 4's advisory text:

1. Drop the "#154" reference. #154 is Flow 2a-specific — it covers the
   contradiction-with-prior-decision case where the agent must call
   resolve_collision after ingesting a refinement. Flow 4 is the
   emerging-constraint case (correction markers "wait", "shouldn't") —
   capture-corrections handles it without any collision-detection logic.
   Two distinct gaps; mixing them is misleading.

2. Add #156 reference. The path-X-(b) substrate fixes in this PR are
   correct (re-entrancy guard, --auto-ingest flag drift, harness
   .bicameral/ bootstrap, --mcp-config passthrough), but they don't
   make path-X-(b) actually fire end-to-end. Two stacked problems above
   the substrate:
   - Canonical SessionEnd hook command can't pass parent transcript_path
     to the spawned subprocess (transcript-passing bug)
   - Even if fixed, --auto-ingest produces unresolved/contradictory
     state in the ledger by skipping collision detection and confirmation

   Both tracked as P1 in #156 (design pivot to next-session surfacing
   via .bicameral/pending-transcripts/ queue).

Tests/CI behavior: Flow 4's advisory FAIL still doesn't block CI per
the existing advisory gate. The advisory text now accurately reflects
why Flow 4 can't pass with this PR's fixes alone, and what would
unblock it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 2, 2026 10:19 — with GitHub Actions Failure
@jinhongkuan jinhongkuan merged commit cd9b7d2 into dev May 2, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants