Skip to content

test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope#171

Merged
jinhongkuan merged 2 commits into
devfrom
fix/flow1-asserter-relax-feature-area
May 3, 2026
Merged

test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope#171
jinhongkuan merged 2 commits into
devfrom
fix/flow1-asserter-relax-feature-area

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented May 3, 2026

Copy link
Copy Markdown
Contributor

Bundles two independent fixes that together stabilize the e2e job on `dev`:

  1. Flow 1 asserter relaxed — exact-filename gate (`reorder.ts` AND `cherry-pick.ts` both bound) replaced with a feature-area gate (any file in the relevant area counts as a legitimate anchor). Fixes the LLM-nondeterminism flake where the agent picks UI-layer `commit-list.tsx` instead of git-layer `reorder.ts` for the bundled drag-to-reorder/squash/amend/branch-from decision.

  2. Bash post-commit PostToolUse hook now uses hookSpecificOutput envelope — the inline `python3 -c` one-liner in `setup_wizard.py` printed "bicameral: new commit detected …" to plain stdout, which Claude Code 2.x silently drops to the debug log (per https://code.claude.com/docs/en/hooks). Symptom: the agent commits in Flow 3 but never follows through with `link_commit` because the reminder never reached the model. Fix: move the inline command to a proper script file emitting `{hookSpecificOutput: {hookEventName: "PostToolUse", additionalContext: "..."}}`.

Originally opened as PRs #171 + #169; combined here per request.

Why bundle

Both changes are needed for Flow 3's ledger assertion to reliably PASS:

  • Asserter relax stops Flow 1 from cascading into Flow 3 when the agent picks a non-canonical anchor.
  • Bash hook envelope makes the agent reliably call `link_commit` after Flow 3's git commit so compliance_check rows actually get written.

Either one in isolation could still red-light the e2e job; together they remove both flake sources from the post-Flow-2a gauntlet.

What's in this PR

Flow 1 asserter (`tests/e2e/run_e2e_flows.py`)

Each seeded decision must have ≥1 bound path matching one of an acceptable substring set:

  • cherry-pick area: `cherry-pick.ts`, `cherry-pick.tsx`
  • commit-history area: `/git/reorder.ts`, `/git/squash.ts`, `/git/commit.ts`, `/history/commit-list.tsx`, `/history/commit-list-item.tsx`, `/multi-commit-operation/{reorder,squash}.tsx`, `/dispatcher/dispatcher.ts`, `/models/{multi-commit-operation,retry-actions}.ts`, `/stores/app-store.ts`

Asserter unit tests (`tests/test_e2e_asserters.py`)

8 cases — canonical `reorder.ts`, UI-layer `commit-list.tsx` (previously flaky), dispatcher / squash anchors, `cherry-pick.tsx`, negative cases (unrelated file → fail with clear `commit-history area` / `cherry-pick area` detail), missing ratify.

Post-commit hook (`scripts/hooks/post_commit_sync_reminder.py`)

Reads `{tool_name, tool_input}` from stdin; if `tool_name == Bash` and the command contains `git commit` / `git merge ` / `git pull` / `git rebase --continue`, emits the envelope. Reminder text preserves the canonical "bicameral: new commit detected" prefix verbatim so the `bicameral-sync` skill's existing trigger keeps matching.

Hook unit tests (`tests/test_post_commit_sync_hook.py`)

11 cases — each git write-op verb, read-only git, non-Bash tool, non-git Bash, malformed stdin, missing/non-dict `tool_input`, idempotent.

Wiring

  • `pyproject.toml` — `bicameral-mcp-post-commit-sync-reminder` console script.
  • `setup_wizard.py` — `_BICAMERAL_POST_COMMIT_COMMAND` now references the console script. Existing `_install_claude_hooks` merge logic unchanged.
  • `.claude/settings.json` — dogfood entry invokes the source script via `python3`.

Validation

  • `ruff check .` — clean (217 files).
  • `ruff format --check .` — clean (217 files).
  • `pytest tests/test_e2e_asserters.py tests/test_post_commit_sync_hook.py tests/test_preflight_hook.py` — 24/24 PASS.
  • Setup-wizard byte-stability across repeated `_install_claude_hooks` calls.

Test plan

  • CI: `e2e assertions (auto)` — Flow 1 PASSes regardless of which feature-area file the agent picked; Flow 3 ledger assertion (compliance_check rows written) reliably PASSes.
  • CI: `ruff + mypy` passes.
  • CI: `MCP Regression Suite` (ubuntu + windows) passes.

Companion PR

#168 — adds the resolve_collision PostToolUse hook for Flow 2a (#154), bumps the 300s flow timeout, and uses the same envelope shape this PR establishes for the Bash hook.

…ilename

Flow 1 asserter previously required exact paths "cherry-pick.ts" AND
"reorder.ts" to be among the bound files. The "Improved commit history"
seed decision bundles four ops (drag-to-reorder, drag-to-squash, amend
last commit, branch from previous commit) — any file backing those is
a legitimate anchor. CI flake observed: agent picks UI-layer
commit-list.tsx for the bundled decision; asserter fails despite the
functional outcome (every feature has a code anchor for drift
detection) being satisfied.

Replace the exact-filename gate with a feature-area gate. Each seeded
decision must have at least one bound path matching one of an
acceptable substring set:

  cherry-pick area:
    cherry-pick.ts, cherry-pick.tsx

  commit-history area:
    /git/reorder.ts, /git/squash.ts, /git/commit.ts,
    /history/commit-list.tsx, /history/commit-list-item.tsx,
    /multi-commit-operation/{reorder,squash}.tsx,
    /dispatcher/dispatcher.ts,
    /models/{multi-commit-operation,retry-actions}.ts,
    /stores/app-store.ts

Adds 8 unit tests in tests/test_e2e_asserters.py covering both the
canonical reorder.ts case and the previously-flaky UI-layer choices,
plus negative cases (unbound feature area, missing ratify).

The asserter still rejects bindings that have no relationship to either
feature — it just stops dictating which specific file IS the obvious
anchor. Functional intent ("each feature is grounded in code so drift
detection can fire") is preserved.
@coderabbitai

coderabbitai Bot commented May 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 53d0c196-6322-4566-a47e-87b2dfc4e485

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flow1-asserter-relax-feature-area

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…lope

The PostToolUse/Bash hook installed by setup_wizard prints
"bicameral: new commit detected — run /bicameral:sync ..." after every
git write-op. The bicameral-sync skill watches for that exact prefix
as one of its trigger signals.

Per Claude Code 2.x hook docs (https://code.claude.com/docs/en/hooks),
plain stdout from PostToolUse hooks is silently dropped to the debug
log — only UserPromptSubmit / UserPromptExpansion / SessionStart treat
raw stdout as agent-visible context. Symptom in the e2e harness: the
agent commits in Flow 3 but never follows through to call link_commit
or /bicameral:sync because the reminder never reaches the model. Flow
3's ledger assertion then fails: "no compliance_check rows written
(0→0) and no verdicts written. Either the bound decisions never had
their sync triggered (no bicameral call after HEAD moves) ..."

Fix: move the inline `python3 -c` one-liner to a proper script file
that emits the structured envelope:

  {"hookSpecificOutput": {"hookEventName": "PostToolUse",
                           "additionalContext": "<reminder text>"}}

The reminder text preserves the canonical "bicameral: new commit
detected" prefix verbatim so the bicameral-sync skill's trigger keeps
matching.

Files:
- scripts/hooks/post_commit_sync_reminder.py — new hook script
- tests/test_post_commit_sync_hook.py — 11 cases (commit, merge, pull,
  rebase --continue, read-only-git, non-Bash tool, non-git Bash,
  malformed stdin, missing/non-dict tool_input, idempotent)
- pyproject.toml — bicameral-mcp-post-commit-sync-reminder console script
- setup_wizard.py — _BICAMERAL_POST_COMMIT_COMMAND now refs the console
  script; existing _install_claude_hooks merge logic unchanged
- .claude/settings.json — dogfood entry invokes the source script via
  python3 (mirrors existing UserPromptSubmit dogfood line)

The e2e harness's _BICAMERAL_POST_COMMIT_COMMAND import is unchanged
because the constant's value is what changed, not its name.

Companion to #168 which applies the same envelope fix to the new
PostToolUse hook on bicameral_preflight (#154 / Flow 2a).
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 3, 2026 23:05 — with GitHub Actions Failure
@jinhongkuan jinhongkuan changed the title test(e2e): relax Flow 1 binding asserter to feature-area, not exact filename test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope May 3, 2026
@jinhongkuan jinhongkuan merged commit a090aa5 into dev May 3, 2026
8 of 9 checks passed
@jinhongkuan jinhongkuan deleted the fix/flow1-asserter-relax-feature-area branch May 3, 2026 23:17
jinhongkuan added a commit that referenced this pull request May 3, 2026
…ture-area

test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope
@jinhongkuan jinhongkuan mentioned this pull request May 5, 2026
5 tasks
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
Bumps pyproject + RECOMMENDED_VERSION to 0.13.7 and resolves the stale
git conflict markers that were committed into CHANGELOG.md by the previous
`Merge branch 'main' into triage-from-dev` (c7d1274).

v0.13.6 was bumped in pyproject on 2026-04-30 but never tagged or
published to PyPI (latest published is v0.13.5; latest GitHub release is
v0.13.5). v0.13.7 is the first release that ships everything merged into
main since v0.13.5, including:

- Preflight graph expansion + region anchored preflight (BicameralAI#173, BicameralAI#174)
- Contradiction-capture flow via AskUserQuestion (BicameralAI#154, BicameralAI#175)
- Preflight skill auto-fire fix on natural refactor prompts (BicameralAI#146)
- SessionEnd hook re-entrancy + --auto-ingest (BicameralAI#147)
- Post-preflight capture reminder hook (BicameralAI#168)
- Flow1 asserter relax + flow2/2a split (BicameralAI#171)
- v0 user flow e2e + demo recording carried over from dev (BicameralAI#165)
- Lint-and-typecheck CI wired up; ruff format + fixes across 115 files

See CHANGELOG.md for full details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant