test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope#171
Merged
Merged
Conversation
…ilename
Flow 1 asserter previously required exact paths "cherry-pick.ts" AND
"reorder.ts" to be among the bound files. The "Improved commit history"
seed decision bundles four ops (drag-to-reorder, drag-to-squash, amend
last commit, branch from previous commit) — any file backing those is
a legitimate anchor. CI flake observed: agent picks UI-layer
commit-list.tsx for the bundled decision; asserter fails despite the
functional outcome (every feature has a code anchor for drift
detection) being satisfied.
Replace the exact-filename gate with a feature-area gate. Each seeded
decision must have at least one bound path matching one of an
acceptable substring set:
cherry-pick area:
cherry-pick.ts, cherry-pick.tsx
commit-history area:
/git/reorder.ts, /git/squash.ts, /git/commit.ts,
/history/commit-list.tsx, /history/commit-list-item.tsx,
/multi-commit-operation/{reorder,squash}.tsx,
/dispatcher/dispatcher.ts,
/models/{multi-commit-operation,retry-actions}.ts,
/stores/app-store.ts
Adds 8 unit tests in tests/test_e2e_asserters.py covering both the
canonical reorder.ts case and the previously-flaky UI-layer choices,
plus negative cases (unbound feature area, missing ratify).
The asserter still rejects bindings that have no relationship to either
feature — it just stops dictating which specific file IS the obvious
anchor. Functional intent ("each feature is grounded in code so drift
detection can fire") is preserved.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…lope The PostToolUse/Bash hook installed by setup_wizard prints "bicameral: new commit detected — run /bicameral:sync ..." after every git write-op. The bicameral-sync skill watches for that exact prefix as one of its trigger signals. Per Claude Code 2.x hook docs (https://code.claude.com/docs/en/hooks), plain stdout from PostToolUse hooks is silently dropped to the debug log — only UserPromptSubmit / UserPromptExpansion / SessionStart treat raw stdout as agent-visible context. Symptom in the e2e harness: the agent commits in Flow 3 but never follows through to call link_commit or /bicameral:sync because the reminder never reaches the model. Flow 3's ledger assertion then fails: "no compliance_check rows written (0→0) and no verdicts written. Either the bound decisions never had their sync triggered (no bicameral call after HEAD moves) ..." Fix: move the inline `python3 -c` one-liner to a proper script file that emits the structured envelope: {"hookSpecificOutput": {"hookEventName": "PostToolUse", "additionalContext": "<reminder text>"}} The reminder text preserves the canonical "bicameral: new commit detected" prefix verbatim so the bicameral-sync skill's trigger keeps matching. Files: - scripts/hooks/post_commit_sync_reminder.py — new hook script - tests/test_post_commit_sync_hook.py — 11 cases (commit, merge, pull, rebase --continue, read-only-git, non-Bash tool, non-git Bash, malformed stdin, missing/non-dict tool_input, idempotent) - pyproject.toml — bicameral-mcp-post-commit-sync-reminder console script - setup_wizard.py — _BICAMERAL_POST_COMMIT_COMMAND now refs the console script; existing _install_claude_hooks merge logic unchanged - .claude/settings.json — dogfood entry invokes the source script via python3 (mirrors existing UserPromptSubmit dogfood line) The e2e harness's _BICAMERAL_POST_COMMIT_COMMAND import is unchanged because the constant's value is what changed, not its name. Companion to #168 which applies the same envelope fix to the new PostToolUse hook on bicameral_preflight (#154 / Flow 2a).
3 tasks
jinhongkuan
added a commit
that referenced
this pull request
May 3, 2026
…ture-area test(e2e): Flow 1 asserter feature-area relax + Bash post-commit hook envelope
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
Bumps pyproject + RECOMMENDED_VERSION to 0.13.7 and resolves the stale git conflict markers that were committed into CHANGELOG.md by the previous `Merge branch 'main' into triage-from-dev` (c7d1274). v0.13.6 was bumped in pyproject on 2026-04-30 but never tagged or published to PyPI (latest published is v0.13.5; latest GitHub release is v0.13.5). v0.13.7 is the first release that ships everything merged into main since v0.13.5, including: - Preflight graph expansion + region anchored preflight (BicameralAI#173, BicameralAI#174) - Contradiction-capture flow via AskUserQuestion (BicameralAI#154, BicameralAI#175) - Preflight skill auto-fire fix on natural refactor prompts (BicameralAI#146) - SessionEnd hook re-entrancy + --auto-ingest (BicameralAI#147) - Post-preflight capture reminder hook (BicameralAI#168) - Flow1 asserter relax + flow2/2a split (BicameralAI#171) - v0 user flow e2e + demo recording carried over from dev (BicameralAI#165) - Lint-and-typecheck CI wired up; ruff format + fixes across 115 files See CHANGELOG.md for full details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundles two independent fixes that together stabilize the e2e job on `dev`:
Flow 1 asserter relaxed — exact-filename gate (`reorder.ts` AND `cherry-pick.ts` both bound) replaced with a feature-area gate (any file in the relevant area counts as a legitimate anchor). Fixes the LLM-nondeterminism flake where the agent picks UI-layer `commit-list.tsx` instead of git-layer `reorder.ts` for the bundled drag-to-reorder/squash/amend/branch-from decision.
Bash post-commit PostToolUse hook now uses hookSpecificOutput envelope — the inline `python3 -c` one-liner in `setup_wizard.py` printed "bicameral: new commit detected …" to plain stdout, which Claude Code 2.x silently drops to the debug log (per https://code.claude.com/docs/en/hooks). Symptom: the agent commits in Flow 3 but never follows through with `link_commit` because the reminder never reached the model. Fix: move the inline command to a proper script file emitting `{hookSpecificOutput: {hookEventName: "PostToolUse", additionalContext: "..."}}`.
Originally opened as PRs #171 + #169; combined here per request.
Why bundle
Both changes are needed for Flow 3's ledger assertion to reliably PASS:
Either one in isolation could still red-light the e2e job; together they remove both flake sources from the post-Flow-2a gauntlet.
What's in this PR
Flow 1 asserter (`tests/e2e/run_e2e_flows.py`)
Each seeded decision must have ≥1 bound path matching one of an acceptable substring set:
Asserter unit tests (`tests/test_e2e_asserters.py`)
8 cases — canonical `reorder.ts`, UI-layer `commit-list.tsx` (previously flaky), dispatcher / squash anchors, `cherry-pick.tsx`, negative cases (unrelated file → fail with clear `commit-history area` / `cherry-pick area` detail), missing ratify.
Post-commit hook (`scripts/hooks/post_commit_sync_reminder.py`)
Reads `{tool_name, tool_input}` from stdin; if `tool_name == Bash` and the command contains `git commit` / `git merge ` / `git pull` / `git rebase --continue`, emits the envelope. Reminder text preserves the canonical "bicameral: new commit detected" prefix verbatim so the `bicameral-sync` skill's existing trigger keeps matching.
Hook unit tests (`tests/test_post_commit_sync_hook.py`)
11 cases — each git write-op verb, read-only git, non-Bash tool, non-git Bash, malformed stdin, missing/non-dict `tool_input`, idempotent.
Wiring
Validation
Test plan
Companion PR
#168 — adds the resolve_collision PostToolUse hook for Flow 2a (#154), bumps the 300s flow timeout, and uses the same envelope shape this PR establishes for the Bash hook.