test(e2e): rewrite demo flow prompts in realistic per-role voice#164
Conversation
Replaces tool-aware prompts (referencing 'ledger', 'ratify', 'code home', specific line numbers) with how each role would actually type: - Flow 1 (PM, post-roadmap): drops file paths and line ranges; lets the ingest skill's caller-LLM derive bindings from feature names. Tests the binding heuristic as part of the e2e flow. - Flow 2 (PM, UX pivot): drops the explicit reorder.ts path; agent derives target file from the prior decision binding. - Flow 3 (dev, commit-sync): conversational dev voice, retains the deterministic comment text and commit message the harness asserts on. - Flow 4 (dev, mid-refactor): Slack-think-out-loud — natural in-flight realization that should fire capture-corrections. - Flow 5 (PM, Friday review): drops 'ledger', 'ratify', 'proposed', 'code-compliance status' jargon; agent maps intent to the right tools. Risk note: assert_flow_1 requires bind_targets include both cherry-pick.ts and reorder.ts. With the new prompt the ingest skill must derive these from feature names. If it fails, the right fix is in the skill or binding heuristic — don't add file paths back to the prompt. Flow 2 has a scaffolding fallback (line 1222) that names reorder.ts directly as a safety net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Replaces tool-aware demo prompts with how each role (PM / dev) would actually type. The original prompts leaked product jargon ("ledger", "ratify", "code home", "compliance status") and engineer-shaped scaffolding (specific line ranges, exact file paths in PM voice). The new versions sound like real Slack/chat input.
flow-1-ingest.mdflow-2-preflight.mdflow-3-commit-sync.mdflow-4-session-end.mdflow-5-history.mdWhy this matters
The demo's whole pitch is: bicameral works on prompts a real human types. If the demo prompts read like a test harness, the demo undermines its own claim. Each flow now matches how a PM or dev would ping Claude in the wild.
Risk
assert_flow_1requiresbind_targetsto include bothcherry-pick.tsandreorder.ts. The new Flow 1 prompt drops file paths entirely — the ingest skill's caller-LLM has to derive them from feature names. This is intentional: it's an e2e test of the binding heuristic that is supposed to be the product's value prop. If it fails, fix the skill or the heuristic; don't add file paths back to the prompt.assert_flow_2has a scaffolding fallback (run_e2e_flows.py:1222) that namesreorder.tsdirectly when auto-fire fails, so Flow 2 has a safety net.Test plan
tests/e2e/run_e2e_flows.pyagainst the pinneddesktop/desktopcommit. All five flows should still PASS.Out of scope
composite-demo.mdis unchanged. It uses tool-aware language deliberately ("call `bicameral.ingest`") because it's the recording script, not a flow prompt. Worth a follow-up if we want full consistency.run_e2e_flows.py). If Flow 1's binding test now exposes a real gap in the ingest skill, that's a separate PR.🤖 Generated with Claude Code