fix(skill): #402 — preflight auto-fires on /qor-plan + sibling slash-commands#524
fix(skill): #402 — preflight auto-fires on /qor-plan + sibling slash-commands#524Knapp-Kevin wants to merge 2 commits into
Conversation
…commands Root cause: the Tier-1 hook classifier at scripts/hooks/preflight_intent.py looked only at free-text English verbs. IMPLEMENTATION_VERBS did not contain "plan", and the regex had no slash-command awareness. So a prompt like `/qor-plan https://github.com/.../issues/1` returned False from should_fire_preflight() and the UserPromptSubmit hook injected nothing — preflight never got its turn, the agent went straight to planning, and the ledger's prior decisions stayed invisible. Sibling commands worked by accident (`/qor-implement` matched the `implement` verb, etc.); `/qor-plan`, `/qor-debug`, `/qor-auto-dev-1` did not. Confirms Hypothesis 1 from the issue body. Hypothesis 3 (skill-chain ordering) is wrong: UserPromptSubmit fires on the raw user text BEFORE slash-command resolution, so the hook was getting called — the regex inside it was the gate that failed. Changes - preflight_intent.py: new IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev); new classify_prompt() returning (fire, prompt_surface_form, slash_command) NamedTuple; should_fire_preflight() preserved as a compat wrapper. - preflight_reminder.py: hook uses classify_prompt() and appends a `preflight.trigger_evaluated` row to ~/.bicameral/preflight_trigger_evaluated.jsonl carrying prompt_surface_form (slash_command_with_url / _with_text / _bare / free_text / empty). BICAMERAL_TELEMETRY=0 suppresses the log without affecting the gate decision. - SKILL.md: description frontmatter + body updated so the Tier-2 caller-LLM gate sees the same slash-command surface as the deterministic Tier-1 hook (per existing "must be edited together" contract). - tests/test_preflight_intent.py: 13 new tests including the literal #402 repro, all IMPL_INTENT slash-commands, read-only commands, the unknown-slash-command fallthrough, and backward-compat parity with should_fire_preflight(). - tests/test_preflight_hook.py: 3 sociable subprocess tests — real hook + real classifier + real JSONL write, no mocks (per CLAUDE.md sociable-testing rule). - tests/e2e/prompts/flow-6-slash-command-preflight.md: prompt fixture for the failing case, future-compatible with run_e2e_flows.py. Deferred - PostHog uplink for the trigger_evaluated stream. The hook is a fast subprocess; synchronous network I/O blocks the user prompt and daemon-thread fire-and-forget is unreliable on fast exit. The local JSONL captures the exact field set the dashboard would query — a follow-up should drain it from the long-lived MCP server via the existing relay path in telemetry.py. - Wiring flow-6 into run_e2e_flows.py as a FlowSpec entry (needs DESKTOP_REPO_PATH + claude CLI). The sociable hook test already exercises the deterministic gate; agentic-layer e2e is a follow-up. Closes #402 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e2e CI failure — diagnosed as substrate, not regressionThe Evidence
Why this can't be from this PRThe entire change set is a Tier-1 deterministic hook ( Likely root causes (CI-side)
Status
|
|
Closing during PR hygiene because this branch is stale/conflicting even though the underlying work remains important. Preflight auto-fire for /qor-plan and sibling slash commands is high-value if those commands are part of the active workflow: it prevents plans from being written blind to the ledger. Please re-cut from current dev against issue #402, keep deterministic hook/skill coverage, and defer optional analytics/e2e wiring unless needed for the core behavior. |
…-intent slash-commands Revives the proven fix from PR #524 (closed only because the v0 e2e CI failed on stale-auth substrate — not the diff; that e2e workflow is now shelved to dispatch-only via #556). Rebased onto current main (all 6 files applied cleanly — nobody touched these since #524 diverged) and the bug re-reproduced on main before fixing: `should_fire_preflight("/qor-plan https://…/issues/1")` returned False. Root cause (Hypothesis 1, confirmed): the UserPromptSubmit classifier in scripts/hooks/preflight_intent.py matched only free-text IMPLEMENTATION_VERBS and had no slash-command awareness. `plan` is not a verb, so `/qor-plan <url>` never fired preflight — planning ran blind to the ledger. Sibling commands worked by coincidence (`/qor-implement`→implement, `/qor-refactor`→refactor). Hypothesis 3 (skill-chain ordering) refuted: UserPromptSubmit fires on raw text before slash-command resolution; the hook ran — its regex was the gate that returned False. Fix: - New IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev) — these short-circuit to fire regardless of argument (URL / text / empty). - New layered classify_prompt() → ClassifyResult(fire, prompt_surface_form, slash_command); preflight_reminder.py records prompt_surface_form to telemetry so a future trigger-surface regression is observable. should_fire_preflight() preserved (delegates) for backward compat. - SKILL.md description (Tier-2 caller-LLM gate) updated in lockstep with the Tier-1 hook set, and correctly keeps read-only commands (/qor-status, /qor-help, /qor-audit, /qor-validate) on the SKIP list. Regression guard: test_preflight_intent.py + test_preflight_hook.py now run in test-mcp-regression.yml (they were dormant — in no CI workflow). flow-6 e2e prompt added for reference (the e2e harness is dispatch-only per #556; the unit tests are the durable gate). Closes #402. Supersedes the closed #524. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Closes #402.
bicameral-preflight's Tier-1 (UserPromptSubmit hook) classifier was silently skipping slash-command prompts whose implementation-intent verb was encoded in the command name rather than the prompt body. The literal failing invocation from the issue —/qor-plan https://github.com/BicameralAI/bicameral-daemon/issues/1— now fires the gate.Root cause (Hypothesis 1 confirmed)
scripts/hooks/preflight_intent.py:IMPLEMENTATION_VERBSdid not contain"plan", and the regex had no slash-command awareness.should_fire_preflight()returnedFalsefor any/qor-plan ...prompt; the hook injected no system-reminder; the agent ran planning blind to the ledger.Sibling commands worked by coincidence (
/qor-implement→implementverb match,/qor-refactor→refactorverb match)./qor-plan,/qor-debug,/qor-auto-dev-1did not.Hypothesis 3 (skill-chain ordering) is wrong:
UserPromptSubmitfires on raw user text before slash-command resolution. The hook was being called — the regex inside it was the gate that returnedFalse.What changed
IMPL_INTENT_SLASH_COMMANDSfrozenset inpreflight_intent.py:qor-plan,qor-implement,qor-refactor,qor-debug,qor-remediate,qor-organize,qor-auto-dev-1,qor-auto-dev. Slash-commands in this set short-circuit tofire=Trueregardless of argument shape (URL, text, or empty).classify_prompt()API returning a(fire, prompt_surface_form, slash_command)NamedTuple.should_fire_preflight()preserved as a backward-compat wrapper — existing tests + callers untouched.prompt_surface_formfield added per fix(skill): bicameral-preflight does not auto-fire on /qor-plan with a GitHub issue URL #402 acceptance:slash_command_with_url/slash_command_with_text/slash_command_bare/free_text/empty. Hook appends apreflight.trigger_evaluatedrow to~/.bicameral/preflight_trigger_evaluated.jsonlcarrying this field.BICAMERAL_TELEMETRY=0suppresses the log without affecting the gate.skills/bicameral-preflight/SKILL.mddescription frontmatter + body updated so the Tier-2 caller-LLM gate sees the same slash-command surface as the deterministic Tier-1 hook (per the existing "must be edited together" contract).Regression coverage
Per #402 acceptance:
tests/test_preflight_intent.py— 13 new tests covering the literal/qor-plan <issue-url>repro, every command inIMPL_INTENT_SLASH_COMMANDS, read-only commands that must NOT fire (/qor-status,/qor-help,/qor-audit,/qor-validate), the unknown-slash-command fallthrough to verb-check, lowercasing, and parity betweenclassify_prompt()andshould_fire_preflight().tests/test_preflight_hook.py— 3 sociable subprocess tests (real hook + real classifier + real JSONL write, no mocks, per CLAUDE.md sociable-testing rule). Thetest_hook_fires_on_qor_plan_with_issue_urltest runs the exact failing prompt from the issue and would have failed ondevHEAD before this PR.tests/e2e/prompts/flow-6-slash-command-preflight.md— prompt fixture for the failing case, ready to wire intorun_e2e_flows.pyonce the agentic-layer slot is available.Local gates run before push:
pytest tests/test_preflight_intent.py tests/test_preflight_hook.py→ 28 passed (15 new)ruff check+ruff format --check(touched files) → cleanAcceptance check
/qor-plan <plain English>and/qor-implement <prompt>prompt_surface_form— partial: local JSONL emission wired; PostHog uplink deferred (see below)Backward compatibility
skills/bicameral-preflight/SKILL.mddescription + body updated in the same commit (per CLAUDE.md "Tool Changes Require Skill Changes").Deferred (NOT in this PR)
preflight.trigger_evaluated. The hook is a fast subprocess; synchronous network I/O blocks the user prompt up to 3 s and a daemon-thread fire-and-forget is unreliable on fast exit. The local JSONL captures the exact field set the dashboard would query — a follow-up should drain it from the long-lived MCP server via the existing relay path intelemetry.py. Recommend filing as a new issue taggedobservability, linked to fix(skill): bicameral-preflight does not auto-fire on /qor-plan with a GitHub issue URL #402.flow-6-slash-command-preflight.mdintorun_e2e_flows.pyas an agentic-layerFlowSpec(needsDESKTOP_REPO_PATH+claudeCLI). The sociable hook test already covers the deterministic gate.Out of scope (per issue body)
bicameral.preflightreturns — purely about the auto-fire trigger surface./qor-planitself — the slash-command author should not have to wire preflight in by hand.