fix(skill): #402 — preflight auto-fires on /qor-plan + sibling slash-commands by Knapp-Kevin · Pull Request #524 · BicameralAI/bicameral-mcp

Knapp-Kevin · 2026-05-26T21:22:47Z

Summary

Closes #402.

bicameral-preflight's Tier-1 (UserPromptSubmit hook) classifier was silently skipping slash-command prompts whose implementation-intent verb was encoded in the command name rather than the prompt body. The literal failing invocation from the issue — /qor-plan https://github.com/BicameralAI/bicameral-daemon/issues/1 — now fires the gate.

Root cause (Hypothesis 1 confirmed)

scripts/hooks/preflight_intent.py:IMPLEMENTATION_VERBS did not contain "plan", and the regex had no slash-command awareness. should_fire_preflight() returned False for any /qor-plan ... prompt; the hook injected no system-reminder; the agent ran planning blind to the ledger.

Sibling commands worked by coincidence (/qor-implement → implement verb match, /qor-refactor → refactor verb match). /qor-plan, /qor-debug, /qor-auto-dev-1 did not.

Hypothesis 3 (skill-chain ordering) is wrong: UserPromptSubmit fires on raw user text before slash-command resolution. The hook was being called — the regex inside it was the gate that returned False.

What changed

New IMPL_INTENT_SLASH_COMMANDS frozenset in preflight_intent.py: qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev. Slash-commands in this set short-circuit to fire=True regardless of argument shape (URL, text, or empty).
New classify_prompt() API returning a (fire, prompt_surface_form, slash_command) NamedTuple. should_fire_preflight() preserved as a backward-compat wrapper — existing tests + callers untouched.
prompt_surface_form field added per fix(skill): bicameral-preflight does not auto-fire on /qor-plan with a GitHub issue URL #402 acceptance: slash_command_with_url / slash_command_with_text / slash_command_bare / free_text / empty. Hook appends a preflight.trigger_evaluated row to ~/.bicameral/preflight_trigger_evaluated.jsonl carrying this field. BICAMERAL_TELEMETRY=0 suppresses the log without affecting the gate.
skills/bicameral-preflight/SKILL.md description frontmatter + body updated so the Tier-2 caller-LLM gate sees the same slash-command surface as the deterministic Tier-1 hook (per the existing "must be edited together" contract).

Regression coverage

Per #402 acceptance:

tests/test_preflight_intent.py — 13 new tests covering the literal /qor-plan <issue-url> repro, every command in IMPL_INTENT_SLASH_COMMANDS, read-only commands that must NOT fire (/qor-status, /qor-help, /qor-audit, /qor-validate), the unknown-slash-command fallthrough to verb-check, lowercasing, and parity between classify_prompt() and should_fire_preflight().
tests/test_preflight_hook.py — 3 sociable subprocess tests (real hook + real classifier + real JSONL write, no mocks, per CLAUDE.md sociable-testing rule). The test_hook_fires_on_qor_plan_with_issue_url test runs the exact failing prompt from the issue and would have failed on dev HEAD before this PR.
tests/e2e/prompts/flow-6-slash-command-preflight.md — prompt fixture for the failing case, ready to wire into run_e2e_flows.py once the agentic-layer slot is available.

Local gates run before push:

pytest tests/test_preflight_intent.py tests/test_preflight_hook.py → 28 passed (15 new)
ruff check + ruff format --check (touched files) → clean

Acceptance check

Reproduction harness — fails on dev HEAD, passes after (the sociable hook test uses the literal failing prompt as a real subprocess)
Root cause documented in fix PR (this section)
Regression coverage for /qor-plan <plain English> and /qor-implement <prompt>
Telemetry carries prompt_surface_form — partial: local JSONL emission wired; PostHog uplink deferred (see below)

Backward compatibility

Data migration: N/A — no ledger schema or business-logic change.
Wire format: N/A — no MCP tool changes; classifier is hook-side.
Skill contract: skills/bicameral-preflight/SKILL.md description + body updated in the same commit (per CLAUDE.md "Tool Changes Require Skill Changes").

Deferred (NOT in this PR)

PostHog uplink for preflight.trigger_evaluated. The hook is a fast subprocess; synchronous network I/O blocks the user prompt up to 3 s and a daemon-thread fire-and-forget is unreliable on fast exit. The local JSONL captures the exact field set the dashboard would query — a follow-up should drain it from the long-lived MCP server via the existing relay path in telemetry.py. Recommend filing as a new issue tagged observability, linked to fix(skill): bicameral-preflight does not auto-fire on /qor-plan with a GitHub issue URL #402.
Wiring flow-6-slash-command-preflight.md into run_e2e_flows.py as an agentic-layer FlowSpec (needs DESKTOP_REPO_PATH + claude CLI). The sociable hook test already covers the deterministic gate.

Out of scope (per issue body)

fix(preflight): fires on unrelated/un-ingested code — verb regex too broad, no ledger-awareness #343 (preflight over-fires) — opposite failure mode, orthogonal regex work.
Changing what bicameral.preflight returns — purely about the auto-fire trigger surface.
Changing /qor-plan itself — the slash-command author should not have to wire preflight in by hand.

…commands Root cause: the Tier-1 hook classifier at scripts/hooks/preflight_intent.py looked only at free-text English verbs. IMPLEMENTATION_VERBS did not contain "plan", and the regex had no slash-command awareness. So a prompt like `/qor-plan https://github.com/.../issues/1` returned False from should_fire_preflight() and the UserPromptSubmit hook injected nothing — preflight never got its turn, the agent went straight to planning, and the ledger's prior decisions stayed invisible. Sibling commands worked by accident (`/qor-implement` matched the `implement` verb, etc.); `/qor-plan`, `/qor-debug`, `/qor-auto-dev-1` did not. Confirms Hypothesis 1 from the issue body. Hypothesis 3 (skill-chain ordering) is wrong: UserPromptSubmit fires on the raw user text BEFORE slash-command resolution, so the hook was getting called — the regex inside it was the gate that failed. Changes - preflight_intent.py: new IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev); new classify_prompt() returning (fire, prompt_surface_form, slash_command) NamedTuple; should_fire_preflight() preserved as a compat wrapper. - preflight_reminder.py: hook uses classify_prompt() and appends a `preflight.trigger_evaluated` row to ~/.bicameral/preflight_trigger_evaluated.jsonl carrying prompt_surface_form (slash_command_with_url / _with_text / _bare / free_text / empty). BICAMERAL_TELEMETRY=0 suppresses the log without affecting the gate decision. - SKILL.md: description frontmatter + body updated so the Tier-2 caller-LLM gate sees the same slash-command surface as the deterministic Tier-1 hook (per existing "must be edited together" contract). - tests/test_preflight_intent.py: 13 new tests including the literal #402 repro, all IMPL_INTENT slash-commands, read-only commands, the unknown-slash-command fallthrough, and backward-compat parity with should_fire_preflight(). - tests/test_preflight_hook.py: 3 sociable subprocess tests — real hook + real classifier + real JSONL write, no mocks (per CLAUDE.md sociable-testing rule). - tests/e2e/prompts/flow-6-slash-command-preflight.md: prompt fixture for the failing case, future-compatible with run_e2e_flows.py. Deferred - PostHog uplink for the trigger_evaluated stream. The hook is a fast subprocess; synchronous network I/O blocks the user prompt and daemon-thread fire-and-forget is unreliable on fast exit. The local JSONL captures the exact field set the dashboard would query — a follow-up should drain it from the long-lived MCP server via the existing relay path in telemetry.py. - Wiring flow-6 into run_e2e_flows.py as a FlowSpec entry (needs DESKTOP_REPO_PATH + claude CLI). The sociable hook test already exercises the deterministic gate; agentic-layer e2e is a follow-up. Closes #402 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-26T21:22:57Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ad01731a-8d21-4385-b16f-28786da23741

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/402-preflight-slash-command-classifier

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Knapp-Kevin · 2026-05-27T01:14:05Z

e2e CI failure — diagnosed as substrate, not regression

The v0 user flow e2e workflow's e2e assertions (auto) job failed on this PR, but the failure mode is infra-level — not anything in this PR's change set.

Evidence

Signal	Observation
Every flow's claude subprocess	`claude CLI exit=1`, `stderr (last 500 chars): (empty)`
Flow 5 (pure MCP-layer `bicameral.history` — never goes near preflight)	Fails identically: `exit=1`, 0 tool calls
Per-flow runtime	~4 s each vs the 900 s budget — fast-fail at the subprocess boundary, before any MCP traffic
Last passing auto-assertion on this workflow	May 25 02:47Z (`feat/357-sociable-gate-surrealql` @ `c1041f3`, run 26380551118)
Today's other failed run	Dependabot sigstore PR (run 26447359314) failed the SAME workflow today for a different reason: `CLAUDE_CODE_OAUTH_TOKEN: EMPTY`
Claude Code CLI version in the runner	`2.1.150 (Claude Code)` — fresh install per run

Why this can't be from this PR

The entire change set is a Tier-1 deterministic hook (scripts/hooks/preflight_intent.py) that runs inside a Claude Code session, after claude -p successfully starts. Here claude -p is dying before opening the session — so my code is unreachable in this failure mode. Flow 5 doubles down: it never touches the preflight hook (it just calls bicameral.history) and fails the same way.

Likely root causes (CI-side)

@anthropic-ai/claude-code 2.1.150 regression in non-interactive (-p) mode that fails silently
CLAUDE_CODE_OAUTH_TOKEN rotated/expired between May 25 and now (token present in env, length 108, but claude may not surface auth errors to stderr)
One of the 8 commits that landed on dev between c1041f3 and 24f773a — most-suspect is aafdd42 (dogfood hooks source-relative), but that wouldn't cause silent claude exits

Status

All other gates green (ruff+mypy, MCP regression ubuntu+windows, Perf gate, Python floor, Schema persistence, TruffleHog, CodeRabbit, Preflight Failure-Mode Eval).
This PR's deterministic hook test suite (sociable subprocess tests, real classifier, real JSONL write) is the local regression coverage for fix(skill): bicameral-preflight does not auto-fire on /qor-plan with a GitHub issue URL #402 and passes 28/28.
Leaving as draft while substrate is restored. No code changes proposed here.

jinhongkuan · 2026-06-07T04:34:48Z

Closing during PR hygiene because this branch is stale/conflicting even though the underlying work remains important. Preflight auto-fire for /qor-plan and sibling slash commands is high-value if those commands are part of the active workflow: it prevents plans from being written blind to the ledger. Please re-cut from current dev against issue #402, keep deterministic hook/skill coverage, and defer optional analytics/e2e wiring unless needed for the core behavior.

…-intent slash-commands Revives the proven fix from PR #524 (closed only because the v0 e2e CI failed on stale-auth substrate — not the diff; that e2e workflow is now shelved to dispatch-only via #556). Rebased onto current main (all 6 files applied cleanly — nobody touched these since #524 diverged) and the bug re-reproduced on main before fixing: `should_fire_preflight("/qor-plan https://…/issues/1")` returned False. Root cause (Hypothesis 1, confirmed): the UserPromptSubmit classifier in scripts/hooks/preflight_intent.py matched only free-text IMPLEMENTATION_VERBS and had no slash-command awareness. `plan` is not a verb, so `/qor-plan <url>` never fired preflight — planning ran blind to the ledger. Sibling commands worked by coincidence (`/qor-implement`→implement, `/qor-refactor`→refactor). Hypothesis 3 (skill-chain ordering) refuted: UserPromptSubmit fires on raw text before slash-command resolution; the hook ran — its regex was the gate that returned False. Fix: - New IMPL_INTENT_SLASH_COMMANDS frozenset (qor-plan, qor-implement, qor-refactor, qor-debug, qor-remediate, qor-organize, qor-auto-dev-1, qor-auto-dev) — these short-circuit to fire regardless of argument (URL / text / empty). - New layered classify_prompt() → ClassifyResult(fire, prompt_surface_form, slash_command); preflight_reminder.py records prompt_surface_form to telemetry so a future trigger-surface regression is observable. should_fire_preflight() preserved (delegates) for backward compat. - SKILL.md description (Tier-2 caller-LLM gate) updated in lockstep with the Tier-1 hook set, and correctly keeps read-only commands (/qor-status, /qor-help, /qor-audit, /qor-validate) on the SKIP list. Regression guard: test_preflight_intent.py + test_preflight_hook.py now run in test-mcp-regression.yml (they were dormant — in no CI workflow). flow-6 e2e prompt added for reference (the e2e harness is dispatch-only per #556; the unit tests are the durable gate). Closes #402. Supersedes the closed #524. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Knapp-Kevin temporarily deployed to ci-test May 26, 2026 21:22 — with GitHub Actions Inactive

Knapp-Kevin requested a deployment to recording-approval May 26, 2026 21:22 — with GitHub Actions Waiting

Knapp-Kevin had a problem deploying to production May 26, 2026 21:22 — with GitHub Actions Failure

Knapp-Kevin marked this pull request as ready for review May 27, 2026 06:07

This was referenced May 27, 2026

fix(ci): #528 — e2e assertions auth via ANTHROPIC_API_KEY #529

Merged

v0 e2e harness: oauth_org_not_allowed (403) on every flow — production env CLAUDE_CODE_OAUTH_TOKEN disabled #528

Closed

Merge branch 'dev' into fix/402-preflight-slash-command-classifier

17b09e8

Knapp-Kevin temporarily deployed to ci-test June 1, 2026 20:01 — with GitHub Actions Inactive

Knapp-Kevin requested a deployment to recording-approval June 1, 2026 20:01 — with GitHub Actions Waiting

Knapp-Kevin temporarily deployed to ci-test June 1, 2026 20:01 — with GitHub Actions Inactive

Knapp-Kevin requested a review from jinhongkuan June 1, 2026 20:22

Knapp-Kevin mentioned this pull request Jun 2, 2026

feat(ingest): #404 Phase 1 — thread parent_decision_id through natural-format ingest #534

Closed

jinhongkuan closed this Jun 7, 2026

Knapp-Kevin mentioned this pull request Jun 8, 2026

fix(skill): #402 — bicameral-preflight auto-fires on /qor-plan + impl-intent slash-commands #569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skill): #402 — preflight auto-fires on /qor-plan + sibling slash-commands#524

fix(skill): #402 — preflight auto-fires on /qor-plan + sibling slash-commands#524
Knapp-Kevin wants to merge 2 commits into
devfrom
fix/402-preflight-slash-command-classifier

Knapp-Kevin commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

Knapp-Kevin commented May 27, 2026

Uh oh!

jinhongkuan commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Knapp-Kevin commented May 26, 2026

Summary

Root cause (Hypothesis 1 confirmed)

What changed

Regression coverage

Acceptance check

Backward compatibility

Deferred (NOT in this PR)

Out of scope (per issue body)

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Knapp-Kevin commented May 27, 2026

e2e CI failure — diagnosed as substrate, not regression

Evidence

Why this can't be from this PR

Likely root causes (CI-side)

Status

Uh oh!

jinhongkuan commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 26, 2026 •

edited

Loading