Skip to content

fix(infra): bundle skills/ in wheel, surface install failures, rename /bicameral:* → /bicameral-*#178

Merged
Knapp-Kevin merged 3 commits into
devfrom
177-installer-skills-bundle
May 4, 2026
Merged

fix(infra): bundle skills/ in wheel, surface install failures, rename /bicameral:* → /bicameral-*#178
Knapp-Kevin merged 3 commits into
devfrom
177-installer-skills-bundle

Conversation

@Knapp-Kevin

@Knapp-Kevin Knapp-Kevin commented May 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

End-to-end fix for "skills are not invocable after install" — the cause was three layered bugs:

  • Wheel didn't ship skills/ despite the artifacts directive (verified empty pre-fix). Claude Code skills require folder copies that the installer reads from Path(__file__).parent / "skills", so the wheel must contain that path.
  • _install_skills failed silently when source was missing; the summary suppressed the zero count.
  • All /bicameral:* slash commands were unresolvable. Claude Code resolves slash commands by skill folder name, and the folders use hyphens (bicameral-ingest, etc.). The colon form would only work for plugin-namespaced skills — and bicameral-mcp does not ship a Claude Code plugin. Renaming everywhere is the right fix; switching to a plugin distribution was considered and rejected as out-of-scope (multi-agent strategy, plugin maturity, separate engine layer).
  • _detect_runner had a broken python -m bicameral_mcp fallback that produced non-functional MCP configs (no such package; entry is server:cli_main).
  • Stale Note: using ... -m bicameral_mcp ... advisory printed regardless of selected runner.

Linked issues

Closes #177

Plan / Audit / Seal

  • Plan: plan-installer-skills-remediation.md (originally authored against main; adapted on dev after discovering significant divergence and verifying via wheel build that dev had the same packaging bug)
  • Audit: PASS (binary verdict; one VETO cycle resolved on F1 specification-drift around the runner-note text — fixed inline before re-audit)
  • Seal: advisory PASS — all 24 installer/hook/SessionEnd tests pass; setup_wizard and server import cleanly; manual smoke (_BICAMERAL_SESSION_END_COMMAND ends with '/bicameral-capture-corrections --auto-ingest')
  • Risk: L2 (touches packaging contract, installer behavior, hook commands, and ~18 files of cross-references)

Test plan

  • pytest tests/test_setup_wizard.py -v — 6 passed (incl. new test_session_end_command_uses_hyphen_slash_command regression guard)
  • pytest tests/test_installer_packaging.py -v — 2 passed (real python -m build + zipfile inspection; skill members enumerated dynamically from source)
  • pytest tests/test_post_commit_sync_hook.py -v — 11 passed (hook reminder text now references /bicameral-sync)
  • pytest tests/test_session_end_hook_drift.py -v — 5 passed (canonical SessionEnd command + drift checks)
  • python -c "import setup_wizard, server" — clean imports, RunnerNotFoundError exported
  • Manual verify (reviewer): build the wheel, pip install --force-reinstall it into a fresh venv, run bicameral-mcp setup <some-target-repo>. Expect: all 12 skill folders populated under <repo>/.claude/skills/, summary advertises /bicameral-<name> (hyphen) commands, no -m bicameral_mcp text anywhere.
  • Manual verify (reviewer): with no bicameral-mcp script on PATH, run setup. Expect: RunnerNotFoundError with install hint, not silent broken config.

Scope summary

Functional code (must change for the fix to work end-to-end):

  • pyproject.toml[tool.hatch.build.targets.wheel.force-include] with "skills" = "skills"
  • setup_wizard.py — RunnerNotFoundError; clean _detect_runner; WARNING in _install_skills; drop if num_skills: gate; surface RunnerNotFoundError in run_setup; delete stale runner-note; rename advertised slash commands; rename hook command string
  • .claude/settings.json — SessionEnd hook command
  • scripts/hooks/post_commit_sync_reminder.py — PostToolUse reminder text
  • server.py — 11 tool-description "Slash alias:" lines
  • skills/*/SKILL.md — 6 skill files with cross-references / self-references / hook examples

User-facing docs:

  • README.md — slash-commands table (5 rows + skills column reference)
  • docs/DEV_CYCLE.md — one example
  • assets/git-for-specs-deck.html — presentation chip

Tests (lockstep + regression guard):

  • tests/test_post_commit_sync_hook.py, tests/test_session_end_hook_drift.py, tests/e2e/_harness_setup.py, tests/e2e/run_e2e_flows.py — references in test text/asserts
  • tests/test_setup_wizard.py, tests/test_installer_packaging.py — new (TDD)

Intentionally NOT changed:

  • CHANGELOG.md — historical entries are archival
  • .claude/skills/* — repo-local dupes regenerated by bicameral-mcp setup .
  • pyproject.toml artifacts = ["skills/**/*.md", ...] — left alongside force-include (removing would be tangential cleanup)

Notes for reviewer

  • Pre-existing setup_wizard.py size (~970 LOC) is over the Razor 250-LOC file limit; not addressed here. Suggest a follow-up refactor issue.
  • Plugin distribution (so /bicameral:ingest could work natively) was considered and explicitly rejected — the multi-agent surface (Cursor + Codex don't have plugin systems) makes a Python-packaged engine + per-agent installer the right shape. The plugin route would be a several-week migration with no net win.

The installer silently left .claude/skills/ empty because:
- pyproject.toml did not force-include skills/, so Hatchling dropped the
  directory from the wheel (no __init__.py; the existing `artifacts`
  directive was insufficient on its own — verified empty pre-fix).
- setup_wizard._install_skills returned 0 silently when source was missing.
- run_setup suppressed the zero count via `if num_skills:`, hiding the
  failure from the "Done!" summary.

Compounding bugs surfaced during audit:
- _detect_runner kept a `python -m bicameral_mcp` fallback that produced
  a non-functional MCP config (no such package; entry is server:cli_main).
- A stale `Note: using ... -m bicameral_mcp ...` advisory printed
  regardless of selected runner.

Changes:
- pyproject.toml: add [tool.hatch.build.targets.wheel.force-include]
  with `"skills" = "skills"`; add `build>=1.0.0` to test extras.
- setup_wizard.py: add RunnerNotFoundError; drop broken module fallback
  from _detect_runner; surface RunnerNotFoundError at top of run_setup
  before any filesystem mutation; replace silent return in _install_skills
  with a loud WARNING; remove `if num_skills:` gate so the count is
  always reported; delete stale runner-note block.
- tests/test_installer_packaging.py: real `python -m build` + zipfile
  inspection. Skill members enumerated dynamically from skills/ source
  so coverage stays correct as skills are added or removed.
- tests/test_setup_wizard.py: behavioral coverage of warning, runner
  selection branches, and a regression guard against re-introducing a
  non-script `_detect_runner` return.

Closes #177

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Knapp-Kevin Knapp-Kevin added flow:feature Standard feature/fix PR targeting BicameralAI/dev (the default flow) P1 High: ship this milestone; user-impacting bug or committed feature infra Infrastructure / build / CI / repo-admin work fix Bug fix or correctness repair labels May 4, 2026
@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 203cb931-5c97-437c-bd02-4ec2d6c1c40c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 177-installer-skills-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Claude Code resolves slash commands by skill folder name. Folders are
named bicameral-<verb> (hyphen), so the colon-form /bicameral:<verb>
never resolved — it would only work for skills shipped via a Claude
Code plugin namespace, which bicameral-mcp does not provide.

Functional impact (what was broken before):
- SessionEnd hook spawned `claude -p '/bicameral:capture-corrections'`
  which Claude could not invoke. The hook never ran the intended skill.
- PostToolUse reminder text told the agent to "run /bicameral:sync"
  pointing at a non-resolvable command.
- run_setup advertised /bicameral:* names that users would type and
  see "command not found".
- README and DEV_CYCLE documented the broken form.
- 11 server.py tool descriptions referenced colon-form aliases.

Renames (functional):
- setup_wizard.py: SessionEnd hook command, slash-command summary,
  inline comment.
- .claude/settings.json: deployed SessionEnd hook command.
- scripts/hooks/post_commit_sync_reminder.py: PostToolUse reminder
  text and module docstring.
- server.py: 11 tool-description "Slash alias:" lines.
- 6 skills/*/SKILL.md cross-references and self-references.
- assets/git-for-specs-deck.html presentation chip.
- README.md slash-commands table (5 rows + skills column).
- docs/DEV_CYCLE.md example.

Renames (test lockstep):
- tests/test_post_commit_sync_hook.py assertion.
- tests/test_session_end_hook_drift.py canonical command + docstring.
- tests/e2e/_harness_setup.py + run_e2e_flows.py docstring refs.

New regression guard:
- tests/test_setup_wizard.py::test_session_end_command_uses_hyphen_slash_command
  fails if the colon form is ever re-introduced into _BICAMERAL_SESSION_END_COMMAND.

CHANGELOG.md historical entries left intact (archival).
.claude/skills/* dupes left intact (regenerated by `bicameral-mcp setup`).

Refs #177
@Knapp-Kevin Knapp-Kevin had a problem deploying to recording-approval May 4, 2026 17:11 — with GitHub Actions Failure
@Knapp-Kevin Knapp-Kevin changed the title fix(infra): bundle skills/ in wheel and surface install failures loudly fix(infra): bundle skills/ in wheel, surface install failures, rename /bicameral:* → /bicameral-* May 4, 2026
Pure formatting; no behavior change. CI `ruff format --check` failed on
PR #178 because the two new test files weren't yet formatted to repo
ruff conventions. `pytest tests/test_setup_wizard.py tests/test_installer_packaging.py`
still 8/8 green post-format.

Refs #177
@Knapp-Kevin Knapp-Kevin had a problem deploying to recording-approval May 4, 2026 17:28 — with GitHub Actions Failure
@Knapp-Kevin Knapp-Kevin merged commit b688884 into dev May 4, 2026
8 of 9 checks passed
@Knapp-Kevin Knapp-Kevin deleted the 177-installer-skills-bundle branch May 4, 2026 17:46
Knapp-Kevin added a commit that referenced this pull request May 4, 2026
The v0 architecture reference previously lived only in a Notion page
(mirrored locally at .failsafe/governance/v0 Architecture *.md,
gitignored). It had drifted from dev HEAD across every load-bearing
dimension and was not editable via PR.

This doc:
- Is derived from code; cites ledger/schema.py, server.py,
  events/team_adapter.py, contracts.py as source-of-truth pointers
  for every numeric claim
- Delineates v0 core (6 nodes, 5 edges) from operational
  infrastructure (symbol/vocab_cache/ledger_sync/graph_proposal/
  schema_meta) and v1 CodeGenome scaffolding (subject_identity/
  subject_version, has_identity/has_version/identity_supersedes/
  locates/depends_on) already present on dev
- Documents the 23-tool MCP surface (lifecycle / telemetry /
  code-locator) and explicitly notes there is no internal/external
  split at the tool boundary
- Documents the JSONL-in-git event substrate and explains why it
  delivers the Notion page's stated team-sync goal better than the
  graph-node design described there
- Names the known compliance_checked emission gap (#178 open-issues
  survey) so it can be tracked separately
- Lists the 6 v0 invariants with test-as-contract semantics and an
  honest fat callout
- Includes a Reconciliation notes table at the end mapping each
  drifted Notion claim to current reality

Closes #179

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin added a commit that referenced this pull request May 5, 2026
Closes the e2e flake observed on PR #181 (run 25345393569): Flow 5 (PM
history+ratify, MCP-layer) and Flow 3 (commit→sync, agentic) failed
because a stale ~/.claude/projects/-tmp-desktop-clone/memory/MEMORY.md
from a prior run let the agent answer Flow 5's prompt directly from
disk instead of invoking bicameral.history. Flow 3's failure cascaded
because its ledger snapshot relies on Flow 5's bicameral call to drain
the post-commit JSONL queue via EventMaterializer.replay_new_events
(documented at run_e2e_flows.py:342-349).

Diagnosis evidence (from /qor-debug Phase 1 four-layer analysis):
- PASS run #178 Flow 5.ndjson L5: 'File does not exist' on MEMORY.md →
  agent runs ToolSearch + bicameral_history. Flow 5 PASS.
- FAIL run #181 Flow 5.ndjson L5: full MEMORY.md content shown → agent
  reads memory + git-log + answers from disk. Zero bicameral calls.
- The team-server materializer hunk at events/materializer.py:88-106
  is correctly gated (is_team_server_payload requires both source_type
  AND extraction keys; standard MCP payloads have neither). Dormant in
  the e2e flow. NOT the regressor.

Fix:
- New helper clean_claude_memory_for_repo(repo_path) in
  tests/e2e/_harness_setup.py that purges ~/.claude/projects/<key>/memory/
  where <key> is the absolute repo path with / and \ replaced by -.
- Wired into tests/e2e/run_e2e_flows.py::main as _clean_claude_memory(),
  called alongside the existing _clean_ledger / _reset_desktop_repo /
  _bootstrap_bicameral_dir cleanup chain.
- Regression-locking unit tests in tests/test_e2e_harness_memory_purge.py
  cover: purges-when-present, no-op-when-absent, scoped-to-target-project
  (does NOT touch other repos' memory on the same runner).

3/3 tests pass; ruff + ruff format + mypy all clean.

Refs PR #181, root-cause via /qor-debug session
Knapp-Kevin added a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
Closes BicameralAI#187

BREAKING: removes IngestResponse.judgment_payload and
IngestResponse.judgment_payloads. Replaces them with a single
IngestResponse.brief: BriefEnvelope | None field that carries gap-judge
findings inline. Caller no longer has to remember to render two
separate sections; the dual-render contract was the most-skipped
step in bicameral-ingest per the issue body.

Phase 1 — schema + handler:
- contracts.py: new BriefEnvelope class (mirrors PreflightResponse's
  brief surface; includes rubric: GapRubric | None for the migrated
  rubric reference). Removes judgment_payload + judgment_payloads from
  IngestResponse.
- handlers/ingest.py: populates brief from judgment_payloads (auto-chain
  output remains the local var; the legacy public fields are gone).
  brief stays None when no gap-judge findings — silent-on-no-signal
  matches the PreflightResponse contract.
- handlers/gap_judge.py: docstring text update to reference the new
  IngestResponse.brief.gaps location (was IngestResponse.judgment_payload).

Phase 1 — cascade migration of existing tests:
- tests/test_v0416_gap_judge.py: 5 assertions on response.judgment_payload
  migrated to response.brief / response.brief.rubric / response.created_decisions.
  Test names renamed to reflect the new field. Module docstring updated.
- 4 new behavioral tests in tests/test_ingest_brief_unification.py:
  brief populated when gaps judged, rubric carried through, brief is
  None on no signal, judgment_payload[s] not in model_fields or
  model_dump output.

Phase 2 — skill render contract:
- skills/bicameral-ingest/SKILL.md Step 6: rewrites gap-judge rubric
  trigger to read response.brief.gaps[] and response.brief.rubric
  instead of response.judgment_payload.
- skills/bicameral-judge-gaps/SKILL.md: trigger description + auto-chain
  reference updated to brief.gaps. Migration note added.

Phase 3 — stale colon mop-up (orthogonal):
- skills/bicameral-report-bug/SKILL.md: 4 line edits, /bicameral:report-bug
  → /bicameral-report-bug. The single residual occurrence of the
  legacy colon-namespace form across skills/, post-BicameralAI#178's sweep.

Verification:
- 16/16 tests pass (4 new + 12 v0416 regression incl. 3 migrated)
- mypy . clean on changed files (contracts.py, handlers/ingest.py,
  handlers/gap_judge.py)
- ruff check . + ruff format --check . clean
- grep -rn "judgment_payload" --include="*.py" . — only intentional
  hits remain: contracts.py docstrings (migration markers),
  handlers/ingest.py (local var name for the gap-judge auto-chain
  loop), and server.py:1119 (standalone bicameral.judge_gaps tool's
  response shape — non-cascading collision, intentionally untouched
  per plan-187 §Phase 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fix or correctness repair flow:feature Standard feature/fix PR targeting BicameralAI/dev (the default flow) infra Infrastructure / build / CI / repo-admin work P1 High: ship this milestone; user-impacting bug or committed feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant