Skip to content

triage: dev → main (README restructure + #272 Fix 3 + SECURITY.md)#282

Merged
jinhongkuan merged 14 commits into
mainfrom
triage-from-dev
May 9, 2026
Merged

triage: dev → main (README restructure + #272 Fix 3 + SECURITY.md)#282
jinhongkuan merged 14 commits into
mainfrom
triage-from-dev

Conversation

@jinhongkuan

@jinhongkuan jinhongkuan commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Triages 10 commits from dev onto main — every commit from e661bd8 onwards (the README restructure base) up through the latest Flow 5 prompt disambiguation. Cherry-picked rather than merge-tracked because main has independent hotfix history (#267-#269 SBOM hotfixes) that's already in dev via prior triage; this avoids re-merging the merge commits.

Commits in order

  1. dc380a7 docs(README): restructure for getting-started clarity + add hero image
  2. 5d1dca7 chore(repo): untrack plan-*.md and add to .gitignore
  3. 279e9ee docs(security): add SECURITY.md — enables GitHub Security tab
  4. b1e9879 docs(README): trim to test-out path; add star CTA + logo
  5. d786914 fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (Dev CI baseline regressions blocking #258 merge: test-summary action + M1 eval AttributeError + e2e Flow 1 #272)
  6. 6cab533 fix(skill): ingest advises on ratify instead of auto-prompting (Dev CI baseline regressions blocking #258 merge: test-summary action + M1 eval AttributeError + e2e Flow 1 #272 Fix 3)
  7. 46d8fad style: ruff format assert_flow_1 ratify_note line
  8. d1364a1 fix(e2e): re-anchor flow-1 prompt with 'decision ledger' to trigger ingest
  9. 3975447 style(e2e): refresh stale Flow 5 no-ratify message (Dev CI baseline regressions blocking #258 merge: test-summary action + M1 eval AttributeError + e2e Flow 1 #272 Fix 3)
  10. c2cf52a fix(e2e): disambiguate flow-5 prompt — 'decisions we've tracked in our ledger'

Highlights

  • README: hero image, restructured Quickstart, star CTA, logo
  • SECURITY.md: enables the GitHub Security tab
  • Dev CI baseline regressions blocking #258 merge: test-summary action + M1 eval AttributeError + e2e Flow 1 #272 Fix 3: ingest skill now advises on ratification rather than auto-firing an AskUserQuestion gate (saves Flow 1's tool-call budget; ratify moves to Flow 5)
  • CI hardening: SHA-pinned phoenix-actions/test-reporting-summary
  • e2e: Flow 1 + Flow 5 prompt anchors tightened so the agent reaches bicameral.ingest / bicameral.history before exhausting budget on auto-memory + working-tree reads

What this does NOT include

  • Older dev-only work that was never triaged (sim_* scripts, release/sbom_emit.py refactors) — out of scope; left on dev

Test plan

  • CI: MCP Regression Suite green on triage-from-dev
  • CI: ruff + mypy green
  • CI: e2e Flow 1 + Flow 5 green (the new prompt anchors are the load-bearing change here)
  • CI: TruffleHog clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Supply-chain protections: signed artifacts and SBOM verification surfaced in docs
  • Documentation

    • Reworked README with clearer privacy/compliance posture and telemetry opt-out
    • Added SECURITY.md detailing telemetry, vulnerability reporting, and release verification
  • Changes

    • Ingest/ratify flow is advisory by default (ratify optional; direct sign-off still supported)
    • E2E assertions updated to tolerate advisory-only ratification
    • CI workflow stability improved by pinning a test-summary action
  • Chores

    • Ignore local plan files via .gitignore entries

jinhongkuan and others added 10 commits May 7, 2026 21:29
Visible changes when a user lands on the README:

1. Hero image (`assets/bicameral-hero.png`) at the very top — visual
   without/with comparison from the landing-page asset bundle.
2. Quickstart immediately after the one-line value prop (was buried
   below "The Problem" and "How It Feels"). User goes from "what is
   this" to "type these three lines" without scrolling.
3. Compliance section trimmed from a 12-line policy-file enumeration
   near the top to a 5-line "we take privacy seriously" paragraph at
   the bottom. The full posture stays linkable via docs/policies/.
4. pipx dropped from the install path. The two paths are now uv
   (recommended) and plain pip (fallback). uv was already preferred
   per #199's 3-path resolve order; pipx was middle ground that
   doesn't pull its weight in a top-level README.

Section order:
  Hero → title → 1-liner →
  Quickstart → How It Feels → The Problem → Core Concepts →
  What setup installs → Slash Commands → MCP Tools Reference →
  Configuration → Local Development → Telemetry →
  Contributing → Privacy & Compliance → License

312 lines (was 376).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 in-progress planning artifacts (`plan-114-*.md`, `plan-codegenome-*.md`,
`plan-A-*.md`, etc.) were tracked in the repo. They're working memory
between author and reviewers during a feature; once the feature merges,
the PR description + CHANGELOG carry the durable record. Keeping these
in the public-ish repo:

- adds 27 markdown files to the wheel's source-distribution surface for
  no end-user benefit
- couples planning vocabulary to release artifacts (e.g. plan-codegenome
  files describe v1 work that #246 reverted; the plan stays useful as
  reference but doesn't belong on `main`)
- creates churn pressure to mark plans as "done" or "superseded" instead
  of just letting them rest as the author's working notes

This commit:
1. Adds `plan-*.md` pattern to `.gitignore` with a one-paragraph comment
   on the policy.
2. `git rm --cached` on all 27 currently-tracked `plan-*.md` files —
   they remain on local disk for the author's reference, just no longer
   tracked.

After merge, anyone with a checkout will keep their local `plan-*.md`
files; new plans drafted in-tree will be untracked by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a top-level SECURITY.md so GitHub auto-creates a "Security" tab
in the repo nav bar — the closest GitHub-native surface to a "button
that pulls up our SBOM/privacy statement."

Contents:

- **Privacy posture** — explicit "all data on your laptop" + telemetry
  opt-out + pointer to docs/policies/ for the full posture
- **Software supply chain** — table of signed artifacts on each
  release (CycloneDX SBOM, Rekor attestation, hooks-manifest sigs,
  skills-manifest sigs, release-tag-commit sig); pointer to the
  release-evidence verification procedure; mention of GitHub's
  auto-SPDX SBOM under Insights → Dependency graph
- **Supported versions** — only latest minor, ~30-day backport window
  for critical fixes
- **Reporting a vulnerability** — GitHub Security Advisories preferred;
  jin@bicameral-ai.com fallback with `[security]` subject prefix; 3-day
  ack target, 30-day patch target for critical
- **Scope** — what's in (server, skills, hooks, release pipeline) and
  what's out (third-party deps, host vulns, local-attack scenarios
  already covered by host-trust model)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sections removed (PM/dev evaluating Bicameral wants the test-out path,
not the why/how-internals):
- "The Problem" (long context narrative)
- "Core Concepts" (two-axis model, link_commit, collab modes)
- "Removing Bicameral" (post-test concern)
- "Configuration" env-var table
- "Local Development"
- "Contributing"
- "Telemetry" subsection (folded into Privacy & Compliance one-liner)

What stays — the path from "land on the page" to "type three commands
and see something work":
- Hero image
- Star-on-GitHub CTA (animated SVG, adapted from cocoindex's design
  with metadata strings updated; visual mechanism is generic)
- Logo (small, inline-right of title)
- Title + badges + 1-liner
- Quickstart (uv | pip | Windows)
- How It Feels (preflight render + dashboard screenshot)
- Slash Commands
- What `setup` writes (trust signal — what hits disk)
- MCP Tools Reference (collapsed by default)
- Privacy & Compliance (concise)
- License

312 → 152 lines.

Visuals from landing-page borrowed: assets/logo.png (was
landing-page/logo.png), assets/bicameral-hero.png from
landing-page/output/imagegen/. Star CTA SVG saved as
assets/star-on-github.svg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… attr (#272)

Closes 2 of 3 dev-baseline regressions tracked in #272 (the third — Flow 1
e2e expectation post-#263 auto-bind — is deferred for product-level
discussion since it touches the ingest skill choreography).

## test-summary/action SHA-pin

The `test-summary/action@v2` mutable tag was repointed on 2026-05-07
22:09 UTC from a `dist`-targeted release (with bundled `index.js`) to
the new v2.5 release that targets `main` (no bundled output). Every PR
opened after that point fails the Test Summary step with
`File not found: index.js`, marking MCP Regression Suite jobs red on
both ubuntu and windows.

Pin to v2.4 commit `31493c76ec9e7aa675f1585d3ed6f1da69269a86` (the
last `dist`-targeted release with the bundled artifact). Aligns with
OWASP-03 / `docs/policies/install-trust-model.md` discipline (do not
trust mutable action tags for security-load-bearing CI).

## tests/eval_decision_relevance.py:166 attribute rename

`IngestResponse` schema field is `pending_grounding_decisions` per
`contracts.py:571` and `handlers/ingest.py:695`. The eval script was
still reading the legacy internal name `ungrounded_decisions` directly
off the response, raising `AttributeError` and failing the M1
adversarial corpus eval step (technically `continue-on-error: true`,
but worth fixing independently since the eval result was lost on every
run).

Keeps the JSON-output key `ungrounded_decisions` unchanged so downstream
consumers (M1 trend reports, etc.) see no change.

Both fixes verified locally: yaml.safe_load on the workflow + IngestResponse
field introspection confirms the schema.

Refs #272.
…Fix 3)

Closes the third regression deferred from PR #273. Post-#263 (sync
auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral
Read/Grep/validate_symbols calls before reaching the ingest skill's
auto-fired ratify AskUserQuestion gate, so the agent never invoked
ratify.

Per #108 Flow 1 spec discussion: this is both a regression AND a
canonical flow change. Auto-bind stays (deterministic, useful), but
ratify drops out of the auto-prompt path and becomes advisory text.
Ratification belongs to Flow 5 (PM Friday review) and to direct user
requests like "sign these off" / "ratify all".

Skill changes (skills/bicameral-ingest/SKILL.md Step 7):

- Replace AskUserQuestion ratify gate with a one-block advisory:
  "○ N decisions captured as proposals — drift tracking activates
   after ratification. Run `bicameral.ratify` when ready, or revisit
   them in your next history review (Flow 5)."
- Add explicit "Direct user request shortcut": if the prompt asked
  for sign-off in the same turn, ratify directly without the round-trip.
- Update the example at the bottom to match.

Test changes:

- tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end"
  so the prompt exercises the new advisory path. Direct sign-off
  requests are covered by Flow 5 (history + ratify).
- tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify
  requirement; accept [ingest, bind?] as the canonical Flow 1
  signature. Update Flow 5's stale comment about Flow 1 pre-ratifying.
- tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify
  → test_flow1_passes_without_ratify (advisory-only is now the
  expected behavior).

Doesn't touch sync skill #263 — that auto-bind change is preserved
intentionally; this PR fixes the test/spec contract that conflicted
with it.

Refs #272 #108 #263.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngest

The previous fix dropped "sign them off on our end" to defeat the ratify
auto-call; in doing so, it also removed the bicameral disambiguator. The
remaining "please log these on our end" landed inside the auto-memory
trigger surface (the runner's ~/.claude/projects/.../memory/ directory),
so the agent wrote four memory files instead of invoking bicameral.ingest
— Flow 1 went 0/0 on bicameral calls and the cascading flows (3, 5)
then had nothing to assert against.

Replace "log these on our end" with "log these to our decision ledger"
— same intent, but "decision ledger" is an unambiguous bicameral signal
the auto-memory skill does not match. Still no "sign them off" / "ratify"
phrasing, so the new advisory-only Step 7 contract is preserved.

Refs #272 #108.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The no-ratify success branch in assert_flow_5 still claimed "Flow 1
ratified its 3 seeds" — but per #272 Fix 3, Flow 1 now leaves seeds
as `proposed` (advisory-only ingest). The docstring was updated in
2252c82; this catches the trailing f-string that was missed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…r ledger'

Same auto-memory / working-tree ambiguity that bit Flow 1 (#272 Fix 3
follow-up): Flow 5's prompt 'all the things we're tracking' was readable
as either bicameral decisions OR files modified in the cwd's working
tree. With Flow 1 no longer auto-ratifying (advisory-only Step 7), the
agent's mental model on the new ledger state shifted: it pulled the
auto-memory index and the working-tree diff first, then never reached
bicameral.history.

Tighten the prompt:
- 'all the things we're tracking' → 'the decisions we've tracked in our
  ledger' (unambiguous bicameral signal — auto-memory and the working
  tree are not 'our ledger').
- 'still on the to-do pile that hasn't moved' → 'still on the proposed
  pile that hasn't been ratified' (uses the ledger's `signoff.state`
  vocabulary; aligns with the new advisory-only ingest contract where
  Flow 1 leaves seeds as `proposed` for Flow 5 to walk).

The asserter (history mandatory; ratify conditional on proposals) is
unchanged. Now the asserter's load-bearing 'if proposals exist' branch
is the live path: Flow 1 leaves 3 proposals → Flow 5 ratifies → drift
tracking activates, exactly as the #108 spec describes.

Refs #272 #108.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Rate limit exceeded

@jinhongkuan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 43 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bdf5e54d-e6d6-4c22-8a16-071a7224445f

📥 Commits

Reviewing files that changed from the base of the PR and between c20954e and cdc1f19.

📒 Files selected for processing (2)
  • tests/e2e/prompts/flow-3-commit-sync.md
  • tests/e2e/run_e2e_flows.py
📝 Walkthrough

Walkthrough

This PR consolidates project planning artifacts, pivots the SessionEnd capture flow to queue pending transcripts, changes ingest ratification from mandatory to advisory-only, reorganizes README and security documentation, and updates e2e test assertions to match the new ingest behavior. The core narrative is SessionEnd hook refactoring coupled with ingest flow simplification and significant documentation hygiene work.

Changes

SessionEnd Queue Pivot + Ingest Flow Refactor + Plan Consolidation

Layer / File(s) Summary
Plan File Cleanup
plan-124-post-commit-hook-fix.md, plan-156b-preflight-queue-drain.md, plan-187-gap-brief-unification.md
Removes obsolete/completed plan files (124, 156b) and archives 187 for future phases.
Transcript Queue Data Model
events/transcript_queue.py
New module implementing pending/processed transcript queue with three operations: write-pending (fail-soft), list-pending-FIFO (by mtime), and archive-processed (idempotent overwrite).
SessionEnd Queue Writer Hook
scripts/hooks/session_end_queue_writer.py
New fail-soft SessionEnd hook script that parses JSON stdin, validates inputs (session_id, transcript_path, cwd), and writes to pending queue while always exiting zero on error.
Transcript Archive CLI Helper
scripts/hooks/transcript_archive.py
New CLI to safely archive pending transcripts by basename, validate format to prevent traversal, and return distinct exit codes for error vs missing file.
SessionEnd Command Wiring
setup_wizard.py, .claude/settings.json
Updated to invoke session_end_queue_writer.py instead of legacy claude -p ... --auto-ingest command; preserves BICAMERAL_SESSION_END_RUNNING guard.
Capture Corrections Skill Queue Draining
skills/bicameral-capture-corrections/SKILL.md
Added Step 0 to drain pending-transcripts FIFO via transcript_archive.py; updated narrative to reflect queue-in-SessionEnd, drain-in-next-session flow.
SessionEnd Queue Tests & Drift
tests/test_session_end_queue_writer.py, test/drift assertions
Expanded coverage: queue writing success, fail-soft no-ops, FIFO ordering by mtime, archive idempotency; updated drift tests to validate new queue-writer command and remove claude -p/--auto-ingest assertions.
Ingest Skill Ratification Flow
skills/bicameral-ingest/SKILL.md
Changed from mandatory AskUserQuestion ratification step to advisory-only step (v0.14.2+); preserves direct-request shortcut to bicameral.ratify when user explicitly requests sign-off.
E2E Prompt Wording
tests/e2e/prompts/flow-1-ingest.md, tests/e2e/prompts/flow-5-history.md
Updated Flow 1 and Flow 5 prompts to use "decision ledger" and "proposed pile" terminology, aligning with advisory-only ratification model.
E2E Flow Assertions
tests/e2e/run_e2e_flows.py, tests/eval_decision_relevance.py, tests/test_e2e_asserters.py
Updated Flow 1 asserter to tolerate missing bicameral_ratify (advisory-only), Flow 3 commit detection improved via regex, Flow 5 assertions clarified; eval harness updated to use result.pending_grounding_decisions.
README Restructure
README.md
Streamlined hero section, removed Problem/How-It-Feels blocks, reorganized MCP Tools Reference into "Decision Ledger" and "Code Locator" categories, added "What setup writes to your repo" section, consolidated "Privacy & Compliance" section.
Security & Privacy Documentation
SECURITY.md
New comprehensive documentation of local-only operation, telemetry scope (tool name, version, duration, error flag, counts) with BICAMERAL_TELEMETRY=0 opt-out, supply-chain protections (cosign/SBOM/Rekor), version maintenance policy, vulnerability reporting channels.
GitHub Actions Workflow
.github/workflows/test-mcp-regression.yml
SHA-pinned test-summary/action from @v2 to v2.4 commit with explanatory comments about repointing/incident context.
Gitignore Plan Exclusion
.gitignore
Added plan-*.md patterns to exclude internal planning artifacts from version control.
Plan Document References
plan-*.md (114, 156, 190, 197, 199–200, 205, 216, 218, 227, 252, 48, A–F, codegenome phases)
Documentation-only specifications for future planned features, architectures, and multi-phase implementations (no executable code changes in this diff layer).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • Knapp-Kevin

Poem

🐰 A rabbit's note on queues and plans:
Hooks now tuck transcripts in a patient queue,
Ratify is gentle — advisory, not a cue,
Docs trimmed and pinned, tests tuned to the new,
A tidy hop forward — congrats from this crew! 🥕

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch triage-from-dev

@jinhongkuan jinhongkuan enabled auto-merge (rebase) May 8, 2026 04:40

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 111: Update the summary text that currently reads "<summary><strong>13
tools across two categories</strong></summary>" to reflect the actual count of
tools; change it to "<summary><strong>15 tools across two
categories</strong></summary>" (or optionally "<summary><strong>15 tools across
two categories (13 Decision Ledger + 2 Code Locator)</strong></summary>") so the
summary matches the listed totals.

In `@skills/bicameral-ingest/SKILL.md`:
- Around line 823-827: The fenced block starting with triple backticks around
the decision-note example is missing a language tag (causing MD040); update the
opening fence in that block (the ``` that begins the example containing "○ N
decisions captured as proposals ...") to include a language identifier such as
text (i.e., change ``` to ```text) so the markdown linter recognizes the code
fence.

In `@tests/eval_decision_relevance.py`:
- Line 166: The JSON key "ungrounded_decisions" is semantically stale and should
match the response attribute name; update the dict construction where the output
is built to replace the "ungrounded_decisions" key with
"pending_grounding_decisions" so it serializes
result.pending_grounding_decisions (i.e., use the attribute
result.pending_grounding_decisions as the value under the key
"pending_grounding_decisions") to keep naming consistent for downstream
consumers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eee1f34f-01fa-4bc0-9860-2d00577dc0cf

📥 Commits

Reviewing files that changed from the base of the PR and between 18e1414 and c2cf52a.

⛔ Files ignored due to path filters (3)
  • assets/bicameral-hero.png is excluded by !**/*.png
  • assets/logo.png is excluded by !**/*.png
  • assets/star-on-github.svg is excluded by !**/*.svg
📒 Files selected for processing (37)
  • .github/workflows/test-mcp-regression.yml
  • .gitignore
  • README.md
  • SECURITY.md
  • plan-114-grounding-lint.md
  • plan-124-post-commit-hook-fix.md
  • plan-156-sessionend-queue-pivot.md
  • plan-156b-preflight-queue-drain.md
  • plan-187-gap-brief-unification.md
  • plan-190-compliance-event.md
  • plan-197-flow3-prompt-clarity.md
  • plan-199-windows-install-and-uv.md
  • plan-200-config-yaml-redaction.md
  • plan-200-skills-audit-hardening.md
  • plan-205-compliance-research-brief.md
  • plan-216-ingest-size-and-rate-limit.md
  • plan-218-cosign-hooks-manifest-and-sbom.md
  • plan-227-structured-audit-log-emission.md
  • plan-252-layer-2-schema-revision-sentinel.md
  • plan-252-layer-3-diagnose-cli.md
  • plan-48-pre-push-drift-hook.md
  • plan-A-ingest-followups-230-232-233.md
  • plan-B-preflight-attribution-regex-209.md
  • plan-C-soc2-03-signed-tags-and-release-evidence.md
  • plan-D-compliance-stance-docs-220-225-226.md
  • plan-E-owasp-03-and-05-install-update-trust-model.md
  • plan-F-llm-06-skills-manifest-signing.md
  • plan-codegenome-llm-drift-judge.md
  • plan-codegenome-phase-1-2.md
  • plan-codegenome-phase-3.md
  • plan-codegenome-phase-4.md
  • skills/bicameral-ingest/SKILL.md
  • tests/e2e/prompts/flow-1-ingest.md
  • tests/e2e/prompts/flow-5-history.md
  • tests/e2e/run_e2e_flows.py
  • tests/eval_decision_relevance.py
  • tests/test_e2e_asserters.py
💤 Files with no reviewable changes (27)
  • plan-227-structured-audit-log-emission.md
  • plan-197-flow3-prompt-clarity.md
  • plan-B-preflight-attribution-regex-209.md
  • plan-codegenome-phase-3.md
  • plan-114-grounding-lint.md
  • plan-156b-preflight-queue-drain.md
  • plan-E-owasp-03-and-05-install-update-trust-model.md
  • plan-codegenome-phase-4.md
  • plan-187-gap-brief-unification.md
  • plan-200-config-yaml-redaction.md
  • plan-codegenome-llm-drift-judge.md
  • plan-218-cosign-hooks-manifest-and-sbom.md
  • plan-124-post-commit-hook-fix.md
  • plan-C-soc2-03-signed-tags-and-release-evidence.md
  • plan-216-ingest-size-and-rate-limit.md
  • plan-205-compliance-research-brief.md
  • plan-48-pre-push-drift-hook.md
  • plan-A-ingest-followups-230-232-233.md
  • plan-252-layer-2-schema-revision-sentinel.md
  • plan-199-windows-install-and-uv.md
  • plan-200-skills-audit-hardening.md
  • plan-codegenome-phase-1-2.md
  • plan-F-llm-06-skills-manifest-signing.md
  • plan-D-compliance-stance-docs-220-225-226.md
  • plan-156-sessionend-queue-pivot.md
  • plan-252-layer-3-diagnose-cli.md
  • plan-190-compliance-event.md

Comment thread README.md

<details>
<summary><strong>13 tools across three categories</strong></summary>
<summary><strong>13 tools across two categories</strong></summary>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Tool count in summary appears inconsistent with the tables

Line 111 says “13 tools across two categories,” but the section currently lists 15 total (13 Decision Ledger + 2 Code Locator). Please update the summary line to avoid reader confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 111, Update the summary text that currently reads
"<summary><strong>13 tools across two categories</strong></summary>" to reflect
the actual count of tools; change it to "<summary><strong>15 tools across two
categories</strong></summary>" (or optionally "<summary><strong>15 tools across
two categories (13 Decision Ledger + 2 Code Locator)</strong></summary>") so the
summary matches the listed totals.

Comment on lines 823 to 827
```
AskUserQuestion({
question: "Captured N decisions as proposals — ratify now to start drift tracking:",
header: "Ratify",
multiSelect: false,
options: [
{ label: "Ratify all N (recommended)",
description: "Drift tracking starts immediately on all N decisions." },
{ label: "Pick which to ratify",
description: "Use the Other field to specify decision numbers (e.g. '1 3 5')." },
{ label: "Skip — review later",
description: "All stay as proposals. They'll surface as stale after inactivity." }
]
})
○ N decisions captured as proposals — drift tracking activates after ratification.
Run `bicameral.ratify` when ready, or revisit them in your next history review
(Flow 5: "show me what's tracked" → ratify what's ready).
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced example block

The code fence opened at Line 823 is missing a language identifier, which triggers MD040.

Suggested patch
-```
+```text
 ○ N decisions captured as proposals — drift tracking activates after ratification.
   Run `bicameral.ratify` when ready, or revisit them in your next history review
   (Flow 5: "show me what's tracked" → ratify what's ready).
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 823-823: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/bicameral-ingest/SKILL.md` around lines 823 - 827, The fenced block
starting with triple backticks around the decision-note example is missing a
language tag (causing MD040); update the opening fence in that block (the ```
that begins the example containing "○ N decisions captured as proposals ...") to
include a language identifier such as text (i.e., change ``` to ```text) so the
markdown linter recognizes the code fence.

"grounded_pct": round(stats.grounded_pct, 4),
"grounding_deferred": stats.grounding_deferred,
"ungrounded_decisions": list(result.ungrounded_decisions),
"ungrounded_decisions": list(result.pending_grounding_decisions),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the return type of handle_ingest exposes pending_grounding_decisions
# and check whether ungrounded_decisions still exists on that type.
rg -n "pending_grounding_decisions\|ungrounded_decisions" --type py -C 3

Repository: BicameralAI/bicameral-mcp

Length of output: 51


🏁 Script executed:

git ls-files --cached | head -20

Repository: BicameralAI/bicameral-mcp

Length of output: 907


🏁 Script executed:

find . -name "*.py" -type f | grep -E "(eval_decision|handle_ingest)" | head -20

Repository: BicameralAI/bicameral-mcp

Length of output: 104


🏁 Script executed:

cd tests && ls -la *.py | head -20

Repository: BicameralAI/bicameral-mcp

Length of output: 1568


🏁 Script executed:

cat -n tests/eval_decision_relevance.py | sed -n '140,175p'

Repository: BicameralAI/bicameral-mcp

Length of output: 1709


🏁 Script executed:

rg -n "def handle_ingest|class.*Result|pending_grounding_decisions|ungrounded_decisions" -A 2 tests/eval_decision_relevance.py

Repository: BicameralAI/bicameral-mcp

Length of output: 249


🏁 Script executed:

rg -n "handle_ingest" --type py

Repository: BicameralAI/bicameral-mcp

Length of output: 16489


🏁 Script executed:

head -500 handlers/ingest.py | tail -100

Repository: BicameralAI/bicameral-mcp

Length of output: 4313


🏁 Script executed:

grep -n "async def handle_ingest" handlers/ingest.py

Repository: BicameralAI/bicameral-mcp

Length of output: 98


🏁 Script executed:

sed -n '456,550p' handlers/ingest.py

Repository: BicameralAI/bicameral-mcp

Length of output: 4884


🏁 Script executed:

grep -n "class IngestResponse\|IngestResponse =" handlers/ingest.py | head -5

Repository: BicameralAI/bicameral-mcp

Length of output: 51


🏁 Script executed:

rg -n "class IngestResponse" --type py

Repository: BicameralAI/bicameral-mcp

Length of output: 119


🏁 Script executed:

sed -n '564,650p' contracts.py

Repository: BicameralAI/bicameral-mcp

Length of output: 2920


🏁 Script executed:

rg "ungrounded_decisions" --type py handlers/ contracts.py

Repository: BicameralAI/bicameral-mcp

Length of output: 170


🏁 Script executed:

# Check git history or comments about this field change
rg -B 3 -A 3 "pending_grounding_decisions" handlers/ingest.py | head -40

Repository: BicameralAI/bicameral-mcp

Length of output: 347


Rename stale JSON key to match attribute name.

The JSON output key "ungrounded_decisions" is out of sync with the response field pending_grounding_decisions. The attribute exists on the response object (no runtime risk), but the key name mismatch creates semantic confusion—downstream consumers will receive data about decisions pending grounding (deferred, not yet grounded) under a key implying definitive ungroundedness.

Suggested fix
-        "ungrounded_decisions": list(result.pending_grounding_decisions),
+        "pending_grounding_decisions": list(result.pending_grounding_decisions),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"ungrounded_decisions": list(result.pending_grounding_decisions),
"pending_grounding_decisions": list(result.pending_grounding_decisions),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/eval_decision_relevance.py` at line 166, The JSON key
"ungrounded_decisions" is semantically stale and should match the response
attribute name; update the dict construction where the output is built to
replace the "ungrounded_decisions" key with "pending_grounding_decisions" so it
serializes result.pending_grounding_decisions (i.e., use the attribute
result.pending_grounding_decisions as the value under the key
"pending_grounding_decisions") to keep naming consistent for downstream
consumers.

…er-as-verdict

Per BicameralAI/bicameral#108, Flow 3's `git commit` is the *trigger* for
the V1 lifecycle, not the assertion. The previous design conflated the two:
the per-flow asserter FAILed on missing commit, and a buggy substring check
(`"git commit" in cmd`) silently rejected the agent's `git -C <path> commit`
form that `claude` produces when cwd doesn't match the prompt's bare path.
That masked the real signal — the post-hoc ledger validator's "no
compliance_check rows" message — and reported a regex bug as a sync-chain
failure.

Changes:
- New `_command_runs_git_commit()` helper using a regex that handles
  `-C <path>`, `--git-dir=<path>`, &&-chained sequences, and rejects
  `commit-tree` plumbing.
- `assert_flow_3` now always returns True; body text describes the
  precondition (met / unmet) for the report. Per-flow no longer gates.
- `_validate_flow3_via_ledger` becomes the single source of verdict truth:
    no commit                  → SKIP   (advisory, non-blocking)
    commit + verdict written   → PASS   (full V1 lifecycle)
    commit + cc rows only      → PASS   (degraded, advisory: #135 gap)
    commit + neither           → FAIL   (no advisory — real sync-chain bug)
- `commit_note` no longer derives from `flow3.verdict`, breaking the
  precondition→assertion leak.

Replayed against PR #282's failing CI artifact: the agent's commit is now
correctly detected, and the validator surfaces the real upstream bug
(cc_delta == 0 despite a successful commit + 3 subsequent bicameral calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old prompt asked the agent to add a comment above the CherryPickResult
enum (line 30 of cherry-pick.ts), which sits OUTSIDE the cherry-pick
decision's bound code region (the cherryPick function body, lines 142-183).
Verified against PR #282's CI ledger artifact:

  code_region row for app/src/lib/git/cherry-pick.ts:
    symbol:        cherryPick
    start_line:    142
    end_line:      183
    pinned_commit: 192ef5730e84c9337a1e07a04e56cba18a0b2766  ← Flow 3's commit

link_commit DID fire post-Flow-3 (the pinned_commit set on the region proves
ensure_ledger_synced → link_commit ran), but the bound region's content
hash didn't change because the edit was 100+ lines above. With no drift
detected, link_commit correctly wrote ZERO compliance_check rows — and the
harness's success signal (cc_delta > 0) never fired.

The new prompt edits inside the cherryPick function body (line 146,
above the empty-set early-return), which is well inside the bound region.
This produces the drift the V1 lifecycle is designed to detect, and the
existing validator's `cc_delta > 0` branch will fire correctly (PASS-degraded
with #135 advisory, since headless can't run resolve_compliance).

Drive-by: use the full path `app/src/lib/git/cherry-pick.ts` in both the
edit and commit commands so the agent doesn't have to defensively rewrite
to `git -C <repo> commit` (the failing CI's agent did exactly this when
faced with the bare relative path against cwd=/tmp/desktop-clone).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/e2e/run_e2e_flows.py (1)

1004-1026: 💤 Low value

Regex correctly detects git commit across documented patterns; two edge cases identified for future-proofing.

The regex reliably handles the three patterns documented in the code — bare git commit, git -C <path> commit, git --git-dir=<path> commit — and chained variants like git status && git commit, while the trailing (?=\s|$) lookahead correctly excludes plumbing such as git commit-tree. Two edge cases fall outside the current pattern:

  • (?=\s|$) does not match commands where commit is followed directly by shell punctuation (git commit;, git commit) in a subshell, git commit&& without a space). A lookahead like (?![\w-]) would accept these while still rejecting commit-tree/committer.
  • The flag-arg clause [^\s-]\S* consumes only unquoted tokens, so quoted paths with spaces (git -C "/tmp/has space" commit) will not match.

These gaps are unlikely to occur with the Claude Code CLI patterns currently tested. No changes needed for the current scope.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/run_e2e_flows.py` around lines 1004 - 1026, The _GIT_COMMIT_RE
currently misses two edge cases: commands where "commit" is followed immediately
by shell punctuation (e.g. "git commit;", "git commit)") and flag arguments that
are quoted tokens with spaces (e.g. git -C "/tmp/has space" commit); update the
_GIT_COMMIT_RE pattern used by _command_runs_git_commit to (1) use a lookaround
that forbids word/ hyphen chars after "commit" (e.g. (?![\w-]) or equivalent)
instead of (?=\s|$) so punctuation is accepted, and (2) broaden the flag-arg
clause to accept quoted tokens (e.g. allow "(\"[^\"]+\"|'[^']+'|[^\s-]\S*)" as
the optional flag argument) so quoted paths are matched; keep the rest of the
pattern logic and ensure _command_runs_git_commit still returns
bool(_GIT_COMMIT_RE.search(cmd or "")).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/run_e2e_flows.py`:
- Around line 357-361: The multi-line list comprehension for commit_calls should
be collapsed into a single-line comprehension to satisfy ruff formatting: locate
the comprehension that constructs commit_calls (uses bash_calls and calls
_command_runs_git_commit) and rewrite it so the conditional is on the same line
as the for clause (e.g., commit_calls = [c for c in bash_calls if
_command_runs_git_commit((c.get("input") or {}).get("command", ""))]); apply the
same single-line change to the identical comprehension later in the file (the
one around lines 1049–1053) to fix both ruff failures.

---

Nitpick comments:
In `@tests/e2e/run_e2e_flows.py`:
- Around line 1004-1026: The _GIT_COMMIT_RE currently misses two edge cases:
commands where "commit" is followed immediately by shell punctuation (e.g. "git
commit;", "git commit)") and flag arguments that are quoted tokens with spaces
(e.g. git -C "/tmp/has space" commit); update the _GIT_COMMIT_RE pattern used by
_command_runs_git_commit to (1) use a lookaround that forbids word/ hyphen chars
after "commit" (e.g. (?![\w-]) or equivalent) instead of (?=\s|$) so punctuation
is accepted, and (2) broaden the flag-arg clause to accept quoted tokens (e.g.
allow "(\"[^\"]+\"|'[^']+'|[^\s-]\S*)" as the optional flag argument) so quoted
paths are matched; keep the rest of the pattern logic and ensure
_command_runs_git_commit still returns bool(_GIT_COMMIT_RE.search(cmd or "")).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ada8ff59-745f-468d-8013-07e895ff7e3f

📥 Commits

Reviewing files that changed from the base of the PR and between c2cf52a and c20954e.

📒 Files selected for processing (1)
  • tests/e2e/run_e2e_flows.py

Comment thread tests/e2e/run_e2e_flows.py
Drive-by formatter fix on c20954e — ruff prefers single-line list
comprehensions when they fit within line-length.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jinhongkuan jinhongkuan had a problem deploying to recording-approval May 9, 2026 02:30 — with GitHub Actions Failure
@jinhongkuan jinhongkuan disabled auto-merge May 9, 2026 02:56
@jinhongkuan jinhongkuan merged commit eb8ade8 into main May 9, 2026
6 of 7 checks passed
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
Patch on v0.14.1 — README + SECURITY surface, ingest skill softening,
Flow 3 e2e hardening. See CHANGELOG.md for the full per-section detail.

Pre-release checklist (per docs/RELEASE_EVIDENCE_PROCEDURE.md):
- All PRs in window (BicameralAI#282) have approving CI; review approval pending
- No force-pushes to main since v0.14.1
- CHANGELOG draft ready (this commit)

Tag + GitHub Release to be cut by the maintainer after this PR merges
(triggers .github/workflows/publish.yml on `release:` event).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants