triage: dev → main · v0.14.3 (#277 team-mode + #280 grounding precision)#290
Conversation
Visible changes when a user lands on the README: 1. Hero image (`assets/bicameral-hero.png`) at the very top — visual without/with comparison from the landing-page asset bundle. 2. Quickstart immediately after the one-line value prop (was buried below "The Problem" and "How It Feels"). User goes from "what is this" to "type these three lines" without scrolling. 3. Compliance section trimmed from a 12-line policy-file enumeration near the top to a 5-line "we take privacy seriously" paragraph at the bottom. The full posture stays linkable via docs/policies/. 4. pipx dropped from the install path. The two paths are now uv (recommended) and plain pip (fallback). uv was already preferred per #199's 3-path resolve order; pipx was middle ground that doesn't pull its weight in a top-level README. Section order: Hero → title → 1-liner → Quickstart → How It Feels → The Problem → Core Concepts → What setup installs → Slash Commands → MCP Tools Reference → Configuration → Local Development → Telemetry → Contributing → Privacy & Compliance → License 312 lines (was 376). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27 in-progress planning artifacts (`plan-114-*.md`, `plan-codegenome-*.md`, `plan-A-*.md`, etc.) were tracked in the repo. They're working memory between author and reviewers during a feature; once the feature merges, the PR description + CHANGELOG carry the durable record. Keeping these in the public-ish repo: - adds 27 markdown files to the wheel's source-distribution surface for no end-user benefit - couples planning vocabulary to release artifacts (e.g. plan-codegenome files describe v1 work that #246 reverted; the plan stays useful as reference but doesn't belong on `main`) - creates churn pressure to mark plans as "done" or "superseded" instead of just letting them rest as the author's working notes This commit: 1. Adds `plan-*.md` pattern to `.gitignore` with a one-paragraph comment on the policy. 2. `git rm --cached` on all 27 currently-tracked `plan-*.md` files — they remain on local disk for the author's reference, just no longer tracked. After merge, anyone with a checkout will keep their local `plan-*.md` files; new plans drafted in-tree will be untracked by default. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a top-level SECURITY.md so GitHub auto-creates a "Security" tab in the repo nav bar — the closest GitHub-native surface to a "button that pulls up our SBOM/privacy statement." Contents: - **Privacy posture** — explicit "all data on your laptop" + telemetry opt-out + pointer to docs/policies/ for the full posture - **Software supply chain** — table of signed artifacts on each release (CycloneDX SBOM, Rekor attestation, hooks-manifest sigs, skills-manifest sigs, release-tag-commit sig); pointer to the release-evidence verification procedure; mention of GitHub's auto-SPDX SBOM under Insights → Dependency graph - **Supported versions** — only latest minor, ~30-day backport window for critical fixes - **Reporting a vulnerability** — GitHub Security Advisories preferred; jin@bicameral-ai.com fallback with `[security]` subject prefix; 3-day ack target, 30-day patch target for critical - **Scope** — what's in (server, skills, hooks, release pipeline) and what's out (third-party deps, host vulns, local-attack scenarios already covered by host-trust model) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sections removed (PM/dev evaluating Bicameral wants the test-out path, not the why/how-internals): - "The Problem" (long context narrative) - "Core Concepts" (two-axis model, link_commit, collab modes) - "Removing Bicameral" (post-test concern) - "Configuration" env-var table - "Local Development" - "Contributing" - "Telemetry" subsection (folded into Privacy & Compliance one-liner) What stays — the path from "land on the page" to "type three commands and see something work": - Hero image - Star-on-GitHub CTA (animated SVG, adapted from cocoindex's design with metadata strings updated; visual mechanism is generic) - Logo (small, inline-right of title) - Title + badges + 1-liner - Quickstart (uv | pip | Windows) - How It Feels (preflight render + dashboard screenshot) - Slash Commands - What `setup` writes (trust signal — what hits disk) - MCP Tools Reference (collapsed by default) - Privacy & Compliance (concise) - License 312 → 152 lines. Visuals from landing-page borrowed: assets/logo.png (was landing-page/logo.png), assets/bicameral-hero.png from landing-page/output/imagegen/. Star CTA SVG saved as assets/star-on-github.svg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
docs(README): restructure for getting-started clarity + add hero image
… attr (#272) Closes 2 of 3 dev-baseline regressions tracked in #272 (the third — Flow 1 e2e expectation post-#263 auto-bind — is deferred for product-level discussion since it touches the ingest skill choreography). ## test-summary/action SHA-pin The `test-summary/action@v2` mutable tag was repointed on 2026-05-07 22:09 UTC from a `dist`-targeted release (with bundled `index.js`) to the new v2.5 release that targets `main` (no bundled output). Every PR opened after that point fails the Test Summary step with `File not found: index.js`, marking MCP Regression Suite jobs red on both ubuntu and windows. Pin to v2.4 commit `31493c76ec9e7aa675f1585d3ed6f1da69269a86` (the last `dist`-targeted release with the bundled artifact). Aligns with OWASP-03 / `docs/policies/install-trust-model.md` discipline (do not trust mutable action tags for security-load-bearing CI). ## tests/eval_decision_relevance.py:166 attribute rename `IngestResponse` schema field is `pending_grounding_decisions` per `contracts.py:571` and `handlers/ingest.py:695`. The eval script was still reading the legacy internal name `ungrounded_decisions` directly off the response, raising `AttributeError` and failing the M1 adversarial corpus eval step (technically `continue-on-error: true`, but worth fixing independently since the eval result was lost on every run). Keeps the JSON-output key `ungrounded_decisions` unchanged so downstream consumers (M1 trend reports, etc.) see no change. Both fixes verified locally: yaml.safe_load on the workflow + IngestResponse field introspection confirms the schema. Refs #272.
…in-and-eval-attribute fix(ci): SHA-pin test-summary action + rename eval_decision_relevance attr (#272)
…Fix 3) Closes the third regression deferred from PR #273. Post-#263 (sync auto-bind step 1.5), Flow 1's e2e exhausted budget on ~41 non-bicameral Read/Grep/validate_symbols calls before reaching the ingest skill's auto-fired ratify AskUserQuestion gate, so the agent never invoked ratify. Per #108 Flow 1 spec discussion: this is both a regression AND a canonical flow change. Auto-bind stays (deterministic, useful), but ratify drops out of the auto-prompt path and becomes advisory text. Ratification belongs to Flow 5 (PM Friday review) and to direct user requests like "sign these off" / "ratify all". Skill changes (skills/bicameral-ingest/SKILL.md Step 7): - Replace AskUserQuestion ratify gate with a one-block advisory: "○ N decisions captured as proposals — drift tracking activates after ratification. Run `bicameral.ratify` when ready, or revisit them in your next history review (Flow 5)." - Add explicit "Direct user request shortcut": if the prompt asked for sign-off in the same turn, ratify directly without the round-trip. - Update the example at the bottom to match. Test changes: - tests/e2e/prompts/flow-1-ingest.md: drop "sign them off on our end" so the prompt exercises the new advisory path. Direct sign-off requests are covered by Flow 5 (history + ratify). - tests/e2e/run_e2e_flows.py:assert_flow_1: drop the ratify requirement; accept [ingest, bind?] as the canonical Flow 1 signature. Update Flow 5's stale comment about Flow 1 pre-ratifying. - tests/test_e2e_asserters.py: invert test_flow1_fails_without_ratify → test_flow1_passes_without_ratify (advisory-only is now the expected behavior). Doesn't touch sync skill #263 — that auto-bind change is preserved intentionally; this PR fixes the test/spec contract that conflicted with it. Refs #272 #108 #263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ngest The previous fix dropped "sign them off on our end" to defeat the ratify auto-call; in doing so, it also removed the bicameral disambiguator. The remaining "please log these on our end" landed inside the auto-memory trigger surface (the runner's ~/.claude/projects/.../memory/ directory), so the agent wrote four memory files instead of invoking bicameral.ingest — Flow 1 went 0/0 on bicameral calls and the cascading flows (3, 5) then had nothing to assert against. Replace "log these on our end" with "log these to our decision ledger" — same intent, but "decision ledger" is an unambiguous bicameral signal the auto-memory skill does not match. Still no "sign them off" / "ratify" phrasing, so the new advisory-only Step 7 contract is preserved. Refs #272 #108. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The no-ratify success branch in assert_flow_5 still claimed "Flow 1 ratified its 3 seeds" — but per #272 Fix 3, Flow 1 now leaves seeds as `proposed` (advisory-only ingest). The docstring was updated in 2252c82; this catches the trailing f-string that was missed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(skill): ingest advises on ratify instead of auto-prompting (#272 Fix 3)
… M2 grounding precision)
The silent-corruption surface for M2 grounding precision was one branch
in handlers/bind.py: when a caller supplied start_line/end_line alongside
symbol_name, the handler verified only that the file existed at the SHA
and accepted any symbol_name — letting agents write binds_to edges to
plausible-looking but wrong symbols whenever they hallucinated a real
file with a fake symbol. Branch A (no lines) already ran tree-sitter and
rejected on miss; Branch B was the asymmetric escape hatch.
Branch B now also calls resolve_symbol_lines and rejects two cases:
1. symbol_name doesn't resolve at all
→ "symbol '<name>' not found in <file> at <sha> — caller-supplied
line range cannot bypass symbol verification (#280)"
2. symbol resolves but caller-supplied span doesn't overlap the
resolved span
→ "symbol '<name>' resolves at lines <a>-<b> but caller supplied
<x>-<y> — span mismatch (#280)"
Overlap (not exact equality) is the matching rule via the new
_spans_overlap helper, so legitimate sub-region binds (e.g. pinning a
specific clause inside a larger function body) stay accepted; only
hallucinated ranges with no shared lines are rejected.
Skill catalog reorganization (per CLAUDE.md skill-mandate):
- New skills/bicameral-bind/SKILL.md extracts the bind contract out
of skills/bicameral-ingest/SKILL.md §2 and tightens advisory rules
to mandatory: Read at least one candidate file end-to-end, confirm
symbol via validate_symbols, abort on weak evidence. Documents the
handler-side rejection contract for agent visibility.
- skills/bicameral-ingest/SKILL.md §2 reduced from ~38 inline lines
to a 16-line pointer at the new bind skill — keeps ingest focused
on extraction + filtering and matches the rest of the catalog
(one tool ↔ one skill).
Stale ground_mappings() refs cleaned up:
- code_locator/tools/validate_symbols.py: dropped self._db field +
L40-41 retention comment (referenced a v0.6.0-deleted path; field
had zero readers).
- tests/eval_decision_relevance.py:73 docstring updated to describe
post-v0.6.0 caller-LLM grounding pipeline.
Tests:
- 3 existing tests (test_bind_success_with_explicit_lines,
test_bind_idempotent, test_bind_status_transition) gain a
resolve_symbol_lines mock since Branch B now exercises it.
- 2 new tests (test_bind_branch_b_rejects_nonexistent_symbol,
test_bind_branch_b_rejects_span_mismatch) cover the rejection paths.
- _spans_overlap helper smoke-tested locally across 8 boundary cases.
PR-1 of 3. Synthetic-recall eval (PR-2) and m2_grounding_* telemetry +
dashboard (PR-3) follow per plan-280-grounding-precision-fix.md.
Refs #280.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(bind): reject caller-supplied lines that hallucinate symbols (#280 PR-1)
…#280 PR-2) Synthetic-fixture benchmark that drives the bicameral-bind skill end-to-end against 23 cases across three failure modes — same-name-different-module, similar-intent-different-symbol, and cross-language. Measures three axes deliberately split for diagnosis: - precision = correct / (correct + wrong_symbol + wrong_file) - recall = correct / total_rows - abort_rate = aborted / total_rows The split matters: high-precision-low-recall = agent over-cautious; low- precision-high-recall = hallucinations the #280 PR-1 handler would now reject (handler_rejected outcome would surface as precision drag). Files tests/fixtures/grounding_recall/dataset.py 230 LOC 23 GroundingCase rows: 5 case-A (process_order × 3 modules, cancel_order × 2 modules), 10 case-B (rate-limit/throttle/retry/ auth/metrics intent disambiguation), 8 case-C (Python ↔ TS pairs). GENERATOR_VERSION constant invalidates the cache when bumped. Import-time _validate_dataset() fails loud on duplicate ids, invalid case_type, distractor === intended, etc. tests/fixtures/grounding_recall/repo/ 15 files / ~625 LOC Hand-crafted fixture repo with intended + distractor symbols. Each function/class body is short but real enough that the agent can actually distinguish behavior from keyword overlap (e.g. checkout/orders.py:process_order = customer flow w/ retry cap; admin/orders.py:process_order = manual replay of finance-flagged orders; billing/refunds.py:process_order = bulk-refund pipeline). tests/eval/_bind_judge.py 466 LOC Headless caller-LLM driver — modeled on tests/eval/_skill_judge.py. Multi-turn tool-use loop with 3 tools exposed: read_file, validate_symbols, submit_binding. Cap at 8 turns. Cache at tests/eval/fixtures/bind_judge/ keyed on SHA(model | bind_skill | repo | decision). Cache hits keep CI cost ~$0 unless dataset, fixture repo, or skill change. tests/eval_grounding_recall.py 256 LOC Argparse runner — modeled on tests/eval_decision_relevance.py. Loads dataset, drives _bind_judge per case, classifies outcome (correct / wrong_symbol / wrong_file / aborted), aggregates, emits JSON report, optional gate enforcement (--gate-mode warn|hard). .github/workflows/test-mcp-regression.yml +19 LOC New "M2 grounding-recall eval (warn-only)" step. Ubuntu-only, continue-on-error: true, mirrors the M1 step shape. ANTHROPIC_API_KEY from secrets, model env var, output to test-results/m2-grounding-recall.json. CHANGELOG.md +2 lines Default gates per #280 acceptance: recall ≥ 0.80, precision ≥ 0.85, abort_rate ≤ 0.30. Ship warn-only first to record a post-PR-1 baseline, then ratchet to --gate-mode hard once the signal is stable. Same path the M1 eval has been on. Out of scope for PR-2 (per plan-280-grounding-precision-fix.md): - PR-3 ships PostHog m2_grounding_* events + dashboard panel - Friction capture (≥ 5 design-partner cases) is not engineering scope Local verification - dataset.py imports clean (23 cases, _validate_dataset() passes) - _bind_judge symbol indexer resolves all 11 spot-checked intended symbols including Class.method form - eval_grounding_recall.py CLI runs offline with --skip-missing-fixtures (0 cases, gate breaches reported, exit 0 in warn mode as designed) Refs #280. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five lint-side findings on the initial PR-2 commit, none of them
runtime — fixing in place rather than amending the prior commit:
- tests/eval/_bind_judge.py B007: add `# noqa: B007` to the
`for turn in range(...)` loop. The loop variable IS used after
the loop for telemetry (judgment_payload["turns"] = turn);
suppression is more honest than renaming to `_turn` and losing
the post-loop reference.
- tests/eval/_bind_judge.py mypy: type-annotate `chosen_model: str`
and tighten the `os.getenv` fallback chain so mypy can resolve
`str | None` → `str`. Construct BindJudgment field-by-field
instead of `**judgment_payload` so the dataclass field types
are enforced (3× errors in the cached + write paths).
- tests/eval_grounding_recall.py I001 + E402: per-line
`# noqa: E402, I001` on the two local imports that must follow
the sys.path inserts. Same shape `eval_decision_relevance.py`
uses for its single post-path import.
- tests/eval_grounding_recall.py F541: drop the f-prefix on
`print(f" ✓ all gates pass")` (no placeholders).
- tests/fixtures/grounding_recall/repo/src/checkout/orders.py B007:
rename `for attempt in range(3):` → `for _attempt in range(3):`
(loop body doesn't reference the counter).
Plus `ruff format` reflowed 4 files (line wrapping, parens, exponent
spacing) — no semantic changes.
Local verification: ruff check + ruff format --check + mypy all
green on the PR-2 surface (15 fixture files + 2 eval files).
Refs #280.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(eval): M2 grounding-recall harness for caller-LLM bind precision (#280 PR-2)
…PR-3) Three PostHog events now emit from the bind / ratification surfaces, plus a local mirror + dashboard panel that reads from it. Closes the last engineering piece of #280 (PR-3 of 3). Events ------ m2_grounding_attempt Fires per `handle_bind` per binding. Carries: - decision_source (controlled enum: transcript/spec/chat/manual/document) - diagnostic.success: bool — bound a region cleanly - diagnostic.handler_rejected: bool — true when #280 PR-1's reject path fired (caller hallucinated a wrong/non-existent symbol on a real file). The split between {success=False, handler_rejected=True} and {success=False, handler_rejected=False} tells operators whether the failure was the failsafe doing its job vs a ledger / IO bug. m2_grounding_ratified_correct (verdict == "compliant") m2_grounding_ratified_incorrect (verdict ∈ {"drifted", "not_relevant"}) Fire per accepted verdict in `handle_resolve_compliance`. Carry: - decision_source (same controlled enum) - diagnostic.confidence: int (low=0, medium=1, high=2) Privacy ------- The relay contract from telemetry.py:14-37 is non-negotiable: numeric/ bool diagnostics only, no decision_id / file_path / symbol_name. The new m2_grounding_log.py owns the split: - JSONL local mirror at ~/.bicameral/m2_grounding.jsonl (10 MB rotation, 3 backups) carries decision_id for the dashboard panel's drill-down. Always written, regardless of relay consent. - PostHog relay sees only decision_source + numeric diagnostics — decision_id never crosses that boundary. A unit test (test_decision_id_never_relayed_to_posthog) pins this invariant. Files ----- m2_grounding_log.py (new, 241 LOC) Owner of the M2 event contract. record_attempt(), record_ratification(), read_recent_events(). Lazy-imports server + telemetry to break the handlers→server circular dependency at server-boot time. Test hook via BICAMERAL_M2_LOG_PATH env override (matches preflight_telemetry pattern). handlers/bind.py (+73) _emit_m2_attempt() helper at module scope. Wired to all five terminal paths in the per-binding loop where a decision_id is valid: Branch A symbol-not-found, Branch B file-not-found, the two #280 PR-1 reject paths, the bind_decision exception path, and the success path. API-misuse paths (empty/unknown decision_id) skip emission to keep the metric meaningful. handlers/resolve_compliance.py (+40) _emit_m2_ratification() helper, called per accepted verdict. Wraps record_ratification() in try/except so a telemetry failure never breaks the verdict write. ledger/queries.py (+19) New get_decision_source() — single-field SELECT, returns the decision's source_type (controlled enum from the ingest contract). ledger/adapter.py (+10) Adapter delegation method. dashboard/server.py (+59) New GET /m2_grounding endpoint — aggregates the local mirror into rolling-7d per-source counts (attempts / rejects / ratified ✓ / ratified ✕) and computes precision. Read-only, no ledger I/O. assets/dashboard.html (+60) New "M2 grounding precision" panel below the main ledger view. Color-codes precision per source: green ≥ 85%, amber ≥ 70%, red below. Refreshes every 30s. CHANGELOG.md (+2) Unreleased entry covering all three events + the local mirror contract. Tests ----- tests/test_m2_grounding_log.py (9 tests, all green) Pure unit tests — no ledger dep. Cover JSONL row shape, verdict classification, time-window filtering, and the privacy invariant (decision_id never reaches the relay). tests/test_bind_m2_telemetry.py (4 tests + 3 skip-on-no-surrealdb) Helper-level: emit forwards args correctly, skips on empty decision_id, swallows telemetry failures fire-and-forget. Resolve-compliance verdict classification covered behind `pytest.importorskip("surrealdb")` since the handler module imports ledger.queries at top level — runs in CI, skipped local. Local verification ------------------ - 12 passed, 3 skipped on tests/test_m2_grounding_log.py + tests/test_bind_m2_telemetry.py - ruff check + ruff format --check + mypy all green on touched files (m2_grounding_log.py, handlers/bind.py, handlers/resolve_compliance.py, ledger/queries.py, ledger/adapter.py, dashboard/server.py, both new test files) What's NOT in this PR --------------------- Per plan-280-grounding-precision-fix.md: - Friction capture (≥ 5 design-partner cases) — design-partner work, not engineering scope. - PR-2 gate-flip (warn → hard) — separate small follow-up after PR-3 lands and we have a baseline reading. Aligns with Jin's "deliberate not drift" framing. - attempt_to_ratify_seconds field — deferred. Would need a `created_at` field on the binds_to edge (schema currently has only `confidence` + `provenance`); not worth a schema bump in this PR. Closes #280. Refs #280. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CI instead (per Jin)
Jin clarified the operator-dashboard scope: it's for users. M2 grounding
precision is an engineering quality metric, not user-facing. Reverting
the dashboard pieces; adding GitHub Actions step-summary surfacing
which is where engineers actually look for these numbers.
Reverted from PR-3's initial shape
----------------------------------
assets/dashboard.html
- Drop the <section id="m2-panel"> block + the renderM2 / loadM2 /
setInterval JS. Dashboard returns to pre-#280 user view.
dashboard/server.py
- Drop the GET /m2_grounding route + _serve_m2_grounding handler.
m2_grounding_log.py
- Drop read_recent_events() (only consumer was _serve_m2_grounding;
now dead code per Jin's "avoid bloat unless product-justified").
- Drop now-unused `time` import.
tests/test_m2_grounding_log.py
- Drop test_read_recent_events_respects_window (function gone) and
now-unused `os` import.
Added (the new piece)
---------------------
tests/eval_grounding_recall_summary.py (new)
Renders the PR-2 eval JSON (test-results/m2-grounding-recall.json)
as a markdown block — precision / recall / abort-rate scoreboard,
outcome breakdown, per-case-type recall table, gate-breach line,
expandable miss-list capped at 25 rows. Fail-quiet: missing/malformed
JSON degrades to a one-line note rather than failing CI.
.github/workflows/test-mcp-regression.yml (+10)
New "M2 metrics summary" step after the M2 eval. Pipes the
renderer's stdout to $GITHUB_STEP_SUMMARY so the metrics show on
the GitHub Actions run page without needing the artifact download.
always() guard so the summary appears even when the eval step
above warns. continue-on-error keeps it advisory.
Kept from PR-3's initial shape
------------------------------
- The three PostHog events from handle_bind / handle_resolve_compliance.
- The privacy-preserving local mirror at ~/.bicameral/m2_grounding.jsonl
(operator support + diagnose CLI surface; never relayed).
- The m2_grounding_log.py module's record_attempt / record_ratification
public API.
- All telemetry tests (privacy invariant pin still holds).
Net Δ on PR-3: -119 LOC dashboard pieces, +210 LOC summary renderer
+ workflow step. Tests: 11 passed, 3 skipped (resolve_compliance
import-or-skip). Ruff + ruff format + mypy all green.
Refs #280.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous revert left an extra blank line where the `<section id="m2-panel">` block lived. Removes it so assets/dashboard.html is byte-identical to origin/dev — confirming Jin's "don't change the user dashboard" intent verbatim. Refs #280. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(telemetry): M2 grounding-precision events + dashboard panel (#280 PR-3)
…nds (#277) Closes #277. Implements v0 Productization §2: shifts team mode entirely off git as the inter-machine replication substrate, onto a pluggable backend with two ship-day implementations (LocalFolder, GoogleDrive). Pull-only sync; no daemons, no webhooks, no Bicameral server in the loop. What changes for users - Setup wizard team-mode branch now offers Create vs Join vs LocalFolder. Create: provisions a Drive folder under the operator's Google account, prints the literal share-text-to-teammates message. Join: paste folder ID/URL, OAuth, verify access (404 / read-only both block), confirm the resolved signer (default-No) before persisting. LocalFolder: single prompt for the path. - Drive integration uses Bicameral's bundled OAuth client (the same pattern gh / gcloud / cursor use). Scope: drive.file only — Bicameral's CLI can only see files it creates inside the team folder. Token cache at ~/.bicameral/google-drive-token.json mode 0600. - Colored security disclosure renders before the browser opens, walking the operator through what flows where, what we do and don't see, and the trust dependency. Mirrored on bicameral-ai.com/privacy (BicameralAI/bicameral PR #111). Architecture - events/backends/__init__.py — BackendAdapter ABC + get_backend factory. - events/backends/local_folder.py — sha256-idempotent LocalFolderAdapter. - events/backends/google_drive.py — Drive Files API adapter; bundled client_id + client_secret (RFC 8252 native-app pattern, no env override per Option A); FolderNotFoundError / ReadOnlyAccessError surface for Join verify_access; create_folder helper for Create branch. - events/team_adapter.py — TeamWriteAdapter accepts backend=, marks _dirty on every write, exposes flush_to_backend(). - adapters/ledger.py — _read_collaboration_mode refactored to _read_team_config(repo_path) -> dict; constructs backend and injects into TeamWriteAdapter. - handlers/sync_middleware.py — ensure_team_synced (30 s TTL pull) + flush_team_writes (post-handler push); errors swallowed at DEBUG. - server.py — wires both into the dispatch site (pull at top, flush in finally). - setup_wizard.py — Create/Join/LocalFolder dispatch + colored security disclosure + identity-confirmation prompt at Join time. Testing - 53 new tests, 1 platform-skip (Windows-only path): - LocalFolderAdapter: 6 tests (push idempotency, pull peer-files-only, list_peers, lock serialization) - TeamWriteAdapter ↔ backend: 3 tests (connect-pulls-then-replays, write-marks-dirty-then-flush-pushes, no-backend-noop) - Two-author round-trip: 2 tests - Sync middleware: 5 tests (TTL cache, no-backend-noop, error swallowing) - GoogleDriveAdapter: 11 tests (push idempotency on md5, pull own-file-skip + max-modifiedTime token, lock create-then-delete + cleanup on exception, verify_access 404 / read-only / can-edit, create_folder, placeholder-detection auto-skip when bundled client is published) - Setup wizard Create/Join: 11 tests including identity decline, OAuth-disclosure decline, folder-id URL extraction, unwritable-path rejection - All adjacent regression tests still pass (test_team_event_replay, test_event_writer). - Lint clean across events/ adapters/ handlers/sync_middleware.py setup_wizard.py + new test files. Security model (also documented at docs/team-mode-setup.md and on bicameral-ai.com/privacy) - Decision data flows your-CLI ↔ Google directly. Bicameral the company does NOT receive copies. No Bicameral server in the loop. - drive.file scope limits the CLI on the user's machine to files it creates in the team folder. The rest of the user's Drive is invisible to the CLI; Google enforces this server-side. - As OAuth app publisher, Bicameral receives aggregate API request counts and per-user OAuth consent records (which Google accounts authenticated, when). Not contents. - Trust dependency: same as any OAuth tool (gh, gcloud, Notion, Slack desktop) — open-source CLI behaves as advertised, mitigated by source visibility. OAuth verification submission text + GCP setup checklist: docs/google-oauth-verification-submission.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure formatting — `ruff format` against the 10 files touched in #277. No semantic changes. CI's `ruff format --check .` now passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…endAdapter Mypy was failing on `events/backends/__init__.py:62,66` — the factory's return type is `BackendAdapter | None`, but the two concrete adapters were structurally compatible without declaring inheritance. Added explicit `BackendAdapter` base. Both classes already implemented all four abstract methods (push_events, pull_events, lock, list_peers) — runtime check (issubclass + concrete instantiation) passes. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(team-mode): remote event-log adapter — Drive + LocalFolder backends (#277)
Triages 25 dev commits onto main (already on dev as of merge time): • #289 — team-mode remote event-log adapter (#277) • #285, #284, #283 — M2 grounding telemetry, eval harness, precision fix (#280) • #275 — README/SECURITY surface • plus assorted fixes flowing through dev Resolved conflicts in CHANGELOG.md (kept dev's [Unreleased] block, inserted v0.14.2's release entry from main below it, then renamed [Unreleased] → v0.14.3) and README.md (kept dev's Solo-vs-Team mode section + extended setup-writes table from #289 — main was missing both because PR #289 hadn't backflowed yet). pyproject.toml: 0.14.2 → 0.14.3 RECOMMENDED_VERSION: 0.14.1 → 0.14.3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (54)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…tor subclass shape Mypy was failing on triage PR #290: events/backends/local_folder.py:72: error: Return type "AsyncIterator[str]" of "list_peers" incompatible with return type "Coroutine[Any, Any, AsyncIterator[str]]" in supertype "BackendAdapter" Both concrete adapters implement list_peers as async generators (`async def ... yield`), which return AsyncIterator[str] directly. The ABC's `async def` declaration typed it as Coroutine[..., AsyncIterator[str]] — a different shape. Per mypy docs (more_types.html#asynchronous-iterators), async-iterator methods should be declared `def -> AsyncIterator[T]` in the supertype. Concrete implementations unchanged; tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Triages all 25 commits from `dev` onto `main` — the gap since the v0.14.2 release. Bumps version v0.14.2 → v0.14.3.
Headline shipping
Versions
Conflict resolution notes
Auto-resolved (no manual intervention needed)
`.github/workflows/test-mcp-regression.yml`, `pyproject.toml` (then manually bumped), `skills/bicameral-ingest/SKILL.md`, `tests/e2e/run_e2e_flows.py`, `tests/eval_decision_relevance.py`.
Test plan
What this does NOT include
🤖 Generated with Claude Code