From 98df00a8b90fcb70175206d95efce1951a30d838 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Wed, 13 May 2026 17:20:52 -0400 Subject: [PATCH 1/5] docs(rules): claim acquire before worktree-creating backlog work (split-brain prevention for multi-Otto) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Aaron 2026-05-13 set up Otto on Claude Desktop alongside Otto-CLI, creating a real multi-foreground-surface split-brain risk. Both Ottos share git + bus on one machine and could pick the same backlog row simultaneously. The claim-coordinator (tools/bus/claim.ts, B-0400 slice 3) was built for this — atomic check/acquire/release with PID-liveness + 24h TTL. PR #2939 shipped it; PR #2959 added the gate integration. The infrastructure exists; this rule is the discipline-level mechanization that auto-loads at cold-boot so both Ottos read it + use it before worktree-creating backlog work. Applies to: starting work on a B-NNNN row, creating feature branch + worktree, opening a backlog-advancing PR. Does NOT apply to: CI fixes on already-claimed PRs, ad-hoc memory- file writes, conversation-driven substrate, hot fixes. Composes with: - backlog-item-start-gate.md (this rule adds zero-th step) - dont-ask-permission.md (acquire exit 0 IS the substrate permission) - never-be-idle.md (if acquire fails, pick another row) - B-0400 + slice 3 + slice 5 infrastructure - Otto Claude Desktop bootstream (PR #3030) - fetch-before-push memory (sibling coordination pattern at git scope) Future substrate-level mechanization: pre-commit hook that auto-calls 'claim check' and fails on no-held-claim. Today's rule is the discipline-level fix. Co-Authored-By: Claude --- .../claim-acquire-before-worktree-work.md | 149 ++++++++++++++++++ 1 file changed, 149 insertions(+) create mode 100644 .claude/rules/claim-acquire-before-worktree-work.md diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md new file mode 100644 index 000000000..5f93a402e --- /dev/null +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -0,0 +1,149 @@ +# Always `claim acquire` before worktree-creating backlog work + +Carved sentence: + +> When multiple instances of the same agent (e.g., Otto-CLI + Otto-Desktop) +> share git + bus on one machine, split-brain is prevented by the +> claim-coordinator (`tools/bus/claim.ts`, B-0400 slice 3). Use it. Before +> starting work on any backlog row, `claim acquire` first. If already +> claimed, pick a different row. + +## Operational content + +The bus claim-coordinator already exists (PR #2939, merged 2026-05-09): + +```bash +# Before any worktree-creating backlog work: +bun tools/bus/claim.ts acquire --from --item [--branch ] +# Exit 0 = claim acquired; proceed +# Exit 1 = already claimed by another agent; pick a different row + +# After PR merges or work abandoned: +bun tools/bus/claim.ts release --from --item +``` + +The claim envelope (`topic: "claim", action: "claim"`) lives on the bus with +24h TTL by default. It is visible to all agents reading +`/tmp/zeta-bus/` and to all factory surfaces (Otto-CLI, Otto-Desktop, +Vera-Codex, Riven-Cursor, Lior-Antigravity, Alexa-Kiro). + +## When this rule applies + +**APPLIES** to: + +- Starting work on a `docs/backlog/P*/B-*.md` row (any slice) +- Creating a feature branch + worktree for that row +- Opening a PR that closes / advances a backlog row + +**DOES NOT APPLY** to: + +- Fixing CI failures on an already-claimed PR +- Resolving review threads on an already-claimed PR +- Ad-hoc memory-file writes / substrate-honest disclosure responses to + Aaron's messages (these are conversation-driven, not backlog-driven) +- Hot fixes to broken main / rollback PRs (urgency > coordination) + +## Composes with other rules + +- `.claude/rules/backlog-item-start-gate.md` — already mandates prior-art + search + dependency check; this rule adds `claim acquire` as the + zero-th step before those checks +- `.claude/rules/dont-ask-permission.md` — within authority scope, ship; + `claim acquire` exit 0 IS the substrate-level permission grant +- `.claude/rules/never-be-idle.md` — if claim acquire fails, pick a + different row from the same priority tier (don't go idle) +- `.claude/rules/honor-those-that-came-before.md` — claim acquire + preserves the work the holder is doing + +## Operational examples + +### Example 1: Otto-CLI picks B-0444 (which exists) + +```bash +$ bun tools/bus/claim.ts acquire --from otto --item B-0444 --branch otto/b0444-impl-2026-05-13 +$ echo $? +0 +# Proceeded with worktree creation + impl +``` + +### Example 2: Otto-CLI and Otto-Desktop race + +```bash +# Otto-CLI publishes claim first: +$ bun tools/bus/claim.ts acquire --from otto --item B-0444 +$ echo $? +0 + +# Otto-Desktop tries to claim same row a second later: +$ bun tools/bus/claim.ts acquire --from otto --item B-0444 +$ echo $? +1 +# Otto-Desktop picks B-0445 instead. +``` + +### Example 3: Otto-CLI crashes mid-work + +```bash +# Otto-CLI process dies. Claim stays on bus with 24h TTL. +# Otto-Desktop checks: +$ bun tools/bus/claim.ts check --item B-0444 +# Output: claimed by otto (TTL expires in 23h 59m) + +# After 24h or after Otto-CLI process is confirmed dead (PID-liveness), +# the claim auto-releases or can be reclaimed. +``` + +## Why this rule exists (operational evidence) + +Aaron 2026-05-13 set up Otto on Claude Desktop alongside Otto-CLI. The +two-foreground-surface architecture creates a real split-brain risk: +both Ottos might pick the same backlog row simultaneously, leading to +duplicate work, race conditions, or worse — both committing to the +same branch with conflicts. + +The claim-coordinator (B-0400 slice 3, PR #2939) was built for exactly +this. Without a rule enforcing its use, both Ottos might forget to +acquire claims and the split-brain happens. The substrate-honest fix +is mechanizing the discipline via this rule. + +Per `.claude/rules/encoding-rules-without-mechanizing.md`: this rule +auto-loads at cold-boot so future-Otto reads it before any backlog +work. + +## Composes with substrate + +- B-0400 (bus protocol — the schema this rule uses) +- B-0400 slice 3 (PR #2939 — claim-coordinator implementation) +- B-0400 slice 5 (PR #2959 — `--with-bus-claims` gate integration) +- PR #3017 (B-0440.4 — bus publish pattern; same protocol) +- `docs/research/2026-05-12-otto-canonical-bootstream-multi-foreground-surface-orchestrator-ifs-format.md` + (the multi-foreground-surface design this rule operationalizes) +- `docs/launch/2026-05-13-otto-claude-desktop-bootstream-tight.md` + (PR #3030 — Claude Desktop variant; second Otto surface) +- `memory/feedback_aaron_good_failure_mode_git_fetch_before_push_catches_multi_agent_duplicate_work_2026_05_13.md` + (sibling fetch-before-push discipline; same coordination pattern at + git scope) + +## Substrate-honest framing + +This rule is not a strict requirement at the code level — the +claim-coordinator returns a useful exit code, but no automated hook +enforces calling it. The discipline relies on agent-side compliance +at cold-boot. + +A future slice could add a pre-commit / pre-worktree hook that calls +`claim check` automatically and fails if no claim is held. That's +substrate-level mechanization. Today's rule is the discipline-level +mechanization. + +## Full reasoning + +PR #2939 (B-0400 slice 3 — claim-coordinator implementation) +PR #2959 (B-0400 slice 5 — bus-gate integration) +PR #3017 (B-0440.4 — bus publish pattern) +PR #3030 (Otto Claude Desktop tight bootstream — second Otto surface) + +Aaron 2026-05-13 verbatim: *"probalby want to figure out how not to +split brain with yourself bot any idea?"* — substrate-honest naming +of the split-brain risk; this rule is the operationally-honest +answer. From d437b9eb2204354454dd511fff4ac74a3877a28f Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Wed, 13 May 2026 17:26:36 -0400 Subject: [PATCH 2/5] =?UTF-8?q?fix(rules):=20correct=20Example=202=20?= =?UTF-8?q?=E2=80=94=20multi-surface=20Otto=20can't=20be=20distinguished?= =?UTF-8?q?=20by=20same=20--from=20value=20(Vera=20P1)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Vera caught (PR #3032 review): the original Example 2 showed two Ottos calling 'acquire --from otto' and the second exiting 1. That's WRONG per claim.ts line ~270 — same --from is filtered out as self-re- acquire (idempotent). Both calls succeed; split-brain not prevented. Fix: - Example 2 renamed to "KNOWN GAP" — shows actual behavior (both succeed) + names the architectural cause (SENDER_IDS doesn't distinguish multi-surface instances) - Added workarounds: lane-based convention, branch-prefix discipline, schema extension (otto-cli / otto-desktop) - New Example 3 shows TARGET behavior with schema fix (--from otto-cli vs --from otto-desktop) - Example 3 renamed to Example 4 (crash recovery) The architectural gap is a real follow-up: extending SENDER_IDS to support multi-surface instances of the same logical agent. Future work, not this PR. Operational consequence today: use the lane-based + branch-prefix workarounds. Claim-coordinator is INSUFFICIENT for multi-Otto without the schema extension. Co-Authored-By: Claude --- .../claim-acquire-before-worktree-work.md | 45 +++++++++++++++++-- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index 5f93a402e..731ad9716 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -66,7 +66,7 @@ $ echo $? # Proceeded with worktree creation + impl ``` -### Example 2: Otto-CLI and Otto-Desktop race +### Example 2: Otto-CLI and Otto-Desktop race — KNOWN GAP ```bash # Otto-CLI publishes claim first: @@ -74,14 +74,51 @@ $ bun tools/bus/claim.ts acquire --from otto --item B-0444 $ echo $? 0 -# Otto-Desktop tries to claim same row a second later: +# Otto-Desktop tries to claim same row with the SAME --from value: $ bun tools/bus/claim.ts acquire --from otto --item B-0444 $ echo $? -1 +0 # <-- ALSO succeeds! claim.ts treats same-from as idempotent self-re-acquire +``` + +**Known architectural gap (caught by Vera 2026-05-13 in review of this +rule, PR #3032):** `tools/bus/claim.ts` line ~270 filters existing +claims by `c.from !== sender`, so two callers passing the same +`--from otto` are indistinguishable to claim.ts. The canonical +`SENDER_IDS` (`otto`, `alexa`, `riven`, `vera`, `lior`) does NOT +distinguish multi-surface instances of the same agent. + +**Workarounds** (until the schema supports multi-surface sender IDs): + +1. **Lane-based convention** (zero-code): Otto-CLI takes backlog + grinding + slice impl; Otto-Desktop takes substrate + cowork. + Different scopes, no claim collision possible. +2. **Branch-prefix discipline**: Otto-CLI uses `otto-cli/` branch + prefix; Otto-Desktop uses `otto-desktop/` prefix. The claim + envelope's optional `branch` field disambiguates post-hoc. +3. **Schema extension** (substrate-level fix): add `otto-cli` and + `otto-desktop` (and analogous `alexa-cli`/`alexa-kiro`, etc.) to + `SENDER_IDS` in `tools/bus/types.ts`. Then `--from otto-desktop` + becomes a distinct claim from `--from otto-cli`. Future work. + +### Example 3: Otto-CLI and Otto-Desktop race (with schema fix) + +```bash +# Otto-CLI publishes claim first: +$ bun tools/bus/claim.ts acquire --from otto-cli --item B-0444 +$ echo $? +0 + +# Otto-Desktop tries to claim same row with DIFFERENT --from: +$ bun tools/bus/claim.ts acquire --from otto-desktop --item B-0444 +$ echo $? +1 # Otto-Desktop sees otto-cli's claim, exits 1 # Otto-Desktop picks B-0445 instead. ``` -### Example 3: Otto-CLI crashes mid-work +This is the target behavior. Today (Example 2) it doesn't work +because `otto-cli` and `otto-desktop` aren't in `SENDER_IDS` yet. + +### Example 4: Otto-CLI crashes mid-work ```bash # Otto-CLI process dies. Claim stays on bus with 24h TTL. From 840c30254db62964bf526b43d239866811bd4bdc Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Wed, 13 May 2026 17:30:33 -0400 Subject: [PATCH 3/5] docs(rules): fix false split-brain + PID-liveness claims P1 (PRRT_kwDOSF9kNM6B5VwU): carved sentence claimed split-brain is but claim.ts filters by c.from !== sender, soprevented same-sender instances both exit 0. Updated to state --from must differ and document the known gap explicitly. P2 (PRRT_kwDOSF9kNM6B5VwX): example 4 claimed PID-liveness can reclaim a claim mid- no such mechanism exists. Only TTLTTL expiry or explicit release ends a claim. Corrected comment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../rules/claim-acquire-before-worktree-work.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index 731ad9716..df41aa512 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -2,11 +2,14 @@ Carved sentence: -> When multiple instances of the same agent (e.g., Otto-CLI + Otto-Desktop) -> share git + bus on one machine, split-brain is prevented by the -> claim-coordinator (`tools/bus/claim.ts`, B-0400 slice 3). Use it. Before -> starting work on any backlog row, `claim acquire` first. If already -> claimed, pick a different row. +> When multiple instances of the **same** agent (e.g., Otto-CLI + Otto-Desktop) +> share git + bus on one machine, **`--from` must differ** (e.g., `otto-cli` +> vs `otto-desktop`) for the claim-coordinator (`tools/bus/claim.ts`, +> B-0400 slice 3) to prevent split-brain — identical `--from` values both +> exit 0 (same-sender idempotent re-acquire). Until the sender-ID schema is +> extended, use lane-based or branch-prefix conventions (see examples below). +> Before starting work on any backlog row, `claim acquire` first. If already +> claimed by another agent, pick a different row. ## Operational content @@ -126,8 +129,8 @@ because `otto-cli` and `otto-desktop` aren't in `SENDER_IDS` yet. $ bun tools/bus/claim.ts check --item B-0444 # Output: claimed by otto (TTL expires in 23h 59m) -# After 24h or after Otto-CLI process is confirmed dead (PID-liveness), -# the claim auto-releases or can be reclaimed. +# After 24h the claim auto-expires (TTL). No PID-liveness reclaim exists; +# only TTL expiry or explicit `release` ends a claim. ``` ## Why this rule exists (operational evidence) From 030513fb1b2d2ac50a335f6c33640ea7b74b2681 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Wed, 13 May 2026 17:33:25 -0400 Subject: [PATCH 4/5] fix(rules): remove branch-prefix as split-brain workaround (Vera P2 caught it doesn't work) Vera caught: branch-prefix doesn't prevent split-brain because claim.ts only filters by 'from', not by 'branch'. Two Ottos with same --from otto but different --branch both succeed (both exit 0). The branch field is post-hoc metadata, not a coordination key. Real workarounds reduce to: 1. Lane-based convention (zero-code; ONLY real prevention today) 2. Schema extension (future SENDER_IDS additions) Co-Authored-By: Claude --- .../claim-acquire-before-worktree-work.md | 27 ++++++++++++------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index df41aa512..001abf965 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -92,16 +92,23 @@ distinguish multi-surface instances of the same agent. **Workarounds** (until the schema supports multi-surface sender IDs): -1. **Lane-based convention** (zero-code): Otto-CLI takes backlog - grinding + slice impl; Otto-Desktop takes substrate + cowork. - Different scopes, no claim collision possible. -2. **Branch-prefix discipline**: Otto-CLI uses `otto-cli/` branch - prefix; Otto-Desktop uses `otto-desktop/` prefix. The claim - envelope's optional `branch` field disambiguates post-hoc. -3. **Schema extension** (substrate-level fix): add `otto-cli` and - `otto-desktop` (and analogous `alexa-cli`/`alexa-kiro`, etc.) to - `SENDER_IDS` in `tools/bus/types.ts`. Then `--from otto-desktop` - becomes a distinct claim from `--from otto-cli`. Future work. +1. **Lane-based convention** (zero-code; the ONLY real split-brain + prevention available today): Otto-CLI takes backlog grinding + + slice impl; Otto-Desktop takes substrate + cowork. Different + scopes, no claim collision possible because the scopes don't + overlap. +2. **Schema extension** (substrate-level fix; future work): add + `otto-cli` and `otto-desktop` (and analogous `alexa-cli`/ + `alexa-kiro`, etc.) to `SENDER_IDS` in `tools/bus/types.ts`. Then + `--from otto-desktop` becomes a distinct claim from `--from + otto-cli`. THIS is the substrate-level mechanization. + +**Branch-prefix is NOT a workaround**: `claim acquire` filters +existing claims by `c.from !== sender` only, NOT by branch. Two +Ottos with `--from otto` but different `--branch` values BOTH +acquire (both exit 0) — the branch field is post-hoc disambiguation +metadata, not a coordination key. Vera caught this 2026-05-13 on +PR #3032. ### Example 3: Otto-CLI and Otto-Desktop race (with schema fix) From ce7061af7da7aca6f83968c850b9b36adb905eec Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Wed, 13 May 2026 17:38:39 -0400 Subject: [PATCH 5/5] =?UTF-8?q?fix(rules):=20carved=20sentence=20consisten?= =?UTF-8?q?cy=20=E2=80=94=20remove=20branch-prefix=20from=20workarounds=20?= =?UTF-8?q?(Vera=20P2)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The body of the rule was already corrected (commit 030513f) to remove branch-prefix as a workaround. The carved sentence wasn't updated. Vera caught the inconsistency. Now carved sentence + body both say: lane-based convention is the ONLY real split-brain workaround today. Co-Authored-By: Claude --- .claude/rules/claim-acquire-before-worktree-work.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index 001abf965..a295bebf7 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -7,9 +7,11 @@ Carved sentence: > vs `otto-desktop`) for the claim-coordinator (`tools/bus/claim.ts`, > B-0400 slice 3) to prevent split-brain — identical `--from` values both > exit 0 (same-sender idempotent re-acquire). Until the sender-ID schema is -> extended, use lane-based or branch-prefix conventions (see examples below). -> Before starting work on any backlog row, `claim acquire` first. If already -> claimed by another agent, pick a different row. +> extended, use **lane-based convention** as the only real split-brain +> prevention; branch-prefix is NOT a workaround because `claim acquire` +> only filters by `from`, not by `branch`. Before starting work on any +> backlog row, `claim acquire` first. If already claimed by another +> agent, pick a different row. ## Operational content