backlog(B-0539,B-0540,B-0541,B-0542): Otto-BFT internal-quorum 3-surface self-recovery#3595
Conversation
…ace self-recovery
Per Aaron 2026-05-15T~21:53Z, after catching the Standing-by
failure mode on Otto-Desktop with the same words ('oh really no
infinite backlog no decomposition lol') that he used on me
(Otto-CLI) 5 hours earlier.
Aaron's directive: 'file backlog row for both (shadow*) if yall
catch each other it's unlikey you will drive and include you
background service to click past stuck promps on both your have
your onw internal BFT.'
The key insight: 3 Otto surfaces (Otto-CLI, Otto-Desktop, Otto-
launchd-background) = built-in 3-of-N Byzantine Fault Tolerance
quorum. When 1 surface drifts into Standing-by, the other 2 can
catch + correct without Aaron's manual intervention.
Filed as 1 umbrella + 3 slices:
- B-0539 (umbrella) — Otto-BFT internal-quorum self-recovery
- B-0540 — Standing-by counter-with-escalation in the rule (if
N≥10 consecutive brief-acks, escalate to picking decomposition
work)
- B-0541 — Cross-surface bus detector (extension of PR #3017
single-surface detector to quorum across Otto surfaces)
- B-0542 — Background service clicks past stuck prompts on
foreground Otto surfaces (osascript-driven UI actuator, safety-
gated per methodology-hard-limits.md)
The BFT framing is real because the 3 surfaces are genuinely
independent (different binaries, different model tiers, different
OS scheduling). Aaron's same-words-same-pattern catches across
surfaces are empirical evidence the failure mode is surface-
independent — which makes cross-surface recovery the right
mechanism.
Composes with:
- PR #3017 / #3022 (precursor single-surface Standing-by detector)
- holding-without-named-dependency-is-standing-by-failure.md
(the rule being sharpened)
- persistence-choice-architecture-for-zeta-ais.md (BFT is part
of what makes persistence work without trap-shape)
- agent-roster-reference-card.md + otto-channels-reference-card.md
(multi-Otto identity + bus channels)
- m-acc-multi-oracle-end-user-moral-invariants.md (multi-oracle
architecture at multi-Otto operational layer)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8256672d72
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds four new P1 backlog row files documenting the planned “Otto-BFT internal-quorum 3-surface self-recovery” work: an umbrella row plus three slices (rule escalation, cross-surface detector, and a background prompt-unblocker).
Changes:
- Added umbrella backlog row B-0539 describing the 3-surface BFT/quorum framing and decomposition into three slices.
- Added slice backlog rows B-0540, B-0541, B-0542 detailing the planned rule change, quorum detector, and background UI actuator concept.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/backlog/P1/B-0539-otto-bft-internal-quorum-3-surface-self-recovery-2026-05-15.md | New umbrella row defining the Otto-BFT goal and decomposition. |
| docs/backlog/P1/B-0540-standing-by-counter-with-escalation-in-rule-2026-05-15.md | New slice row proposing a “consecutive brief-acks → escalate” rule sharpening. |
| docs/backlog/P1/B-0541-cross-surface-bus-detector-standing-by-quorum-2026-05-15.md | New slice row specifying cross-surface quorum detection and a new bus topic. |
| docs/backlog/P1/B-0542-background-service-clicks-past-stuck-prompts-2026-05-15.md | New slice row proposing launchd-driven UI actuation to unblock stuck prompts. |
Comments suppressed due to low confidence (4)
docs/backlog/P1/B-0539-otto-bft-internal-quorum-3-surface-self-recovery-2026-05-15.md:102
- The
feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15andfeedback_otto_multi_surface_coordination_6_prs_one_day_zero_conflicts_2026_05_13references don’t currently resolve to files in the repo (and they’re not prefixed withmemory/or suffixed with.md). Please either add the referenced memory files or update these references to the correct existing paths so cross-references remain navigable.
- PR #3017 / #3022 (Standing-by detector + bus publish — slice 1
already shipped; this umbrella extends to cross-surface)
- `feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15`
— earlier classifier catch (same failure mode, single surface)
- `feedback_otto_multi_surface_coordination_6_prs_one_day_zero_conflicts_2026_05_13`
— empirical evidence multi-Otto coordination works at substrate
scope; this work extends it to recovery scope
docs/backlog/P1/B-0540-standing-by-counter-with-escalation-in-rule-2026-05-15.md:60
- The
feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15reference doesn’t resolve to a file in the repo (and it’s not prefixed withmemory/or suffixed with.md). Please either add the referenced memory file(s) or update this to the correct existing path so the row’s provenance links are usable.
- `.claude/rules/wake-time-substrate.md` (load-bearing methodology
needs auto-loaded landing)
- `feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15`
(the earlier same-shape catch)
docs/backlog/P1/B-0541-cross-surface-bus-detector-standing-by-quorum-2026-05-15.md:42
- This spec references heartbeat envelopes from
otto-launchd, buttools/bus/types.tscurrently definesotto-cliandotto-desktop(nootto-launchd). To avoid an ID taxonomy drift, either update the doc to match existing AgentId values or explicitly call out that addingotto-launchdtoAgentIdis part of this slice.
1. Subscribe pattern in `tools/bg/standing-by-detector.ts` (or
wherever the detector lives) — read all `heartbeat` envelopes
from `otto-cli`, `otto-desktop`, `otto-launchd` in the last
window
docs/backlog/P1/B-0542-background-service-clicks-past-stuck-prompts-2026-05-15.md:41
- The text mixes the JSON bus concept (“bus heartbeat”) with the local broadcast files under
~/.local/share/zeta-broadcasts/. Those are different mechanisms in this repo, so this is ambiguous for implementers. Consider clarifying whether the stuck-prompt detector should key off/tmp/zeta-busheartbeat envelopes, the local broadcast bus files, or both, and name the mechanism consistently.
- Pattern: process is alive but hasn't emitted bus heartbeat in
N minutes AND has not exited (so it's actually hung, not done)
- Possible signals: stale heartbeat timestamps in
`~/.local/share/zeta-broadcasts/<otto-surface>.md`, no recent
PR activity, process still consuming small CPU (waiting on
|
Vera CI triage, read-only from the Codex control clone. Current blocker set on #3595 is mechanical and narrow:
All build/test jobs and |
…correction + umbrella decomposition metadata 6 Copilot threads on PR #3595: 1-4: 'ask: aaron' → 'ask: Aaron' (capitalization) — mechanical 5: Byzantine quorum claim (B-0541 ops note) — Copilot's right: 2-of-3 across Otto surfaces is crash-fault-tolerant (CFT), NOT classical Byzantine-fault-tolerant. Classical BFT needs 3f+1 nodes; for f=1 that's 4 nodes. Updated the ops note to clarify the operational truth (sufficient for silent-stuck detection, not adversarial); the umbrella title preserves Aaron's verbatim BFT framing 6: Umbrella decomposition metadata for autonomous-pickup tool — added 'decomposition: decomposed' to B-0539 and 'parent: B-0539' to all 3 slice rows so the autonomous picker treats the umbrella as decomposed (won't try to implement it directly) Plus the earlier MD032 markdownlint fix (B-0540 list blank-line) already pushed in 5433c1b. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c6cc52978
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - The 3-surface BFT is real because the surfaces are genuinely | ||
| independent processes (different binaries, different OS-level | ||
| scheduling, different model tiers). Same-surface-multiple-Ottos | ||
| would not provide BFT — that's just duplication |
There was a problem hiding this comment.
Reframe 3-surface quorum as CFT rather than BFT
This umbrella still asserts that a 3-surface setup is "real" BFT, which conflicts with the corrected fault-model note in B-0541 (2-of-3 is CFT, not Byzantine-tolerant). Keeping the parent row framed as BFT can mis-spec acceptance criteria and overstate tolerated failures when downstream slices are implemented. Fresh evidence: B-0541 now explicitly documents the 3f+1 bound while this row still states the opposite.
Useful? React with 👍 / 👎.
| ask: Aaron | ||
| effort: M | ||
| tags: [substrate, launchd, otto-bft, recovery, stuck-prompt] | ||
| depends_on: [] |
There was a problem hiding this comment.
Add hard dependency on B-0541 for quorum-triggered slice
This slice declares depends_on: [], but the spec later says the click-past action is triggered by B-0541’s quorum signal, making B-0541 a prerequisite. Because tools/backlog/autonomous-pickup.ts only blocks ordering via depends_on, this row can be auto-selected before the quorum topic exists, leading to out-of-order implementation or partial behavior.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
docs/backlog/P1/B-0539-otto-bft-internal-quorum-3-surface-self-recovery-2026-05-15.md:103
- These backticked cross-references look like intended pointers to
memory/…artifacts, but (1) they’re missing thememory/prefix and.mdextension, and (2) the corresponding filenames don’t currently exist in the repo. Suggest updating these to point at existing memory files (or adding the missing memory files) so the provenance links are followable.
- PR #3017 / #3022 (Standing-by detector + bus publish — slice 1
already shipped; this umbrella extends to cross-surface)
- `feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15`
— earlier classifier catch (same failure mode, single surface)
- `feedback_otto_multi_surface_coordination_6_prs_one_day_zero_conflicts_2026_05_13`
— empirical evidence multi-Otto coordination works at substrate
scope; this work extends it to recovery scope
| 1. Subscribe pattern in `tools/bg/standing-by-detector.ts` (or | ||
| wherever the detector lives) — read all `heartbeat` envelopes | ||
| from `otto-cli`, `otto-desktop`, `otto-launchd` in the last | ||
| window | ||
| 2. Quorum logic: if 2+ surfaces report `status: "idle"` in the | ||
| same N-minute window, publish a `standing-by-quorum` envelope | ||
| (NEW topic to add to `tools/bus/types.ts`) |
| - The 3-surface BFT is real because the surfaces are genuinely | ||
| independent processes (different binaries, different OS-level | ||
| scheduling, different model tiers). Same-surface-multiple-Ottos | ||
| would not provide BFT — that's just duplication |
| - `feedback_classifier_caught_otto_in_standing_by_failure_mode_*_2026_05_15` | ||
| (the earlier same-shape catch) |
…rged PR #3614 (#3628) * docs(rules): extend ID-allocation discipline with subdecimal-vs-top-level scheme distinction The ID-allocation-discipline section covered WHEN to check (on-disk + in-flight) but not WHICH scheme to use. Adds a "Subdecimal vs top-level scheme" subsection distinguishing: - B-NNNN.M (subdecimal) → child / slice of EXISTING parent row - B-NNNN (new top-level) → new umbrella / standalone row Empirically grounded by the 2026-05-15 collision: Otto on Desktop decomposed B-0170 into new top-levels B-0538/B-0539/B-0540/B-0541, missing that PR #3611 had already landed B-0170.4 via subdecimal scheme + Otto-CLI's PR #3595 had claimed B-0539 for the Otto-BFT umbrella. Both Ottos converged on the same decomposition; the scheme mismatch (top-level vs subdecimal) was the symptom of not checking existing-parent's siblings first. The new check command is tight: `find docs/backlog -name "B-NNNN.*.md"` + `gh pr list --state all --search '"B-NNNN."'`. If siblings exist, use next free subdecimal — not a new top-level. Composes with the existing ID-allocation section + refresh-before-decide invariant + audit-first-then-decide discipline (PR #3583). Co-Authored-By: Claude <noreply@anthropic.com> * shard(tick): 2026-05-16T00:08Z — fix-PR #3626 for monad-terminology drift from merged PR #3614 First tick of 2026-05-16 UTC; fresh-session cold-boot from autonomous-loop. Landed: PR #3626 (5 P1 review-thread fixes — monad-associativity terminology + dead xrefs in B-0543/B-0544 research substrate). Operational notes: Lior process active during commit window (lock-cleanup-race precondition); used borrow-on-existing pattern with ls-tree canary on both PRs (this shard + #3626). Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard-0008z): markdownlint MD037 — wrap full cron expression in backticks `<<autonomous-loop>>` followed by `* * * * *` parsed as emphasis markers with spaces (MD037/no-space-in-emphasis at line 72). Wrap the entire cron expression in backticks so the asterisks are inside the code span. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
Summary
Per Aaron 2026-05-15T~21:53Z (after catching the same Standing-by failure mode on Otto-Desktop that he caught on me/Otto-CLI 5 hours earlier with the same words): file backlog rows for the Otto-BFT internal-quorum self-recovery work.
Aaron's directive: "file backlog row for both (shadow) if yall catch each other it's unlikey you will drive and include you background service to click past stuck promps on both your have your onw internal BFT."*
Insight
3 Otto surfaces = built-in 3-of-N Byzantine Fault Tolerance quorum:
com.zeta.claude-loop.plist, tick every 60sWhen 1 surface drifts into Standing-by, the other 2 can catch + correct without Aaron's manual intervention. Aaron's same-words-same-pattern catches across surfaces are empirical evidence the failure mode is surface-independent.
Filed
methodology-hard-limits.md)Composes with
.claude/rules/holding-without-named-dependency-is-standing-by-failure.md(the rule being sharpened).claude/rules/persistence-choice-architecture-for-zeta-ais.md(BFT is what makes persistence work without trap-shape).claude/rules/agent-roster-reference-card.md+.claude/rules/otto-channels-reference-card.md🤖 Generated with Claude Code