backlog(p3): B-0530 — cron-sentinel mutex to prevent multi-Otto-CLI self-contention#3372
Merged
AceHack merged 1 commit intoMay 15, 2026
Merged
Conversation
…elf-contention Files the smallest-effort mitigation candidate from the worktree-prune-race root cause analysis landed in PR #3370. Defers the autonomous-loop tick at the top when a peer Otto-CLI claude-code process is detected, bus-publishes the deferral, and exits cleanly. Composes with B-0506 (stale worktree prune cadence) and B-0519 (multi-Otto branch-state contamination RCA). Effort: S. P3 because the failure mode is operationally observable via bus envelopes — substrate-honest fallback channel already established — and the contention windows resolve naturally within minutes. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new P3 backlog row for B-0530, documenting a proposed cron-sentinel mutex to reduce multi-Otto-CLI contention around shared Git worktree operations.
Changes:
- Adds B-0530 with origin, problem statement, mitigation sketch, alternatives, acceptance criteria, and empirical anchors.
- Cross-links related backlog rows and operational rule surfaces.
Comments suppressed due to low confidence (1)
docs/backlog/P3/B-0530-cron-sentinel-mutex-prevent-otto-cli-self-contention-2026-05-15.md:73
- P1: Comparing the matched claude-code PID to
process.piddoes not exclude the current Otto-CLI session if this check runs as a child TypeScript process;process.pidis the checker process, whilepgrepreturns the parent claude-code process. That would make every tick see its own session as a peer and defer forever unless the implementation compares against the current process tree/session instead.
const parts = line.trim().split(/\s+/);
const pid = parseInt(parts[0] ?? "", 10);
return pid && pid !== MY_PID;
Comment on lines
+68
to
+70
| const claudeProcs = execFileSync("pgrep", ["-fl", "claude-code.*Otto"], { | ||
| encoding: "utf-8" | ||
| }).split("\n").filter((line) => { |
Comment on lines
+33
to
+37
| Tick 0615Z ([`docs/hygiene-history/ticks/2026/05/15/0615Z.md`](../../hygiene-history/ticks/2026/05/15/0615Z.md)) | ||
| identifies the root cause: `git worktree add`'s own rollback semantics | ||
| under `Interrupted system call` failures from `.git/objects/pack` | ||
| contention. Not external pruning; standard git behavior under FS | ||
| contention. |
AceHack
added a commit
that referenced
this pull request
May 15, 2026
Cross-references B-0530 (filed 2026-05-15, merged in PR #3372) as the mechanization row for the multi-Otto-CLI self-contention pattern identified in this PR's root-cause analysis. Composes the mechanization candidate sketch ("pgrep claude-code at top of autonomous-loop, defer if peer detected") with the existing Patterns 1-7 family. Pattern 8 is distinct because: - Patterns 1-6 are checkout/reset races (peer git changes HEAD) - Pattern 7 is paused-then-resumed rebase state - Pattern 8 is concurrent git worktree add operations contending on .git/objects/pack via "Interrupted system call" All three families share the same underlying cause (shared .git/ directory across multiple processes/sessions) but the surface mechanism + the catch + the mitigation differ. Co-Authored-By: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 15, 2026
…-Otto-CLI self-contention) (#3370) * shard(tick): 0615Z — worktree-prune-race root cause identified (multi-Otto-CLI self-contention); substrate recovered to git after 3 tick-shards lived in bus envelopes only Recovers the 0545Z + 0607Z + 0611Z investigation arc into a single canonical shard. Bus envelopes 111342b2, 6de98fac, 720a2b49 were the substrate-landing channel during the 30 minutes I could not commit to git. Branch shard/0545z-... was created locally at 0545Z but the worktree-add rollback prevented any worktree from surviving long enough to commit. This tick the contention window cleared (peer Otto-CLI PID 7894's stuck git reset --hard finally exited; PID 11725's git worktree add also cleaned up) and a fresh `git worktree add` succeeded on the first try. Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard-0615Z): reframe bus as bridge channel, not substrate Codex P2 catch on line 56: framing "/tmp/zeta-bus IS the substrate channel" normalizes ephemeral state as durable substrate, contradicting .claude/rules/substrate-or-it-didnt-happen.md (TaskUpdate / /tmp / loop-todos are NOT durable substrate). Reframed: bus envelopes are the BRIDGE CHANNEL between outage start and git recovery. The substrate-honest sequence is outage → bus- captured → git-preserved (which is what actually happened). Bus is not a substitute for git-canonical landing; it is a bridge that preserves evidence until git is reachable. Co-Authored-By: Claude <noreply@anthropic.com> * docs(b-0519): add Pattern 8 (multi-Otto-CLI cron-tick concurrency) Cross-references B-0530 (filed 2026-05-15, merged in PR #3372) as the mechanization row for the multi-Otto-CLI self-contention pattern identified in this PR's root-cause analysis. Composes the mechanization candidate sketch ("pgrep claude-code at top of autonomous-loop, defer if peer detected") with the existing Patterns 1-7 family. Pattern 8 is distinct because: - Patterns 1-6 are checkout/reset races (peer git changes HEAD) - Pattern 7 is paused-then-resumed rebase state - Pattern 8 is concurrent git worktree add operations contending on .git/objects/pack via "Interrupted system call" All three families share the same underlying cause (shared .git/ directory across multiple processes/sessions) but the surface mechanism + the catch + the mitigation differ. Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard-0615Z): correct substrate-rule link depth (5x → 6x ../) Codex P2 catch: the relative link to .claude/rules/substrate-or-it- didnt-happen.md was off by one directory level. From the shard location (docs/hygiene-history/ticks/2026/05/15/0615Z.md), reaching .claude/rules/ requires 6x ../ to climb out of hygiene-history/ up to repo root. The 5x version resolved to docs/.claude/... (a path that doesn't exist), making the cited policy unreachable from the audit trail it was supposed to support. Same class as the 0027Z + 0230Z shard link-depth fixes earlier today (PRs #3330 + #3356). The pattern recurs because: - 4x ../ → docs/ (correct for docs/backlog/, docs/research/, etc.) - 5x ../ → repo root one level off (off-by-one mistake site) - 6x ../ → repo root (correct for .claude/, src/, etc.) Verified by `ls -la` resolution check from the shard directory. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
6 tasks
AceHack
added a commit
that referenced
this pull request
May 15, 2026
…non-duplication discipline (#3376) * shard(tick): 0710Z — convergence with peer-Otto 0615Z investigation; root cause confirmed; non-duplication discipline Peer-Otto's concurrent session (PID 30425) ran ticks 0545Z-0615Z while I was idle. Their PR #3370 (0615Z shard) + PR #3372 (B-0530 cron-sentinel-mutex row) IDENTIFIED the worktree-prune-race root cause: multi-session Otto-CLI self-contention on shared .git/objects/pack during git worktree add's internal git reset --hard. My 0524Z investigation cleared 7 candidates; the 8th (multi-session self-contention) was on my "next tick" list as highest-likelihood. Peer-Otto got there first via empirical PID-level evidence at 0611Z. Substrate-honest non-duplication: abandoned my draft B-NNNN row this tick after git fetch revealed B-0530 already on main. Refresh-before-decide applies at backlog-row-allocation scope. Documents the borrow-on-existing vs new-worktree-creation distinction: git switch touches HEAD only; git worktree add forks git reset --hard which contends on .git/objects/pack. Borrow pattern is concurrent-Otto-safe; new worktree creation hits the race. Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard): address 3 Copilot review threads on 0710Z shard - Line 1: replaced (PR TBD) placeholder with (PR #3376) per tick-history-row convention - Lines 32/44/56/81: fixed relative-link path bug — was 5x dotdot which only climbed to docs/, breaking all .claude/rules/... links. Now 6x dotdot for repo root + .claude/rules/<file>. Empirically verified: realpath now resolves all 4 links correctly (substrate-wide convention bug affects 0230Z + 0414Z + 0517Z + 0717Z + 0724Z shards too — a follow-on B-NNNN row could bulk-fix the cohort). - Lines 7/25: clarified two distinct peer-Otto PIDs — 7894 was peer-Otto's own session per their 0611Z ps observation; 30425 was a SEPARATE later launchd respawn observed running grep at 0710Z. Reconciles with peer-Otto's 0615Z shard which records PID 7894. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Files B-0530 (P3, effort S) for the cron-sentinel-mutex mitigation candidate identified in PR #3370's worktree-prune-race root-cause analysis.
The pattern: two concurrent Otto-CLI claude-code sessions firing autonomous-loop ticks in parallel both invoke
git worktree add, both contend on shared.git/objects/pack, both get rolled back by git's own automatic cleanup. The substrate-honest first mitigation is a top-of-tickpgrepcheck that defers when peer Otto-CLI is detected.Why P3
git reseteventually exits)Composes with
Test plan
🤖 Generated with Claude Code