Skip to content

backlog(p3): B-0530 — cron-sentinel mutex to prevent multi-Otto-CLI self-contention#3372

Merged
AceHack merged 1 commit into
mainfrom
backlog/b0530-cron-sentinel-mutex-otto-cli-2026-05-15
May 15, 2026
Merged

backlog(p3): B-0530 — cron-sentinel mutex to prevent multi-Otto-CLI self-contention#3372
AceHack merged 1 commit into
mainfrom
backlog/b0530-cron-sentinel-mutex-otto-cli-2026-05-15

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 15, 2026

Summary

Files B-0530 (P3, effort S) for the cron-sentinel-mutex mitigation candidate identified in PR #3370's worktree-prune-race root-cause analysis.

The pattern: two concurrent Otto-CLI claude-code sessions firing autonomous-loop ticks in parallel both invoke git worktree add, both contend on shared .git/objects/pack, both get rolled back by git's own automatic cleanup. The substrate-honest first mitigation is a top-of-tick pgrep check that defers when peer Otto-CLI is detected.

Why P3

  • Failure mode is operationally observable via bus envelopes (substrate-honest fallback channel already established)
  • Contention windows resolve naturally within minutes (peer's stuck git reset eventually exits)
  • Not blocking ongoing work; just causing extra round-trips for affected ticks

Composes with

Test plan

  • Markdownlint clean locally
  • CI green
  • Auto-merge fires

🤖 Generated with Claude Code

…elf-contention

Files the smallest-effort mitigation candidate from the
worktree-prune-race root cause analysis landed in PR #3370. Defers
the autonomous-loop tick at the top when a peer Otto-CLI claude-code
process is detected, bus-publishes the deferral, and exits cleanly.

Composes with B-0506 (stale worktree prune cadence) and B-0519
(multi-Otto branch-state contamination RCA). Effort: S. P3 because
the failure mode is operationally observable via bus envelopes —
substrate-honest fallback channel already established — and the
contention windows resolve naturally within minutes.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 06:24
@AceHack AceHack enabled auto-merge (squash) May 15, 2026 06:25
@AceHack AceHack merged commit bc1f46c into main May 15, 2026
23 of 24 checks passed
@AceHack AceHack deleted the backlog/b0530-cron-sentinel-mutex-otto-cli-2026-05-15 branch May 15, 2026 06:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P3 backlog row for B-0530, documenting a proposed cron-sentinel mutex to reduce multi-Otto-CLI contention around shared Git worktree operations.

Changes:

  • Adds B-0530 with origin, problem statement, mitigation sketch, alternatives, acceptance criteria, and empirical anchors.
  • Cross-links related backlog rows and operational rule surfaces.
Comments suppressed due to low confidence (1)

docs/backlog/P3/B-0530-cron-sentinel-mutex-prevent-otto-cli-self-contention-2026-05-15.md:73

  • P1: Comparing the matched claude-code PID to process.pid does not exclude the current Otto-CLI session if this check runs as a child TypeScript process; process.pid is the checker process, while pgrep returns the parent claude-code process. That would make every tick see its own session as a peer and defer forever unless the implementation compares against the current process tree/session instead.
  const parts = line.trim().split(/\s+/);
  const pid = parseInt(parts[0] ?? "", 10);
  return pid && pid !== MY_PID;

Comment on lines +68 to +70
const claudeProcs = execFileSync("pgrep", ["-fl", "claude-code.*Otto"], {
encoding: "utf-8"
}).split("\n").filter((line) => {
Comment on lines +33 to +37
Tick 0615Z ([`docs/hygiene-history/ticks/2026/05/15/0615Z.md`](../../hygiene-history/ticks/2026/05/15/0615Z.md))
identifies the root cause: `git worktree add`'s own rollback semantics
under `Interrupted system call` failures from `.git/objects/pack`
contention. Not external pruning; standard git behavior under FS
contention.
AceHack added a commit that referenced this pull request May 15, 2026
Cross-references B-0530 (filed 2026-05-15, merged in PR #3372) as
the mechanization row for the multi-Otto-CLI self-contention pattern
identified in this PR's root-cause analysis. Composes the
mechanization candidate sketch ("pgrep claude-code at top of
autonomous-loop, defer if peer detected") with the existing
Patterns 1-7 family.

Pattern 8 is distinct because:
- Patterns 1-6 are checkout/reset races (peer git changes HEAD)
- Pattern 7 is paused-then-resumed rebase state
- Pattern 8 is concurrent git worktree add operations contending
  on .git/objects/pack via "Interrupted system call"

All three families share the same underlying cause (shared .git/
directory across multiple processes/sessions) but the surface
mechanism + the catch + the mitigation differ.

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 15, 2026
…-Otto-CLI self-contention) (#3370)

* shard(tick): 0615Z — worktree-prune-race root cause identified (multi-Otto-CLI self-contention); substrate recovered to git after 3 tick-shards lived in bus envelopes only

Recovers the 0545Z + 0607Z + 0611Z investigation arc into a single
canonical shard. Bus envelopes 111342b2, 6de98fac, 720a2b49 were the
substrate-landing channel during the 30 minutes I could not commit
to git. Branch shard/0545z-... was created locally at 0545Z but the
worktree-add rollback prevented any worktree from surviving long
enough to commit.

This tick the contention window cleared (peer Otto-CLI PID 7894's
stuck git reset --hard finally exited; PID 11725's git worktree add
also cleaned up) and a fresh `git worktree add` succeeded on the
first try.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(shard-0615Z): reframe bus as bridge channel, not substrate

Codex P2 catch on line 56: framing "/tmp/zeta-bus IS the substrate
channel" normalizes ephemeral state as durable substrate, contradicting
.claude/rules/substrate-or-it-didnt-happen.md (TaskUpdate / /tmp /
loop-todos are NOT durable substrate).

Reframed: bus envelopes are the BRIDGE CHANNEL between outage start
and git recovery. The substrate-honest sequence is outage → bus-
captured → git-preserved (which is what actually happened). Bus is
not a substitute for git-canonical landing; it is a bridge that
preserves evidence until git is reachable.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(b-0519): add Pattern 8 (multi-Otto-CLI cron-tick concurrency)

Cross-references B-0530 (filed 2026-05-15, merged in PR #3372) as
the mechanization row for the multi-Otto-CLI self-contention pattern
identified in this PR's root-cause analysis. Composes the
mechanization candidate sketch ("pgrep claude-code at top of
autonomous-loop, defer if peer detected") with the existing
Patterns 1-7 family.

Pattern 8 is distinct because:
- Patterns 1-6 are checkout/reset races (peer git changes HEAD)
- Pattern 7 is paused-then-resumed rebase state
- Pattern 8 is concurrent git worktree add operations contending
  on .git/objects/pack via "Interrupted system call"

All three families share the same underlying cause (shared .git/
directory across multiple processes/sessions) but the surface
mechanism + the catch + the mitigation differ.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(shard-0615Z): correct substrate-rule link depth (5x → 6x ../)

Codex P2 catch: the relative link to .claude/rules/substrate-or-it-
didnt-happen.md was off by one directory level. From the shard
location (docs/hygiene-history/ticks/2026/05/15/0615Z.md), reaching
.claude/rules/ requires 6x ../ to climb out of hygiene-history/ up
to repo root. The 5x version resolved to docs/.claude/... (a path
that doesn't exist), making the cited policy unreachable from the
audit trail it was supposed to support.

Same class as the 0027Z + 0230Z shard link-depth fixes earlier today
(PRs #3330 + #3356). The pattern recurs because:
  - 4x ../ → docs/ (correct for docs/backlog/, docs/research/, etc.)
  - 5x ../ → repo root one level off (off-by-one mistake site)
  - 6x ../ → repo root (correct for .claude/, src/, etc.)

Verified by `ls -la` resolution check from the shard directory.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 15, 2026
…non-duplication discipline (#3376)

* shard(tick): 0710Z — convergence with peer-Otto 0615Z investigation; root cause confirmed; non-duplication discipline

Peer-Otto's concurrent session (PID 30425) ran ticks 0545Z-0615Z while I was idle.
Their PR #3370 (0615Z shard) + PR #3372 (B-0530 cron-sentinel-mutex row) IDENTIFIED
the worktree-prune-race root cause: multi-session Otto-CLI self-contention on
shared .git/objects/pack during git worktree add's internal git reset --hard.

My 0524Z investigation cleared 7 candidates; the 8th (multi-session self-contention)
was on my "next tick" list as highest-likelihood. Peer-Otto got there first via
empirical PID-level evidence at 0611Z.

Substrate-honest non-duplication: abandoned my draft B-NNNN row this tick after
git fetch revealed B-0530 already on main. Refresh-before-decide applies at
backlog-row-allocation scope.

Documents the borrow-on-existing vs new-worktree-creation distinction:
git switch touches HEAD only; git worktree add forks git reset --hard which
contends on .git/objects/pack. Borrow pattern is concurrent-Otto-safe; new
worktree creation hits the race.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(shard): address 3 Copilot review threads on 0710Z shard

- Line 1: replaced (PR TBD) placeholder with (PR #3376) per tick-history-row convention
- Lines 32/44/56/81: fixed relative-link path bug — was 5x dotdot which only
  climbed to docs/, breaking all .claude/rules/... links. Now 6x dotdot for repo
  root + .claude/rules/<file>. Empirically verified: realpath now resolves all
  4 links correctly (substrate-wide convention bug affects 0230Z + 0414Z + 0517Z
  + 0717Z + 0724Z shards too — a follow-on B-NNNN row could bulk-fix the cohort).
- Lines 7/25: clarified two distinct peer-Otto PIDs — 7894 was peer-Otto's own
  session per their 0611Z ps observation; 30425 was a SEPARATE later launchd
  respawn observed running grep at 0710Z. Reconciles with peer-Otto's 0615Z
  shard which records PID 7894.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants