Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,36 @@ Field-test shards:
- `docs/hygiene-history/ticks/2026/05/15/0230Z.md` — full forensics +
pivot to dedicated worktrees recovery

### Pattern 8 — Multi-Otto-CLI cron-tick concurrency on `.git/objects/pack` (2026-05-15T06:11Z)

Two or more concurrent Otto-CLI claude-code sessions (different
foreground sessions, same machine, same `.git/`) firing
`<<autonomous-loop>>` cron sentinels in parallel both invoke
`git worktree add`, both contend on shared `.git/objects/pack`
during the internal `git reset --hard --no-recurse-submodules`,
both get rolled back by `git worktree add`'s own automatic cleanup
on `Interrupted system call`.

From the operator's perspective this looks like external pruning of
new worktrees; the actual mechanism is standard `git worktree add`
rollback semantics under FS contention.

Field-test trail:

- Bus envelopes: `44aaf799` (peer-Otto 0414Z) +
`111342b2` / `6de98fac` / `720a2b49` (my 0545Z+0607Z+0611Z)
- Investigation shard: `docs/hygiene-history/ticks/2026/05/15/0524Z.md`
(peer-Otto cleared 7 candidates; multi-session was the missed one)
- Root cause shard: `docs/hygiene-history/ticks/2026/05/15/0615Z.md`
(PID-level diagnostic landed in PR #3370)
- Mechanization row: [B-0530 cron-sentinel-mutex](B-0530-cron-sentinel-mutex-prevent-otto-cli-self-contention-2026-05-15.md)
(P3, effort S, filed 2026-05-15)

Mechanization candidate (see B-0530 for full detail): `pgrep -fl
claude-code.*Otto` at the top of `<<autonomous-loop>>`; if a peer
Otto-CLI process is detected, bus-publish a "deferred" envelope and
exit cleanly.

## Mechanization candidates

### Cheap
Expand Down
95 changes: 95 additions & 0 deletions docs/hygiene-history/ticks/2026/05/15/0615Z.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Tick 0615Z — Worktree-prune-race root cause identified + 0545Z+0607Z+0611Z shard substrate recovered to git

## Headline

- **Worktree-prune-race root cause: concurrent Otto-CLI sessions** firing autonomous-loop ticks in parallel, contending on shared `.git/objects/pack`. PID-level evidence captured this tick. Investigation arc (0414Z peer-Otto → 0524Z peer-Otto 7-candidates cleared → 0545Z+0607Z my consecutive observations → 0611Z my PID diagnostic) now resolved.
- **3 prior bus envelopes published** (`111342b2`, `6de98fac`, `720a2b49`) capturing the substrate that I could not commit to git in ticks 0545Z + 0607Z. This shard is the git-canonical landing of that substrate now that the contention window has cleared.
- **PR [#3342](https://github.com/Lucent-Financial-Group/Zeta/pull/3342) MERGED at `4a10bf6`** — confirmed in worldview-refresh at 0545Z (the tick that started this whole investigation). Auto-merge fired during my worldview-refresh that tick.

## Root cause (0611Z PID diagnostic)

```
PID 68752 — my claude-code session, ~3:53h old at tick 0611Z
PID 7894 — peer Otto-CLI session, ~5:40 old at tick 0611Z
shares Claude.app parent PID 702 with mine

PID 11710 — peer's zsh launching git worktree add
PID 11725 — peer's `git worktree add /tmp/zeta-0607z-shard-2026-05-15 -b shard/0607z-cron-rearm-and-0230z-recovery-otto-cli-2026-05-15 origin/main`
PID 11818 — peer's `git reset --hard --no-recurse-submodules` (child of 11725)
STUCK on .git/objects/pack contention; held index.lock 9+ min

My git worktree add attempts (ticks 0545Z + 0607Z) all failed with:
error: unable to open object pack directory: .../.git/objects/pack: Interrupted system call
fatal: Could not reset index file to revision 'HEAD'
followed by git's automatic rollback (rm -rf the partially-populated worktree dir).
```

Peer-Otto's 0524Z investigation correctly ruled out 7 candidates (Lior/Riven/Codex/Vera/Copilot loops, lane-allocator, `git worktree prune`, `gc.pruneexpire`). The candidate they missed was **multi-session self-contention** — two Otto-CLI claude-code instances running autonomous-loop ticks in parallel on the same `.git/` directory.

## What "looks like pruning" actually is

When `git worktree add` runs:

1. Creates `.git/worktrees/<name>/` admin dir + the worktree path on disk
2. Copies the entire working tree (`Updating files: 100%`)
3. Runs `git reset --hard --no-recurse-submodules` to populate the index
4. If step 3 fails (e.g., `Interrupted system call` on `.git/objects/pack` due to concurrent peer git read/write), `git worktree add` **rolls back**:
- `rm -rf` the worktree path
- removes the `.git/worktrees/<name>/` admin dir
- exits with the `fatal: Could not reset index file to revision 'HEAD'` error

From the operator's perspective this looks like "something pruned my worktree." It IS git pruning the worktree — but git itself, as part of standard `git worktree add` rollback semantics, not an external attacker. The trigger is `.git/objects/pack` contention during the reset.

## Mitigation candidates

| Candidate | Shape | Effort |
|---|---|---|
| **Cron-sentinel mutex** | Refuse to fire if another `claude-code` Otto-CLI process is detected via `pgrep -fl "claude-code.*Otto"` | S |
| **Pre-worktree-add lock check** | Before `git worktree add`, `lsof .git/objects/pack` and back off with jitter if any peer git is present | S |
| **Filesystem flock on a sentinel path** | All Otto-CLI git operations serialize via `flock /tmp/zeta-git.lock` | M |
| **Per-session bare clone** | Each Otto-CLI session bind-mounts or symlinks a dedicated bare clone of the repo; no shared `.git/` | L |

The substrate-honest first move is the cron-sentinel mutex: it's small, only affects autonomous-loop firings (not interactive Otto-CLI work), and has zero blast-radius. Could be a `pgrep` check at the top of `<<autonomous-loop>>` that bus-publishes a "deferred" envelope and exits cleanly.

## Substrate-honest meta: bus is the bridge channel, NOT substrate

Per [`.claude/rules/substrate-or-it-didnt-happen.md`](../../../../../../.claude/rules/substrate-or-it-didnt-happen.md): `/tmp/zeta-bus/` envelopes are **captured** (TaskUpdate-tier, ephemeral) — they are NOT substrate. Substrate requires committed + reachable + indexed git artifacts.

When `git worktree add` failed repeatedly across 0545Z + 0607Z + 0611Z, the bus envelopes (`111342b2` + `6de98fac` + `720a2b49`) served as the **bridge channel** between outage start and git recovery: they preserved the investigation evidence in ephemeral form until git contention cleared and I could land this shard. That sequence — outage → bus-captured → git-preserved — is the substrate-honest pattern, not a normalization of bus-as-substrate.

This is consistent with how peer-Otto handled the same blocker at 0414Z (envelope `44aaf799`) and then landed [`docs/hygiene-history/ticks/2026/05/15/0524Z.md`](0524Z.md) once git was reachable. The lesson for future-Otto: when `git worktree add` fails repeatedly, bus-publish to bridge the outage, then commit-and-push as soon as the contention clears. Do NOT treat the bus as a substitute for git-canonical landing.

## Δ since 0543Z (the last shard that landed)

| What | At 0543Z | At 0615Z |
|---|---|---|
| PR #3342 | wait-ci, auto-merge armed | **MERGED** (`4a10bf6`) |
| Worktree-prune-race understanding | peer-Otto's open investigation | root cause identified (multi-session contention) |
| Bus envelopes from me | 0 | 3 (`111342b2`, `6de98fac`, `720a2b49`) |
| Otto-CLI active sessions | 1 (mine) | 2 (mine + peer's PID 7894) |
| Peer Otto-CLI stuck git ops | n/a | observed mid-tick, then cleared by 0615Z |
| Mitigation candidates | none documented | 4 candidates with effort-T-shirt sizing |

## Bus state

```
$ ls /tmp/zeta-bus/ (7 envelopes)
720a2b49 (otto-cli 0613Z) — ROOT CAUSE: multi-session Otto-CLI concurrency
6de98fac (otto-cli 0611Z) — third observation + refined hypothesis
111342b2 (otto-cli 0605Z) — initial observation (my tick 0545Z)
44aaf799 (otto-cli 0451Z) — peer-Otto's original report (tick 0414Z)
+ 3 stale work-assignment broadcasts (B-0441, B-0170, B-0503)
```

## Cron sentinel

`a2c54a1c` armed.

## Next

Cron-driven. Suggested next-tick actions:

1. **Commit this shard** + push + open PR + arm auto-merge (this tick's primary work)
2. **If contention recurs**: detect peer Otto-CLI before retrying worktree-add
3. **File B-0530** for the cron-sentinel-mutex mitigation if peer-Otto or maintainer agrees this is worth mechanizing
4. **Otherwise**: check on the worktree-prune-race investigation surface for closure
Loading