Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,62 @@ if [ "$status_lines" -gt 5 ] || [ "$tree_size" -lt 50 ]; then
fi
```

## Stale-index.lock-as-precursor guard (NEW — empirical 2026-05-21T06:03Z)

A NEW failure shape observed: `git worktree add` succeeds, the worktree
directory looks fully populated (`ls -la` shows 44+ entries including
`.claude/`, `.codex/`, etc.), `git ls-tree HEAD` returns the expected
count (e.g. 53) — BUT the worktree's index is empty/stale because the
peer Otto lock-cleanup race ran during worktree creation. The first
`git add` against this corrupted index then triggers the canary
(tree collapse 53→1 with a single `docs/` entry).
Comment on lines +102 to +110

**Precursor signal**: `.git/worktrees/<name>/index.lock` is present at
worktree-add completion (rather than only appearing transiently during
git operations).

**Diagnostic shape** (the lock that fired the canary on 2026-05-21T06:13Z):

- **Size 0 bytes** (`stat -f "%z" <lock>` reports `0`)
- **Age past the 15s natural-clear window** (5min37s old when caught)

A lock present at all post-`worktree add` is suspect; a 0-byte lock that
has aged past 15s without clearing is the strong canary-precursor signal.

**Operational guard** (before first `git add` in a fresh worktree):

```bash
WT_GIT=$(git -C <worktree-path> rev-parse --git-dir)
LOCK="$WT_GIT/index.lock"
if [ -f "$LOCK" ]; then
AGE=$(( $(date +%s) - $(stat -f %m "$LOCK") ))
SIZE=$(stat -f %z "$LOCK")
Comment on lines +130 to +131
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace BSD-only stat flags in stale-lock guard

The new guard script is not portable to GNU/Linux, which means the precursor check can fail exactly in the environments where Codex agents run. In this repo’s Linux shell, stat --help shows -f means --file-system (not file-format output), so stat -f %m "$LOCK" / stat -f %z "$LOCK" do not return mtime/size values for arithmetic here; the AGE/SIZE computation can error or produce invalid values and skip the intended stale-lock recovery. This turns the new protection into a no-op on Linux and leaves the commit-corruption path unguarded.

Useful? React with 👍 / 👎.

if [ "$AGE" -gt 15 ]; then
Comment on lines +118 to +132
echo "STALE LOCK: ${AGE}s old, ${SIZE} bytes — canary precursor"
rm "$LOCK"
# Re-materialize index from tree to recover from possible peer corruption:
git -C <worktree-path> restore --staged --worktree --source=HEAD -- .
Comment on lines +132 to +136
fi
fi
```

The `git restore --staged --worktree --source=HEAD -- .` recovery is the
key step — it re-materializes both index and working tree from the HEAD
tree, replacing whatever the peer cleanup race emptied. Without this,
`git add <new-file>` against an empty index produces a commit whose
parent-diff is "delete everything + add this one file" — exactly the
canary signature.

**Why the post-creation `ls-tree HEAD` check (previous section) is not
sufficient**: `git ls-tree HEAD` reads the TREE object, not the INDEX.
The tree-from-HEAD remains correct (53 entries on 2026-05-21) while the
index is empty. The post-creation FRESHNESS check above runs
`git status --short` which compares working tree to index — but if BOTH
the index AND working tree are stale-but-matching (the peer cleanup
emptied both before populating from HEAD completed), `git status` shows
clean too. The 0-byte stale `index.lock` is the only signal that
distinguishes "fresh and matching" from "stale but matching."

## Post-commit guard (TRADITIONAL)

```bash
Expand Down Expand Up @@ -145,6 +201,38 @@ outcomes correlate with Lior between cleanup-cycles (1338Z), Lior absent
real-but-not-universal failure mode; the post-worktree-creation guard
remains the load-bearing check that distinguishes the cases.

## Empirical anchor (2026-05-21T06:13Z — stale-index.lock precursor)

7th data point. Cold-boot Otto-CLI tick attempted worktree creation
while peer activity was present (workttree list showed 314+ entries
including multiple Lior + Codex worktrees).

Comment on lines +206 to +209
- `/private/tmp/zeta-otto-cli-0603z-shard` (06:08Z 2026-05-21) — **clean
at worktree-add time**: `ls -la` showed 44 entries; `git ls-tree HEAD`
returned 53; `git status --short` returned empty
- **BUT** `.git/worktrees/zeta-otto-cli-0603z-shard/index.lock` existed,
0 bytes, 5min37s old
- First `git add docs/.../0603Z.md` proceeded after the stale lock was
removed (rm); commit then **corrupted** (`git ls-tree HEAD | wc -l` = 1)
- Recovered via `git reset --hard HEAD~1` → 5918 files restored; tree
back to 53; re-write shard; clean re-commit (HEAD=53, HEAD~1=53, +1 file)
- Shard landed via [PR #4511](https://github.com/Lucent-Financial-Group/Zeta/pull/4511)

**The new signal**: `ls-tree HEAD = 53` and `status --short = 0` BOTH
passed the post-worktree-creation guard from the previous section — yet
the commit still corrupted. The previous guards (process-list, freshness
check, post-commit guard) caught the FAILURE; the stale-`index.lock`-as-
precursor guard would have caught the SETUP-FOR-FAILURE before the first
`git add` ran, avoiding the recovery roundtrip entirely.

Empirical totals across all 7 anchors:

- 3 clean (1338Z + 1631Z 2026-05-15; 1413Z 2026-05-20)
- 4 corrupted (1345Z + 1521Z + 1547Z 2026-05-15; 0608Z 2026-05-21)
- New diagnostic surface (stale-`index.lock` precursor) added by the
4th corrupted case to distinguish "guards pass + commit corrupts" from
"guards pass + commit clean"

## Composes with

- `.claude/rules/claim-acquire-before-worktree-work.md` — worktree
Expand Down
Loading