diff --git a/.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md b/.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md index ed8e2bb1f6..e0fbc04cb7 100644 --- a/.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md +++ b/.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md @@ -99,6 +99,62 @@ if [ "$status_lines" -gt 5 ] || [ "$tree_size" -lt 50 ]; then fi ``` +## Stale-index.lock-as-precursor guard (NEW — empirical 2026-05-21T06:03Z) + +A NEW failure shape observed: `git worktree add` succeeds, the worktree +directory looks fully populated (`ls -la` shows 44+ entries including +`.claude/`, `.codex/`, etc.), `git ls-tree HEAD` returns the expected +count (e.g. 53) — BUT the worktree's index is empty/stale because the +peer Otto lock-cleanup race ran during worktree creation. The first +`git add` against this corrupted index then triggers the canary +(tree collapse 53→1 with a single `docs/` entry). + +**Precursor signal**: `.git/worktrees//index.lock` is present at +worktree-add completion (rather than only appearing transiently during +git operations). + +**Diagnostic shape** (the lock that fired the canary on 2026-05-21T06:13Z): + +- **Size 0 bytes** (`stat -f "%z" ` reports `0`) +- **Age past the 15s natural-clear window** (5min37s old when caught) + +A lock present at all post-`worktree add` is suspect; a 0-byte lock that +has aged past 15s without clearing is the strong canary-precursor signal. + +**Operational guard** (before first `git add` in a fresh worktree): + +```bash +WT_GIT=$(git -C rev-parse --git-dir) +LOCK="$WT_GIT/index.lock" +if [ -f "$LOCK" ]; then + AGE=$(( $(date +%s) - $(stat -f %m "$LOCK") )) + SIZE=$(stat -f %z "$LOCK") + if [ "$AGE" -gt 15 ]; then + echo "STALE LOCK: ${AGE}s old, ${SIZE} bytes — canary precursor" + rm "$LOCK" + # Re-materialize index from tree to recover from possible peer corruption: + git -C restore --staged --worktree --source=HEAD -- . + fi +fi +``` + +The `git restore --staged --worktree --source=HEAD -- .` recovery is the +key step — it re-materializes both index and working tree from the HEAD +tree, replacing whatever the peer cleanup race emptied. Without this, +`git add ` against an empty index produces a commit whose +parent-diff is "delete everything + add this one file" — exactly the +canary signature. + +**Why the post-creation `ls-tree HEAD` check (previous section) is not +sufficient**: `git ls-tree HEAD` reads the TREE object, not the INDEX. +The tree-from-HEAD remains correct (53 entries on 2026-05-21) while the +index is empty. The post-creation FRESHNESS check above runs +`git status --short` which compares working tree to index — but if BOTH +the index AND working tree are stale-but-matching (the peer cleanup +emptied both before populating from HEAD completed), `git status` shows +clean too. The 0-byte stale `index.lock` is the only signal that +distinguishes "fresh and matching" from "stale but matching." + ## Post-commit guard (TRADITIONAL) ```bash @@ -145,6 +201,38 @@ outcomes correlate with Lior between cleanup-cycles (1338Z), Lior absent real-but-not-universal failure mode; the post-worktree-creation guard remains the load-bearing check that distinguishes the cases. +## Empirical anchor (2026-05-21T06:13Z — stale-index.lock precursor) + +7th data point. Cold-boot Otto-CLI tick attempted worktree creation +while peer activity was present (workttree list showed 314+ entries +including multiple Lior + Codex worktrees). + +- `/private/tmp/zeta-otto-cli-0603z-shard` (06:08Z 2026-05-21) — **clean + at worktree-add time**: `ls -la` showed 44 entries; `git ls-tree HEAD` + returned 53; `git status --short` returned empty +- **BUT** `.git/worktrees/zeta-otto-cli-0603z-shard/index.lock` existed, + 0 bytes, 5min37s old +- First `git add docs/.../0603Z.md` proceeded after the stale lock was + removed (rm); commit then **corrupted** (`git ls-tree HEAD | wc -l` = 1) +- Recovered via `git reset --hard HEAD~1` → 5918 files restored; tree + back to 53; re-write shard; clean re-commit (HEAD=53, HEAD~1=53, +1 file) +- Shard landed via [PR #4511](https://github.com/Lucent-Financial-Group/Zeta/pull/4511) + +**The new signal**: `ls-tree HEAD = 53` and `status --short = 0` BOTH +passed the post-worktree-creation guard from the previous section — yet +the commit still corrupted. The previous guards (process-list, freshness +check, post-commit guard) caught the FAILURE; the stale-`index.lock`-as- +precursor guard would have caught the SETUP-FOR-FAILURE before the first +`git add` ran, avoiding the recovery roundtrip entirely. + +Empirical totals across all 7 anchors: + +- 3 clean (1338Z + 1631Z 2026-05-15; 1413Z 2026-05-20) +- 4 corrupted (1345Z + 1521Z + 1547Z 2026-05-15; 0608Z 2026-05-21) +- New diagnostic surface (stale-`index.lock` precursor) added by the + 4th corrupted case to distinguish "guards pass + commit corrupts" from + "guards pass + commit clean" + ## Composes with - `.claude/rules/claim-acquire-before-worktree-work.md` — worktree