From 3c67d61980bb6eb76cf3f00faa27c5a878627188 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 09:31:17 -0400 Subject: [PATCH 1/3] rules(saturation-ceiling): land Sub-case 3b (B-0530 at push-time) + read-tree HEAD index recovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Folds in the two refinements captured in the 2026-05-21 memo (PR #4535) to the saturation-ceiling discipline directly, so future Otto cold-boots inherit them via auto-load instead of via memory-file pointer. Two edits: 1. New Sub-case 3b — pack-dir contention at git-push time. Same B-0530 root cause as sub-case 3 (worktree-add time), but the symptom appears on push (Interrupted system call on .git/objects/pack). Distinct from B-0615 (silent-push-failure with exit 0). Mitigation: REST git-data API bypass per PR #4145. Empirical anchor: PR #4535 shipped via the bypass after git push timed out at exit 124. 2. In-place index recovery — git read-tree HEAD rebuilds a truncated index file (post stale-lock-removal under peer contention) without requiring worktree abandonment. Extends sub-case 5 (peer-side destructive) recovery toolkit. Empirical anchor: PR #4532 shipped after read-tree HEAD recovered an index truncated by stale-lock- removal race; previously only recovery option was abandonment. Both edits are minimal-additive: they extend existing sub-case structure rather than reorganizing it. Section header still accurate ("4 failure sub-cases of borrow-on-existing") because 3b is a sibling-variant of 3, not a new numbered case. Authored + pushed via REST git-data API bypass because git push was still hitting the very Sub-case 3b being documented. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../claim-acquire-before-worktree-work.md | 85 +++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index a192be3937..7602050d10 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -230,6 +230,91 @@ activity. No `--lock` flag prevents this; see [B-0530](../../docs/backlog/P3/B-0 mutex (not yet shipped). Until then, fall through to existing-sidetick borrow — which hits sub-case 4. +### Sub-case 3b — pack-dir contention causes `git push` to fail at push time + +Same B-0530 root cause class as sub-case 3, but manifesting at `git push` +time on an already-created worktree that previously passed the canary. +Distinguished from B-0615 (silent-push-failure) by being non-silent. + +**Symptom**: `git push` returns non-zero exit with errors like: + +``` +error: unable to open loose object : Interrupted system call +error: unable to open object pack directory: .../.git/objects/pack: Interrupted system call +fatal: bad object +fatal: the remote end hung up unexpectedly +error: failed to push some refs to '...' +``` + +Network + auth are fine; bottleneck is local pack-dir reads under peer-agent +contention. Distinguish from **B-0615** (push exits ZERO but remote ref +never updates — silent; mitigation: REST git-data API bypass per +[PR #4145](https://github.com/Lucent-Financial-Group/Zeta/pull/4145)). +Both belong to the same FS-contention root cause class but require +different mitigations because the exit codes differ. + +**Mitigation (working today)**: the **B-0615 REST git-data API bypass** +(`POST .../git/blobs` → `POST .../git/trees` → `POST .../git/commits` → +`POST/PATCH .../git/refs`) works for sub-case 3b as well as B-0615. +Empirical anchor: [PR #4535](https://github.com/Lucent-Financial-Group/Zeta/pull/4535) +(2026-05-21) — the memo about this very failure mode was blocked from +landing by `git push` exit 124 timeouts, then shipped successfully via +the REST bypass. + +**Cost**: ~5-6 REST calls total per commit (well within Normal-tier +GraphQL budget). No `.git/objects/pack` reads happen locally because +GitHub does the object packing server-side from the blob you uploaded. + +**Composes with the rate-limit operational tiers** documented in +[`refresh-world-model-poll-pr-gate.md`](refresh-world-model-poll-pr-gate.md): +when the saturation makes `git push` exit non-zero or hang, the REST +bypass IS the tier-skipping move that lets substantive substrate land +without waiting for contention to clear. + +### In-place index recovery — `git read-tree HEAD` + +Refinement to sub-case 5 (peer-side destructive git operation), where the +specific symptom is a **truncated index file** after stale-lock removal: + +``` +fatal: .git/worktrees//index: index file smaller than expected +``` + +A preceding `git status` may show massive D (deleted) entries against +files you have not touched — a misleading symptom of the corrupted index, +NOT actual working-tree deletion. Do NOT abandon the worktree on this +symptom alone; first verify the working tree itself via `ls` (files +should still be on disk). + +**Recovery**: + +```bash +git -C read-tree HEAD +``` + +This rebuilds the worktree's index from the HEAD commit, replacing the +truncated index in-place. Working-tree files are NOT modified (they were +not part of the corruption — only the index was). After rebuild: + +1. `git status` returns clean (empty) +2. Stage your intended file via `git add ` (the file is still on + disk; the read-tree wiped any stale staged state but did not touch + the working tree) +3. `git commit` normally +4. Verify commit canary (parent tree size = commit tree size) before + pushing per [`codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md`](codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md) + +**When NOT to use**: if the working tree itself is corrupted (files +missing on disk), `read-tree` will silently stage the wrong state. +Pre-check disk state via `ls` before invoking. This recovery applies +ONLY to truncated-INDEX states, NOT truncated-working-tree states. + +**Empirical anchor**: [PR #4532](https://github.com/Lucent-Financial-Group/Zeta/pull/4532) +(2026-05-21) — the 1212Z tick shard was successfully shipped after +`read-tree HEAD` recovered an index truncated by stale-lock-removal +race; previously the saturation-ceiling rule's only recovery option +was worktree abandonment. + ### Sub-case 4 — pruned-sidetick race The empirically-validated sidetick `/private/tmp/zeta-otto-cli-0027z-sidetick` From 510da94dc982b43c7a43943d38f35e5382841516 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 09:35:43 -0400 Subject: [PATCH 2/3] =?UTF-8?q?fix(rule):=20correct=20read-tree=20HEAD=20p?= =?UTF-8?q?ostcondition=20=E2=80=94=20status=20is=20NOT=20clean=20after=20?= =?UTF-8?q?rebuild?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit P2 thread finding (chatgpt-codex-connector): read-tree HEAD rewrites the index but does NOT touch the working tree. The original wording said "git status returns clean (empty)" after rebuild — false in the general case + actually false in the empirical case the rule documents (the shard file was untracked at the time). Correction: the recovery indicator is the DISAPPEARANCE of "index file smaller than expected" — not a clean status. Genuine working-tree-vs- HEAD diff still reflects in status. Misreading read-tree as "should produce clean status" is the most common way the recovery gets misdiagnosed as failed when it actually succeeded. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../rules/claim-acquire-before-worktree-work.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index 7602050d10..33ce8131a3 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -296,10 +296,17 @@ This rebuilds the worktree's index from the HEAD commit, replacing the truncated index in-place. Working-tree files are NOT modified (they were not part of the corruption — only the index was). After rebuild: -1. `git status` returns clean (empty) -2. Stage your intended file via `git add ` (the file is still on - disk; the read-tree wiped any stale staged state but did not touch - the working tree) +1. `git status` now reflects the genuine working-tree-vs-HEAD diff — + not "empty," because `read-tree` only rewrote the index, not the + working tree. Any intended local edits / untracked files you had + before the corruption STILL show as modified / untracked. The + `index file smaller than expected` error is gone; that is the + indicator the recovery worked. (Misreading `read-tree` as "should + produce a clean status" is the most common way the recovery gets + misdiagnosed as failed when it actually succeeded.) +2. Stage your intended file via `git add ` — the file is still + on disk; `read-tree` wiped any stale staged state but did not touch + the working tree 3. `git commit` normally 4. Verify commit canary (parent tree size = commit tree size) before pushing per [`codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md`](codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md) From ae15cbe997af8aab3837cfb9c31c26c33528b408 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Thu, 21 May 2026 09:37:53 -0400 Subject: [PATCH 3/3] =?UTF-8?q?fix(rule):=203=20review=20findings=20?= =?UTF-8?q?=E2=80=94=20sub-case=20count=20+=20exit-124=20origin=20+=20REST?= =?UTF-8?q?/core=20budget?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit P1 findings from copilot-pull-request-reviewer: 1. Count of sub-cases was stale ("4 failure sub-cases of borrow-on-existing" + "All 4 sub-cases empirically validated") — now 5 with the addition of 3b. Updated section header to "5 failure sub-cases" + footnote naming 3b as the fifth empirical sub-case with a working mitigation. 2. exit 124 is from the GNU timeout wrapper (command killed by timeout status), NOT a native git push exit code. Clarified in Sub-case 3b empirical anchor that the contention was hanging git push indefinitely until the timeout wrapper killed it. 3. REST API calls consume the REST/core budget (5000/hr per token), NOT the GraphQL budget. Original text referenced "Normal-tier GraphQL budget" which conflated independent budget pools. Updated cost section to reference REST/core explicitly + clarified relationship to the GraphQL tier classification in refresh-world-model-poll-pr-gate.md (which is GraphQL-scoped, does not translate directly to REST/core). 4. Outdated thread (already-addressed in prior fix commit 510da94d on read-tree HEAD postcondition) resolved no-op. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../claim-acquire-before-worktree-work.md | 28 +++++++++++++------ 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/.claude/rules/claim-acquire-before-worktree-work.md b/.claude/rules/claim-acquire-before-worktree-work.md index 33ce8131a3..3c3c1b04ab 100644 --- a/.claude/rules/claim-acquire-before-worktree-work.md +++ b/.claude/rules/claim-acquire-before-worktree-work.md @@ -153,7 +153,7 @@ Composes with [B-0530](../../docs/backlog/P3/B-0530-cron-sentinel-mutex-prevent- when it ships); until that ships, the borrow pattern is the operational workaround. -## Saturation-ceiling — 4 failure sub-cases of borrow-on-existing +## Saturation-ceiling — 5 failure sub-cases of borrow-on-existing Empirical anchor [PR #3808](https://github.com/Lucent-Financial-Group/Zeta/pull/3808) (closed-without-merge; shard for `0715Z` was the PR's payload, hence never @@ -163,8 +163,10 @@ fresh-cold-boot Otto-CLI, and peer-agent global-lock-cleanup loop), with peer Otto cycling worktree HEAD every ~3-5 min for 9 transitions in 35 min, a fresh-cold-boot session attempting to ship a shard hit FOUR distinct failure sub-cases of the borrow-on-existing pattern across 4 -commit attempts. All 4 sub-cases empirically validated; only 2 have -working mitigations today. +commit attempts. All 4 of those sub-cases empirically validated; only 2 +have working mitigations today. **A fifth sub-case (3b — pack-dir +contention at push time) was added in 2026-05-21 (PR [#4536](https://github.com/Lucent-Financial-Group/Zeta/pull/4536)) with a working mitigation +(REST git-data API bypass).** ### Sub-case 1 — existing-branch-name collision → peer-WIP commit inheritance via recovery path @@ -258,11 +260,21 @@ different mitigations because the exit codes differ. `POST/PATCH .../git/refs`) works for sub-case 3b as well as B-0615. Empirical anchor: [PR #4535](https://github.com/Lucent-Financial-Group/Zeta/pull/4535) (2026-05-21) — the memo about this very failure mode was blocked from -landing by `git push` exit 124 timeouts, then shipped successfully via -the REST bypass. - -**Cost**: ~5-6 REST calls total per commit (well within Normal-tier -GraphQL budget). No `.git/objects/pack` reads happen locally because +landing by repeated `timeout`-wrapped `git push` runs surfacing exit 124 +(GNU `timeout`'s "command killed by timeout" status — NOT a native +`git push` exit code; the contention was hanging the push indefinitely +until the wrapper killed it). The same commits then shipped successfully +via the REST bypass. + +**Cost**: ~5-6 REST calls total per commit, consuming the **REST/core +budget** (5000/hr per token; check via `gh api rate_limit --jq +'.resources.core'`). REST/core is independent of the GraphQL budget +discussed in [`refresh-world-model-poll-pr-gate.md`](refresh-world-model-poll-pr-gate.md); +the tier classification in that rule (Normal / Cost-aware / Extreme / +Pure-git) is GraphQL-budget-scoped and does NOT translate directly to +REST/core. Empirically: even at GraphQL Extreme cost-aware tier (200–1000 +remaining), REST/core typically has thousands remaining and the bypass +is affordable. No `.git/objects/pack` reads happen locally because GitHub does the object packing server-side from the blob you uploaded. **Composes with the rate-limit operational tiers** documented in