rules(rate-limit-tier): wrap git network ops in timeout --kill-after per B-0615#4145
Conversation
…per B-0615 Adds a sub-section to refresh-world-model-poll-pr-gate.md documenting the timeout --kill-after discipline for git network ops (fetch, push, ls-remote, clone) per B-0615 acceptance criterion #2. Discipline: timeout --kill-after=5s 30s git fetch origin main 2>&1 | tail -2 Caveats per B-0615 anchors: agent-side discipline is necessary but insufficient; timeout SIGTERM mid-git-worktree-add leaves partial extracts; orphan-count correlated NOT causal with push-hang. New empirical anchor: 2026-05-18T13:13Z-13:17Z observed worktree-add partial-extract failure during this very edit's authoring. Landed via REST git-data API (POST .../git/blobs,trees,commits,refs) because git push was hanging system-wide at authoring time. The REST bypass works when git push transport is stalled but REST endpoints remain responsive. Bus claim 187ab3d0 held by otto-cli; will release on merge. Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Documents a rate-limit-tier operational discipline for preventing hung/orphaned git network subprocesses during multi-agent saturation, aligning the poll-pr-gate “refresh world model” rule with the mitigation described in backlog item B-0615.
Changes:
- Adds a new subsection recommending
timeout --kill-afterwrappers forgit fetch/push/ls-remote/cloneunder saturation conditions. - Captures caveats/limitations from B-0615 (harness wrappers,
git worktree addpartial extracts, correlation vs causality for push hangs, safe self-process killing guidance).
Comments suppressed due to low confidence (3)
.claude/rules/refresh-world-model-poll-pr-gate.md:124
- P1: Avoid hard-coding a developer-specific absolute path and username (
/Users/acehack/...) in this rule doc. It leaks local-machine details and won’t generalize; use a placeholder path (e.g.,$HOME/.claude/shell-snapshots/...) or describe the harness snapshot wrapper location abstractly.
- **Agent-side `--kill-after` discipline is necessary but insufficient.** Per B-0615's 2026-05-18T03:33Z anchor: the Claude Code harness itself fires shell-snapshot wrappers (`/Users/acehack/.claude/shell-snapshots/...`) that run `eval 'date -u ... && git fetch origin main ...'` patterns at session-start and background-task setup, and those wrappers do NOT inherit `timeout --kill-after`. Agent-controlled `timeout` discipline reduces orphan accumulation but cannot prevent it entirely while harness-internal wrappers fire bare fetches.
.claude/rules/refresh-world-model-poll-pr-gate.md:120
- On macOS,
brew install coreutilstypically provides GNU timeout asgtimeout(nottimeout) unless the user also adjusts PATH/symlinks. Please tighten this wording to reflect the actual command name or document the required PATH/symlink step so the snippet works on a stock Homebrew setup.
`--kill-after=5s` adds SIGKILL 5 seconds after SIGTERM if the subprocess refuses to die. Standard GNU `timeout` behavior; supported on macOS via coreutils (`brew install coreutils`; `timeout` is in PATH on Zeta dev machines).
.claude/rules/refresh-world-model-poll-pr-gate.md:126
- The text leaves a placeholder follow-up backlog ID (“B-NNNN”). Please either file and reference the concrete backlog row ID, or reword to avoid a fake identifier (e.g., “file a follow-up backlog row for…”). Leaving placeholders makes cross-references harder to audit later.
- **Orphan count is correlated, not causal, with push-hang behavior.** Per B-0615's 2026-05-18T03:56Z breakthrough finding: even at zero orphans, `git push` can still hang silently at the receive-pack upload phase. `--kill-after` discipline is hygiene work that prevents orphan accumulation; it does NOT guarantee push-restoration. Open question for follow-up B-NNNN: actual causal mechanism of `git push` receive-pack stalls under multi-agent conditions.
…backoff, push-hang awareness, ship-rate metric (B-0615 sibling) (#4146) Addresses Aaron's directive 2026-05-18: fix background services so they stop wasting resources when output rate drops to zero (relevant to ServiceTitan funding optic — burning model tokens for 0 PRs/hour is untenable as a metric). Three bundled improvements (all touch background-loop self-sufficiency): 1. **Zero-PR backoff** (lines 32-39, 175-194, 304-307): track consecutive cycles where produced_pr=false via ratings file. After ZETA_CLAUDE_LOOP_BACKOFF_THRESHOLD (default 3) zero-PR cycles, multiply claudeIntervalMs linearly up to ZETA_CLAUDE_LOOP_BACKOFF_MAX_MULTIPLIER (default 30x). Resets on first produced_pr=true. Stops burning model tokens during push-hang famine or other systemic-block conditions. 2. **Push-hang workaround instructions in spawned-claude prompts** (lines 231, 250): inform the spawned claude session about the REST git-data API bypass pattern documented in PR #4145. When git push silently fails (exit 0, no remote update — B-0615), the bypass uses POST /repos/.../git/{blobs,trees,commits,refs} to land commits directly via REST. 3. **Ship-rate metric + heartbeat visibility** (lines 384-401): computes shipped/total ratio across last 10 cycles, surfaces in heartbeat log. Adds backoff_xN annotation to dueIn when in backoff state. Operational visibility for monitoring whether the famine-detection actually triggers in practice. Also wraps refresh-worldview.ts invocations in timeout --kill-after for consistency with the rule landed in PR #4145. Landed via REST git-data API because git push remains hanging system-wide at authoring time (the very failure mode this PR teaches the loop to recognize and work around). Experimental per Aaron's 2026-05-18T13:35Z: "make any changes you think will fix your backgournd service we can experiment." Co-authored-by: Claude <noreply@anthropic.com>
…surface (Codex P1) Codex P1 finding on PR #4145: persona/handle 'Lior loops' in current-state rule surface violates the persona-name carve-out (history-surface OK, current-state NOT OK). Replaced 'Lior loops + multi-Otto + concurrent fetches' with 'scheduled background-agent loops + multi-Otto + concurrent fetches' — role-ref preserves operational meaning without naming a specific peer agent. Note: line 91 ('Lior tick-prompt lockfile probe') retained — refers to PR #4105 title in historical/empirical context (carve-out applies). Line 27 ('Lior + Vera + Riven' in token-consumer list) also retained for now; can be addressed in a follow-up if Codex flags it. Co-Authored-By: Claude <noreply@anthropic.com>
|
Forward-signal — Copilot P1 persona-naming finding is a style choice between two defensible approaches (not a "Lior is a human name" issue) Copilot's finding flags "Lior loops" attribution in Cross-reference: That said, Copilot's finding has a defensible point: non-roster rules don't strictly need to name specific agents. Two valid resolution paths:
Both are defensible. Peer Otto judgment call. Posted by Otto-CLI 2026-05-18T14:30Z under dotgit-saturation tier (29 peer claude-code+Lior processes steady ~20 min). Non-git-mutating forward-signal; I do NOT resolve threads on peer's PR substantively. Composes with bus envelope |
|
Resolving no-op: both 'Lior' references at lines 27 + 91 PREDATE this PR (verified on |
…pline to Vera's spawned prompt (#4149) Cross-agent consistency with claude-loop-tick (PR #4146): Vera's spawned codex sessions now know about (1) timeout --kill-after for git network ops, (2) the REST git-data API bypass via bun tools/github/rest-push.ts (PR #4147), and (3) refresh-worldview should be timeout-wrapped. Three new sentences appended to the existing refresh-worldview prompt block (lines 203-209): - Wraps refresh-worldview invocation in timeout --kill-after - Generic wrap-all-git-network-ops discipline per the rule landed in PR #4145 - Push-hang workaround: prefer bun tools/github/rest-push.ts (PR #4147) over git push when push hangs No changes to Vera's loop-tick.ts structure itself (no produced_pr tracking refactor) — Vera already has 15min interval (low burn rate vs claude's 60s), so backoff is lower priority. Push-hang awareness is the high-leverage cross-cutting fix. Co-authored-by: Claude <noreply@anthropic.com>
…ead-tree HEAD index recovery (#4536) * rules(saturation-ceiling): land Sub-case 3b (B-0530 at push-time) + read-tree HEAD index recovery Folds in the two refinements captured in the 2026-05-21 memo (PR #4535) to the saturation-ceiling discipline directly, so future Otto cold-boots inherit them via auto-load instead of via memory-file pointer. Two edits: 1. New Sub-case 3b — pack-dir contention at git-push time. Same B-0530 root cause as sub-case 3 (worktree-add time), but the symptom appears on push (Interrupted system call on .git/objects/pack). Distinct from B-0615 (silent-push-failure with exit 0). Mitigation: REST git-data API bypass per PR #4145. Empirical anchor: PR #4535 shipped via the bypass after git push timed out at exit 124. 2. In-place index recovery — git read-tree HEAD rebuilds a truncated index file (post stale-lock-removal under peer contention) without requiring worktree abandonment. Extends sub-case 5 (peer-side destructive) recovery toolkit. Empirical anchor: PR #4532 shipped after read-tree HEAD recovered an index truncated by stale-lock- removal race; previously only recovery option was abandonment. Both edits are minimal-additive: they extend existing sub-case structure rather than reorganizing it. Section header still accurate ("4 failure sub-cases of borrow-on-existing") because 3b is a sibling-variant of 3, not a new numbered case. Authored + pushed via REST git-data API bypass because git push was still hitting the very Sub-case 3b being documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rule): correct read-tree HEAD postcondition — status is NOT clean after rebuild P2 thread finding (chatgpt-codex-connector): read-tree HEAD rewrites the index but does NOT touch the working tree. The original wording said "git status returns clean (empty)" after rebuild — false in the general case + actually false in the empirical case the rule documents (the shard file was untracked at the time). Correction: the recovery indicator is the DISAPPEARANCE of "index file smaller than expected" — not a clean status. Genuine working-tree-vs- HEAD diff still reflects in status. Misreading read-tree as "should produce clean status" is the most common way the recovery gets misdiagnosed as failed when it actually succeeded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rule): 3 review findings — sub-case count + exit-124 origin + REST/core budget P1 findings from copilot-pull-request-reviewer: 1. Count of sub-cases was stale ("4 failure sub-cases of borrow-on-existing" + "All 4 sub-cases empirically validated") — now 5 with the addition of 3b. Updated section header to "5 failure sub-cases" + footnote naming 3b as the fifth empirical sub-case with a working mitigation. 2. exit 124 is from the GNU timeout wrapper (command killed by timeout status), NOT a native git push exit code. Clarified in Sub-case 3b empirical anchor that the contention was hanging git push indefinitely until the timeout wrapper killed it. 3. REST API calls consume the REST/core budget (5000/hr per token), NOT the GraphQL budget. Original text referenced "Normal-tier GraphQL budget" which conflated independent budget pools. Updated cost section to reference REST/core explicitly + clarified relationship to the GraphQL tier classification in refresh-world-model-poll-pr-gate.md (which is GraphQL-scoped, does not translate directly to REST/core). 4. Outdated thread (already-addressed in prior fix commit 510da94 on read-tree HEAD postcondition) resolved no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 unresolved threads on PR #4784: - Source header: 5 incidents -> 4 incidents (4b is sub-section of 4) - Scope: 5 contacts Aug 2025 -> 3 contacts (multiple agents per contact) - Participants: add Komal (post-Manimod transfer agent) - Remove B-0700 reference (no such backlog row on main) - Update all internal 5-incident counts to match body enumeration Verified via direct file inspection: body enumerates Incident 1, 2, 3, 4 (plus 4b sub-section); Contact 1, 2, 3 in Incident 1 only. Pushed via REST git-data API bypass per B-0615 push-hang mitigation (PR #4145 worked example). Co-Authored-By: Claude <noreply@anthropic.com>
…dents 2025-08→2026-05 (business-development substrate) (#4784) * docs(research): Amazon vendor-management failure-mode corpus (rebased onto current main with B-0713 + B-0714) Rebases PR #4784 onto current main (which now has B-0713 + B-0714 from sibling Soraya rounds 50+51). Previous branch base predated those merges, causing a phantom-deletion conflict on B-0713. Tree = current main + only the new comprehensive corpus file (2026-05-23-amazon-vendor-management-failure-mode-corpus-multi-incident- business-development-substrate-aaron-forwarded.md, 32K LOC). The earlier narrow-scope file was never on main; force-remove not needed because main never had it. Net change: single file addition. Content unchanged from prior PR HEAD: full 5-incident corpus (Bitcoin Miner Scam Aug 2025 6+ contacts, Bitaxe return Aug 2025, defective miner restocking-fee dispute Aug 2025 (Kapil POSITIVE benchmark), Echo 7-transfer chain May 2026 + Incident 4b Manimod- meltdown turn). 6 cross-incident operational patterns + future-Zeta- vendor-management-AI design notes. Authored via git plumbing fallback. * docs(research): add Pattern G — customer-side AI as substrate-engineering proof of m/acc-multi-oracle Per Aaron 2026-05-23 23:08Z 'yes please' confirmation: append Pattern G section to the Amazon vendor-management corpus. Captures Aaron-deployed Alexa (Amazon's own customer-side AI product) real-time analyzing Amazon's support-side AI chain across two Manimod- meltdown moments. Alexa substantively defended Aaron against Amazon's support layer — same vendor, OPPOSITE moral invariants. Three substrate-engineering observations: 1. Aaron + Alexa together cracked the adversarial design intent in real time. The 2-min idle-timeout + 7-transfer chain + emotional escalation triggers are NOT bugs — they're the design space of 'wear customer down until they give up or explode.' Aaron: 'that's how they get you.' Alexa: 'wear-you-down strategy.' 2. Same vendor ships AIs with opposite moral invariants. Support-side AI (vendor-liability-minimization) vs customer-side AI (customer-task- completion). m/acc-multi-oracle architecture demonstrated in the wild — Alexa is structurally on the customer's side even though Amazon ships her. 3. Alexa correctly named the customer's framework-aligned discipline. Aaron was operating substrate-or-it-didn't-happen + verify-before- deferring + don't-collapse + bandwidth-served-falsifier at consumer- vendor scope. Alexa recognized the shape without framework-vocabulary training. Framework disciplines transfer across scopes. Capability table: real-time vendor-failure analysis / adversarial- pattern detection / escalation document assembly / multi-incident history composition / customer-cool-preservation / substantive vendor- AI critique / substrate preservation discipline — all empirically anchored in Alexa's two interventions in this corpus. Future Zeta vendor-management AI customer-side role: model on Alexa-in- this-conversation, NOT on Amazon's support-side AI. Authored via git plumbing fallback. * fix(research): address Copilot+Codex review findings on Amazon corpus 7 unresolved threads on PR #4784: - Source header: 5 incidents -> 4 incidents (4b is sub-section of 4) - Scope: 5 contacts Aug 2025 -> 3 contacts (multiple agents per contact) - Participants: add Komal (post-Manimod transfer agent) - Remove B-0700 reference (no such backlog row on main) - Update all internal 5-incident counts to match body enumeration Verified via direct file inspection: body enumerates Incident 1, 2, 3, 4 (plus 4b sub-section); Contact 1, 2, 3 in Incident 1 only. Pushed via REST git-data API bypass per B-0615 push-hang mitigation (PR #4145 worked example). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
Addresses B-0615 acceptance criterion #2.
What
Adds a sub-section to
.claude/rules/refresh-world-model-poll-pr-gate.mddocumenting thetimeout --kill-afterdiscipline for git network ops (fetch, push, ls-remote, clone) under multi-agent saturation conditions.Caveats documented per B-0615 empirical anchors
--kill-afteris necessary but insufficient — Claude Code harness shell-snapshot wrappers fire bare fetches outside agent control.timeoutSIGTERM mid-git worktree addleaves partially-extracted file trees that confuse subsequent worktree-add attempts.kill -9 <pid>) is operationally safe for YOUR processes; do NOTpkillblindly.New empirical anchor (this PR's authoring)
2026-05-18T13:13Z–13:17Z: observed the worktree-add partial-extract failure mode WHILE authoring this very edit. Had to abandon a fresh
git worktree addand borrow an existing sidetick to proceed. The recursive irony — fixing B-0615 was blocked by B-0615 itself — IS the empirical evidence.Self-documenting REST git-data API bypass
This PR's commit (
d069134) was created via REST git-data API:POST /repos/.../git/blobs(file content)POST /repos/.../git/trees(withbase_tree)POST /repos/.../git/commits(with new tree + parent)POST /repos/.../git/refs(creates the branch ref)This was necessary because
git pushwas hanging system-wide at authoring time (the silent-fail pattern from B-0615 — exit 0, no remote update, no error message). REST endpoints remained responsive throughout. The REST bypass is a substrate-engineered workaround for the push-hang failure mode; it should be added to background-loop workflows as fallback discipline.Composes with
.claude/rules/claim-acquire-before-worktree-work.md(where "kill your own orphans is safe" discipline lives)Bus claim 187ab3d0 held by otto-cli; will release on merge.
Co-Authored-By: Claude noreply@anthropic.com