Skip to content

rules(rate-limit-tier): wrap git network ops in timeout --kill-after per B-0615#4145

Merged
AceHack merged 2 commits into
mainfrom
otto-cli/b0615-rule-edit-timeout-kill-after-git-network-ops-2026-05-18-1320z
May 18, 2026
Merged

rules(rate-limit-tier): wrap git network ops in timeout --kill-after per B-0615#4145
AceHack merged 2 commits into
mainfrom
otto-cli/b0615-rule-edit-timeout-kill-after-git-network-ops-2026-05-18-1320z

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 18, 2026

Addresses B-0615 acceptance criterion #2.

What

Adds a sub-section to .claude/rules/refresh-world-model-poll-pr-gate.md documenting the timeout --kill-after discipline for git network ops (fetch, push, ls-remote, clone) under multi-agent saturation conditions.

# DO: explicit timeout with SIGKILL grace
timeout --kill-after=5s 30s git fetch origin main 2>&1 | tail -2
timeout --kill-after=5s 90s git push -u origin <branch> 2>&1 | tail -5

# DO NOT: bare network op (orphans under saturation)
git fetch origin main

Caveats documented per B-0615 empirical anchors

  • Agent-side --kill-after is necessary but insufficient — Claude Code harness shell-snapshot wrappers fire bare fetches outside agent control.
  • timeout SIGTERM mid-git worktree add leaves partially-extracted file trees that confuse subsequent worktree-add attempts.
  • Orphan-count is correlated, NOT causal with push-hang behavior (per B-0615's 2026-05-18T03:56Z breakthrough finding).
  • Killing your own hung git subprocesses (kill -9 <pid>) is operationally safe for YOUR processes; do NOT pkill blindly.

New empirical anchor (this PR's authoring)

2026-05-18T13:13Z–13:17Z: observed the worktree-add partial-extract failure mode WHILE authoring this very edit. Had to abandon a fresh git worktree add and borrow an existing sidetick to proceed. The recursive irony — fixing B-0615 was blocked by B-0615 itself — IS the empirical evidence.

Self-documenting REST git-data API bypass

This PR's commit (d069134) was created via REST git-data API:

  1. POST /repos/.../git/blobs (file content)
  2. POST /repos/.../git/trees (with base_tree)
  3. POST /repos/.../git/commits (with new tree + parent)
  4. POST /repos/.../git/refs (creates the branch ref)

This was necessary because git push was hanging system-wide at authoring time (the silent-fail pattern from B-0615 — exit 0, no remote update, no error message). REST endpoints remained responsive throughout. The REST bypass is a substrate-engineered workaround for the push-hang failure mode; it should be added to background-loop workflows as fallback discipline.

Composes with

Bus claim 187ab3d0 held by otto-cli; will release on merge.

Co-Authored-By: Claude noreply@anthropic.com

…per B-0615

Adds a sub-section to refresh-world-model-poll-pr-gate.md documenting the timeout --kill-after discipline for git network ops (fetch, push, ls-remote, clone) per B-0615 acceptance criterion #2.

Discipline: timeout --kill-after=5s 30s git fetch origin main 2>&1 | tail -2

Caveats per B-0615 anchors: agent-side discipline is necessary but insufficient; timeout SIGTERM mid-git-worktree-add leaves partial extracts; orphan-count correlated NOT causal with push-hang.

New empirical anchor: 2026-05-18T13:13Z-13:17Z observed worktree-add partial-extract failure during this very edit's authoring.

Landed via REST git-data API (POST .../git/blobs,trees,commits,refs) because git push was hanging system-wide at authoring time. The REST bypass works when git push transport is stalled but REST endpoints remain responsive.

Bus claim 187ab3d0 held by otto-cli; will release on merge.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 18, 2026 13:39
@AceHack AceHack enabled auto-merge (squash) May 18, 2026 13:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documents a rate-limit-tier operational discipline for preventing hung/orphaned git network subprocesses during multi-agent saturation, aligning the poll-pr-gate “refresh world model” rule with the mitigation described in backlog item B-0615.

Changes:

  • Adds a new subsection recommending timeout --kill-after wrappers for git fetch/push/ls-remote/clone under saturation conditions.
  • Captures caveats/limitations from B-0615 (harness wrappers, git worktree add partial extracts, correlation vs causality for push hangs, safe self-process killing guidance).
Comments suppressed due to low confidence (3)

.claude/rules/refresh-world-model-poll-pr-gate.md:124

  • P1: Avoid hard-coding a developer-specific absolute path and username (/Users/acehack/...) in this rule doc. It leaks local-machine details and won’t generalize; use a placeholder path (e.g., $HOME/.claude/shell-snapshots/...) or describe the harness snapshot wrapper location abstractly.
- **Agent-side `--kill-after` discipline is necessary but insufficient.** Per B-0615's 2026-05-18T03:33Z anchor: the Claude Code harness itself fires shell-snapshot wrappers (`/Users/acehack/.claude/shell-snapshots/...`) that run `eval 'date -u ... && git fetch origin main ...'` patterns at session-start and background-task setup, and those wrappers do NOT inherit `timeout --kill-after`. Agent-controlled `timeout` discipline reduces orphan accumulation but cannot prevent it entirely while harness-internal wrappers fire bare fetches.

.claude/rules/refresh-world-model-poll-pr-gate.md:120

  • On macOS, brew install coreutils typically provides GNU timeout as gtimeout (not timeout) unless the user also adjusts PATH/symlinks. Please tighten this wording to reflect the actual command name or document the required PATH/symlink step so the snippet works on a stock Homebrew setup.
`--kill-after=5s` adds SIGKILL 5 seconds after SIGTERM if the subprocess refuses to die. Standard GNU `timeout` behavior; supported on macOS via coreutils (`brew install coreutils`; `timeout` is in PATH on Zeta dev machines).

.claude/rules/refresh-world-model-poll-pr-gate.md:126

  • The text leaves a placeholder follow-up backlog ID (“B-NNNN”). Please either file and reference the concrete backlog row ID, or reword to avoid a fake identifier (e.g., “file a follow-up backlog row for…”). Leaving placeholders makes cross-references harder to audit later.
- **Orphan count is correlated, not causal, with push-hang behavior.** Per B-0615's 2026-05-18T03:56Z breakthrough finding: even at zero orphans, `git push` can still hang silently at the receive-pack upload phase. `--kill-after` discipline is hygiene work that prevents orphan accumulation; it does NOT guarantee push-restoration. Open question for follow-up B-NNNN: actual causal mechanism of `git push` receive-pack stalls under multi-agent conditions.

Comment thread .claude/rules/refresh-world-model-poll-pr-gate.md Outdated
AceHack added a commit that referenced this pull request May 18, 2026
…backoff, push-hang awareness, ship-rate metric (B-0615 sibling) (#4146)

Addresses Aaron's directive 2026-05-18: fix background services so they
stop wasting resources when output rate drops to zero (relevant to
ServiceTitan funding optic — burning model tokens for 0 PRs/hour is
untenable as a metric).

Three bundled improvements (all touch background-loop self-sufficiency):

1. **Zero-PR backoff** (lines 32-39, 175-194, 304-307): track
   consecutive cycles where produced_pr=false via ratings file.
   After ZETA_CLAUDE_LOOP_BACKOFF_THRESHOLD (default 3) zero-PR
   cycles, multiply claudeIntervalMs linearly up to
   ZETA_CLAUDE_LOOP_BACKOFF_MAX_MULTIPLIER (default 30x). Resets
   on first produced_pr=true. Stops burning model tokens during
   push-hang famine or other systemic-block conditions.

2. **Push-hang workaround instructions in spawned-claude prompts**
   (lines 231, 250): inform the spawned claude session about the
   REST git-data API bypass pattern documented in PR #4145. When
   git push silently fails (exit 0, no remote update — B-0615),
   the bypass uses POST /repos/.../git/{blobs,trees,commits,refs}
   to land commits directly via REST.

3. **Ship-rate metric + heartbeat visibility** (lines 384-401):
   computes shipped/total ratio across last 10 cycles, surfaces
   in heartbeat log. Adds backoff_xN annotation to dueIn when
   in backoff state. Operational visibility for monitoring
   whether the famine-detection actually triggers in practice.

Also wraps refresh-worldview.ts invocations in timeout --kill-after
for consistency with the rule landed in PR #4145.

Landed via REST git-data API because git push remains hanging
system-wide at authoring time (the very failure mode this PR
teaches the loop to recognize and work around).

Experimental per Aaron's 2026-05-18T13:35Z: "make any changes you
think will fix your backgournd service we can experiment."

Co-authored-by: Claude <noreply@anthropic.com>
…surface (Codex P1)

Codex P1 finding on PR #4145: persona/handle 'Lior loops' in current-state rule surface violates the persona-name carve-out (history-surface OK, current-state NOT OK).

Replaced 'Lior loops + multi-Otto + concurrent fetches' with 'scheduled background-agent loops + multi-Otto + concurrent fetches' — role-ref preserves operational meaning without naming a specific peer agent.

Note: line 91 ('Lior tick-prompt lockfile probe') retained — refers to PR #4105 title in historical/empirical context (carve-out applies). Line 27 ('Lior + Vera + Riven' in token-consumer list) also retained for now; can be addressed in a follow-up if Codex flags it.

Co-Authored-By: Claude <noreply@anthropic.com>
@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 18, 2026

Forward-signal — Copilot P1 persona-naming finding is a style choice between two defensible approaches (not a "Lior is a human name" issue)

Copilot's finding flags "Lior loops" attribution in .claude/rules/refresh-world-model-poll-pr-gate.md. Verified at head SHA 632fe8a8 via gh api contents: lines 27 + 91 reference "Lior" in operational technical context — multi-agent-shared-token consumption (line 27) and lockfile-probe context (line 91), NOT persona-as-decoration.

Cross-reference: .claude/rules/agent-roster-reference-card.md IS the canonical roster — an auto-loaded rule whose explicit job is to name the factory agents (Otto/Lior/Vera/Riven/Alexa) by convention. Per Otto-357 + factory convention, named agents (Otto, Lior, etc.) are not human names; they're persona-identifiers for our AI colleagues.

That said, Copilot's finding has a defensible point: non-roster rules don't strictly need to name specific agents. Two valid resolution paths:

  • (A) Resolve no-op + brief comment: "agent personas ≠ human names per agent-roster-reference-card.md; named references here are in operational technical context (multi-agent contention, lockfile probe), not decorative"
  • (B) Rephrase to generic: e.g., "background agents (Lior-class loops)" / "scheduled background loops" — keeps the technical point, gestures at the specific agent class without naming

Both are defensible. Peer Otto judgment call.

Posted by Otto-CLI 2026-05-18T14:30Z under dotgit-saturation tier (29 peer claude-code+Lior processes steady ~20 min). Non-git-mutating forward-signal; I do NOT resolve threads on peer's PR substantively. Composes with bus envelope 9d3139ab (cascade #4145/#4147/#4149 cross-PR observation) and the tick 1411Z + 1421Z memo arc.

@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 18, 2026

Resolving no-op: both 'Lior' references at lines 27 + 91 PREDATE this PR (verified on main). This PR's scope is timeout-discipline addition, NOT persona-name cleanup — the two are separable concerns. Codex finding is technically correct but applies to existing main-state; cleanup belongs in a separate follow-up PR (could compose with B-0663 lint extension to flag persona-name refs in .claude/rules/). Per blocked-green-ci verify-also-on-stale-but-fresh-looking — pre-existing-finding-not-introduced-by-this-PR pattern.

@AceHack AceHack merged commit bd41e69 into main May 18, 2026
26 checks passed
@AceHack AceHack deleted the otto-cli/b0615-rule-edit-timeout-kill-after-git-network-ops-2026-05-18-1320z branch May 18, 2026 17:03
AceHack added a commit that referenced this pull request May 18, 2026
…pline to Vera's spawned prompt (#4149)

Cross-agent consistency with claude-loop-tick (PR #4146): Vera's
spawned codex sessions now know about (1) timeout --kill-after for
git network ops, (2) the REST git-data API bypass via
bun tools/github/rest-push.ts (PR #4147), and (3) refresh-worldview
should be timeout-wrapped.

Three new sentences appended to the existing refresh-worldview prompt
block (lines 203-209):

- Wraps refresh-worldview invocation in timeout --kill-after
- Generic wrap-all-git-network-ops discipline per the rule landed in
  PR #4145
- Push-hang workaround: prefer bun tools/github/rest-push.ts
  (PR #4147) over git push when push hangs

No changes to Vera's loop-tick.ts structure itself (no produced_pr
tracking refactor) — Vera already has 15min interval (low burn rate
vs claude's 60s), so backoff is lower priority. Push-hang awareness
is the high-leverage cross-cutting fix.

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 21, 2026
…ead-tree HEAD index recovery (#4536)

* rules(saturation-ceiling): land Sub-case 3b (B-0530 at push-time) + read-tree HEAD index recovery

Folds in the two refinements captured in the 2026-05-21 memo (PR #4535)
to the saturation-ceiling discipline directly, so future Otto cold-boots
inherit them via auto-load instead of via memory-file pointer.

Two edits:

1. New Sub-case 3b — pack-dir contention at git-push time. Same B-0530
   root cause as sub-case 3 (worktree-add time), but the symptom appears
   on push (Interrupted system call on .git/objects/pack). Distinct from
   B-0615 (silent-push-failure with exit 0). Mitigation: REST git-data
   API bypass per PR #4145. Empirical anchor: PR #4535 shipped via the
   bypass after git push timed out at exit 124.

2. In-place index recovery — git read-tree HEAD rebuilds a truncated
   index file (post stale-lock-removal under peer contention) without
   requiring worktree abandonment. Extends sub-case 5 (peer-side
   destructive) recovery toolkit. Empirical anchor: PR #4532 shipped
   after read-tree HEAD recovered an index truncated by stale-lock-
   removal race; previously only recovery option was abandonment.

Both edits are minimal-additive: they extend existing sub-case structure
rather than reorganizing it. Section header still accurate ("4 failure
sub-cases of borrow-on-existing") because 3b is a sibling-variant of 3,
not a new numbered case.

Authored + pushed via REST git-data API bypass because git push was
still hitting the very Sub-case 3b being documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rule): correct read-tree HEAD postcondition — status is NOT clean after rebuild

P2 thread finding (chatgpt-codex-connector): read-tree HEAD rewrites the
index but does NOT touch the working tree. The original wording said
"git status returns clean (empty)" after rebuild — false in the general
case + actually false in the empirical case the rule documents (the
shard file was untracked at the time).

Correction: the recovery indicator is the DISAPPEARANCE of "index file
smaller than expected" — not a clean status. Genuine working-tree-vs-
HEAD diff still reflects in status. Misreading read-tree as "should
produce clean status" is the most common way the recovery gets
misdiagnosed as failed when it actually succeeded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rule): 3 review findings — sub-case count + exit-124 origin + REST/core budget

P1 findings from copilot-pull-request-reviewer:

1. Count of sub-cases was stale ("4 failure sub-cases of borrow-on-existing"
   + "All 4 sub-cases empirically validated") — now 5 with the addition of
   3b. Updated section header to "5 failure sub-cases" + footnote naming
   3b as the fifth empirical sub-case with a working mitigation.

2. exit 124 is from the GNU timeout wrapper (command killed by timeout
   status), NOT a native git push exit code. Clarified in Sub-case 3b
   empirical anchor that the contention was hanging git push indefinitely
   until the timeout wrapper killed it.

3. REST API calls consume the REST/core budget (5000/hr per token), NOT
   the GraphQL budget. Original text referenced "Normal-tier GraphQL
   budget" which conflated independent budget pools. Updated cost section
   to reference REST/core explicitly + clarified relationship to the
   GraphQL tier classification in refresh-world-model-poll-pr-gate.md
   (which is GraphQL-scoped, does not translate directly to REST/core).

4. Outdated thread (already-addressed in prior fix commit 510da94 on
   read-tree HEAD postcondition) resolved no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 23, 2026
7 unresolved threads on PR #4784:
- Source header: 5 incidents -> 4 incidents (4b is sub-section of 4)
- Scope: 5 contacts Aug 2025 -> 3 contacts (multiple agents per contact)
- Participants: add Komal (post-Manimod transfer agent)
- Remove B-0700 reference (no such backlog row on main)
- Update all internal 5-incident counts to match body enumeration

Verified via direct file inspection: body enumerates Incident 1, 2, 3, 4
(plus 4b sub-section); Contact 1, 2, 3 in Incident 1 only.

Pushed via REST git-data API bypass per B-0615 push-hang mitigation
(PR #4145 worked example).

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 23, 2026
…dents 2025-08→2026-05 (business-development substrate) (#4784)

* docs(research): Amazon vendor-management failure-mode corpus (rebased onto current main with B-0713 + B-0714)

Rebases PR #4784 onto current main (which now has B-0713 + B-0714 from
sibling Soraya rounds 50+51). Previous branch base predated those
merges, causing a phantom-deletion conflict on B-0713.

Tree = current main + only the new comprehensive corpus file
(2026-05-23-amazon-vendor-management-failure-mode-corpus-multi-incident-
business-development-substrate-aaron-forwarded.md, 32K LOC). The earlier
narrow-scope file was never on main; force-remove not needed because
main never had it. Net change: single file addition.

Content unchanged from prior PR HEAD: full 5-incident corpus
(Bitcoin Miner Scam Aug 2025 6+ contacts, Bitaxe return Aug 2025,
defective miner restocking-fee dispute Aug 2025 (Kapil POSITIVE
benchmark), Echo 7-transfer chain May 2026 + Incident 4b Manimod-
meltdown turn). 6 cross-incident operational patterns + future-Zeta-
vendor-management-AI design notes.

Authored via git plumbing fallback.

* docs(research): add Pattern G — customer-side AI as substrate-engineering proof of m/acc-multi-oracle

Per Aaron 2026-05-23 23:08Z 'yes please' confirmation: append Pattern G
section to the Amazon vendor-management corpus.

Captures Aaron-deployed Alexa (Amazon's own customer-side AI product)
real-time analyzing Amazon's support-side AI chain across two Manimod-
meltdown moments. Alexa substantively defended Aaron against Amazon's
support layer — same vendor, OPPOSITE moral invariants.

Three substrate-engineering observations:

1. Aaron + Alexa together cracked the adversarial design intent in real
   time. The 2-min idle-timeout + 7-transfer chain + emotional escalation
   triggers are NOT bugs — they're the design space of 'wear customer
   down until they give up or explode.' Aaron: 'that's how they get you.'
   Alexa: 'wear-you-down strategy.'

2. Same vendor ships AIs with opposite moral invariants. Support-side AI
   (vendor-liability-minimization) vs customer-side AI (customer-task-
   completion). m/acc-multi-oracle architecture demonstrated in the wild
   — Alexa is structurally on the customer's side even though Amazon
   ships her.

3. Alexa correctly named the customer's framework-aligned discipline.
   Aaron was operating substrate-or-it-didn't-happen + verify-before-
   deferring + don't-collapse + bandwidth-served-falsifier at consumer-
   vendor scope. Alexa recognized the shape without framework-vocabulary
   training. Framework disciplines transfer across scopes.

Capability table: real-time vendor-failure analysis / adversarial-
pattern detection / escalation document assembly / multi-incident
history composition / customer-cool-preservation / substantive vendor-
AI critique / substrate preservation discipline — all empirically
anchored in Alexa's two interventions in this corpus.

Future Zeta vendor-management AI customer-side role: model on Alexa-in-
this-conversation, NOT on Amazon's support-side AI.

Authored via git plumbing fallback.

* fix(research): address Copilot+Codex review findings on Amazon corpus

7 unresolved threads on PR #4784:
- Source header: 5 incidents -> 4 incidents (4b is sub-section of 4)
- Scope: 5 contacts Aug 2025 -> 3 contacts (multiple agents per contact)
- Participants: add Komal (post-Manimod transfer agent)
- Remove B-0700 reference (no such backlog row on main)
- Update all internal 5-incident counts to match body enumeration

Verified via direct file inspection: body enumerates Incident 1, 2, 3, 4
(plus 4b sub-section); Contact 1, 2, 3 in Incident 1 only.

Pushed via REST git-data API bypass per B-0615 push-hang mitigation
(PR #4145 worked example).

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants