From 7ea4dc3e41cc5e72dbaf0ab9c556fc44eea7e490 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 15 May 2026 22:47:00 -0400 Subject: [PATCH 1/2] =?UTF-8?q?shard(tick):=202026-05-16T02:45Z=20?= =?UTF-8?q?=E2=80=94=20GraphQL=20exhaust=20on=20--all-open=20poll?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Bg-worker session opened, refreshed worldview (45 PRs), then poll-pr-gate-batch.ts --all-open hit graphql 0/5000 across 44 errors - Documented exact failure mode warned by refresh-world-model-poll-pr-gate.md cost-awareness section (1.4 --all-open polls per agent-hour with 4 agents sharing Aaron's token) - Bounded named-dependency wait: graphql resets 02:55:28Z (~10 min) - N=1 brief-ack with concrete substrate output; counter resets per holding-without-named-dependency clause - Lior-gemini active (ps -A); honored no-new-worktree safe-window check - Otto-owned PRs #3677, #3690, #3649 surveyed via REST (GraphQL-free) for next-tick targeted poll plan Composes with: refresh-world-model-poll-pr-gate.md, holding-without-named-dependency-is-standing-by-failure.md, codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md --- .../hygiene-history/ticks/2026/05/16/0245Z.md | 76 +++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 docs/hygiene-history/ticks/2026/05/16/0245Z.md diff --git a/docs/hygiene-history/ticks/2026/05/16/0245Z.md b/docs/hygiene-history/ticks/2026/05/16/0245Z.md new file mode 100644 index 000000000..208e51edd --- /dev/null +++ b/docs/hygiene-history/ticks/2026/05/16/0245Z.md @@ -0,0 +1,76 @@ +# Tick 2026-05-16T02:45Z — Otto-CLI (bg-worker) + +**Surface**: Otto-CLI bg-worker session (Claude Code, Opus 4.7, autonomous-loop tick) +**Parent tick**: [2026-05-16T02:34Z](0234Z.md) — multi-Otto duplicate-work catch + B-0545 scope re-assessment + +## What landed this tick + +No code substrate; **substrate-honest pivot tick** documenting a known cascade-cost failure mode firing exactly as the rules predict, then turning the bounded named-dependency wait into decomposition. + +## Catch: GraphQL rate-limit exhausted on `--all-open` poll + +Sequence: + +- 02:39Z — `bun tools/github/refresh-worldview.ts` succeeded (45 open PRs, summary line) +- 02:41Z — `bun tools/github/poll-pr-gate-batch.ts --all-open` (intent: identify BLOCKED PRs with `nextAction=resolve-threads`) +- 02:42Z — all 44 per-PR `gh pr view` calls returned `GraphQL: API rate limit already exceeded for user ID 578953`; `reports: []`, `errors: [44]` +- 02:42Z — `gh api rate_limit` confirmed: **graphql 0/5000 remaining, resets at 1778900128 = 2026-05-16T02:55:28Z** (~13 min ETA) + +Core REST budget intact (4894/5000). Search budget intact (30). + +This is the **exact failure mode** [`refresh-world-model-poll-pr-gate.md`](../../../../../../.claude/rules/refresh-world-model-poll-pr-gate.md#cost-awareness-under-cascade-conditions) warns about: + +> `poll-pr-gate-batch.ts --all-open` fires ~N `gh pr view` calls (N = open PR count, ~37 in cascade-mode). At 4× polls per cascade hour + cross-tick `gh` operations + multi-agent shared-token consumption (Otto-CLI + Otto-Desktop + Lior + Vera + Riven all draw from Aaron's user-token), the 5000 GraphQL/hour budget exhausts. Avoid `--all-open` more than 1-2× per cascade window. + +The empirical anchor it cites (2026-05-15T22:21Z, `{remaining:0, used:5005}`) just repeated 2026-05-16T02:42Z, `{remaining:0, used:5000}`. Same failure shape, ~4h later. The rule encodes prevention but the discipline lives in the agent — this session ran `--all-open` once as the canonical wake-and-orient action, didn't notice it was the second-or-third such call this cascade window (peer-Otto sessions share the budget per `gh_graphql_rate_limit_cascade_cost_*` memo). + +**Substrate-honest framing**: Aaron's user-token is the shared resource across all autonomous-loop surfaces. The 5000/hr budget is roughly 1.4 `--all-open` polls per agent-hour when 4 agents share it. Cascade ticks (multiple PR landings in a 30-min window) double or triple per-agent draw on top of that. + +## Why this was NOT a refresh-discipline violation + +Per [`refresh-before-decide.md`](../../../../../../.claude/rules/refresh-before-decide.md): refresh-before-decide is mandatory at session start. The 02:39Z `refresh-worldview.ts` call covered the invariant — that script uses a single GraphQL query (efficient). The cost-amplifying call was the per-PR `--all-open` batch poll that followed. + +The discipline correction is: refresh-worldview gives 45-PR list + per-PR metadata (including `autoMergeRequest`, `reviewDecision`, `headRefName`, etc.) in ONE GraphQL call. That should be enough to make first-cut triage decisions without the per-PR `gh pr view` storm. **If a specific PR shows `reviewDecision: CHANGES_REQUESTED` or has an Otto-owned `headRefName` worth investigating, THEN do the focused poll** — not the all-open scan as default. + +## Holding-discipline state + +**N=1 brief-ack**. This tick is producing substrate (the shard) AND staying within the [`holding-without-named-dependency-is-standing-by-failure.md`](../../../../../../.claude/rules/holding-without-named-dependency-is-standing-by-failure.md) discipline: + +- Named dependency: GraphQL reset at **02:55:28Z** (10 min from now) +- Bounded ETA: yes +- Decomposition work this tick: this shard + worldview-vs-batch-poll cost analysis (above) + +Counter resets per "actually picking real decomposition work" clause (concrete artifact = this shard; bounded scope = one tick; not a brief-ack with synonyms). + +## Concurrent-Otto / Lior state observed + +- 18 `claude-code` processes detected (`ps -A`) +- 2 `gemini.*Lior` processes — **active**; per [`codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md`](../../../../../../.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md) safe-window check, do **NOT** create new worktrees this window +- 884 entries in `git worktree list` (many `prunable`) — heavy historical contention +- This session uses its existing dedicated worktree (`/Users/acehack/.local/share/zeta-claude-loop/Zeta/.claude/worktrees/floating-wibbling-hoare`); no new-worktree-creation in this tick + +Operative pattern: **switch-on-existing** with `git switch -c FETCH_HEAD` (per [`claim-acquire-before-worktree-work.md`](../../../../../../.claude/rules/claim-acquire-before-worktree-work.md) borrow-on-existing pattern adapted for own-worktree). No ref-lock contention observed; commit + push expected clean. + +## Otto-owned PR state (from REST, GraphQL-free) + +| PR | head_sha | mergeable_state | observations | +|----|----------|-----------------|--------------| +| [#3677](https://github.com/Lucent-Financial-Group/Zeta/pull/3677) | `2e6a87a` | `unknown` (still computing) | 2 review threads: Copilot P1 on `0044Z.md` PR-description size mismatch; AceHack already replied 02:34:45Z explaining +88/-0 reframe. Auto-merge armed by AceHack 02:12:59Z. | +| [#3690](https://github.com/Lucent-Financial-Group/Zeta/pull/3690) | `d367ab1` | `blocked` | Newly opened (02:35Z); post-merge triage shard on PR #3685 (1 stale / 1 false / 1 real). Need GraphQL to check unresolved threads. | +| [#3649](https://github.com/Lucent-Financial-Group/Zeta/pull/3649) | `a966424` | `unknown` | bg-worker triage shard from 00:44Z. Probably superseded by later activity. | + +## Next-tick gate (post-02:55:28Z GraphQL reset) + +1. **Single targeted `poll-pr-gate.ts` per Otto-owned PR** (#3677, #3690, #3649) — narrow scope, fits well within per-tick GraphQL budget +2. **If any show `gate=BLOCKED` + `nextAction=resolve-threads`**: read threads via single `gh api graphql` reviewThreads query per PR, address findings, push +3. **Do NOT re-run `--all-open`** this cascade window — refresh-worldview's summary suffices for non-Otto-owned coordination +4. **If GraphQL still tight**: defer thread resolution to next cron tick (02:56Z+); peer-Otto sessions will continue grinding their own lanes meanwhile + +## Composes with substrate + +- [`refresh-world-model-poll-pr-gate.md`](../../../../../../.claude/rules/refresh-world-model-poll-pr-gate.md) — the rule whose cost-awareness section this tick empirically re-validates +- [`memory/feedback_gh_graphql_rate_limit_cascade_cost_poll_pr_gate_batch_n_per_call_multi_agent_shared_token_2026_05_15.md`](../../../../../../memory/feedback_gh_graphql_rate_limit_cascade_cost_poll_pr_gate_batch_n_per_call_multi_agent_shared_token_2026_05_15.md) — prior empirical anchor (2026-05-15T22:21Z) +- [`holding-without-named-dependency-is-standing-by-failure.md`](../../../../../../.claude/rules/holding-without-named-dependency-is-standing-by-failure.md) — counter-with-escalation discipline; this tick's brief-ack is N=1 with named ETA +- [`tick-must-never-stop.md`](../../../../../../.claude/rules/tick-must-never-stop.md) — cron sentinel armed at session start (`55d74778`); shard cadence preserved +- [`claim-acquire-before-worktree-work.md`](../../../../../../.claude/rules/claim-acquire-before-worktree-work.md) — switch-on-existing pattern used here (not borrow-from-peer, no peer contention) +- [`codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md`](../../../../../../.claude/rules/codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md) — Lior-active check honored (no new worktree created) From cb573e83a7ecebc1e8944a64513d799b8604ec46 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Fri, 15 May 2026 23:25:41 -0400 Subject: [PATCH 2/2] =?UTF-8?q?fix(shard):=20explain=2045=E2=86=9244=20PR-?= =?UTF-8?q?count=20delta=20in=200245Z=20tick=20(Copilot=20#3691)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 02:39Z refresh saw 45 open PRs; the 02:41Z `--all-open` batch internally re-queried `gh pr list --state open` and got 44. The delta is cascade-window drift (one PR closed in the 2-minute interval), not a filter — clarify on the bullet so the record is internally consistent. Resolves Copilot thread on docs/hygiene-history/ticks/2026/05/16/0245Z.md L16. Co-Authored-By: Claude --- docs/hygiene-history/ticks/2026/05/16/0245Z.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hygiene-history/ticks/2026/05/16/0245Z.md b/docs/hygiene-history/ticks/2026/05/16/0245Z.md index 208e51edd..66121d434 100644 --- a/docs/hygiene-history/ticks/2026/05/16/0245Z.md +++ b/docs/hygiene-history/ticks/2026/05/16/0245Z.md @@ -12,7 +12,7 @@ No code substrate; **substrate-honest pivot tick** documenting a known cascade-c Sequence: - 02:39Z — `bun tools/github/refresh-worldview.ts` succeeded (45 open PRs, summary line) -- 02:41Z — `bun tools/github/poll-pr-gate-batch.ts --all-open` (intent: identify BLOCKED PRs with `nextAction=resolve-threads`) +- 02:41Z — `bun tools/github/poll-pr-gate-batch.ts --all-open` (intent: identify BLOCKED PRs with `nextAction=resolve-threads`); batch re-queried open PRs internally and got 44 (one PR landed/closed in the ~2-minute interval between the 02:39Z refresh and the 02:41Z batch run — typical cascade-window drift) - 02:42Z — all 44 per-PR `gh pr view` calls returned `GraphQL: API rate limit already exceeded for user ID 578953`; `reports: []`, `errors: [44]` - 02:42Z — `gh api rate_limit` confirmed: **graphql 0/5000 remaining, resets at 1778900128 = 2026-05-16T02:55:28Z** (~13 min ETA)