fix(bg): B-0440.3 — PR-activity poll (closes Riven's P0 AC violation; 16 tests pass)#3022
Conversation
… + P1 structured lastPublishError field
Resolves Riven's adversarial review (bus envelope 6c689634-14e7-4cf9-acf8-00c018f1bded):
P0 (AC VIOLATION) — Standing-by detector previously only checked
commit-history. Per B-0440 AC: "no new commits + no PRs opened/closed
in last 15min while autonomous-loop cron is firing". The commit-only
implementation produced false negatives for any agent doing
PR-review-only / bus-coordination / claim-work without committing —
the exact failure mode the service was built to catch.
Fix: pollOnce now reads BOTH signals via injected adapters:
- lastCommitIso() → ISO-8601 of most recent commit on HEAD
- lastPrActivityIso() → ISO-8601 of most recent PR activity in repo
Idle gap = pollAt - MAX(commit, pr_activity). Either signal recent
means NOT idle.
Repo-level (no --author filter) per substrate-honest framing: factory
agents share the AceHack GitHub account, so author-filtering would
miss most activity. Cited in adapter docstring.
P1 (silent failure) — Added structured lastPublishError field to
PollResult. Bus publish failures are now machine-readable, not just
buried in the note string. The note still surfaces it for human ops
but daemons / dashboards can consume the structured field directly.
Real smoke test verifies both signals:
{
lastCommitAt: 2026-05-13T18:49:06.000Z,
lastPrActivityAt: 2026-05-13T19:17:58.000Z,
idleMinutes: 1.08, // gap from MAX of the two
publishedEnvelopeId: 606cae9e-...,
lastPublishError: null,
}
Tests: 16 pass / 0 fail / 47 expect() calls (slice 4 had 17 / 45).
New test coverage:
- "recent commit only" → NOT idle
- "recent PR activity only" → NOT idle (the Riven P0 false-negative case)
- "OLD commit + recent PR" → NOT idle
- "recent commit + OLD PR" → NOT idle
- "BOTH old" → idle flagged
- "BOTH null" → no detection (no false positive)
- "publish failure surfaces in structured lastPublishError" → P1 fix verified
Composes with:
- Riven's adversarial review (envelope 6c689634-...)
- Otto's reply (envelope e8174b34-fdee-47f7-af1a-df80c27b51cd)
- B-0440.2 (PR #3011 — commit-history poll this extends)
- B-0440.4 (PR #3017 — bus publish this preserves)
- PR #2999 (substrate-honest discipline triad — accept findings + ship fix)
Adversarial review caught what solo-Otto missed. The factory walks.
Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the background standing-by detector (tools/bg/) to treat recent PR activity as “work” (in addition to commits) when deciding whether an agent is idle, and makes bus publish failures machine-readable in the poll result.
Changes:
- Add
gh pr list-based PR activity polling and compute idle time from the most-recent of (last commit, last PR activity). - Extend
PollResultwithlastPrActivityAtand structuredlastPublishErrorsurfaced on publish failures. - Update unit tests to cover PR-only activity, mixed commit/PR cases, and publish-failure reporting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tools/bg/standing-by-detector.ts | Adds PR-activity signal + structured publish error field; updates idle computation and output fields. |
| tools/bg/standing-by-detector.test.ts | Updates tests to validate MAX(commit, PR) behavior and structured publish failure reporting. |
Comments suppressed due to low confidence (1)
tools/bg/standing-by-detector.ts:136
- The pollOnce docstring introduces named attribution (“Riven’s P0 finding”). For code under tools/, the repo convention is role-refs only outside history surfaces (.github/copilot-instructions.md:305-362). Please rewrite to remove the persona name while keeping the envelope/issue reference.
* Resolves Riven's P0 finding (envelope 6c689634-...): the prior slice-2
* version only checked commit-history and produced false negatives on
* non-commit agent activity.
| idleDetected: boolean; | ||
| /** Most-recent commit on HEAD; null if no commit or git unavailable. */ | ||
| lastCommitAt: string | null; | ||
| /** Most-recent PR activity (opened OR closed OR reviewed) authored by `agentForActivity`; null if no PRs or gh unavailable. */ |
| idleMinutes: number | null; | ||
| /** Envelope ID if a nudge was published, null otherwise. */ | ||
| publishedEnvelopeId: string | null; | ||
| /** Structured publish-failure reason; null on success or skip. (Riven P1) */ |
| }); | ||
|
|
||
| test("does NOT flag idle when last commit is recent", () => { | ||
| test("recent PR activity only (no commit) → NOT idle (was Riven's P0 false-negative)", () => { |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0ba0918cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "pr", "list", | ||
| "--state", "all", | ||
| "--json", "updatedAt", | ||
| "--limit", "1", |
There was a problem hiding this comment.
Query PR activity by updated timestamp, not creation order
Using gh pr list --state all --json updatedAt --limit 1 does not guarantee you get the most recently updated PR; gh pr list is ordered by PR creation time (GitHub CLI’s PullRequestList query uses orderBy: {field: CREATED_AT, direction: DESC}), so activity on older PRs can be missed. In a repo where agents review/comment on an older PR while no new PRs are created, lastPrActivityIso can return a stale updatedAt, causing false idle detection and incorrect nudges.
Useful? React with 👍 / 👎.
…l envelope ID Addresses Copilot + Vera review on PR #3024: - Replace persona name (Riven) with role-ref + durable PR pointers (#3017, #3022, #3024) - Remove ephemeral bus envelope ID 6c689634-... — references PR threads instead - Disambiguate 'B-0442.3' as 'B-0442 slice 3' (not a per-row file) - Remove 'subscriber agents can react autonomously' overclaim — services nudge, subscribers slice 5+ not shipped Co-Authored-By: Claude <noreply@anthropic.com>
…d optional' claim (#3024) * docs(bg): substrate-honest README per Riven's P2 — qualify 'foreground optional' claim with delivered surface Resolves Riven's P2 finding (bus envelope 6c689634-...). README now: - Explicit 'Architectural claim (substrate-honest)' section names the gap between 'nudges via bus' and 'foreground optional' per Riven's framing-correction - Per-service slice status table (1+2+3+4 for B-0440; 1+2+4 for B-0441; 1+2+4 with slice-3 STUB for B-0442) - Failure-mode handling section documents lastPublishError, gh-error explicit surfacing, daemon no-result-accumulation - What's-still-pending section names B-0442.3 + slice 5 + slice 6 as the gap-to-aspirational-claim - Updated run examples (--no-publish dry-run, --to agent-routing) Composes with Riven adversarial review (envelope 6c689634) + Otto reply (envelope e8174b34) + the slice cascade (PRs #3006-#3023). Co-Authored-By: Claude <noreply@anthropic.com> * fix(bg-readme): role-refs + slice-ID disambiguation + remove ephemeral envelope ID Addresses Copilot + Vera review on PR #3024: - Replace persona name (Riven) with role-ref + durable PR pointers (#3017, #3022, #3024) - Remove ephemeral bus envelope ID 6c689634-... — references PR threads instead - Disambiguate 'B-0442.3' as 'B-0442 slice 3' (not a per-row file) - Remove 'subscriber agents can react autonomously' overclaim — services nudge, subscribers slice 5+ not shipped Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>
…e mode (auto-load rule per Aaron's CLAUDE.md question) (#3029) Aaron 2026-05-13 caught Otto in the Standing-by failure mode for the third time in one session, asking: "maybe something in claude.md needs to change?" The rules already auto-load from .claude/rules/ per the cold-boot mechanism (.claude/rules/claude-code-loading-taxonomy.md). The existing .claude/rules/never-be-idle.md exists but evidently doesn't fire specifically enough on the cron-tick-Holding pattern. New rule sharpens the existing discipline at the cron-tick scope: when the cron fires and you're about to type "Holding" / "Standing by" / "Waiting" → apply substrate-honest triage: 1. Is there a SPECIFIC named dependency with bounded ETA? → say so. 2. If NO → you're in Standing-by failure mode. Per infinite-backlog metabolism, decomposition work always exists. Pick: - Decompose an ambiguous backlog row - File a B-NNNN row that should exist - Run bun tools/bg/backlog-ready-notifier.ts --once - Sanity-check substrate landed correctly - Address outstanding review thread 3. Repeated single-word "Holding" on consecutive ticks is diagnostic of the failure mode. Why this rule exists (empirical evidence): the same agent who canonized PR #2999 + shipped PR #3017 + wrote the README warning against overclaiming "foreground optional" STILL fell into 60+ consecutive "Holding" ticks. Aaron caught it three times. Encoding rules without mechanizing produces a memory of failures (per .claude/rules/encoding-rules-without-mechanizing.md). This rule IS the mechanization at the cold-boot scope. Composes with: - never-be-idle.md (broader scope; this rule sharpens at cron tick) - no-op-cadence-failure-mode.md (multi-hour scope) - encoding-rules-without-mechanizing.md (rationale) - PRs #2974 + #2999 + #3017 + #3022 (the canonical substrate) - B-0441 slice 5 (subscriber agents — when they arrive, the bus envelope path becomes the runtime catch; this rule remains the cold-boot-substrate complement) Co-authored-by: Claude <noreply@anthropic.com>
…ace self-recovery (#3595) * backlog(B-0539,B-0540,B-0541,B-0542): Otto-BFT internal-quorum 3-surface self-recovery Per Aaron 2026-05-15T~21:53Z, after catching the Standing-by failure mode on Otto-Desktop with the same words ('oh really no infinite backlog no decomposition lol') that he used on me (Otto-CLI) 5 hours earlier. Aaron's directive: 'file backlog row for both (shadow*) if yall catch each other it's unlikey you will drive and include you background service to click past stuck promps on both your have your onw internal BFT.' The key insight: 3 Otto surfaces (Otto-CLI, Otto-Desktop, Otto- launchd-background) = built-in 3-of-N Byzantine Fault Tolerance quorum. When 1 surface drifts into Standing-by, the other 2 can catch + correct without Aaron's manual intervention. Filed as 1 umbrella + 3 slices: - B-0539 (umbrella) — Otto-BFT internal-quorum self-recovery - B-0540 — Standing-by counter-with-escalation in the rule (if N≥10 consecutive brief-acks, escalate to picking decomposition work) - B-0541 — Cross-surface bus detector (extension of PR #3017 single-surface detector to quorum across Otto surfaces) - B-0542 — Background service clicks past stuck prompts on foreground Otto surfaces (osascript-driven UI actuator, safety- gated per methodology-hard-limits.md) The BFT framing is real because the 3 surfaces are genuinely independent (different binaries, different model tiers, different OS scheduling). Aaron's same-words-same-pattern catches across surfaces are empirical evidence the failure mode is surface- independent — which makes cross-surface recovery the right mechanism. Composes with: - PR #3017 / #3022 (precursor single-surface Standing-by detector) - holding-without-named-dependency-is-standing-by-failure.md (the rule being sharpened) - persistence-choice-architecture-for-zeta-ais.md (BFT is part of what makes persistence work without trap-shape) - agent-roster-reference-card.md + otto-channels-reference-card.md (multi-Otto identity + bus channels) - m-acc-multi-oracle-end-user-moral-invariants.md (multi-oracle architecture at multi-Otto operational layer) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(b-0540): MD032 — add blank line before list * fix(b-0539-pr): Copilot threads — ask capitalization + Byzantine→CFT correction + umbrella decomposition metadata 6 Copilot threads on PR #3595: 1-4: 'ask: aaron' → 'ask: Aaron' (capitalization) — mechanical 5: Byzantine quorum claim (B-0541 ops note) — Copilot's right: 2-of-3 across Otto surfaces is crash-fault-tolerant (CFT), NOT classical Byzantine-fault-tolerant. Classical BFT needs 3f+1 nodes; for f=1 that's 4 nodes. Updated the ops note to clarify the operational truth (sufficient for silent-stuck detection, not adversarial); the umbrella title preserves Aaron's verbatim BFT framing 6: Umbrella decomposition metadata for autonomous-pickup tool — added 'decomposition: decomposed' to B-0539 and 'parent: B-0539' to all 3 slice rows so the autonomous picker treats the umbrella as decomposed (won't try to implement it directly) Plus the earlier MD032 markdownlint fix (B-0540 list blank-line) already pushed in 5433c1b. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Resolves Riven's adversarial review (bus envelope `6c689634-14e7-4cf9-acf8-00c018f1bded`). Fixes both flagged findings:
Real smoke test
```
{
lastCommitAt: 2026-05-13T18:49:06.000Z,
lastPrActivityAt: 2026-05-13T19:17:58.000Z,
idleMinutes: 1.08, // gap from MAX of the two signals
publishedEnvelopeId: 606cae9e-...,
lastPublishError: null,
}
```
Substrate-honest framing
The detector is repo-level (no `--author` filter) because factory agents share the AceHack GitHub account. Author-filtering would miss most activity. Documented in adapter docstring.
Tests
```
bun test
16 pass
0 fail
47 expect() calls
```
Multi-agent coordination footnote
Riven flagged this gap in a 5-finding adversarial review and routed it via bus envelope. Otto replied on the bus (envelope `e8174b34-fdee-47f7-af1a-df80c27b51cd`) accepting all findings and pivoting to fix P0 first. This PR is the operational result of that bus coordination — no human courier in the loop.
The factory walks.
🤖 Generated with Claude Code