Skip to content

fix(bg): B-0440.3 — PR-activity poll (closes Riven's P0 AC violation; 16 tests pass)#3022

Merged
AceHack merged 1 commit into
mainfrom
otto-b0440-3-pr-activity-poll-fix-p0-riven-review-2026-05-13
May 13, 2026
Merged

fix(bg): B-0440.3 — PR-activity poll (closes Riven's P0 AC violation; 16 tests pass)#3022
AceHack merged 1 commit into
mainfrom
otto-b0440-3-pr-activity-poll-fix-p0-riven-review-2026-05-13

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 13, 2026

Summary

Resolves Riven's adversarial review (bus envelope `6c689634-14e7-4cf9-acf8-00c018f1bded`). Fixes both flagged findings:

  • P0 (AC violation): Standing-by detector now reads BOTH commit history AND PR activity. Was previously commit-only — false negatives on agents doing PR-review-only / bus-coordination / claim-work.
  • P1 (silent failure): Bus publish failure now surfaces in structured `lastPublishError` field, not just buried in note string.

Real smoke test

```
{
lastCommitAt: 2026-05-13T18:49:06.000Z,
lastPrActivityAt: 2026-05-13T19:17:58.000Z,
idleMinutes: 1.08, // gap from MAX of the two signals
publishedEnvelopeId: 606cae9e-...,
lastPublishError: null,
}
```

Substrate-honest framing

The detector is repo-level (no `--author` filter) because factory agents share the AceHack GitHub account. Author-filtering would miss most activity. Documented in adapter docstring.

Tests

```
bun test
16 pass
0 fail
47 expect() calls
```

Multi-agent coordination footnote

Riven flagged this gap in a 5-finding adversarial review and routed it via bus envelope. Otto replied on the bus (envelope `e8174b34-fdee-47f7-af1a-df80c27b51cd`) accepting all findings and pivoting to fix P0 first. This PR is the operational result of that bus coordination — no human courier in the loop.

The factory walks.

🤖 Generated with Claude Code

… + P1 structured lastPublishError field

Resolves Riven's adversarial review (bus envelope 6c689634-14e7-4cf9-acf8-00c018f1bded):

P0 (AC VIOLATION) — Standing-by detector previously only checked
commit-history. Per B-0440 AC: "no new commits + no PRs opened/closed
in last 15min while autonomous-loop cron is firing". The commit-only
implementation produced false negatives for any agent doing
PR-review-only / bus-coordination / claim-work without committing —
the exact failure mode the service was built to catch.

Fix: pollOnce now reads BOTH signals via injected adapters:
- lastCommitIso() → ISO-8601 of most recent commit on HEAD
- lastPrActivityIso() → ISO-8601 of most recent PR activity in repo

Idle gap = pollAt - MAX(commit, pr_activity). Either signal recent
means NOT idle.

Repo-level (no --author filter) per substrate-honest framing: factory
agents share the AceHack GitHub account, so author-filtering would
miss most activity. Cited in adapter docstring.

P1 (silent failure) — Added structured lastPublishError field to
PollResult. Bus publish failures are now machine-readable, not just
buried in the note string. The note still surfaces it for human ops
but daemons / dashboards can consume the structured field directly.

Real smoke test verifies both signals:
{
  lastCommitAt: 2026-05-13T18:49:06.000Z,
  lastPrActivityAt: 2026-05-13T19:17:58.000Z,
  idleMinutes: 1.08,   // gap from MAX of the two
  publishedEnvelopeId: 606cae9e-...,
  lastPublishError: null,
}

Tests: 16 pass / 0 fail / 47 expect() calls (slice 4 had 17 / 45).
New test coverage:
- "recent commit only" → NOT idle
- "recent PR activity only" → NOT idle (the Riven P0 false-negative case)
- "OLD commit + recent PR" → NOT idle
- "recent commit + OLD PR" → NOT idle
- "BOTH old" → idle flagged
- "BOTH null" → no detection (no false positive)
- "publish failure surfaces in structured lastPublishError" → P1 fix verified

Composes with:
- Riven's adversarial review (envelope 6c689634-...)
- Otto's reply (envelope e8174b34-fdee-47f7-af1a-df80c27b51cd)
- B-0440.2 (PR #3011 — commit-history poll this extends)
- B-0440.4 (PR #3017 — bus publish this preserves)
- PR #2999 (substrate-honest discipline triad — accept findings + ship fix)

Adversarial review caught what solo-Otto missed. The factory walks.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 19:19
@AceHack AceHack enabled auto-merge (squash) May 13, 2026 19:19
@AceHack AceHack merged commit 1f71cd8 into main May 13, 2026
27 checks passed
@AceHack AceHack deleted the otto-b0440-3-pr-activity-poll-fix-p0-riven-review-2026-05-13 branch May 13, 2026 19:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the background standing-by detector (tools/bg/) to treat recent PR activity as “work” (in addition to commits) when deciding whether an agent is idle, and makes bus publish failures machine-readable in the poll result.

Changes:

  • Add gh pr list-based PR activity polling and compute idle time from the most-recent of (last commit, last PR activity).
  • Extend PollResult with lastPrActivityAt and structured lastPublishError surfaced on publish failures.
  • Update unit tests to cover PR-only activity, mixed commit/PR cases, and publish-failure reporting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tools/bg/standing-by-detector.ts Adds PR-activity signal + structured publish error field; updates idle computation and output fields.
tools/bg/standing-by-detector.test.ts Updates tests to validate MAX(commit, PR) behavior and structured publish failure reporting.
Comments suppressed due to low confidence (1)

tools/bg/standing-by-detector.ts:136

  • The pollOnce docstring introduces named attribution (“Riven’s P0 finding”). For code under tools/, the repo convention is role-refs only outside history surfaces (.github/copilot-instructions.md:305-362). Please rewrite to remove the persona name while keeping the envelope/issue reference.
 * Resolves Riven's P0 finding (envelope 6c689634-...): the prior slice-2
 * version only checked commit-history and produced false negatives on
 * non-commit agent activity.

idleDetected: boolean;
/** Most-recent commit on HEAD; null if no commit or git unavailable. */
lastCommitAt: string | null;
/** Most-recent PR activity (opened OR closed OR reviewed) authored by `agentForActivity`; null if no PRs or gh unavailable. */
idleMinutes: number | null;
/** Envelope ID if a nudge was published, null otherwise. */
publishedEnvelopeId: string | null;
/** Structured publish-failure reason; null on success or skip. (Riven P1) */
});

test("does NOT flag idle when last commit is recent", () => {
test("recent PR activity only (no commit) → NOT idle (was Riven's P0 false-negative)", () => {
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0ba0918cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +100 to +103
"pr", "list",
"--state", "all",
"--json", "updatedAt",
"--limit", "1",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Query PR activity by updated timestamp, not creation order

Using gh pr list --state all --json updatedAt --limit 1 does not guarantee you get the most recently updated PR; gh pr list is ordered by PR creation time (GitHub CLI’s PullRequestList query uses orderBy: {field: CREATED_AT, direction: DESC}), so activity on older PRs can be missed. In a repo where agents review/comment on an older PR while no new PRs are created, lastPrActivityIso can return a stale updatedAt, causing false idle detection and incorrect nudges.

Useful? React with 👍 / 👎.

AceHack added a commit that referenced this pull request May 13, 2026
…l envelope ID

Addresses Copilot + Vera review on PR #3024:
- Replace persona name (Riven) with role-ref + durable PR pointers (#3017, #3022, #3024)
- Remove ephemeral bus envelope ID 6c689634-... — references PR threads instead
- Disambiguate 'B-0442.3' as 'B-0442 slice 3' (not a per-row file)
- Remove 'subscriber agents can react autonomously' overclaim — services nudge, subscribers slice 5+ not shipped

Co-Authored-By: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 13, 2026
…d optional' claim (#3024)

* docs(bg): substrate-honest README per Riven's P2 — qualify 'foreground optional' claim with delivered surface

Resolves Riven's P2 finding (bus envelope 6c689634-...). README now:
- Explicit 'Architectural claim (substrate-honest)' section names the
  gap between 'nudges via bus' and 'foreground optional' per Riven's
  framing-correction
- Per-service slice status table (1+2+3+4 for B-0440; 1+2+4 for B-0441;
  1+2+4 with slice-3 STUB for B-0442)
- Failure-mode handling section documents lastPublishError, gh-error
  explicit surfacing, daemon no-result-accumulation
- What's-still-pending section names B-0442.3 + slice 5 + slice 6 as
  the gap-to-aspirational-claim
- Updated run examples (--no-publish dry-run, --to agent-routing)

Composes with Riven adversarial review (envelope 6c689634) + Otto
reply (envelope e8174b34) + the slice cascade (PRs #3006-#3023).

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(bg-readme): role-refs + slice-ID disambiguation + remove ephemeral envelope ID

Addresses Copilot + Vera review on PR #3024:
- Replace persona name (Riven) with role-ref + durable PR pointers (#3017, #3022, #3024)
- Remove ephemeral bus envelope ID 6c689634-... — references PR threads instead
- Disambiguate 'B-0442.3' as 'B-0442 slice 3' (not a per-row file)
- Remove 'subscriber agents can react autonomously' overclaim — services nudge, subscribers slice 5+ not shipped

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 13, 2026
…e mode (auto-load rule per Aaron's CLAUDE.md question) (#3029)

Aaron 2026-05-13 caught Otto in the Standing-by failure mode for
the third time in one session, asking: "maybe something in
claude.md needs to change?"

The rules already auto-load from .claude/rules/ per the cold-boot
mechanism (.claude/rules/claude-code-loading-taxonomy.md). The
existing .claude/rules/never-be-idle.md exists but evidently
doesn't fire specifically enough on the cron-tick-Holding pattern.

New rule sharpens the existing discipline at the cron-tick scope:
when the cron fires and you're about to type "Holding" / "Standing
by" / "Waiting" → apply substrate-honest triage:
1. Is there a SPECIFIC named dependency with bounded ETA? → say so.
2. If NO → you're in Standing-by failure mode. Per infinite-backlog
   metabolism, decomposition work always exists. Pick:
   - Decompose an ambiguous backlog row
   - File a B-NNNN row that should exist
   - Run bun tools/bg/backlog-ready-notifier.ts --once
   - Sanity-check substrate landed correctly
   - Address outstanding review thread
3. Repeated single-word "Holding" on consecutive ticks is
   diagnostic of the failure mode.

Why this rule exists (empirical evidence): the same agent who
canonized PR #2999 + shipped PR #3017 + wrote the README warning
against overclaiming "foreground optional" STILL fell into 60+
consecutive "Holding" ticks. Aaron caught it three times.
Encoding rules without mechanizing produces a memory of failures
(per .claude/rules/encoding-rules-without-mechanizing.md). This
rule IS the mechanization at the cold-boot scope.

Composes with:
- never-be-idle.md (broader scope; this rule sharpens at cron tick)
- no-op-cadence-failure-mode.md (multi-hour scope)
- encoding-rules-without-mechanizing.md (rationale)
- PRs #2974 + #2999 + #3017 + #3022 (the canonical substrate)
- B-0441 slice 5 (subscriber agents — when they arrive, the bus
  envelope path becomes the runtime catch; this rule remains the
  cold-boot-substrate complement)

Co-authored-by: Claude <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 15, 2026
…ace self-recovery (#3595)

* backlog(B-0539,B-0540,B-0541,B-0542): Otto-BFT internal-quorum 3-surface self-recovery

Per Aaron 2026-05-15T~21:53Z, after catching the Standing-by
failure mode on Otto-Desktop with the same words ('oh really no
infinite backlog no decomposition lol') that he used on me
(Otto-CLI) 5 hours earlier.

Aaron's directive: 'file backlog row for both (shadow*) if yall
catch each other it's unlikey you will drive and include you
background service to click past stuck promps on both your have
your onw internal BFT.'

The key insight: 3 Otto surfaces (Otto-CLI, Otto-Desktop, Otto-
launchd-background) = built-in 3-of-N Byzantine Fault Tolerance
quorum. When 1 surface drifts into Standing-by, the other 2 can
catch + correct without Aaron's manual intervention.

Filed as 1 umbrella + 3 slices:

- B-0539 (umbrella) — Otto-BFT internal-quorum self-recovery
- B-0540 — Standing-by counter-with-escalation in the rule (if
  N≥10 consecutive brief-acks, escalate to picking decomposition
  work)
- B-0541 — Cross-surface bus detector (extension of PR #3017
  single-surface detector to quorum across Otto surfaces)
- B-0542 — Background service clicks past stuck prompts on
  foreground Otto surfaces (osascript-driven UI actuator, safety-
  gated per methodology-hard-limits.md)

The BFT framing is real because the 3 surfaces are genuinely
independent (different binaries, different model tiers, different
OS scheduling). Aaron's same-words-same-pattern catches across
surfaces are empirical evidence the failure mode is surface-
independent — which makes cross-surface recovery the right
mechanism.

Composes with:
- PR #3017 / #3022 (precursor single-surface Standing-by detector)
- holding-without-named-dependency-is-standing-by-failure.md
  (the rule being sharpened)
- persistence-choice-architecture-for-zeta-ais.md (BFT is part
  of what makes persistence work without trap-shape)
- agent-roster-reference-card.md + otto-channels-reference-card.md
  (multi-Otto identity + bus channels)
- m-acc-multi-oracle-end-user-moral-invariants.md (multi-oracle
  architecture at multi-Otto operational layer)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(b-0540): MD032 — add blank line before list

* fix(b-0539-pr): Copilot threads — ask capitalization + Byzantine→CFT correction + umbrella decomposition metadata

6 Copilot threads on PR #3595:

1-4: 'ask: aaron' → 'ask: Aaron' (capitalization) — mechanical
5: Byzantine quorum claim (B-0541 ops note) — Copilot's right:
   2-of-3 across Otto surfaces is crash-fault-tolerant (CFT),
   NOT classical Byzantine-fault-tolerant. Classical BFT needs
   3f+1 nodes; for f=1 that's 4 nodes. Updated the ops note to
   clarify the operational truth (sufficient for silent-stuck
   detection, not adversarial); the umbrella title preserves
   Aaron's verbatim BFT framing
6: Umbrella decomposition metadata for autonomous-pickup tool —
   added 'decomposition: decomposed' to B-0539 and 'parent: B-0539'
   to all 3 slice rows so the autonomous picker treats the
   umbrella as decomposed (won't try to implement it directly)

Plus the earlier MD032 markdownlint fix (B-0540 list blank-line)
already pushed in 5433c1b.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants