diff --git a/docs/backlog/P0/B-0109-dependency-status-tracking-surface-2026-04-30.md b/docs/backlog/P0/B-0109-dependency-status-tracking-surface-2026-04-30.md new file mode 100644 index 000000000..96e82d688 --- /dev/null +++ b/docs/backlog/P0/B-0109-dependency-status-tracking-surface-2026-04-30.md @@ -0,0 +1,331 @@ +--- +id: B-0109 +priority: P0 +status: open +title: Dependency status tracking surface — outages and issues affecting us (Aaron 2026-04-30, urgent) +tier: design + implementation +effort: M +ask: Aaron 2026-04-30 (autonomous-loop channel input — verbatim "we need somewhere that list the status of our dependinces and issues that could affect us" + 6 source URLs + urgency clarification "github can erase stuff from master when we use the merge queue sometimes") +created: 2026-04-30 +last_updated: 2026-04-30 +composes_with: [B-0086, B-0096] +tags: [dependency-status, outages, github-incidents, supply-chain, observability, factory-resilience, urgent] +--- + +# Dependency status tracking surface — outages and issues affecting us + +> **First-class factory surface** (Aaron 2026-04-30): +> *"looking at github status should be first class for us we +> live on git and github for now until we get a 2nd host in +> the future."* This is not a "design something later" row — +> it's the visibility layer the factory's hot path runs +> through. Until a second git host exists, GitHub status IS +> factory status, and the surface should reflect that +> operational reality. + +Aaron sent input on 2026-04-30 via the autonomous-loop maintainer +channel asking the factory to land a surface that lists the +status of the dependencies the factory relies on and any issues +in those dependencies that could affect us. The framing came +with 6 source URLs covering a GitHub-availability incident class +(merge queue bug + general availability degradation), a +follow-up urgency clarification (*"github can erase stuff from +master when we use the merge queue sometimes"*), and a +first-class-priority elevation (*"looking at github status +should be first class for us we live on git and github for +now until we get a 2nd host in the future"*). The first-class +framing composes with the existing 3-tier multi-remote design +work (Amara packet 3 2026-04-29, task #341): the +status-tracking surface IS the tier-0 visibility layer that +the multi-remote design assumes is in place. + +This is a **P0** row (escalated from P1 on Aaron's urgency +clarification) because: + +1. **Live evidence found while filing this row.** The GitHub + status API at write-time shows an ACTIVE incident: + *"Incomplete pull request results in repositories"* — + started 2026-04-30T03:49:37Z, status `investigating`, + component `Pull Requests` flagged `degraded_performance`, + ongoing for 7+ hours. The factory has PR #911 in flight + during this incident; our 0-unresolved-threads count and + our auto-merge readiness signals could both be based on + incomplete API results. **Auto-merge on PR #911 was + disabled at 2026-04-30T~11:14Z while filing this row, as + the conservative response to live degradation evidence.** +2. **The dead-air polling loop earlier this session may have + been exacerbated by this incident.** It ran ~2.5 hours + (08:19Z merge → 10:50Z catch); the incident has been + active since 03:49Z. Without a dependency-status surface, + future-Otto can't disambiguate own-discipline-failure + from external-dependency-degradation when both compose. + Both happened simultaneously this session. +3. **The factory's hot path runs through GitHub.** Any GitHub + degradation IS a factory degradation, with no current + visibility surface naming it as such. + +## Repo merge-mechanism verification (2026-04-30) + +Aaron's urgency clarification mentioned "the merge queue." +Verified at write-time via `gh api`: + +- **`merge_queue: None`** in `branches/main/protection` — we + do NOT use GitHub's merge-queue feature. The Trunk.io + post about merge-queue-builds-on-wrong-commit is a + different bug class than what currently affects us + directly. +- **Auto-merge with squash** (`auto_merge: true`, + `squash: true`, `merge_commit: false`, `rebase: false`) + is what we use. +- **`allow_update_branch: true`** is the relevant safety + here — auto-merge auto-rebases stale branches before + merging, reducing (but not eliminating) the + stale-base-merge risk. +- **Required status checks**: lint (semgrep, shellcheck, + actionlint, markdownlint) + build-and-test (macos-26, + ubuntu-24.04, ubuntu-24.04-arm). 7 required contexts. + +So: the *specific* merge-queue bug Aaron worried about +doesn't apply to our setup directly. The *broader* concern +(GitHub backend bugs producing wrong-state results, of +which the live PR-degradation incident is a current +example) absolutely does. The status-tracking surface is +P0 against the broader concern, not the specific +merge-queue concern. + +## Aaron's verbatim input (channel preservation per Otto-363) + +> we need somewhere that list the status of our dependinces +> and issues that could affect us +> https://github.com/orgs/community/discussions/193645 +> https://www.githubstatus.com/ +> https://news.ycombinator.com/item?id=47881672 +> https://github.blog/news-insights/company-news/an-update-on-github-availability/ +> https://www.youtube.com/watch?v=b13m-iuu4XU&t=288s +> https://trunk.io/blog/what-happens-if-a-merge-queue-builds-on-the-wrong-commit +> this can affect us + +The "this can affect us" closing is Aaron-as-second-person +framing the relevance: not abstract dependency-management, but +*specifically* the merge-queue / GitHub-availability class of +issue that hits the factory's PR-driven workflow directly. + +## Why this matters + +- **The factory's hot path runs through GitHub.** Auto-merge, + the every-minute autonomous-loop cron, the Scorecard rolling + window, CodeQL analyses, the AceHack→LFG forward-sync flow, + Copilot/Codex PR reviews — all are GitHub-mediated. A GitHub + outage is a factory outage; a GitHub merge-queue bug is a + potential commit-corruption surface (per the Trunk.io post, + merge queues can build on wrong commits under specific + edge-case conditions). +- **Silent dependency degradation is the worst kind.** When + GitHub Actions runners are slow but functional, a polling + loop watching CI looks indistinguishable from a real wait. + Without a surface naming "GitHub Actions runner queue is + currently degraded," future-Otto can't disambiguate + honest-wait from external-incident. +- **Quantum-resistant crypto and supply-chain discipline both + assume we know what's running.** The + `feedback_all_cryptography_quantum_resistant_even_one_gap_is_attack_vector_2026_04_23.md` + rule and the absorb-and-contribute community-dependency + discipline both presume the factory knows its dependency + surface — but currently we only know what's in + `Directory.Packages.props` / `package.json` / + `tools/setup/*.sh`, not what's *currently failing or + flagged* in those dependencies. +- **The 6 source URLs Aaron sent are a worked example of the + class.** Each describes either a current GitHub incident, + the GitHub availability surface, an HN discussion of the + fallout, or a Trunk.io technical post on merge-queue + edge cases. The factory needs a place where a future tick + sees "GitHub merge-queue had a bug yesterday — check if + your auto-merge fired on the right commit." + +## Scope (design + implementation row) + +This row produces, in order: + +1. **Design pass** — what shape does this surface take? + Candidate shapes (each has tradeoffs): + - **Static markdown file** at + `docs/dependency-status.md`: cheap, version-controlled, + manually-updated. Good for "known-watched dependencies" + list; bad for "current incident state." + - **Cron-driven scraper** that polls + `https://www.githubstatus.com/api/v2/summary.json` + (and equivalent for other dependencies) and writes a + `docs/dependency-status/current.json`. Self-updating, + surface-able to agents and humans. Adds a workflow + and a script. + - **Issue-tracker integration** — open a tracking issue + in the LFG repo per dependency we monitor; status + updates flow to the issue. Discoverable via + `gh issue list` filters. Adds GitHub-issue dependency. + - **Hybrid** — static markdown for the watched-list + + cron scraper for current state + per-incident issues + for active investigation. Most coverage; most surface + to maintain. +2. **Watched-dependencies enumeration** — what do we depend + on operationally? Initial set: GitHub (Actions, Copilot + review, Codex review, hosting, merge-queue, Scorecard); + Anthropic (Claude Code harness, model availability); + OpenAI (Codex, ChatGPT for Aaron's substrate channel); + Google (Gemini for substrate channel); npm registry; bun + registry; mise; rustup; .NET runtime; PostgreSQL (if + used); Linear (if used). Cross-reference with + `tools/setup/*.sh` install paths. +3. **Status-source enumeration** — where does each + dependency publish status? GitHub: + `https://www.githubstatus.com/api/v2/`. Anthropic: + `https://status.anthropic.com/`. OpenAI: + `https://status.openai.com/`. Google: per-product + pages. The status-source-list itself is data the + surface must capture. +4. **Implementation** — start with the chosen shape; + expand if the static markdown turns out to be enough, + stay there. + +## Adjacent merge-risk classes in scope + +Aaron's named concern was the merge-queue-builds-on-wrong-commit +class (which we don't trigger directly because we don't use +merge-queue). The broader class — *GitHub backend producing +wrong-state results that auto-merge can fire against* — IS in +scope. Specific failure modes the surface should help future-Otto +notice: + +- **Auto-merge against stale base.** Our auto-merge with + `allow_update_branch: true` setting auto-rebases stale + branches before merging — but if the rebase itself fires + during a degraded-API window, the result might not be what + the diff preview showed. +- **`allow_update_branch` auto-rebase producing unexpected + merge content.** When auto-merge updates a stale branch, + the resulting tree is whatever the rebase produces. If the + rebase happens during incomplete-API-state, the branch state + observed by reviewers can differ from the state actually + merged. +- **Force-push race with auto-merge firing.** If a force-push + and auto-merge fire near-simultaneously, the merged commit + may be whichever the GitHub backend resolved first — not + necessarily the head observed during review. +- **Incomplete API results during merge decision.** This round's + active incident ("Incomplete pull request results in + repositories") is exactly this class. A 0-unresolved-threads + count from a degraded API can satisfy auto-merge's + required_conversation_resolution gate while threads exist + unseen. + +The status-tracking surface flags these conditions; it does not +mitigate them. Mitigation rules (e.g., "when GitHub Pull Requests +component is degraded, do not arm auto-merge") belong in +follow-up rows. + +## Sharpening points (Claude.ai 2026-04-30 review) + +Three operational details to settle in the design pass: + +1. **Polling cadence cost-vs-freshness tradeoff.** Polling every + minute would be noisy and might hit GitHub's rate limits; + polling every hour might miss short incidents that fall + wholly within the gap. Reasonable shape: poll on + freshness-pass triggers (before mutating actions like merge, + force-push, auto-merge arming), poll opportunistically when + ticks are otherwise idle, treat any non-operational status as + a freshness gap that propagates to dependent decisions. +2. **Distinguish factory-relevant components from unrelated + incidents.** A GitHub Pages outage doesn't affect the + factory's PR pipeline; a Pull Requests degradation does. + Without that distinction, every minor unrelated incident + becomes noise and the surface trains future-Otto to ignore + it. Initial factory-relevant component allowlist for GitHub: + Pull Requests, Actions, API Requests, Webhooks. Other + dependencies (Anthropic, OpenAI, Google) get their own + allowlist when their status sources are wired in. +3. **Historical record for retrospective correlation.** Log + incidents to a durable file (e.g., + `docs/dependency-status/incident-log.jsonl`) so future-Otto + can correlate "session-time anomalies" against + "session-time incidents." Without this, the diagnostic + question Deepseek's framing introduced ("if I do nothing, + will the signal change on its own?") can't be answered + retrospectively — the substrate gains nothing from past + incidents. + +## Out of scope for this row + +- Building a full incident-management system. The factory + needs *visibility*, not Pagerduty. +- Real-time alerting / paging / on-call rotation. If + dependencies fail, the factory pauses, files an incident + note, and waits for restoration. No auto-paging. +- Per-dependency mitigation plans. Those go in separate + rows when concrete (e.g., "if GitHub merge-queue is + flagged, switch from auto-merge to manual-merge for the + duration"). +- Replacing or vendoring degraded dependencies preemptively. + Vendoring discussions belong in B-0086 (TS+Bun migration) + for the dependencies that ARE in-scope for vendoring. + +## When this is "done" + +Done = a surface exists that any future-Otto (or human +contributor) can query in under 30 seconds to answer: + +1. *What does the factory depend on?* (watched list) +2. *Are any of those dependencies currently flagged or + degraded?* (current state) +3. *Is there a known issue affecting our merge / CI / + review pipeline right now?* (active incidents) + +The surface must be discoverable from CLAUDE.md and AGENTS.md +(at minimum a pointer line) so cold-start sessions find it. + +## Composes with + +- **B-0086** (TS+Bun migration) — dependency reduction is + itself a dependency-status mitigation strategy. The fewer + external runtimes, the smaller the status-tracking + surface. +- **B-0096** (Forbidden Pattern Quarantine) — a category of + issue worth tracking is "patterns we have used that + external sources later flagged." Composes naturally if + both surfaces share a vocabulary. +- `memory/feedback_all_cryptography_quantum_resistant_even_one_gap_is_attack_vector_2026_04_23.md` + — quantum-resistant crypto policy presumes we know the + current state of our crypto primitives. Same shape: + presume-known-state requires a state-knowing-surface. +- `memory/feedback_absorb_and_contribute_community_dependency_discipline_2026_04_22.md` + — the absorb-and-contribute discipline presumes we know + what we depend on; this row makes the dependency list + legible. +- `memory/feedback_amara_poll_gate_not_ending_holding_is_not_status_2026_04_30.md` + (landing in PR #911 alongside this row) — the + poll-the-gate rule says "watch the gate, not the + ending." Knowing whether the gate (CI, merge queue, + reviewer presence) itself is dependency-degraded is part + of the gate-state. A degraded GitHub Actions queue makes + "in-progress" mean something different than usual. +- `docs/AUTONOMOUS-LOOP.md` — autonomous-loop runs on + GitHub-mediated state. Loop-tick-history rows could + cross-reference the dependency-status surface when + external incidents shape the tick. + +## Source links (verbatim from Aaron's channel, 2026-04-30) + +- [GitHub Community discussion 193645](https://github.com/orgs/community/discussions/193645) +- [GitHub Status page](https://www.githubstatus.com/) +- [Hacker News discussion 47881672](https://news.ycombinator.com/item?id=47881672) +- [GitHub Blog — An update on GitHub availability](https://github.blog/news-insights/company-news/an-update-on-github-availability/) +- [YouTube video b13m-iuu4XU (segment at 4:48)](https://www.youtube.com/watch?v=b13m-iuu4XU&t=288s) +- [Trunk.io — What happens if a merge queue builds on the wrong commit](https://trunk.io/blog/what-happens-if-a-merge-queue-builds-on-the-wrong-commit) + +The Trunk.io post on merge-queue-builds-on-wrong-commit is +the most operationally-load-bearing of the six — it +describes a class of bug that, if present in our path, +would silently produce wrong commits while our auto-merge +plus CI gates report green. The "wrong commit" failure +mode is exactly the silent-failure shape the factory has +rules against. Worth a careful read on first absorb pass. diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 65dcafa68..8ef3010be 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** +- [**GitHub status — first-class dependency reference (Aaron 2026-04-30)**](reference_github_status_first_class_aaron_2026_04_30.md) — Aaron 2026-04-30: GitHub is our only host; status URL is first-class repo-and-loop substrate. Pins canonical URLs (status page + summary.json API), names factory-relevant component allowlist (Pull Requests / Actions / API Requests / Webhooks / Git Operations / Issues), defines freshness-check rule on three triggers: cadence (every 10-15 min when in-flight, less when idle — *"every loop tick might be excessive but on some cadence"*), on-suspicion (anomaly investigation asks "is GitHub degraded?" before "is my logic wrong?"), and pre-mutation (strictest gate). Aaron 2026-04-30 reinforcement *"all our assumptions are based on them being healthy today which is not always true as we can see todya"*. Origin: live "Incomplete pull request results" GitHub PR-degradation incident discovered while filing B-0109 (PR #912). - [**Kernel-pipe vs JS-space stream ordering — TS+Bun port pattern (Otto, 2026-04-30)**](feedback_kernel_pipe_vs_js_space_stream_ordering_ts_bun_port_pattern_2026_04_30.md) — TS+Bun port discipline: when porting bash `$(... 2>&1)` to `spawnSync`, merge stdout+stderr via shell-side `bash -c " 2>&1"` (preserves chronological ordering at the kernel pipe boundary), NOT `result.stdout + result.stderr` concat in JS-space (loses ordering when child interleaves writes). Origin: PR #901 slice-18 Copilot P1 round 2. Composes with `classifySpawnFailure` 4-case helper + Otto-363 substrate-or-it-didn't-happen. - [**DST + code coverage are universal best practices for every Zeta language (Aaron 2026-04-30)**](feedback_dst_and_coverage_universal_every_language_aaron_2026_04_30.md) — Generalises Otto-272 / Otto-281 / Otto-273 to all languages. SQLSharp is the named TS+Bun reference. Pin seeds, fake clocks, no test retries; tests cover public API surface, CI surfaces coverage, reductions fail. Per-language tooling lives in the runtime layer (`docs/best-practices/`). - [**Host mutation receipt — ruleset 15256879 code_quality rule removed (Aaron-authorized 2026-04-29)**](feedback_host_mutation_receipt_2026_04_29_ruleset_15256879_code_quality_removed.md) — Receipt for a live host (GitHub) mutation made before executable-host-settings tooling exists. PUT /repos/Lucent-Financial-Group/Zeta/rulesets/15256879 removed `code_quality severity=all` rule (host-side / non-git-declared CodeQL owner injecting `event=dynamic` "Code Quality" runs that bypassed the source-presence gate from PR #857). Made the git-visible advanced workflow `.github/workflows/codeql.yml` the sole CodeQL owner; resolved multi-master conflict that blocked PR #849. Aaron auth: *"if the org-recommended are legacy we can remove, declarative is better."* Per Amara *"Clickops used to restore declarative ownership must become a receipt, or it becomes the next drift"* — this receipt makes the live mutation visible to future executable-host-settings reconciler. NOT precedent for casual ruleset mutations; hook denial during episode was healthy; future apply path is host-reconciler-mediated with WorkClaim + policy + receipt; do NOT broaden `gh api ... rulesets/PUT` permission. Composes with executable-host-settings design packet, Otto-363, task #342 (completed) + #343. diff --git a/memory/reference_github_status_first_class_aaron_2026_04_30.md b/memory/reference_github_status_first_class_aaron_2026_04_30.md new file mode 100644 index 000000000..c8eec7e92 --- /dev/null +++ b/memory/reference_github_status_first_class_aaron_2026_04_30.md @@ -0,0 +1,217 @@ +--- +name: GitHub status — first-class dependency reference (Aaron 2026-04-30) +description: GitHub is our only host right now; the GitHub status API is first-class repo-and-loop substrate. Canonical URL pinned + freshness-check rule named. +type: reference +--- + +# GitHub status — first-class dependency reference + +Aaron 2026-04-30 (autonomous-loop channel input, verbatim): + +> looking at github status should be first class for us we live +> on git and github for now until we get a 2nd host in the future +> +> that github status page should be first class remembered +> somwehre in our repo and loop since github is our only host +> right now + +The factory's hot path runs through GitHub: auto-merge, the +every-minute autonomous-loop cron, the Scorecard rolling +window, CodeQL analyses, the AceHack→LFG forward-sync flow, +Copilot/Codex PR reviews. Until a second git host exists, +**GitHub status IS factory status**. + +## Canonical URLs + +- **Status page (human)**: +- **Status API (machine)**: + + +The summary.json endpoint is the freshness-check surface. It +returns: + +- `status.indicator` — one of `none`, `minor`, `major`, + `critical`, `maintenance` +- `status.description` — human-readable status (e.g., + "All systems operational" / "Partially Degraded Service") +- `incidents[]` — currently active incidents with name, + status (`investigating`/`identified`/`monitoring`/`resolved`), + impact, and incident_updates +- `components[]` — per-component status; non-`operational` + components are the affected systems + +## Factory-relevant component allowlist + +These components, when degraded, directly affect the factory's +hot path. Other components (Pages, Codespaces, etc.) are not in +scope for the factory's gate-state decisions. This list is +authoritative for the freshness-check rule below — any change +to it requires Architect / human sign-off. + +- **Pull Requests** — affects PR queries, thread state, + auto-merge readiness. Degradation here can produce + incomplete API results that fool the auto-merge pre-flight + check. +- **Actions** — affects CI runs. Degradation here can cause + spurious "in-progress forever" or missing-check-rollup + states. +- **API Requests** — the entire `gh` CLI runs through this. + Degradation here makes every gate-state poll unreliable. +- **Webhooks** — affects branch-protection check delivery. + Degradation can mean a check that "completed" never gets + reported back to the PR. +- **Git Operations** — the substrate layer. Degradation here + affects push, fetch, force-push semantics. +- **Issues** — used for the dependency-status incident log + pattern (per B-0109's design). + +## Freshness-check rule (loop + investigation integration) + +Aaron 2026-04-30 framed the rule across three short messages +(verbatim): + +> polling github status should just be a regular part of our +> loops and investigations because all our assumptions are +> based on them being healthy today which is not always true +> as we can see todya + +> every loop tick might be excessive but on some cadence or if +> you suspect issues because assumptions are not working out + +The two messages compose: regular part of loops + investigations, +**but** every-tick is excessive — cadence + on-suspicion + +pre-mutation are the right shape. The freshness-check runs at +three points: + +1. **On a cadence (not every tick).** Default cadence: every + 10–15 minutes when there is in-flight loop work, less + frequently when idle. This composes with the tiered + cadence already in the poll-the-gate rule (0–10 min + tier, 10–30 min tier, 30+ min tier). The freshness-check + piggybacks on the existing tiered cadence rather than + adding its own — when the loop is in an active wait at + the 0–10 tier, freshness-check fires on a longer + sub-cadence (every 3–5 ticks) to avoid hitting the + GitHub status endpoint every minute. Idle loops can + freshness-check less frequently (every 30+ min). The + exact cadence is tunable; the principle is "regular + enough to catch a degradation within an hour, sparse + enough not to be every-tick excessive." +2. **On suspicion** — whenever a loop tick is investigating an + anomaly (slow CI, stuck PR, missing check, unexpected + `BLOCKED` state, vanished review threads, force-push race + conditions, "assumptions are not working out"), the + freshness-check is the first candidate cause considered. + *"Is GitHub currently degraded?"* is asked before *"is + my logic wrong?"* — same shape as the + speculation-vs-evidence discipline applied at the + dependency layer + (`feedback_speculation_leads_investigation_not_defines_root_cause_aaron_2026_04_28.md`). + This is the load-bearing trigger: any time a poll result + surprises Otto, freshness-check is non-optional. +3. **Before any mutating action** (merge, force-push, + auto-merge arming, branch deletion, commit-then-push). This + is the strictest gate; non-green status defers the action. + Pre-mutation freshness-check IS every-mutation, since + each mutation is its own freshness boundary. + +The cadence trigger handles routine surveillance; the +on-suspicion trigger handles anomaly investigation; the +pre-mutation trigger handles strict gate enforcement. Each +serves a different purpose; together they cover the failure +modes without per-tick excess. + +The freshness-check produces a one-of-three classification: + +1. **`status.indicator == "none"` AND no incident affects + factory-relevant components** → green; proceed with the + mutating action per normal pre-flight rules. +2. **Any incident affects a factory-relevant component** → + the action is **deferred** until the dependency clears. + The conservative response taken at incident-discovery time + (e.g., disabling auto-merge on PR #911 at 11:14Z on + 2026-04-30) is the floor; loop must not re-arm or re-issue + mutating actions until the freshness-check returns to + class 1. +3. **`status.indicator` non-none but no incident in + factory-relevant components** → proceed with caution; + record the unrelated incident in the lane-state report so + future-Otto can correlate retrospectively. + +The freshness-check itself is non-mutating (a single HTTP GET). +Its cost is negligible vs. the cost of acting on incomplete +state. **Skipping it where mandated is not an option** — the +cadence-driven check (per the cadence rule above), the +on-suspicion check, and the pre-mutation check are all +required; only the every-tick interpretation is over-specified +per Aaron's calibration. + +## The assumption this rule makes testable + +Aaron's framing names what was previously implicit: *"all our +assumptions are based on them being healthy today which is not +always true as we can see todya."* Every gate-state poll, every +auto-merge decision, every CI summary, every thread query +silently presumes GitHub is healthy. When the assumption is +true, polling is redundant; when the assumption is false, the +gate-state results lie. The freshness-check makes the +underlying assumption explicit and testable on every tick — it +costs almost nothing to verify and converts a silent failure +mode into a visible one. + +## Re-arm condition during incident recovery + +When auto-merge has been deliberately disabled because of a +live incident, re-arming requires **two consecutive consistent +freshness checks** (per the "Auto-merge re-arm during +dependency-incident recovery" section in the poll-the-gate +rule landing in PR #911 alongside this reference at +`memory/feedback_amara_poll_gate_not_ending_holding_is_not_status_2026_04_30.md`; +the cross-reference resolves once both PRs merge). One +freshness check during recovery jitter is not enough — +recovery may produce intermittent operational readings before +the underlying state actually clears. + +## Composes with + +- `docs/backlog/P0/B-0109-dependency-status-tracking-surface-2026-04-30.md` + — the broader dependency-status tracking surface this + reference is the first concrete piece of. Other dependencies + (Anthropic, OpenAI, Google) get their own reference entries + when their status sources are wired in; the design pass for + the full surface lives in B-0109. +- `memory/feedback_amara_poll_gate_not_ending_holding_is_not_status_2026_04_30.md` + (landing in PR #911 alongside this reference) — the + poll-the-gate rule that names the gate-state shape and + the auto-merge pre-flight check. The freshness-check rule + here is the dependency-layer version of poll-the-gate. +- `docs/AUTONOMOUS-LOOP.md` — the loop discipline. The + freshness-check above adds a dependency-aware pre-flight to + the existing tick checklist; binding integration into the + loop spec is a follow-up edit. +- The visibility-constraint rule (Aaron 2026-04-28; canonical + home is `memory/CURRENT-aaron.md`'s "Visibility constraint" + section, since the dedicated `feedback_*.md` file referenced + by some MEMORY.md index entries was never landed in-repo). + A degraded GitHub also degrades the visibility surface Aaron + uses to verify factory state; freshness-check failures are + a visibility gap that should be flagged in the lane-state + report. + +## Origin + +Aaron 2026-04-30 sent two short, separated inputs that +together established this rule: + +1. *"looking at github status should be first class for us we + live on git and github for now until we get a 2nd host in + the future"* — first-class framing. +2. *"that github status page should be first class remembered + somwehre in our repo and loop since github is our only host + right now"* — explicit ask to land it as durable + substrate (in-repo memory + loop integration). + +The catalyst was the live GitHub Pull Requests degradation +incident discovered earlier in the same session, ongoing since +2026-04-30T03:49Z, which directly exposed the gap this rule +fills.