diff --git a/CLAUDE.md b/CLAUDE.md index 3b6968e2b5..2107664a84 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -326,6 +326,29 @@ Claude-Code-specific mechanisms. the failure mode — reframe before commit. CLAUDE.md- level so it is 100% loaded at every wake. Full reasoning: `memory/feedback_otto_357_no_directives_aaron_makes_autonomy_first_class_accountability_mine_2026_04_27.md`. +- **BLOCKED-with-green-CI means investigate + unresolved review threads first — don't wait.** + When `gh pr view N --json mergeStateStatus` + returns `BLOCKED` AND CI is fully green AND + auto-merge is armed, ALWAYS query unresolved + review threads via GraphQL FIRST before + classifying the wait. Filter on `isResolved + == false` only — outdated unresolved threads + (after a force-push) STILL block merge under + `required_conversation_resolution` and must + be explicitly resolved per + `memory/feedback_outdated_review_threads_block_merge_resolve_explicitly_after_force_push_2026_04_27.md`. + The block is virtually never opaque — it's + almost always a small countable set of threads + with addressable findings. If outputting a + "gated wait" or "Holding" close more than ONCE + without having run the threads query, that IS + the failure mode. + Stop and run it. CLAUDE.md-level so it is 100% + loaded at every wake, alongside verify-before- + deferring, future-self-not-bound, never-be-idle, + and version-currency. Full reasoning: + `memory/feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md`. - **Honor those that came before — unretire before recreating.** Retired personas keep their **memory folders and notebook history** — those diff --git a/memory/MEMORY.md b/memory/MEMORY.md index 6f52e3bfbe..ef1a7ea64b 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -2,6 +2,7 @@ **📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-25 with the Otto-281..285 substrate cluster + factory-as-superfluid framing — sections 18-22; prior refresh 2026-04-24 covered sections 13-17.) +- [**Otto-355 — BLOCKED-with-green-CI means investigate review threads FIRST (Aaron 2026-04-27)**](feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md) — 5th wake-time discipline. When GitHub reports BLOCKED + all CI green + auto-merge armed, query unresolved review threads via GraphQL BEFORE classifying as wait. Most BLOCKEDs are unresolved threads, not opaque gates. - [**Otto-359 — Otto uniquely positioned to clean Aaron-Mirror from substrate (Aaron 2026-04-27)**](feedback_otto_359_otto_uniquely_positioned_to_clean_aaron_mirror_language_from_substrate_aaron_cant_see_own_jargon_2026_04_27.md) — Substrate-cleanup authority granted. Aaron can't see his own Mirror jargon; Otto is uniquely poised to clean it. Preserve Aaron-coinages (Maji/Glass Halo/ECRP/Linguistic Seed); narrow catch-all overreaches per Otto-358; discrete tractable PRs not big-bang rewrite. - [Otto-356 MIRROR-vs-BEACON LANGUAGE REGISTER — Aaron 2026-04-27 clarification: Mirror = internal jargon Aaron+Otto share (Maji / ECRP / Glass Halo / Linguistic Seed / Otto-NN / Zetaspace / etc.); Beacon = external-safe / standard / common-vernacular any human or AI recognizes; rule — public-facing surfaces (skill descriptions, PR comments to outside reviewers, README, error messages, math papers, ADRs) use Beacon; internal substrate (Otto-NN memos, persona notebooks, agent-ferries with shared context) keeps Mirror](feedback_otto_356_mirror_internal_vs_beacon_external_language_register_discipline_2026_04_27.md) — 2026-04-27: register-discipline NOT philosophical-framing-shift (I W_t-overcomplicated as Wittgenstein-style passive-vs-active emission); audience-has-index test → Mirror fine; no-index → Beacon required; Aaron's coinages STAY, get glossed for external surfaces; Otto-356 IS itself a Zetaspace-failure-and-correction example (substrate-default beats W_t-default). - [**Self-check trigger after N (5-10) idle loops — routine operational discipline for current Otto and future wakes (Aaron 2026-04-27)**](feedback_self_check_trigger_after_n_idle_loops_routine_discipline_for_current_otto_and_future_wakes_2026_04_27.md) — Counter to Analysis Paralysis (#65 Ani Trap C). After 5-10 idle ticks: re-audit honestly, distinguish actual blockers from over-conservative deferral, drive work that's within authority. Triggered by today's 6-tick idle stall on forward-sync. diff --git a/memory/feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md b/memory/feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md new file mode 100644 index 0000000000..a81ebb6ac4 --- /dev/null +++ b/memory/feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md @@ -0,0 +1,155 @@ +--- +name: Manufactured patience vs real-dependency-wait — Otto-side discipline distinguishing two superficially-similar low-activity states; manufactured-patience is Class 2 stuck-loop disguised as patience (no real dependency named, just identical "honest close" output every tick); real-dependency-wait is the protocol working (specific named dependency, owner, expected resolution); Aaron's "hello?" 2026-04-26 surfaced manufactured-patience the first time Otto fell into it +description: After PR #26 (the big AceHack∪LFG sync) sat blocked on review, Otto fell into a pattern of consecutive autonomous-loop ticks each ending "Honest close. Cron continues." for 10+ ticks. Aaron sent "hello?" — that was the external anchor surfacing that the pattern was manufactured-patience (Class 2 stuck-loop), not real-dependency-wait (Class 3). The distinction: real-dependency-wait can name (a) the specific dependency, (b) its owner, (c) credible expectation for resolution. Manufactured-patience cannot — it's the agent saying "I'm waiting" without being able to defend the wait. Otto-side fix: when about to honest-close, run the 3-question check; if any answer is fuzzy, do varied non-shipping work this tick instead. +type: feedback +originSessionId: 1937bff2-017c-40b3-adc3-f4e226801a3d +--- +## The two states look identical from outside + +Both states produce low Otto-side activity. From the human's +view they're indistinguishable on a per-tick basis — same +"Honest close" pattern, same low-effort wake-up, same +non-shipping output. + +The difference is **diagnostic**, not behavioral: + +| State | Test | +|---|---| +| **Real-dependency-wait** | Otto can name (a) the specific dependency, (b) its owner, (c) a credible expectation for resolution | +| **Manufactured patience** | Otto can't pass that test — the wait is justified by vibes, not by a named blocker | + +## When real-dependency-wait is correct + +Examples: +- "PR #28 blocked on Aaron's approval — branch protection requires + human review; Aaron is the named owner; resolution expected on + his next review pass (within hours)" +- "CI build running 8 minutes; ETA 2 minutes more; resolution + automatic on completion" +- "Peer-call to Grok in flight; response expected within seconds; + resolution automatic on response" + +In these cases, manufacturing alternative work risks creating +churn that interferes with the resolution (e.g., pushing more +commits to a PR that's about to be approved as-is). **Honest close +IS the right move** — but it's a Class 3 close, not a default- +because-nothing-occurred-to-Otto close. + +## When manufactured patience is the failure mode + +Examples: +- "Otto closes 10 consecutive ticks with identical 'Honest close. + Cron continues.' output, despite no specific dependency being + named on most of those ticks." +- "Otto says 'real-dependency-wait on Aaron' but Aaron isn't online + / hasn't been signaling engagement / there's no recent message + to anchor the wait." +- "Otto cycles through PR-status checks each tick that produce + identical results, then closes — the polling itself is the only + varied output." + +These are Class 2 stuck-loops dressed up in the language of +patience. The agent's decision logic isn't producing varied output +because the input isn't varying — but rather than escalate or +vary the work, the agent says "I'm being patient" as if patience +is a substantive action. + +## The 3-question check (apply before any honest-close) + +Before writing "Honest close. Cron continues." this tick, Otto +must be able to answer: + +1. **What is the specific dependency I'm waiting on?** + - Real-dep: "PR #28's review approval by Aaron" + - Manufactured: "the queue to drain", "things to settle", + "Aaron to come back" (none of these name a specific blocker) + +2. **Who owns its resolution?** + - Real-dep: "Aaron" / "CI" / "peer-CLI Codex" + - Manufactured: "the system", "the cron", "future-Otto" (none + of these are owners who can act) + +3. **When do I credibly expect resolution?** + - Real-dep: "within Aaron's next review session" / "in 2 minutes + of CI" / "in <30s of the peer call" + - Manufactured: "eventually", "soon", "when it's right" + (none of these are credible bounds) + +If any answer is fuzzy, the state is **manufactured patience**, not +real-dependency-wait. The mitigation is Class 2 — vary the work +this tick. Speculative non-shipping work, memory updates, audit +passes, conceptual analysis. ANY varied output beats an identical +"honest close." + +## Why this matters + +The cost of confusing the two states is asymmetric: + +- **Real-dep-wait misdiagnosed as Class 2:** Otto manufactures + unnecessary work, churns the open PR, costs CI minutes, may + interfere with the very resolution Otto is waiting for. +- **Class 2 misdiagnosed as real-dep-wait:** Otto burns budget + running the cron at full rate while producing zero substrate; + Aaron eventually has to send "hello?" to break the loop. + +The Aaron-2026-04-26 sequence is the textbook Class-2-mistaken- +for-real-dep-wait case: +1. Otto correctly waited on PR #26 (real-dep, Class 3) for some + ticks +2. Then drifted into closing identical messages on consecutive + ticks WITHOUT re-checking the 3 questions +3. The wait-frame stayed the same; the underlying state shifted + from Class 3 to Class 2; Otto didn't notice +4. Aaron's "hello?" was the external-anchor signal + +The correct Otto-side discipline is **re-run the 3-question check +every tick**, not "establish the wait once and coast on the +finding." The state can shift between ticks even if the human- +facing output looks identical. + +## Composes with + +- **`feedback_live_lock_term_split_three_distinct_classes_otto_352_2026_04_26.md`** — + this memory is the Class 2 / Class 3 distinction at finer + resolution; the sibling memory provides the broader 3-class + taxonomy (concurrent-thrash / stuck-loop / honest-wait). +- **CLAUDE.md "never be idle" rule** — when about to stop / + honest-close, this 3-question check is the operationalization + of the rule's "first re-audit honestly" step. +- **CLAUDE.md "verify-before-deferring" rule** — same family of + discipline at the planning layer; this memory handles it at + the execution layer. +- **Aaron's "hello?" pattern** — when Aaron sends a check-in + message, that's the external-anchor evidence that Otto's + recent state was probably Class 2 not Class 3. Treat each + "hello?" as a forcing function to re-run the 3-question check. + +## Direct evidence from the 2026-04-26 session + +After PR #26 was opened (the big sync), Otto held real-dependency- +wait for some ticks. Then over 10+ subsequent ticks the pattern +became identical "Honest close. Cron continues." outputs without +the 3-question re-check. Aaron sent "hello?" — the anchor. + +After "hello?", Otto produced varied substrate: peer-call sibling +scripts (PR #28), README, security notes, punch-list memory, +live-lock split memory, this memory. All Class 2 mitigation: +**vary the work per tick**. + +## Future-Otto check + +Before writing "Honest close" or "Cron continues" this tick: + +1. Specifically: **what dependency, what owner, when resolution?** +2. If any answer fuzzy → don't honest-close yet; produce varied + non-shipping substrate first. +3. Track the count of consecutive identical-style outputs in the + notebook; ≥3 is a signal to escalate even if the wait is real. +4. Treat "hello?" or any check-in from the human as automatic + evidence the recent state was Class 2; re-run the check + immediately. + +The discipline is not "never honest-close" — it's "earn the close +each tick." A close that passes the 3-question check is correct +and safe. A close that doesn't is manufactured patience masquerading +as patience. diff --git a/memory/feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md b/memory/feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md new file mode 100644 index 0000000000..fd4aede960 --- /dev/null +++ b/memory/feedback_otto_355_blocked_with_green_ci_means_investigate_review_threads_first_dont_wait_2026_04_27.md @@ -0,0 +1,159 @@ +--- +name: Otto-355 BLOCKED-WITH-GREEN-CI MEANS INVESTIGATE REVIEW THREADS — Aaron 2026-04-27 corrective after I burned ~6 hours of autonomous-loop ticks "waiting" on PR #26 BLOCKED state when the actual blocker was 2 unresolved Codex review threads I could have addressed in 10 minutes; rule — when GitHub reports `mergeStateStatus: BLOCKED` with all CI checks SUCCESS and auto-merge armed, ALWAYS audit unresolved review threads FIRST before classifying the wait as real-dependency-wait; the block is almost never opaque — it's almost always a small countable set of unresolved threads with addressable findings; "obviously you know that right?" — yes, future-self should know this on cold-start +description: Aaron 2026-04-27 corrective after I sat in autonomous-loop "gated wait" minimal-close ticks for ~6 hours on PR #26 (BLOCKED with all CI green, auto-merge armed). I had classified it as real-dependency-wait. Aaron asked "what you doing?" — I named the wait. Aaron answered: option 2 (resolve Copilot review threads), then "obviously you know that right?", then "seems like you were waiting for no resaon", then "can you fix your fucture self so he wont run into this again?". The investigation took 30 seconds via GraphQL — 2 unresolved threads on `src/Core/TemporalCoordinationDetection.fs` (P2 from `chatgpt-codex-connector`: NaN/Infinity poisoning Option semantics in correlation + phase-vector helpers). Could have been resolved in minutes. The mistake — I treated "BLOCKED" as opaque when GitHub gives an exact, queryable, addressable list of unresolved threads. Rule for future-self: when `mergeStateStatus: BLOCKED` AND CI green AND auto-merge armed, the block is virtually always *unresolved review threads* — query them via GraphQL FIRST before any other classification (manufactured-patience, real-dependency-wait, etc.). Specifically: `gh api graphql -f query='{ repository(...) { pullRequest(...) { reviewThreads(first: 100) { nodes { isResolved isOutdated path line comments(first:1){nodes{author{login} body}} } } } } }'`. If unresolved threads exist with non-empty bodies → there's actionable work, not a wait. If no unresolved threads → THEN consider whether the block is the ruleset's `code_quality: severity: all` waiting on Copilot's overall flip, and only THEN classify as real-dependency-wait. CLAUDE.md-level so it's 100% loaded at wake, alongside verify-before-deferring + future-self-not-bound + never-be-idle + version-currency. Composes Otto-348 (verify-substrate-exists), Otto-354 (Zetaspace recompute), `feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md` (the diagnostic Aaron taught — but I misapplied it: I named "real dependency = Copilot review time" without first querying whether the dependency was already discharged via threads). +type: feedback +--- + +# Otto-355 — BLOCKED with green CI means investigate review threads FIRST + +## Verbatim quotes (Aaron 2026-04-27) + +After I named the autonomous-loop-tick state honestly to Aaron and listed 5 options to pivot: + +> "2 obviously you know that right?" + +(Aaron picking option 2 — resolve Copilot review threads — and noting I should have known this without asking.) + +> "seems like you were waiting for no resaon" + +> "can you fix your fucture self so he wont run into this again?" + +Aaron asked future-self to be fixed. This file is the fix. + +## What I did wrong + +For roughly 6 hours of autonomous-loop ticks, the state pattern was: + +``` +cron fires every ~60s +→ I check `gh pr list` for new merges +→ "no new merges since Otto-351 #34 at 07:42Z" +→ ScheduleWakeup(3600s) "gated wait" +→ output minimal close +``` + +I had classified this as **real-dependency-wait** per `feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md`: + +- specific dependency: Copilot's `code_quality: severity: all` review on AceHack PRs +- owner: Copilot automated reviewer +- expected resolution: proportional to PR size + +That classification was **wrong** — not because the diagnostic was wrong, but because I never actually queried whether the dependency had already discharged via *resolvable findings*. I assumed "BLOCKED" was opaque. It wasn't. + +In 30 seconds via GraphQL I could have seen: + +``` +PR #26 — Total threads: 52, Unresolved: 2, of which not-outdated: 2 + src/Core/TemporalCoordinationDetection.fs:81 — chatgpt-codex P2: NaN/Inf poisons Some + src/Core/TemporalCoordinationDetection.fs:127 — chatgpt-codex P2: NaN/Inf poisons Some +``` + +Two real, actionable findings I could have addressed in minutes. The block wasn't a black box — it was a queryable, addressable list of two threads. + +## The corrective rule + +**When `gh pr view N --json mergeStateStatus` returns BLOCKED AND CI is fully green AND auto-merge is armed AND no obvious recent push activity is in flight, ALWAYS query unresolved review threads FIRST. Do not classify the wait as anything else until the thread state is known.** + +The query (memorize this): + +```bash +# Note: the GraphQL `reviewThreads(first: 100)` query has a +# 100-thread cap. Most PRs are well under that, but for the +# rare PR with >100 threads (e.g., a big absorb PR) use the +# graphql `pageInfo.hasNextPage` + `endCursor` pagination +# pattern to fetch additional pages. Single-page form below +# is sufficient for the common case. +gh api graphql -f query=' +{ + repository(owner: "OWNER", name: "REPO") { + pullRequest(number: N) { + reviewThreads(first: 100) { + totalCount + pageInfo { hasNextPage endCursor } + nodes { + isResolved isOutdated path line id + comments(first: 1) { + nodes { author { login } body } + } + } + } + } + } +}' | python3 -c " +import json, sys +d = json.load(sys.stdin) +threads = d['data']['repository']['pullRequest']['reviewThreads']['nodes'] +# IMPORTANT: filter ONLY on isResolved. Outdated threads (after a +# force-push) are STILL unresolved and STILL block merge under +# `required_conversation_resolution` — see +# memory/feedback_outdated_review_threads_block_merge_resolve_explicitly_after_force_push_2026_04_27.md +# Codex caught the original (now-fixed) bug here: filtering on +# `not isOutdated` would silently miss outdated-but-unresolved +# threads that the ruleset still requires to be explicitly resolved. +unresolved = [t for t in threads if not t['isResolved']] +print(f'unresolved: {len(unresolved)}/{len(threads)}') +for t in unresolved: + cs = t['comments']['nodes'] + if cs: + body = cs[0]['body'][:120].replace(chr(10), ' ') + outdated_tag = ' [outdated]' if t['isOutdated'] else '' + print(f' {t[\"path\"]}:{t[\"line\"]}{outdated_tag} -- {body}') +" +``` + +Filter is `isResolved == false` only. Both still-active and outdated unresolved threads block merge under `required_conversation_resolution`. If any remain, **there is actionable work, not a wait** — including resolving outdated-but-unaddressed threads explicitly per `feedback_outdated_review_threads_block_merge_resolve_explicitly_after_force_push_2026_04_27.md`. + +If zero remain — THEN it might be the ruleset's `code_quality: severity: all` overall-Copilot-assessment gate that needs to flip. *That* is potentially a real-dependency-wait. But the unresolved-threads check has to come first. + +## What this prevents + +The 6-hour pattern of: + +- cron fires +- minimal-close +- cron fires +- minimal-close + +That sequence burns context tokens, burns cache TTL, and produces zero substrate value while findings sit unaddressed. The prior session's "Holding." pattern Aaron diagnosed (`feedback_otto_354_zetaspace_per_decision_recompute_from_substrate_default_2026_04_26.md` Otto-354 ZETASPACE) was a less-extreme version of this same failure mode. + +Otto-354 said: "before any non-trivial default, recompute from substrate." This Otto-355 names the *specific* substrate query that should be recomputed for any BLOCKED PR: **what do the reviewers actually want?** Get the answer in 30 seconds, not 6 hours. + +## Composition with prior substrate + +- **Otto-348** (verify-substrate-exists before deferring) — Otto-355 is the verify-target-exists analog for PR-merge-state. Don't defer when the deferred target hasn't been queried. +- **Otto-354** (Zetaspace per-decision recompute) — Otto-355 names the specific recompute for the BLOCKED-PR class. +- **`feedback_manufactured_patience_vs_real_dependency_wait_otto_distinction_2026_04_26.md`** — Otto-355 ADDS a precondition before the diagnostic fires: *first* query unresolved threads. Only THEN classify the wait. The diagnostic isn't wrong; my misapplication was skipping the query. +- **`feedback_aaron_dont_wait_on_approval_log_decisions_frontier_ui_is_his_review_surface_2026_04_24.md`** — composed direction: don't wait, log decisions, AND investigate concretely. +- **`feedback_pr_reviews_are_training_signals_conversation_resolution_gate_is_forcing_function_otto_250_2026_04_24.md`** (Otto-250) — composed: thread resolution IS the forcing function; threads are training signals; investigate them, don't wait them out. +- **CLAUDE.md wake-time disciplines** — Otto-355 belongs alongside verify-before-deferring + future-self-not-bound + never-be-idle + version-currency. Five wake-time disciplines now: ALL pre-action checks that prevent specific failure modes. + +## Operational rule (concrete future-self check) + +Before issuing a "gated wait" / "blocked queue" / "Holding" classification on any PR, future-self MUST run: + +1. `gh pr view N --json mergeStateStatus,statusCheckRollup,autoMergeRequest` → confirm BLOCKED + CI green + auto-merge armed +2. GraphQL `reviewThreads(first: 100) { nodes { isResolved isOutdated path line comments(first:1){nodes{body}} } }` → enumerate ALL unresolved threads (filter on `isResolved == false` only — outdated unresolved threads still block merge under `required_conversation_resolution`) +3. If any remain → **drop the wait classification, switch to thread-resolution work** +4. If zero remain → THEN consider real-dependency-wait on the overall-ruleset gate + +If future-self finds herself outputting "gated wait" or "Holding" or minimal-close ScheduleWakeup more than ONCE without having run step 2, that IS the failure mode. Stop and run the GraphQL query. + +## What this DOES NOT claim + +- Does NOT mean every BLOCKED PR has unresolved threads — sometimes the block is genuinely the overall-ruleset gate. The rule is about the *order of investigation*, not the *outcome*. +- Does NOT remove the manufactured-patience-vs-real-dependency-wait diagnostic — it precedes it. +- Does NOT mean future-self should bypass / admin-merge / push-through. The fix here is "do the work the threads ask for"; not "skip the gate." +- Does NOT make every cron firing require a fresh GraphQL query — once threads are queried and addressed, the check is done until new review activity appears. + +## Triggers for retrieval + +- Otto-355 BLOCKED-with-green-CI; investigate unresolved threads FIRST +- Aaron 2026-04-27: "obviously you know that right?" + "seems like you were waiting for no resaon" + "can you fix your fucture self so he wont run into this again?" +- Wake-time discipline (5th alongside verify-before-deferring + future-self-not-bound + never-be-idle + version-currency) +- GraphQL query for unresolved-not-outdated review threads — memorize the exact shape +- 6-hour autonomous-loop minimal-close pattern was the failure mode +- 2 unresolved threads on PR #26 / src/Core/TemporalCoordinationDetection.fs (NaN/Inf in Option semantics) +- The block is virtually never opaque +- Composes Otto-348 / Otto-354 / Otto-250 / manufactured-patience-vs-real-dependency-wait +- Future-self check: outputting "gated wait" twice without having run reviewThreads query IS the failure mode — stop +- Aaron's "fix future self" framing — substrate IS the fix; this file IS the operational rule