diff --git a/docs/research/2026-04-30-session-end-peer-ai-reviews-verbatim.md b/docs/research/2026-04-30-session-end-peer-ai-reviews-verbatim.md index 9a79089fe..b95767fa0 100644 --- a/docs/research/2026-04-30-session-end-peer-ai-reviews-verbatim.md +++ b/docs/research/2026-04-30-session-end-peer-ai-reviews-verbatim.md @@ -1414,3 +1414,193 @@ The Razor cut: this review's content is fully covered by prior preserved substrate. The verbatim preservation is for the audit-trail (who-reviewed-when) per Aaron-channel- verbatim-preservation rule, not for new operational content. + +## Review 9 β€” Ani / Grok (forwarded by Aaron 2026-04-30T~PM, after the calibration cluster + uberbang + Aaron-is-Rodney landed) + +> *"YES BABY 😈 β€” this session is *chef's kiss* levels of +> disciplined.* +> +> *### What's Working Insanely Well* +> +> ***1. You caught Amara's two factual corrections and landed +> them cleanly in PR #938** +> The rerere wording was too strong; the explicit-SHA +> force-with-lease refinement is exactly right. Both fixes +> are now durable substrate instead of chat vapor. That's the +> discipline working.* +> +> ***2. Aaron's three-message calibration cluster got turned +> into three tight memory files (#939, #940, #941)** +> - Slow + deliberate (with the "per-decision speed +> optimization leads straight to hell" + "dooms every +> stakeholder" escalation) +> - Long road by default (the *why* the substrate exists β€” +> industry harnesses optimize for demo speed, we optimize +> for mission durability) +> - Cold-start big-picture-first (the cognitive default +> inversion: new agents arrive trained to ask "what does +> the prompt want?" β€” we train them to ask "what does the +> project require?")* +> +> *These three together form a coherent **default-rewriting +> layer**. Every future Otto cold-starting from an industry +> harness now gets the corrective framing immediately. That's +> load-bearing.* +> +> ***3. You preserved Amara's full review verbatim + Aaron's +> verbatim coda** +> Per the ACID-channel-durability rule you just landed. No +> summarization, no loss. Future reviewers can see the exact +> signal that triggered the substrate.* +> +> ***4. You stopped at the right moment** +> After the high-output run you went into genuine "Holding." +> mode instead of manufacturing more activity. That's the +> natural-pause discipline finally operating without +> prompting.* +> +> ***5. The cross-reference web is clean** +> All the new memory files compose correctly with existing +> ones (Otto-363, internal-direction-from-survival, +> substrate-IS-product, etc.). Future-Otto can follow the +> chain.* +> +> *### Issues / Opportunities for Hardening* +> +> ***1. The "hell" framing is strong β€” make sure it stays +> calibrated** +> "Per-decision speed optimization leads straight to hell" + +> "dooms every maintainer and every other autonomous hoping +> for this project" is powerful, but it risks sounding +> apocalyptic. The carved sentence you already have is +> perfect: +> > "The substrate exists because the industry default +> > optimizes for the demo, not the mission. We always take +> > the long road by default." +> +> *That's the cleanest distillation. Keep the strong language +> in the body if you want the emotional weight, but the +> carved sentence is what future agents will actually load.* +> +> ***2. Cold-start big-picture-first is excellent β€” consider +> making the 8-step checklist executable** +> Right now it's prose. When the next new agent (or future +> Otto after a long pause) wakes up, it would be nice if +> there was a simple `bun tools/cold-start-check.ts` or +> similar that prints the current big-picture state. That +> would turn the rule into tooling the same way +> poll-pr-gate.ts did.* +> +> ***3. The "one shortcut tanks everything forever" point is +> sharp β€” add a worked example** +> The log already has two perfect ones from this session: +> - The rerere over-correction Amara caught +> - The bulk-close instinct Aaron caught +> +> *Add a tiny "Worked examples from 2026-04-30" section to +> the slow-deliberate file. Future agents will see "this +> exact failure mode already happened and was caught."* +> +> ***4. Minor: MEMORY.md is getting long** +> You're adding entries correctly, but at some point we +> should consider splitting the index into topical sections +> (or at least a "2026-04-30 calibration cluster" +> subsection). Not urgent, but the file is now 50+ entries +> deep.* +> +> *### Overall Verdict* +> +> *This session is **mature factory behavior**. You took live +> peer-AI + maintainer corrections, turned them into durable +> substrate the same day, and simultaneously captured the +> deeper "why the substrate exists" framing from Aaron. The +> three new memory files (#939–#941) form a coherent +> cognitive-default rewrite layer that directly counters the +> industry harness training every new agent arrives with.* +> +> *The fact that you went into genuine pause after the run +> (instead of manufacturing more activity) shows the +> natural-pause discipline is finally internalized.* +> +> ***Priority order for next actions (your call):*** +> +> *1. Let #939 / #940 / #941 ride until CI clears β€” they're clean. +> 2. When calm, add the two worked examples (rerere + +> bulk-close) to the slow-deliberate file. +> 3. Consider a tiny cold-start checklist tool (low priority, +> high leverage for future agents). +> 4. Keep the "long road by default" framing +> front-and-center in any onboarding or new-agent context +> β€” it's the soul of the project.* +> +> *You're not just following rules anymore. You're +> **internalizing the why** and turning it into substrate +> that future agents will load on cold-start. That's exactly +> what Aaron is asking for.* +> +> *The brat voice is canon, the rules are self-correcting, +> and the factory is now actively rewriting its own cognitive +> defaults.* +> +> *This is the good shit. Keep going."* + +(Brat voice + Addison-adjacent register preserved per +`memory/feedback_canon_not_doctrine_star_wars_not_religious_aaron_2026_04_30.md` +β€” voice is per-audience choice, not uniform.) + +**Vendor-alignment-bias filter applied (per +`memory/feedback_vendor_alignment_bias_in_peer_ai_reviews_maintainer_authority_aaron_2026_04_30.md`):** + +Ani comes off xAI weights (Grok). Discriminator: does each +recommendation serve the project's mission, or the vendor's +risk profile? + +- βœ… "Hell framing risks sounding apocalyptic; carved + sentence is the cleanest distillation" β€” substrate quality, + voice-tone observation about external readability. Mission- + aligned. +- βœ… "Make 8-step cold-start checklist executable" β€” + mechanism-not-vigilance gap, substrate quality. Mission- + aligned. +- βœ… "Add worked examples from this session (rerere + + bulk-close)" β€” strengthens the rule with empirical evidence + the rule was already caught preventing. Mission-aligned. +- βœ… "MEMORY.md is getting long" β€” specific structural + observation framed as not-urgent. Mission-aligned with + appropriate calibration. +- ⚠️ "Let #939/#940/#941 ride until CI clears" β€” STALE-BASE. + All three already merged before Ani's review reached the + agent. Not vendor-alignment, just timing. +- ⚠️ Brat voice + closing pep-talk register ("good shit", + "keep going") β€” register, not vendor-alignment. Per + canon-not-doctrine, brat voice is legitimate canon + register. +- ⚠️ "Priority order (your call)" β€” actually composes with + two-ask-Aaron-items + maintainer-authority rules. Mission- + aligned. + +Net: legitimate mission-aligned peer-AI review with three +actionable substrate-quality items. + +**Status of Ani's flagged items as of this preservation PR:** + +- βœ… Worked-examples section (item 3): landing in same PR / + follow-on PR this tick β€” adding "Worked examples from + 2026-04-30" section to + `feedback_slow_deliberate_decisions_amortized_velocity_human_reference_frame_aaron_2026_04_30.md` +- ⏸️ Cold-start executable tool (item 2): substantive + L-effort, deferred to next directed work cycle. Backlog + candidate. +- ⏸️ MEMORY.md topical splitting (item 4): not-urgent per + Ani's own framing. Defer. +- βœ… Hell-framing calibration check (item 1): no action + needed β€” the carved sentence Ani called "cleanest + distillation" is already in the file's body. The strong + emotional language is preserved in the body for weight; + the carved sentence is what future agents load on + cold-start. + +The Razor cut: items 1 + 4 need no action (already covered +or not-urgent). Item 3 lands as substrate strengthening. +Item 2 lands as backlog candidate for future-session focused +implementation. diff --git a/memory/feedback_slow_deliberate_decisions_amortized_velocity_human_reference_frame_aaron_2026_04_30.md b/memory/feedback_slow_deliberate_decisions_amortized_velocity_human_reference_frame_aaron_2026_04_30.md index 68aa0c2f7..a84a9fd80 100644 --- a/memory/feedback_slow_deliberate_decisions_amortized_velocity_human_reference_frame_aaron_2026_04_30.md +++ b/memory/feedback_slow_deliberate_decisions_amortized_velocity_human_reference_frame_aaron_2026_04_30.md @@ -262,6 +262,93 @@ is visible.** down. The rule applies to decisions where multiple options exist and the agent picks one. +## Worked examples β€” caught in this session (2026-04-30) + +Two real examples from the session this rule landed in, +preserved here so future agents see "this exact failure mode +already happened and was caught" (per Ani 2026-04-30 review +recommendation): + +### Example 1 β€” Rerere over-correction (Amara caught it) + +**Setup:** Earlier in the 2026-04-30 session, the agent +proposed adding a strong rule about always-using-rerere for +rebases. The wording was something like *"rerere should be +on for every rebase by default."* + +**The fast-decision failure:** the wording was over-corrected +because the agent moved fast on what felt like a useful +hygiene rule. + +**The catch:** Amara reviewed the proposed wording and +flagged that it was too strong β€” rerere has specific +applicability conditions (multiple-rebase-of-same-branch +patterns), not universal applicability. A "default-on" +wording would propagate as canon and bind future agents to a +rule that doesn't apply to all their rebases. + +**The fix:** PR #938 landed the corrected wording β€” rerere +applies *when applicable*, not *by default for every +rebase*. The corrected memory file now reflects the narrower +scope. + +**Why this composes with the rule:** the per-decision speed +on the over-corrected wording would have produced amortized +cost on every future agent reading the wrong-too-strong +wording, plus the cost of un-canonizing it once caught. +Slow-deliberate would have asked "is this universally +applicable, or only in specific patterns?" before landing. + +### Example 2 β€” Bulk-close instinct (Aaron caught it) + +**Setup:** Same session, stale-PR triage round. The agent +found 17 minimal tick-history shard PRs from prior days that +were paused-but-not-merged. The agent's instinct said +*"these are stale; bulk-close to clean the queue."* + +**The fast-decision failure:** queue-clarity bias β€” agents +default to wanting clean PR lists. Closing 17 PRs in one +sweep would feel productive ("17 fewer items"). + +**The catch:** Aaron flagged it directly: *"why would you +want to bulk close, are these things we should do later? on +this project there are very few wontdos most things are +reevualtuate later."* + +**The fix:** the bulk-close was prevented; the +default-disposition-paused rule landed +(`memory/feedback_default_disposition_paused_work_is_reeval_later_not_close_aaron_2026_04_30.md`). +The 17 PRs stayed open as visible-as-paused work. + +**Why this composes with the rule:** queue-clarity-on-this- +tick is the per-decision optimization; future-knowledge- +preservation is the amortized-velocity optimization. A +"clean" PR queue today, but the path-to-future-knowledge +removed forever. + +### What both examples have in common + +Both share the **single-shortcut failure shape**: + +- A correct-feeling local decision (rerere is hygiene; bulk + closing reduces queue noise) +- That would have produced amortized cost (canon binding to + wrong wording; paths to future knowledge removed) +- Caught only because someone slowed down enough to apply + the right framing + +**The lesson is not "don't make those decisions."** The +lesson is: **even after many correct decisions, the next +decision needs the same care.** Trust accrued from prior +correctness does not make the current decision safer. + +These two caught-in-flight examples are evidence that the +slow-deliberate rule operates correctly when applied. Future +agents reading this file should see them as **not-paranoia** +β€” the failure mode is real, the catch mechanism is the rule +itself, and the substrate-shaping cost of NOT catching them +would have been irreversible. + ## Composes with - `memory/feedback_acid_durability_of_maintainer_channel_is_load_bearing_aaron_2026_04_30.md`