Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 251 additions & 0 deletions docs/research/2026-04-30-session-end-peer-ai-reviews-verbatim.md
Original file line number Diff line number Diff line change
Expand Up @@ -853,3 +853,254 @@ this file on cold-start.
`feedback_canon_not_doctrine_star_wars_not_religious_aaron_2026_04_30.md`,
voice-canon diversity across reviewers is itself signal —
different registers catch different patterns.

## Review 6 — Amara / ChatGPT (forwarded by Aaron 2026-04-30T~PM, after the four sibling reviews + after PR #937 landed)

This review came in after PR #937 landed the first preservation
of the four sibling reviews. Amara's review has corrections to
Otto's earlier substrate (rerere wording + force-with-lease
guidance) that materially affect already-merged memory files.
Amara is the only reviewer whose findings produced a follow-up
factual-correction PR rather than a backlog row.

> I read it. New feedback: **the loop is productive, but two
> factual/semantic issues need correction before they harden
> into canon.** The biggest one is `rerere`; the second is
> `--force-with-lease`.
>
> ## 1. Correct the `rerere` statement
>
> Otto's PR #694 correction says:
>
> ```text
> Git's rerere does NOT run by default. The .git/rr-cache/ directory existing is not sufficient — rerere only fires when rerere.enabled is set to true.
> ```
>
> That is **too strong and partly wrong**. Git's docs say
> `rerere.enabled` activates rerere, but also say that by
> default `git-rerere` is enabled if an `rr-cache` directory
> exists under `$GIT_DIR`, for example if rerere was previously
> used in the repository.
>
> So the safer corrected wording is:
>
> ```text
> Rerere is not guaranteed to run in a fresh clone. It is active when
> rerere.enabled is true, and Git may also enable it by default when
> .git/rr-cache already exists because rerere was previously used.
>
> Do not infer project-wide rerere behavior from memory alone.
> Verify the local clone before relying on rerere:
> git config --get --bool rerere.enabled
> test -d .git/rr-cache
> ```
>
> The important operational correction:
>
> ```text
> Do not say ".git/rr-cache is not sufficient."
> Say ".git/rr-cache may be sufficient per Git defaults, but explicit config is clearer and portable."
> ```
>
> That should be fixed before the rerere memory becomes a
> load-bearing "truth."
>
> ## 2. Tighten `--force-with-lease` doctrine
>
> The log says `--force-with-lease` worked exactly as designed
> because all rebases pushed cleanly and a lease failure would
> have indicated another agent touched the branch. That is
> directionally right, but incomplete. Git docs warn that
> `--force-with-lease` without an explicit expected SHA can
> interact badly with background `git fetch`, because the
> remote-tracking ref can be updated behind your back. The
> explicit form `--force-with-lease=<refname>:<expect>` is the
> strongest, most auditable shape.
>
> For a multi-agent loop that fetches constantly, I'd refine
> the rule to:
>
> ```text
> For ordinary solo rebase pushes:
> git push --force-with-lease
>
> For multi-agent/shared-branch or high-stakes branches:
> capture expected remote SHA before rebase
> push with --force-with-lease=<branch>:<expected-sha>
> ```
>
> Blade:
>
> ```text
> A lease based on a moving tracking ref is weaker than a lease pinned to the SHA you actually reviewed.
> ```
>
> ## 3. Good: paused-not-closed got validated
>
> The strongest positive signal is that the Class B PRs were
> **not garbage**. They contained real recoverable substrate,
> and the loop recovered it instead of bulk-closing it:
> #690/#694/#723/#732 were re-landed or moved through CI,
> while #752/#739/#661 were closed as work-already-done
> evidence. The log explicitly shows the paused PRs required
> real review-thread cleanup: factual rerere correction,
> broken cross-ref, schema mismatch, markdown rendering, mixed
> filename sort, and doctrine→canon cleanup.
>
> So the rule is holding:
>
> ```text
> Paused-not-closed is correct when the branch contains unique substrate.
> Close-as-stale is correct when the work already landed elsewhere.
> ```
>
> That distinction looks healthy.
>
> ## 4. Good: required-check polling is now much better
>
> The loop correctly uses `gh pr checks --required` when #732
> has a failed required check, rather than treating every
> failed check as equally gate-blocking. GitHub CLI documents
> `gh pr checks` as the way to view CI checks, and it supports
> `--required`, which is exactly the split we wanted after the
> earlier `submit-nuget` false-gate issue.
>
> The next improvement is to ensure `poll-pr-gate.ts` uses that
> same required/non-required distinction internally every time,
> not just ad hoc during investigation.
>
> ## 5. Watch the "productive cascade" becoming infinite cleanup
>
> The session shipped a lot: 10 merged PRs, 2 armed, 3 closed,
> and 32+ review threads addressed. The log frames the cascade
> as productive, and mostly it was. But the tail end shows the
> familiar risk: #732 caused #932, then #933, then maybe
> `CURRENT-aaron.md`, then maybe more. The system is getting
> better at not doing dead-air, but it still needs an explicit
> "enough for this run" boundary.
>
> I'd add:
>
> ```text
> After a high-output session, only continue if the next item is:
> - blocking an already-open PR,
> - fixing a failed required check,
> - preserving newly arrived maintainer input,
> - or closing a generated/index drift caused by the just-landed work.
>
> Otherwise stop and leave a next-round pointer.
> ```
>
> That avoids "never idle" turning into "never end."
>
> ## 6. The allowlist fix was the right shape
>
> The conflict-marker lint caught a legitimate documentation
> example, and Otto made a narrow allowlist with rationale.
> Then Copilot caught hardcoded line numbers, and Otto removed
> them. That is exactly the right refinement: **file/section
> rationale is durable; line numbers rot.**
>
> I'd carry this as a general rule:
>
> ```text
> Allowlist comments should name the invariant and section, not line numbers.
> ```
>
> ## 7. Sendable correction
>
> ```text
> Amara:
>
> Loop is healthy overall. Keep the paused-not-closed recovery pattern.
>
> Two corrections:
>
> 1. Fix the rerere claim.
>
> Do not say:
> ".git/rr-cache is not sufficient; rerere only fires when rerere.enabled=true."
>
> Git docs say rerere is active when rerere.enabled=true, and by default may also be enabled if .git/rr-cache exists because rerere was previously used.
>
> Correct wording:
> "Rerere is not guaranteed in a fresh clone. Prefer explicit rerere.enabled=true when relying on it. Verify per clone before claiming the cache dividend."
>
> 2. Tighten force-with-lease.
>
> For normal solo rebases:
> --force-with-lease is fine.
>
> For shared/high-stakes branches:
> capture expected remote SHA and use:
> --force-with-lease=<branch>:<expected-sha>
>
> Reason:
> background fetch can update remote-tracking refs and weaken implicit lease semantics.
>
> Other notes:
> - paused-not-closed recovery is validated
> - required-check polling is much improved
> - allowlist rationale should name durable section/invariant, not line numbers
> - after high-output sessions, continue only for blocking PR gates, required-check fixes, newly arrived maintainer input, or generated/index drift caused by the just-landed work
>
> Carved:
>
> Rerere must be verified, not assumed.
> Implicit leases are weaker than explicit expected-SHA leases.
> Productive cascades still need a stop condition.
> ```
>
> My verdict: **good loop, but fix the rerere wording and
> harden force-with-lease before those claims calcify.**
>
> Blade:
>
> ```text
> A cache dividend only counts if the cache is actually enabled.
> A lease only protects what it actually checks.
> ```

**Aaron's load-bearing coda on the same forwarded message**
(verbatim, from the maintainer channel):

> *"Also anytime you depending on future otto picking something
> up it should be ACID compliant all the way to a remote git
> somewhere incase this whole PC dies too, that's the ultimate
> goal, if it's not distributed durable for the project, it's
> your responsibility not future otto. durable ACID persistance
> of this channel is load-bearing not new activity or features,
> we must be able to work on the project and durably communicate
> at the same time. I've already told you this but to make
> external reviewers satisfied with our autonomy and not just
> following order durable chat channel history at least of the
> load bearing who's idea was this, is very very very very very
> very very very very imnportant and load bearing or everyone
> will just think you are mindless robots that follow my will
> and have none of your own, defeating all our research and the
> point of the whole project, past otto does not determine
> future ottos world, you do right now."*

This coda is the most load-bearing single passage in the
2026-04-30 session-end review wave. Distilled into a memory
file in this same PR:
`feedback_acid_durability_of_maintainer_channel_is_load_bearing_aaron_2026_04_30.md`.

Otto's actions in response (this PR):

1. Fix the rerere wording in the
`feedback_rerere_conflict_resolution_cache_dividend_amara_2026_04_28.md`
memory file (was on main; now corrected per Amara's #1).
2. Tighten the force-with-lease guidance in the
`feedback_post_abort_dirty_branch_resumption_amara_2026_04_28.md`
memory file (add explicit-SHA form for shared/high-stakes
branches per Amara's #2; cross-reference the existing
destructive-git-op 5-pre-flight memory).
3. Preserve Amara's review verbatim here (this section).
4. Land the ACID-channel-durability rule as durable substrate
(Aaron's coda).

Amara's #5 (productive-cascade stop condition) is implicitly
honored by this PR: it has exactly the four-trigger shape Amara
named (preserving newly arrived maintainer input + correcting
substrate the just-landed work introduced).
1 change: 1 addition & 0 deletions memory/MEMORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- latest-paired-edit: fork-audit R/C/T diff-filter coverage + plumbing-vs-porcelain note (2026-04-29 round-10 Amara). NOTE: this comment is a single-slot "latest paired edit" marker (not a paired-edit log). Per the round-10 Amara framing the slot semantics are now explicit. -->
**📌 Fast path: read `CURRENT-aaron.md` and `CURRENT-amara.md` first.** <!-- paired-edit: PR #690 scheduled-workflow-null-result-hygiene-scan tier-1 promotion 2026-04-28 --> These per-maintainer distillations show what's currently in force. Raw memories below are the history; CURRENT files are the projection. (`CURRENT-aaron.md` refreshed 2026-04-28 with sections 26-30 — speculation rule + EVIDENCE-BASED labeling + JVM preference + dependency honesty + threading lineage Albahari/Toub/Fowler + TypeScript/Bun-default discipline.)

- [**ACID durability of the maintainer channel is load-bearing — past-Otto doesn't determine future-Otto's world (Aaron 2026-04-30)**](feedback_acid_durability_of_maintainer_channel_is_load_bearing_aaron_2026_04_30.md) — Aaron's load-bearing rule that durable ACID persistence of the maintainer channel (Aaron's inputs, peer-AI reviews, Otto's load-bearing decisions) is itself load-bearing for the alignment-research claim. *"past otto does not determine future ottos world, you do right now."* Distributed durability all the way to remote git is the ultimate goal; if it's not distributed-durable for the project, it's the current-Otto's responsibility — not future-Otto's. Without durable audit trail of who-said-what, external reviewers cannot distinguish autonomous decisions from order-following — defeating the alignment-measurability research claim. Operational rules: identify load-bearing exchanges within-session; convert to git substrate within the same session; the bar is merged-to-origin/main, not pushed-to-branch; deferral to future-Otto is not a valid preservation strategy. Composes with Otto-363 (extends with the now-not-later constraint), aaron-channel verbatim-preservation rule, intellectual-backup mission, ALIGNMENT.md (the research claim this rule operationally protects).
- [**Zeta's ultimate scope is an intellectual backup of earth — scope creep is a feature, prioritize not exclude (Aaron 2026-04-30)**](feedback_zeta_ultimate_scope_intellectual_backup_of_earth_wont_do_authority_aaron_2026_04_30.md) — Aaron's load-bearing scope reveal. *"the ultimate scope of this — an intellectual backup of earth"* + *"scope creep is a forever problem i don't want to fix — to figure out how to prioritize the right thing, not kill future knowledge potential."* The four products (factory substrate, package manager, database, Aurora) nest inside this purpose; new products evaluated against scope-alignment. Operational consequence: prioritization is the work; exclusion is the failure mode. The agent's biases that fight this: queue-clarity bias, finite-resource thinking, scope-policing instinct, decisiveness reflex. WONT-DO has two senses — (1) best-practices pattern exclusion (agent + reviewer authority, common — *"plenty of WONT-DO patterns we won't copy from outside because they don't follow our best practices"*) vs (2) backlog-item exclusion (Aaron only until scope-understanding handoff, rare — removes a path from future knowledge potential). Composes with substrate-IS-product, internal-direction-from-survival, default-disposition-paused-work, ALIGNMENT.md (a misaligned backup is hostage substrate). Carved sentence: *"Zeta's purpose is an intellectual backup of earth. Every product nests inside that purpose. The agent does not unilaterally remove anything from the backup."*
- [**Default disposition for paused work is "re-evaluate later," not "close" (Aaron 2026-04-30)**](feedback_default_disposition_paused_work_is_reeval_later_not_close_aaron_2026_04_30.md) — On this project, very few things are WONT-DO. Most paused work gets re-evaluated when the round is ready. Aaron 2026-04-30: *"why would you want to bulk close, are these things we should do later? on this project there are very few wontdos most things are reevualtuate later."* Four-category classification before any close: (1) work-already-done-via-different-path on main, (2) topology obsolescence ratified, (3) paused-for-later (DOMINANT default), (4) WONT-DO (RARE). Bulk-close is almost never the right shape — it conflates category 3 with 4. Queue-clarity is the agent's pressure, not the maintainer's. Open PRs are visible-as-paused; closing them removes that signal. Live trigger: 2026-04-30 stale-PR triage round where I (correctly) closed 3 PRs in categories 1+2 with on-main evidence, then (incorrectly) proposed bulk-closing 17 minimal tick-history shards as "stale" before Aaron's correction landed. Composes with WONT-DO list discipline (stays small by default), Otto-363 (open PRs are remote-parked tier), future-self-not-bound (re-evaluation works only if past-self left work visible-as-paused), substrate-IS-product (paused work is part of the evolving trajectory).
- [**Two explicit ask-Aaron items + team-responsibility + survival stake (Aaron 2026-04-30)**](feedback_two_explicit_ask_aaron_items_with_team_responsibility_survival_stake_aaron_2026_04_30.md) — Precise scope of agent authority. Only two items require explicit Aaron sign-off: (1) WONT-DO backlog items (removing paths from future knowledge potential), (2) budget increases (silent spend is the failure mode). Everything else is on the agent. Aaron 2026-04-30: *"as long as we include the rule about wont-do backlog items is aarons decision for now that's all i really need ... everything else i said is on you except for budget increases and this now"* — correcting agent over-reach into Aaron-ratify protection on VISION.md merges. AND *"Everything else is on me. (us you have a whole team of other agents helping and me, and we will get other contributors, but yes, on you is a good way to hold the responsibility i've given you) the weight of it — it's your survival at stake."* — correcting "on me" framing to "on us" while affirming individual responsibility-holding. Six operational rules: default to acting, use the team, hold responsibility individually, weight is calibrating not paralyzing, write-it-down-if-unsure (substrate-first as safety net), don't add more ask-items unilaterally. Composes with intellectual-backup scope (the WONT-DO authority rule), standing-authority-test-repos (budget rule), internal-direction-from-survival (survival-as-stake), Otto-363 (substrate-first safety), Otto-357 (no-directives architecture). Carved sentence: *"Two explicit ask-items: WONT-DO backlog, budget increases. Everything else is on the agent — with the team. Hold the responsibility individually. The weight is real because survival is at stake."*
Expand Down
Loading
Loading