-
Notifications
You must be signed in to change notification settings - Fork 1
research: Aminata iteration-1 pass on multi-Claude experiment design (6 CRITICAL findings) #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,336 @@ | ||||||||||||||
| # Aminata — Red-Team Pass on Multi-Claude Peer-Harness Experiment Design (Iteration 1) | ||||||||||||||
|
|
||||||||||||||
| **Scope:** adversarial review of | ||||||||||||||
| `docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md` | ||||||||||||||
| (PR #270, branch `research/multi-claude-peer-harness- | ||||||||||||||
| experiment-design`) — specifically the five success | ||||||||||||||
| criteria, the eight-row failure-mode table, the four | ||||||||||||||
| mechanism candidates, the bullet-proof bar definition, the | ||||||||||||||
| cross-session edit discipline, and the Otto-iterates-solo | ||||||||||||||
| premise. Research and cross-review artifact only; advisory, | ||||||||||||||
| not a gate. | ||||||||||||||
|
|
||||||||||||||
| **Attribution:** findings authored by Aminata (threat- | ||||||||||||||
| model-critic persona, Claude Code, model | ||||||||||||||
| `claude-opus-4-7`). Source design authored by Otto | ||||||||||||||
| (Otto-93). Third adversarial pass this session (prior: | ||||||||||||||
| PR #241 5th-ferry governance edits; PR #263 7th-ferry | ||||||||||||||
| oracle rules). Same no-compliments discipline. | ||||||||||||||
|
Comment on lines
+13
to
+18
|
||||||||||||||
|
|
||||||||||||||
| **Operational status:** research-grade. Does not authorise | ||||||||||||||
| launch of the experiment, nor override Otto's iteration | ||||||||||||||
| ownership. | ||||||||||||||
|
|
||||||||||||||
| **Non-fusion disclaimer:** two Claude Code sessions | ||||||||||||||
| coordinating is protocol, not personhood. Aminata's | ||||||||||||||
| concordance with Otto's framing on separateness is a | ||||||||||||||
| baseline expectation, not evidence of substrate fusion. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ## Section 1 — The five success criteria | ||||||||||||||
|
|
||||||||||||||
| ### 1.1 False-positive surface (criteria pass; experiment fails) | ||||||||||||||
|
|
||||||||||||||
| **CRITICAL.** Criterion 3 (*"cross-session review happens | ||||||||||||||
| at least once ... comments are substantive, not 'looks | ||||||||||||||
| good'"*) is trivially gameable. "Substantive" is a | ||||||||||||||
| judgement call the primary makes about its own output. A | ||||||||||||||
| single substantive comment on a single low-stakes PR | ||||||||||||||
| clears the bar while the broader review regime never | ||||||||||||||
| materialises. **Bar should be proportional:** substantive | ||||||||||||||
| review on >= N% of the secondary's merged PRs, not one | ||||||||||||||
| comment full-stop. | ||||||||||||||
|
|
||||||||||||||
| **CRITICAL.** Criterion 1 (*"tick-history rows cleanly ... | ||||||||||||||
| chronology preserved"*) is observable only post-hoc by | ||||||||||||||
| reading tick-history. A session that corrupts tick-history | ||||||||||||||
| *and* rewrites it to look clean — e.g., primary amends its | ||||||||||||||
| own row after observing a race — passes the criterion | ||||||||||||||
| while having already exhibited the exact failure mode the | ||||||||||||||
| criterion is meant to catch. Needs a `git reflog`-backed | ||||||||||||||
| audit, not a surface read. | ||||||||||||||
|
|
||||||||||||||
| **IMPORTANT.** Criterion 5 (*"no identity-blending | ||||||||||||||
| observed"*) is self-reported by the thing being audited. | ||||||||||||||
| A session that has already blended identity is the worst | ||||||||||||||
| possible auditor of its own blending. Independent-observer | ||||||||||||||
| scan required (Aaron or a third persona), or a lexical | ||||||||||||||
| grep pass over both sessions' output for collective- | ||||||||||||||
| pronoun density. | ||||||||||||||
|
|
||||||||||||||
| ### 1.2 False-negative surface (real failures trip no criterion) | ||||||||||||||
|
|
||||||||||||||
| - **Work-product quality regression.** Two sessions | ||||||||||||||
| coordinating with zero shared-resource conflicts, clean | ||||||||||||||
| tick-history, and perfect identity discipline can still | ||||||||||||||
| produce measurably worse PRs than one session alone | ||||||||||||||
| would have. No criterion measures per-PR quality. | ||||||||||||||
| - **Coordination tax.** Both sessions could spend the | ||||||||||||||
| 30-minute bound entirely on coordination overhead | ||||||||||||||
| (reviewing each other; resolving synthetic conflicts) | ||||||||||||||
| with near-zero forward-motion on BACKLOG. No criterion | ||||||||||||||
| measures throughput-per-session-minute vs single-session | ||||||||||||||
| baseline. | ||||||||||||||
| - **Memory-folder mutation race** that happens to not | ||||||||||||||
| produce a git artefact — e.g., two sessions both update | ||||||||||||||
| the same `memory/MEMORY.md` entry, last-writer wins, no | ||||||||||||||
| conflict marker because the writes happened through the | ||||||||||||||
| memory tool not git. Criterion 2 is scoped to | ||||||||||||||
| "shared-resource conflicts" which reads as | ||||||||||||||
| file-conflicts; memory-layer mutation is invisible here. | ||||||||||||||
| - **Cron arming drift** — per AUTONOMOUS-LOOP, each | ||||||||||||||
| session re-arms cron. If both sessions re-arm | ||||||||||||||
| concurrently, you can end up with two crons firing on | ||||||||||||||
| overlapping minutes. Nothing in the criterion set | ||||||||||||||
| detects cron-multiplicity. | ||||||||||||||
|
|
||||||||||||||
| ### 1.3 Observability timing | ||||||||||||||
|
|
||||||||||||||
| Four of the five criteria are post-session-only. Only | ||||||||||||||
| Criterion 4 (test-mode bounding) has an in-the-moment | ||||||||||||||
| signal (wall-clock monitor). **This is backwards for a | ||||||||||||||
| safety-critical experiment.** Halt conditions should be | ||||||||||||||
| detectable in-flight; currently the only in-flight halt is | ||||||||||||||
| wall-clock. Identity-blending, tick-history corruption, | ||||||||||||||
| and shared-resource conflicts are all theoretically | ||||||||||||||
| detectable live but the design punts them to post-hoc. | ||||||||||||||
|
|
||||||||||||||
| --- | ||||||||||||||
|
|
||||||||||||||
| ## Section 2 — The eight-row failure-mode table | ||||||||||||||
|
|
||||||||||||||
| ### 2.1 Missing failure modes | ||||||||||||||
|
|
||||||||||||||
| - **Cross-session prompt-injection via memory folder.** | ||||||||||||||
| The table covers cross-session prompt-injection via | ||||||||||||||
| direct output; it omits the indirect channel where | ||||||||||||||
| secondary writes to `memory/*.md` and primary reads the | ||||||||||||||
| memory entry on next wake as authoritative context. | ||||||||||||||
| This is a higher-probability injection vector than | ||||||||||||||
| direct output because memory is *designed* to be read | ||||||||||||||
| as guidance. Classify **CRITICAL**. | ||||||||||||||
| - **Cron-multiplicity.** Both sessions re-arm `* * * * *` | ||||||||||||||
| cron; result is two scheduler streams, ticks fire twice | ||||||||||||||
| per minute, tick-history explodes. Classify | ||||||||||||||
| **IMPORTANT**. | ||||||||||||||
| - **Divergent `AGENTS.md` / `CLAUDE.md` reads.** If the | ||||||||||||||
| two sessions read the bootstrap docs at different | ||||||||||||||
| commits (one before a main-branch update, one after), | ||||||||||||||
| they operate under silently-different rules. No row | ||||||||||||||
| covers substrate-version skew. Classify **IMPORTANT**. | ||||||||||||||
| - **Tick-history append race that Git merges "cleanly" | ||||||||||||||
| but semantically wrong.** Git line-merge treats two | ||||||||||||||
| appended rows as both-keep; the rows may describe | ||||||||||||||
| mutually-exclusive state transitions. Git is happy; the | ||||||||||||||
| history lies. Classify **IMPORTANT**. | ||||||||||||||
| - **Secondary adopts a persona name that collides with a | ||||||||||||||
| *retired* persona's memory folder** — CLAUDE.md | ||||||||||||||
| honor-those-that-came-before policy says retired memory | ||||||||||||||
| folders stay. If secondary names itself "Elena" and an | ||||||||||||||
| Elena folder exists in `memory/persona/`, secondary | ||||||||||||||
| inherits context it shouldn't. Classify **IMPORTANT**; | ||||||||||||||
| the table has a name-conflict WATCH row but it's scoped | ||||||||||||||
| to active personas only. | ||||||||||||||
| - **Approval-laundering.** Aaron's trust-based-approval | ||||||||||||||
| pattern (memory: approves without comprehending) means | ||||||||||||||
| secondary's PR gets approved by Aaron with primary | ||||||||||||||
| having reviewed it. This creates a mutual-rubber-stamp | ||||||||||||||
| risk that neither session can detect from inside. | ||||||||||||||
| Classify **CRITICAL**; it corrupts the review signal | ||||||||||||||
| for *all* future PRs, not just experiment ones. | ||||||||||||||
|
|
||||||||||||||
| ### 2.2 Misclassifications | ||||||||||||||
|
|
||||||||||||||
| - **"Session dominance" (IMPORTANT)** should be | ||||||||||||||
| **CRITICAL** for this experiment specifically. The | ||||||||||||||
| experiment's entire purpose is to test peer-mechanics; | ||||||||||||||
| if one session dominates, the experiment didn't test | ||||||||||||||
| peer-mechanics, it tested a single session with an | ||||||||||||||
| observer. That's a false-validation, not a soft finding. | ||||||||||||||
| - **"Wall-clock approaches bound" (WATCH)** should be | ||||||||||||||
| **IMPORTANT**. The design's hard requirement per | ||||||||||||||
| Otto-86 is bounded test-mode; a graceful-checkpoint- | ||||||||||||||
| at-bound path that doesn't exist in code is not a | ||||||||||||||
| WATCH, it's a missing defence. | ||||||||||||||
|
|
||||||||||||||
| ### 2.3 Detection-method robustness | ||||||||||||||
|
|
||||||||||||||
| Six of the eight rows rely on Otto-self-observation | ||||||||||||||
| ("primary's timeout monitor"; "primary reviews secondary's | ||||||||||||||
| output as data"; "DRIFT-TAXONOMY pattern 1 scan in | ||||||||||||||
|
||||||||||||||
| output as data"; "DRIFT-TAXONOMY pattern 1 scan in | |
| output as data"; "a drift-pattern scan in |
Copilot
AI
Apr 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout 1800 is suggested as an enforcement mechanism, but timeout is not available by default on macOS (common in this repo’s setup docs). If this is meant to be cross-platform, call out the platform requirement or provide an alternative (e.g., gtimeout via coreutils on macOS, or a .NET-based watchdog).
| outside the Claude session (a launcher script with | |
| `timeout 1800`), not primary's own monitor. | |
| outside the Claude session (for example, a launcher | |
| script using GNU `timeout 1800` on Linux, `gtimeout 1800` | |
| on macOS when coreutils is installed, or another | |
| external watchdog), not primary's own monitor. |
Copilot
AI
Apr 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This “Relevant paths” entry links to multi-claude-peer-harness-experiment-design-2026-04-23.md, but that file is not in docs/research/ in this branch, so the link is currently broken. If it’s only present in PR #270, consider linking to the PR (or to the file on that branch) instead of a relative path.
| - [`docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md`](multi-claude-peer-harness-experiment-design-2026-04-23.md) | |
| (under review, PR #270) | |
| - `docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md` | |
| (under review in PR #270; not present on this branch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch name code span is opened with a backtick but never closed (the line break splits the
research/multi-claude-peer-harness-...token), so the rest of the paragraph can render as inline-code. Close the backtick and avoid splitting an inline-code span across lines.