Skip to content

research: Aminata iteration-1 pass on multi-Claude experiment design (6 CRITICAL findings)#272

Merged
AceHack merged 1 commit intomainfrom
research/aminata-pass-on-multi-claude-experiment-design-iteration-1
Apr 24, 2026
Merged

research: Aminata iteration-1 pass on multi-Claude experiment design (6 CRITICAL findings)#272
AceHack merged 1 commit intomainfrom
research/aminata-pass-on-multi-claude-experiment-design-iteration-1

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 24, 2026

Summary

Third Aminata pass this session. Reviews the multi-Claude peer-harness experiment design (PR #270) before Otto begins iterating.

6 CRITICAL findings

  1. Criterion 3 (cross-session review) trivially gameable.
  2. Criterion 1 (clean tick-history) self-reportable.
  3. Cross-session prompt-injection via memory folder missing.
  4. Approval-laundering missing.
  5. Bullet-proof bar theatrical (gameable by 2 trivial synthetic-rows iterations).
  6. Otto-solo cannot surface cross-session-review failure modes.

Key recommendation shifts

  • Iteration 1 should be paired-worktree, NOT synthetic rows + subagent simulation. Subagent dispatch has structural problem (not a loop agent).
  • "Review yes, edit no" invariant should be restated with merge-conflict/dependency-ordering carve-out.
  • Bullet-proof bar should require >=1 highest-available-fidelity iteration.
  • Otto-solo un-testable modes should be acknowledged explicitly.

Authority

Advisory research-grade; not a gate; does not override Otto-93 iterate-to-bullet-proof framing — refines what bullet-proof requires.

Test plan

  • 12th aurora/research doc to self-apply §33 archive-header
  • Section-by-section coverage of 5 criteria + 8 failure modes + 4 mechanisms
  • Cross-cutting findings (bullet-proof bar, review-edit invariant, Otto-solo premise)
  • Classifications: 6 CRITICAL / 7 IMPORTANT / 1 WATCH / 0 DISMISS
  • Specific fix recommendations per finding

🤖 Generated with Claude Code

…ment design (advisory)

Third Aminata pass this session (after PR #241 5th-ferry
governance edits + PR #263 7th-ferry oracle rules). Reviews
the multi-Claude peer-harness experiment design (PR #270)
before iteration begins.

Six CRITICAL findings:

1. Criterion 3 (cross-session review) trivially gameable —
   one substantive comment clears the bar.
2. Criterion 1 (clean tick-history) self-reportable — a
   session that corrupts + rewrites tick-history passes.
3. Cross-session prompt-injection via memory folder missing
   from failure-mode table — higher-probability than direct
   output channel.
4. Approval-laundering missing — trust-based-approval +
   secondary's PR creates mutual-rubber-stamp risk.
5. Bullet-proof bar theatrical — gameable by running two
   trivial iterations on synthetic rows. Fix: require >=1
   iteration at highest-available-fidelity.
6. Otto-solo cannot surface cross-session-review failure
   modes — peer-mechanics require an actual peer. These
   must either be deferred to Aaron's Windows-PC run OR
   marked as stage-c-or-d concerns.

IMPORTANT findings (multiple): session-dominance should be
CRITICAL; wall-clock-approaches-bound should be IMPORTANT;
cron-multiplicity missing; substrate-version skew missing;
tick-history semantic-merge-race missing; retired-persona
name-collision missing; forced-integration edge cases not
handled in "review-yes-edit-no" invariant.

Key recommendation shifts for Otto's iteration plan:

- **Iteration 1 should be paired-worktree, NOT synthetic
  rows + subagent simulation.** Subagent dispatch has
  structural problem — subagent is not a loop agent; has
  no cron, no persistent memory, no tick-closing behaviour.
- "Review yes, edit no" invariant should be restated with
  carve-out for merge-conflict resolution + dependency-
  ordering with explicit PR-comment acknowledgement.
- Bullet-proof bar should require highest-available-
  fidelity at least once; lower-fidelity iterations count
  for design-refinement only.
- Otto-solo un-testable failure modes should be
  acknowledged explicitly (naming collision; divergent
  AGENTS.md interpretation; genuine review disagreement).

Archive-header format self-applied — 12th aurora/research
doc in a row.

Lands within-standing-authority per Otto-82/90/93
calibration — research-grade advisory; not a gate; does
not override Otto-93 "iterate to bullet-proof" framing
(refines what bullet-proof requires).

Otto-94 tick iteration-1 deliverable on the peer-harness
experiment design. Iteration 2 should integrate these
findings + run paired-worktree iteration at higher
fidelity.
Copilot AI review requested due to automatic review settings April 24, 2026 03:27
@AceHack AceHack enabled auto-merge (squash) April 24, 2026 03:27
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 9ca247e into main Apr 24, 2026
12 checks passed
@AceHack AceHack deleted the research/aminata-pass-on-multi-claude-experiment-design-iteration-1 branch April 24, 2026 03:28
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a research-grade red-team review document capturing Aminata’s iteration-1 findings on the multi-Claude peer-harness experiment design (PR #270), focusing on success-criteria gameability and missing failure modes.

Changes:

  • Adds a new research doc enumerating critical/important/watch findings and recommendations.
  • Includes cross-cutting analysis on “bullet-proof” criteria and the “review yes, edit no” invariant.
  • Provides a “Relevant paths” section intended to link to related design and prior research docs.

Comment on lines +5 to +6
(PR #270, branch `research/multi-claude-peer-harness-
experiment-design`) — specifically the five success
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch name code span is opened with a backtick but never closed (the line break splits the research/multi-claude-peer-harness-... token), so the rest of the paragraph can render as inline-code. Close the backtick and avoid splitting an inline-code span across lines.

Suggested change
(PR #270, branch `research/multi-claude-peer-harness-
experiment-design`) — specifically the five success
(PR #270, branch `research/multi-claude-peer-harness-experiment-design`) — specifically the five success

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +18
**Attribution:** findings authored by Aminata (threat-
model-critic persona, Claude Code, model
`claude-opus-4-7`). Source design authored by Otto
(Otto-93). Third adversarial pass this session (prior:
PR #241 5th-ferry governance edits; PR #263 7th-ferry
oracle rules). Same no-compliments discipline.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc includes direct contributor names (e.g., Aminata/Otto/Aaron). Repo convention forbids name attribution in docs/skill bodies; use role references instead (see docs/AGENT-BEST-PRACTICES.md:284-290). Also consider rephrasing “trust-based-approval… approves without comprehending” to a role-based, neutral description to avoid sounding personal.

Copilot uses AI. Check for mistakes.

Six of the eight rows rely on Otto-self-observation
("primary's timeout monitor"; "primary reviews secondary's
output as data"; "DRIFT-TAXONOMY pattern 1 scan in
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text references “DRIFT-TAXONOMY pattern 1”, but docs/DRIFT-TAXONOMY.md is not present in the repository (the docs tree references it, but the file is missing). Either add the missing doc, or adjust this reference to point at an existing precursor (e.g., a docs/research/*drift-taxonomy* doc) so readers can resolve the cited pattern.

Suggested change
output as data"; "DRIFT-TAXONOMY pattern 1 scan in
output as data"; "a drift-pattern scan in

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +166
outside the Claude session (a launcher script with
`timeout 1800`), not primary's own monitor.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout 1800 is suggested as an enforcement mechanism, but timeout is not available by default on macOS (common in this repo’s setup docs). If this is meant to be cross-platform, call out the platform requirement or provide an alternative (e.g., gtimeout via coreutils on macOS, or a .NET-based watchdog).

Suggested change
outside the Claude session (a launcher script with
`timeout 1800`), not primary's own monitor.
outside the Claude session (for example, a launcher
script using GNU `timeout 1800` on Linux, `gtimeout 1800`
on macOS when coreutils is installed, or another
external watchdog), not primary's own monitor.

Copilot uses AI. Check for mistakes.
Comment on lines +327 to +328
- [`docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md`](multi-claude-peer-harness-experiment-design-2026-04-23.md)
(under review, PR #270)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “Relevant paths” entry links to multi-claude-peer-harness-experiment-design-2026-04-23.md, but that file is not in docs/research/ in this branch, so the link is currently broken. If it’s only present in PR #270, consider linking to the PR (or to the file on that branch) instead of a relative path.

Suggested change
- [`docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md`](multi-claude-peer-harness-experiment-design-2026-04-23.md)
(under review, PR #270)
- `docs/research/multi-claude-peer-harness-experiment-design-2026-04-23.md`
(under review in PR #270; not present on this branch)

Copilot uses AI. Check for mistakes.
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
AceHack added a commit that referenced this pull request Apr 24, 2026
…a 8th ferry scheduled Otto-95

Split-attention tick: PR #272 Aminata third-pass adversarial
review of multi-Claude experiment design (6 CRITICAL + 7
IMPORTANT + 1 WATCH findings) + mid-tick Amara 8th-ferry
scheduling memory per CC-002 (8 consecutive ferries held).

Key observations:

1. Aminata's 3rd pass surfaced more CRITICAL findings per
   unit design than prior passes — adversarial review value
   compounds as design maturity increases.
2. Otto-93 design was wrong about iteration-1 mechanism
   choice; Aminata caught it before iteration wasted cycles.
3. Otto-solo-cannot-surface-peer-review-failures is
   architecturally load-bearing — bullet-proof redefinition
   required.
4. CC-002 held for 8 consecutive ferries; pattern reflexive.

Stacked on #271 (Otto-93 history).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants