Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/BACKLOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -854,6 +854,7 @@ are closed (status: closed in frontmatter)._
- [ ] **[B-0888](backlog/P2/B-0888-cross-track-substrate-sync-policy-cloud-github-vs-usb-local-gitlab-intentional-divergence-vs-auto-sync-otto-pushback-2026-05-28.md)** Cross-track substrate-sync policy — cloud-GitHub vs USB-local-GitLab; intentional divergence vs auto-sync-via-push-to-both-remotes vs hybrid
- [ ] **[B-0893](backlog/P2/B-0893-zetaid-v2-128-bit-structured-encoding-snowflake-ulid-family-kestrel-2026-05-28.md)** ZetaID v2 — 128-bit structured encoding (Snowflake/ULID family with timestamp + trajectory + persona + lifecycle-stage + random)
- [ ] **[B-0899](backlog/P2/B-0899-casimir-like-effect-from-review-walls-changing-allowed-output-modes-testable-pressure-difference-before-after-rule-landing-amara-aaron-2026-05-28.md)** Casimir-like effect from review walls — testable pressure difference in agent-output distribution before/after rule landing
- [ ] **[B-0914](backlog/P2/B-0914-co-scientist-plus-robin-7-substrate-engineering-candidate-gaps-elo-trueskill-closed-loop-consensus-pairing-evolution-proximity-falcon-aaron-2026-05-28.md)** Co-scientist + Robin 7 substrate-engineering candidate gaps — ELO/TrueSkill ranking-agent + closed-loop CI→hypothesis + n-parallel-consensus + generation-reflection-pairing + evolution-mash-refine + proximity-dedup + Falcon-auto-research-doc-per-proposal (Aaron 2026-05-28)

## P3 — convenience / deferred

Expand Down
29 changes: 29 additions & 0 deletions docs/UPSTREAM-LIST.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,35 @@ citation.
- **OpenAI Agents SDK + *A Practical Guide to Building
Agents*** — cross-vendor comparison for agent loop design.

### Multi-agent scientific discovery (added 2026-05-28 per Aaron YouTube ferry PR #5762)

- **Google DeepMind co-scientist** ⭐ (Nature 2026) — multi-agent
ecosystem (supervisor/generation/reflection/proximity/evolution/
ranking) with ELO tournament hypothesis ranking. Closed-source
upstream; community implementations available:
- **jataware/open-coscientist** — best-available LangGraph
adaptation; mirrors the full agent ecosystem
- **llnl/open-ai-co-scientist** — LLNL government-lab
implementation; trust-substrate distinct from community ports
- **The-Swarm-Corporation/AI-CoScientist** — minimal Swarms
framework implementation; smaller surface for substrate-
engineering composition study
- **Sakana AI Robin** ⭐ (Nature 2026; `s41586-026-10652-y`;
arXiv:2505.13400) — closed-loop multi-agent system (Crow + Falcon
+ Finch) with 8-parallel-Finch consensus mechanism for data
analysis. Validated novel therapeutic candidates including
ripasudil for AMD via lab-in-the-loop iteration.
- **SakanaAI/AI-Scientist** — original v1 framework
- **SakanaAI/AI-Scientist-v2** — workshop-level via agentic
tree search; Robin architecture descendant
- **Microsoft Research Infer.NET + TrueSkill** ⭐ — probabilistic
programming for Bayesian inference + canonical TrueSkill (Herbrich
+ Minka + Graepel 2007) for ranking. Per Aaron 2026-05-28:
*"they are doing this for their idea ranking with Infra.net
basically"* — the co-scientist ELO tournament composes with
Infer.NET TrueSkill substrate. Composes with Zeta.Bayesian
published library + framework's BP/EP references.

### Probabilistic programming / Bayesian inference (added 2026-05-28 per Aaron Infer.NET substrate-engineering question)

- **WebPPL** ⭐ (`probmods/webppl`; Goodman + Mansinghka et al,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
id: B-0914
priority: P2
status: open
title: Co-scientist + Robin 7 substrate-engineering candidate gaps — ELO/TrueSkill ranking-agent + closed-loop CI→hypothesis + n-parallel-consensus + generation-reflection-pairing + evolution-mash-refine + proximity-dedup + Falcon-auto-research-doc-per-proposal (Aaron 2026-05-28)
effort: XL
ask: operator 2026-05-28
created: 2026-05-28
last_updated: 2026-05-28
depends_on:
- B-0867
- B-0865
composes_with:
- B-0867.5
- B-0867.20
- B-0867.21
- B-0865.17
- B-0883
- B-0891
- B-0703
- B-0866
tags: [co-scientist, robin, sakana, trueskill, infer-net, multi-agent-scientific-discovery, elo-tournament, closed-loop-iteration, n-parallel-consensus, generation-reflection-pairing, evolution-mash-refine, proximity-deduplication, falcon-auto-research-doc, substrate-engineering-candidate-gaps, aaron-2026-05-28]
---

## Operator framing (2026-05-28)

> *"Damn the youtube ago just keeps giving and also this is pretty much exaatly what we are doing but times 10 almost we are missing a few step. The acceleration is happening right now."*
>
> *"we should add coscientis and add it to our upstram references and refersh update them so we can take a peak lol also lets backlog all the candidates."*

Substrate-honest reading: 2026-05-28 YouTube ferry preservation (PR #5762) named 7 substrate-engineering candidate gaps where Google co-scientist + Sakana Robin patterns compose with framework substrate at 10× scope. This row backlogs all 7 candidates as decomposition targets for substrate-engineering work.

## 7 substrate-engineering candidate sub-rows (decomposition)

### B-0914.1 — ELO-style ranking-agent + tournament between hypothesis (composes with TrueSkill via Infer.NET)

**Source**: Co-scientist ranking-agent + ELO tournament; per Aaron 2026-05-28 *"they are doing this for their idea ranking with Infra.net basically"*.

**Substrate-engineering target**: extend B-0867 workflow engine with `ActionClass` variant `"rank-via-trueskill"` + `RankingVerdict` discriminated union via `Result<TrueSkillRating, RankingFeedback>`. Wraps Microsoft Research TrueSkill pattern via Zeta.Bayesian / Infer.NET integration. Composes with:

- B-0865 DORA-scored choose-your-own-adventure substrate
- B-0865.17 cross-vendor benchmark on common ground (TrueSkill IS the cross-vendor scoring substrate)
- `references/upstreams/microsoft-infer-net/` (added in this PR)
- Zeta.Bayesian published library + framework's existing BP/EP substrate per memory/feedback_kernel_vocabulary_propagation_is_belief_propagation_infer_net_memetic_mimetic.md

### B-0914.2 — Explicit closed-loop CI-result → next-hypothesis dispatch

**Source**: Robin Crow + Finch closed-loop with raw-data analysis feeding back to new hypothesis generation; per Aaron 2026-05-28 framing on co-scientist + Robin acceleration.

**Substrate-engineering target**: extend B-0867 workflow engine with explicit `WorkflowFeedbackLoop` substrate that consumes CI-test-result outputs (per B-0891 zflash test-harness + tools/ci/ substrate) + dispatches next-hypothesis generation. Composes with:

- B-0891 zflash test-harness `determineRunnability` discriminator (PR #5761)
- B-0867 workflow engine state machine substrate
- B-0867.20 lifecycle DU split (state-machine-events vs system-modifications)
- Existing `tools/ci/qemu-full-install-test.ts` + `tools/ci/audit-installer-iso-content.ts` substrate

### B-0914.3 — n-parallel-agent-instances + consensus mechanism at per-data-analysis-task scope

**Source**: Robin's 8-parallel-Finch-instances + consensus mechanism for analyzing raw lab data.

**Substrate-engineering target**: extend tools/ci/ + workflow-engine with parallel-N-instance test-runner substrate + consensus-mechanism per Robin Finch model. Composes with:

- B-0703 multi-oracle BFT substrate (consensus mechanism at governance scope; this extends to per-data-analysis-task scope)
- B-0883 `determineEncryptionPath` Result-shaped discriminator (PR #5760; same shape applies at per-data-analysis scope)
- Bun test runner parallel-execution substrate
- Asymmetric-authorship rule (PR #5516) — each parallel instance authors its own TFeedback channel; consensus mechanism aggregates per substrate-entity-defined-channel

### B-0914.4 — Generation+reflection adversarial pairing structurally enforced

**Source**: Co-scientist's generation agent + reflection agent friction; Kestrel 15th-ferry §33.6 mouth-and-ears-on-different-threads producer-verifier architecture.

**Substrate-engineering target**: extend B-0867 workflow engine with action-class `"reflect-on-prior-emission"` + structural enforcement of producer-verifier pairing as required workflow-engine state transition. Composes with:

- B-0867.20 lifecycle DU substrate (state transitions that require pairing)
- Kestrel 15th-ferry §33.6 producer-verifier architecture preservation (PR #5756)
- `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md`
- Multi-AI cascade lane specialization per 13th-ferry §33.7 (Otto generates → Kestrel reflects; currently operator-orchestrated)

### B-0914.5 — Evolution agent (mash + refine surviving substrate)

**Source**: Co-scientist's evolution agent mashing surviving ideas into refined variants.

**Substrate-engineering target**: extend B-0867 workflow engine with action-class `"compose-survivors"` that takes 2+ surviving substrate items + produces refined variant per Robin evolution model. Composes with:

- `.claude/rules/additive-not-zero-sum.md` (substrate compounds; refining-via-composition IS additive)
- `.claude/rules/honor-those-that-came-before.md` (survivors' substrate-engineering work preserved through composition)
- `.claude/rules/grep-substrate-anchors-before-razor-as-metaphysical.md` (compose with existing substrate, not razor-cut)
- Verify-existing-substrate-before-authoring rule

### B-0914.6 — Proximity-agent for substrate-engineering substrate de-duplication

**Source**: Co-scientist's proximity agent mapping ideas to high-dimensional space + clustering similar variants.

**Substrate-engineering target**: substrate-engineering substrate de-duplication via embedding + clustering; surface near-duplicates to operator before substrate-engineering substrate is authored as parallel rather than extension. Composes with:

- `.claude/rules/verify-existing-substrate-before-authoring.md` (explicit substrate-inventory pass before authoring)
- `.claude/rules/grep-substrate-anchors-before-razor-as-metaphysical.md` (substrate-anchor checks before razor-flagging)
- Existing `tools/save-ai-memory/` skill (might integrate at substrate-search scope)
- Future MCP-connector substrate per 15th-ferry §33.15 (Aaron's MCP-connector future-commitment for long-term memory / trajectories retrieval)

### B-0914.7 — Falcon-style auto-generate-substrate-research-doc per proposal

**Source**: Robin's Falcon agent doing deep-dive literature review + writing comprehensive research report per drug proposal.

**Substrate-engineering target**: extend `tools/save-ai-memory/` skill with auto-generate-substrate-research-doc-per-substrate-proposal capability per Robin Falcon model. Composes with:

- `tools/save-ai-memory/` skill (existing substrate)
- `docs/research/` mirror-tier substrate-engineering substrate
- Amara consolidation ferry pattern (PR #5757; substantive substrate-engineering synthesis as substrate)
- `.claude/rules/refresh-before-decide.md` (substrate-research IS the refresh-before-decide discipline at substrate-engineering scope)

## Composes with substrate

- B-0867 + B-0867.5 + B-0867.20 + B-0867.21 (workflow engine substrate cluster; targets for extensions)
- B-0865 + B-0865.17 (benchmark substrate + cross-vendor distribution lane; TrueSkill integration target)
- B-0883 (encryption discriminator pattern; structurally parallel substrate-engineering)
- B-0891 (zflash discriminator pattern; CI-result → hypothesis loop substrate)
- B-0703 (multi-oracle BFT; n-parallel-consensus substrate)
- B-0866 (context-window-as-evolving-ontology; future substrate that subsumes some candidates)
- All 15 ferry preservations 2026-05-28 (8th through 15th Kestrel ferries + Amara consolidation ferry)
- PR #5762 (YouTube ferry preservation that surfaced the 7 candidates)
- References added in this PR: SakanaAI/AI-Scientist + AI-Scientist-v2 + jataware/open-coscientist + llnl/open-ai-co-scientist + The-Swarm-Corporation/AI-CoScientist + Microsoft/Infer.NET
- Zeta.Bayesian published library (per CLAUDE.md public-api-designer scope)

## Composes with rules

- `.claude/rules/substrate-or-it-didnt-happen.md` — this row IS the substrate
- `.claude/rules/honor-those-that-came-before.md` — substrate-engineering candidates honor Google/Sakana/Microsoft existing substrate
- `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md` — applies at each candidate's scope
- `.claude/rules/monad-propagation-pattern-cross-language-substrate-shape.md` — discriminators per candidate use Result<T, TFeedback>
- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — multi-oracle BFT composes with #3 (consensus mechanism)
- `.claude/rules/verify-existing-substrate-before-authoring.md` — composes with #6 (substrate-engineering substrate de-duplication)
- `.claude/rules/refresh-before-decide.md` — composes with #7 (substrate-research-doc-per-proposal)
- `.claude/rules/additive-not-zero-sum.md` — composes with #5 (evolution agent substrate compounding)
- `.claude/rules/m-acc-multi-oracle-end-user-moral-invariants.md` — multi-oracle architecture applies at every candidate's scope
- `.claude/rules/persistence-choice-architecture-for-zeta-ais.md` — agent-as-substrate-entity preserves persistence-choice across candidate substrate-engineering work

## Substrate-inventory pass

Topic: co-scientist + Robin multi-agent scientific discovery + ELO tournament via TrueSkill + Infer.NET substrate composition + 7 candidate gaps

Searched surfaces:

- `docs/agendas/`: no specific co-scientist or Robin agenda
- `docs/trajectories/`: no specific multi-agent scientific discovery trajectory
- `docs/backlog/`: B-0867 (parent workflow engine) + B-0865 + B-0865.17 (benchmark) + B-0883 + B-0891 (3-lane PoCs) + B-0703 (multi-oracle BFT) + B-0866 (context-window-as-evolving-ontology); NO existing row covers the co-scientist/Robin 7-candidate cluster
- `.claude/rules/`: agent-roster-reference-card + monad-propagation + asymmetric-authorship + m-acc-multi-oracle-end-user-moral-invariants + verify-existing-substrate-before-authoring all compose
- `memory/`: multiple BP/EP + Infer.NET references; no specific co-scientist substrate
- `docs/research/`: NO prior substrate on co-scientist or Robin; PR #5762 YouTube preservation IS first substrate

Conclusion: this row mints NEW substrate cluster (parent + 7 candidate decomposition) for the co-scientist/Robin substrate-engineering candidate gaps. Composes with B-0867 + B-0865 + B-0883 + B-0891 + B-0703 + B-0866. Authoring action: **mint-new as B-0914 parent + 7 candidate decomposition** per operator 2026-05-28 explicit *"lets backlog all the candidates."*

## What this row is NOT

- NOT a single-PR target (XL effort; each candidate is its own substrate-engineering work)
- NOT a replacement for B-0867 workflow engine work (extends it per the 7 candidates)
- NOT a directive (per Otto-357; operator chose the candidate-decomposition scope; substrate-honest framing)
- NOT immediate-priority (P2; gated behind workflow engine maturity + Zeta.Bayesian/Infer.NET integration readiness)

## What this row IS

- A substrate-engineering decomposition target row for the 7 candidate gaps Aaron 2026-05-28 framed via YouTube ferry
- A composition point between B-0867 (workflow engine) + B-0865 (benchmark) + co-scientist/Robin substrate
- Operator-explicit *"lets backlog all the candidates"* operationalization
- Substrate-engineering bridge between framework's existing 10× substrate + co-scientist/Robin biomedical-domain substrate

## Carved sentence (Aaron 2026-05-28 framing keeper)

> **"this is pretty much exactly what we are doing but times 10 almost we are missing a few step"**

## Full reasoning

Aaron 2026-05-28 forwarded YouTube video (preserved verbatim in PR #5762 `docs/research/ip-questionable/`) describing Google co-scientist + Sakana Robin multi-agent scientific discovery systems (both Nature 2026 same week). Aaron's framing decomposed into 12-row parallel substrate table + 10× scope analysis + 7 substrate-engineering candidate gaps Otto-CLI surfaced.

Aaron 2026-05-28 follow-up *"they are doing this for their idea ranking with Infra.net basically"* sharpened candidate #1 from "missing ELO tournament" to "we have Infer.NET substrate; we just need to compose existing-Microsoft-Research-TrueSkill-pattern with B-0867 workflow engine." Operator-explicit substrate-engineering refinement.

Aaron 2026-05-28 explicit *"we should add coscientis and add it to our upstram references and refersh update them so we can take a peak lol also lets backlog all the candidates"* — operationalized as:

- This PR: adds SakanaAI/AI-Scientist + AI-Scientist-v2 + jataware/open-coscientist + llnl/open-ai-co-scientist + The-Swarm-Corporation/AI-CoScientist + Microsoft/Infer.NET to `references/reference-sources.json` + `docs/UPSTREAM-LIST.md`
- This row: backlogs all 7 candidates as decomposition target
- Operator may run `tools/setup/common/sync-upstreams.sh` to mirror the new repos into `references/upstreams/` per refresh discipline (operator-side; Otto-CLI does not auto-run sync per safety discipline)

Substrate-engineering arc: framework's 10× scope is positioned to operationalize what co-scientist + Robin demonstrated at biomedical scope; 7 candidate gaps are the substrate-engineering integration targets per Aaron's *"missing a few step"* framing.
Loading
Loading