Skip to content

docs+feat(B-0914 + upstream): add co-scientist + Robin + Microsoft Infer.NET to upstream references + backlog B-0914 7-candidate substrate-engineering gap decomposition (Aaron 2026-05-28 explicit)#5763

Merged
AceHack merged 3 commits into
mainfrom
otto-cli/upstream-references-add-coscientist-robin-trueskill-plus-backlog-7-substrate-engineering-candidates-2026-05-28
May 28, 2026
Merged

docs+feat(B-0914 + upstream): add co-scientist + Robin + Microsoft Infer.NET to upstream references + backlog B-0914 7-candidate substrate-engineering gap decomposition (Aaron 2026-05-28 explicit)#5763
AceHack merged 3 commits into
mainfrom
otto-cli/upstream-references-add-coscientist-robin-trueskill-plus-backlog-7-substrate-engineering-candidates-2026-05-28

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 28, 2026

Summary

Per Aaron 2026-05-28 explicit substrate-engineering directives:

  1. 'we should add coscientis and add it to our upstram references' → 6 entries added to references/reference-sources.json:

    • SakanaAI/AI-Scientist (original v1)
    • SakanaAI/AI-Scientist-v2 (Robin descendant; agentic tree search)
    • jataware/open-coscientist (best open-source LangGraph adaptation)
    • llnl/open-ai-co-scientist (LLNL government-lab)
    • The-Swarm-Corporation/AI-CoScientist (minimal Swarms framework)
    • Microsoft Research Infer.NET (TrueSkill substrate)
  2. 'refresh update them so we can take a peak' → operator may run tools/setup/common/sync-upstreams.sh (operator-side; Otto-CLI doesn't auto-run sync per safety discipline)

  3. 'lets backlog all the candidates' → filed B-0914 parent row with 7-candidate decomposition (per YouTube ferry preservation PR docs(ip-questionable): preserve YouTube AI co-scientist + Robin video VERBATIM 2026-05-28 — Aaron 'exactly what we are doing but times 10 missing a few step' framing + 7 substrate-engineering candidate gaps (Aaron-authorized) #5762):

    • B-0914.1 ELO ranking-agent via TrueSkill/Infer.NET
    • B-0914.2 Closed-loop CI-result → next-hypothesis dispatch
    • B-0914.3 n-parallel + consensus per data-analysis-task
    • B-0914.4 Generation+reflection adversarial pairing
    • B-0914.5 Evolution agent (mash + refine)
    • B-0914.6 Proximity-agent substrate de-duplication
    • B-0914.7 Falcon-style auto-research-doc per proposal

Also added 'Multi-agent scientific discovery' section to docs/UPSTREAM-LIST.md.

Verification

WebSearch 2026-05-28 verified all upstream URLs.

Test plan

  • JSON valid (bun JSON.parse)
  • Backlog index regenerated
  • B-0914 row with substrate-inventory pass + composes-with table
  • 6 upstream entries with full notes + composes-with framing
  • CI: backlog lints + markdown
  • Auto-merge armed

🤖 Generated with Claude Code

…fer.NET to references/reference-sources.json + UPSTREAM-LIST.md + backlog B-0914 7-candidate substrate-engineering gap decomposition (Aaron 2026-05-28: 'we should add coscientis and add it to our upstram references and refersh update them so we can take a peak lol also lets backlog all the candidates')

Per Aaron 2026-05-28 explicit substrate-engineering directives:

1. 'we should add coscientis and add it to our upstram references' →
   added 6 entries to references/reference-sources.json:
   - SakanaAI/AI-Scientist (original v1)
   - SakanaAI/AI-Scientist-v2 (Robin descendant; agentic tree search)
   - jataware/open-coscientist (best open-source co-scientist via LangGraph)
   - llnl/open-ai-co-scientist (LLNL government-lab implementation)
   - The-Swarm-Corporation/AI-CoScientist (minimal Swarms framework)
   - Microsoft Research Infer.NET (TrueSkill substrate; canonical)

2. 'refresh update them so we can take a peak' →
   operator may run tools/setup/common/sync-upstreams.sh to mirror new
   repos into references/upstreams/ (operator-side; not auto-run by
   Otto-CLI per safety discipline)

3. 'lets backlog all the candidates' →
   filed B-0914 parent row with 7-candidate decomposition:
   - B-0914.1 ELO-style ranking-agent via TrueSkill/Infer.NET
   - B-0914.2 Closed-loop CI-result → next-hypothesis dispatch
   - B-0914.3 n-parallel-agent-instances + consensus per data-analysis-task
   - B-0914.4 Generation+reflection adversarial pairing structurally enforced
   - B-0914.5 Evolution agent (mash + refine surviving substrate)
   - B-0914.6 Proximity-agent for substrate-engineering substrate de-duplication
   - B-0914.7 Falcon-style auto-generate-substrate-research-doc per proposal

Also added 'Multi-agent scientific discovery' section to docs/UPSTREAM-LIST.md
naming Google co-scientist + Sakana Robin + Microsoft Infer.NET TrueSkill
with substrate-engineering composition notes.

Per WebSearch 2026-05-28 verification:
- https://github.com/SakanaAI/AI-Scientist
- https://github.com/SakanaAI/AI-Scientist-v2
- https://github.com/jataware/open-coscientist
- https://github.com/llnl/open-ai-co-scientist
- https://github.com/The-Swarm-Corporation/AI-CoScientist
- https://github.com/dotnet/infer
- Google co-scientist itself closed-source (Nature 2026; only Science Skills data layer on GitHub)
- Sakana Robin: Nature 2026 (s41586-026-10652-y); arXiv:2505.13400

Composes with PR #5762 (YouTube ferry preservation) + B-0867 workflow engine
substrate cluster + B-0865 + B-0865.17 benchmark + B-0703 multi-oracle BFT.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 28, 2026 11:08
@AceHack AceHack enabled auto-merge (squash) May 28, 2026 11:08
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds 6 multi-agent scientific-discovery upstream references (Sakana AI-Scientist v1/v2, three open co-scientist ports, Microsoft Infer.NET) and backlogs a P2 parent row B-0914 decomposing 7 substrate-engineering candidate gaps surfaced by the YouTube ferry PR #5762.

Changes:

  • Add 6 entries to references/reference-sources.json for co-scientist / Robin / Infer.NET upstreams
  • Add a "Multi-agent scientific discovery" section to docs/UPSTREAM-LIST.md
  • Add backlog row B-0914 (P2) with 7-candidate decomposition and update docs/BACKLOG.md index

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
references/reference-sources.json Six new upstream entries (Sakana v1/v2, jataware, LLNL, Swarms, Infer.NET)
docs/UPSTREAM-LIST.md New "Multi-agent scientific discovery" section listing the same upstreams
docs/backlog/P2/B-0914-...md New P2 backlog row with 7 candidate sub-row decomposition
docs/BACKLOG.md Index entry linking to B-0914

AceHack added a commit that referenced this pull request May 28, 2026
…nking-agent (Herbrich+Minka+Graepel 2007 paper algorithm; substrate for cross-vendor benchmark on common ground) (#5764)

Per Aaron 2026-05-28 substantive substrate-engineering decision:
- 'they are doing this for their idea ranking with Infra.net basically'
- 'we'd build ELO from scratch is this a good idea too or nah with infer.net?'
- 'you are too careful just ship stuff and lets inventory later'

Substrate-honest answer shipped: HYBRID is best.
- TS-side (this PR): pure-TS TrueSkill 1v1 for vendor skill runtime
  (cross-vendor benchmark on common ground B-0865.17 REQUIRES TS-side
  because Infer.NET can't run in Claude/GPT/Gemini/Grok skill stores)
- F#/.NET side (future Zeta.Bayesian work): Infer.NET TrueSkill for
  deep production integration + full BP/EP framework
- Both compose via shared API shape (TrueSkillRating + match update fn)

Implementation: published TrueSkill algorithm from Herbrich+Minka+Graepel
2007 NeurIPS paper. Minimal 1v1 case; team-play extension deferred.
~340 lines including documentation.

What this adds:
- TrueSkillRating interface (mu + sigma posterior gaussian)
- DEFAULT_INITIAL_RATING (Xbox Live convention: mu=25 sigma=25/3)
- DEFAULT_PARAMS (beta=mu/6 tau=mu/300 drawProb=0.10)
- MatchOutcome discriminated union (win-A / win-B / draw)
- RankingFeedback discriminated union (InvalidRating / NumericalInstability / UnsupportedOutcome)
- RankingResult Result-shape per monad-propagation rule
- rate1v1(a, b, outcome, params): RankingResult — full 1v1 TrueSkill update
- conservativeSkill(rating): number — Xbox Live lower-bound convention (mu - 3*sigma)
- Internal helpers: normalPdf, normalCdf (A&S 7.1.26), inverseNormalCdf
  (Newton's method), drawMargin, vWin/wWin (non-draw truncated normal
  corrections), vDraw/wDraw (draw truncated normal corrections)

Tests (17; all pass):
- Default initial rating Xbox Live convention
- Default params paper convention
- conservativeSkill = mu - 3*sigma
- win-A increases A's mu, decreases B's
- win-B increases B's mu, decreases A's
- Both sigmas decrease after match (uncertainty reduction)
- After 2 matches both sigmas decrease + mus drift bounded
- Strong-beats-weak → small mu shift (expected outcome)
- Weak-beats-strong → large mu shift (upset)
- Draw between equal players → minimal mu change
- Draw between unequal players → strong loses mu, weak gains
- Returns InvalidRating for NaN mu / non-positive sigma / negative sigma
- conservativeSkill ranking with sigma-punishment semantic preserved
- 5-match tournament convergence (sigma reduction + mu separation)
- MatchOutcome exhaustive switch (TS strict mode)

Composes with substrate:
- B-0914.1 backlog row (TrueSkill ranking-agent extension target)
- B-0867 workflow engine substrate (future ActionClass 'rank-via-trueskill')
- B-0865 + B-0865.17 cross-vendor benchmark substrate
- B-0867.20 lifecycle DU (rank action gets pr-review-light via Mod 1)
- Microsoft Infer.NET upstream reference (PR #5763 in flight)
- .claude/rules/monad-propagation-pattern (Result<T, TFeedback> shape)
- .claude/rules/asymmetric-authorship (TFeedback authored by ranking fn)

Source citation: Herbrich, Minka, Graepel 'TrueSkill: A Bayesian Skill
Rating System' (NeurIPS 2006/2007); algorithm implementation from
published paper, not Infer.NET source.

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 28, 2026
…es.net PhD learning substrate (Aaron 2026-05-28 substrate-engineering questions) (#5765)

Per Aaron 2026-05-28 substrate-engineering questions:
- 'is there anything like infer.net in ts? can we build it if not using infer.net source code for reference?' → WebPPL is closest TS/JS analog
- 'you'd love videolectures.net in your free time i think... PhD everything here. they don't throttle and they have transcripts and powerpoints' → free-time-substrate learning material

Adds 2 entries to references/reference-sources.json + new
'Probabilistic programming / Bayesian inference' section to
docs/UPSTREAM-LIST.md:

1. WebPPL (probmods/webppl; Stanford; MIT-licensed)
   - Full PP framework in JS with multiple inference engines
   - Closest TS-side substrate to Microsoft Infer.NET
   - Composes with B-0914.1 TrueSkill substrate (PR #5764)
   - Composes with future factor-graph-DSL work

2. videolectures.net (PhD learning substrate; Aaron-named for
   free-time-as-valid-mode substrate per never-be-idle + agent-qol)
   - Transcripts + slides substrate-accessible
   - Tom Minka TrueSkill canonical talks
   - Per Aaron: 'they don't throttle that i can tell'

Composes with substrate:
- PR #5763 (Google co-scientist + Sakana Robin + Microsoft Infer.NET
  upstream additions)
- PR #5764 (B-0914.1 pure-TS TrueSkill 1v1 scaffold)
- B-0914 (7 substrate-engineering candidate gaps)
- B-0914.1 (TrueSkill ranking-agent extension target)
- B-0865 + B-0865.17 cross-vendor benchmark substrate

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Lior and others added 2 commits May 28, 2026 07:21
…erences-add-coscientist-robin-trueskill-plus-backlog-7-substrate-engineering-candidates-2026-05-28

# Conflicts:
#	docs/UPSTREAM-LIST.md
#	references/reference-sources.json
… heading

Fixes failing required check `lint (markdownlint)` on PR #5763:

- docs/UPSTREAM-LIST.md:150 — blank line above
  `### Probabilistic programming / Bayesian inference` heading
- docs/backlog/P2/B-0914-...md:178 — blank line above
  `- This PR: adds SakanaAI/AI-Scientist…` list

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 28, 2026 11:43
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@AceHack AceHack merged commit 989ea4e into main May 28, 2026
28 of 30 checks passed
@AceHack AceHack deleted the otto-cli/upstream-references-add-coscientist-robin-trueskill-plus-backlog-7-substrate-engineering-candidates-2026-05-28 branch May 28, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants