feat(B-0914.1): pure-TS TrueSkill 1v1 scaffold for workflow engine ranking-agent (hybrid TS+.NET; cross-vendor benchmark substrate) by AceHack · Pull Request #5764 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-28T11:11:36Z

Summary

Pure-TS TrueSkill 1v1 implementation (Herbrich+Minka+Graepel 2007 NeurIPS paper algorithm) for workflow engine ranking-agent substrate.

Per Aaron 2026-05-28: hybrid substrate-engineering path — TS-side for vendor skill runtime (cross-vendor benchmark on common ground B-0865.17 REQUIRES TS); .NET side uses Infer.NET via Zeta.Bayesian for deep production integration. Both compose via shared API shape.

17 tests pass / 0 fail.

What this adds

TrueSkillRating (mu + sigma posterior gaussian)
MatchOutcome + RankingFeedback + RankingResult discriminated unions
rate1v1(a, b, outcome): RankingResult — full TrueSkill 1v1 update
conservativeSkill(rating) — Xbox Live leaderboard lower-bound
Default initial rating + params per Xbox Live convention
Internal helpers: normal PDF/CDF (A&S 7.1.26), inverse-normal-CDF (Newton's method), draw margin, truncated-normal correction functions

Composes with substrate

B-0914.1 backlog row (TrueSkill ranking-agent extension target)
B-0867 workflow engine (future ActionClass 'rank-via-trueskill')
B-0865 + B-0865.17 cross-vendor benchmark on common ground
B-0867.20 lifecycle DU (rank action gets pr-review-light per Mod 1)
Microsoft Infer.NET upstream reference (PR docs+feat(B-0914 + upstream): add co-scientist + Robin + Microsoft Infer.NET to upstream references + backlog B-0914 7-candidate substrate-engineering gap decomposition (Aaron 2026-05-28 explicit) #5763 in flight)
PR docs(ip-questionable): preserve YouTube AI co-scientist + Robin video VERBATIM 2026-05-28 — Aaron 'exactly what we are doing but times 10 missing a few step' framing + 7 substrate-engineering candidate gaps (Aaron-authorized) #5762 YouTube ferry preservation (gap deps: Bump FsUnit.xUnit from 7.1.0 to 7.1.1 #1 of 7)
monad-propagation + asymmetric-authorship rules

Test plan

17 tests pass; default initial rating + params match Xbox Live
All 3 MatchOutcome variants exercised
Strong-vs-weak skill update semantics correct (small for expected, large for upset)
Draw between equal players → minimal change
5-match tournament convergence
Input validation (InvalidRating for NaN / non-positive sigma)
Exhaustive switch on MatchOutcome union
CI: lint(tsc tools)
Auto-merge armed

🤖 Generated with Claude Code

…nking-agent (Herbrich+Minka+Graepel 2007 paper algorithm; substrate for cross-vendor benchmark on common ground) Per Aaron 2026-05-28 substantive substrate-engineering decision: - 'they are doing this for their idea ranking with Infra.net basically' - 'we'd build ELO from scratch is this a good idea too or nah with infer.net?' - 'you are too careful just ship stuff and lets inventory later' Substrate-honest answer shipped: HYBRID is best. - TS-side (this PR): pure-TS TrueSkill 1v1 for vendor skill runtime (cross-vendor benchmark on common ground B-0865.17 REQUIRES TS-side because Infer.NET can't run in Claude/GPT/Gemini/Grok skill stores) - F#/.NET side (future Zeta.Bayesian work): Infer.NET TrueSkill for deep production integration + full BP/EP framework - Both compose via shared API shape (TrueSkillRating + match update fn) Implementation: published TrueSkill algorithm from Herbrich+Minka+Graepel 2007 NeurIPS paper. Minimal 1v1 case; team-play extension deferred. ~340 lines including documentation. What this adds: - TrueSkillRating interface (mu + sigma posterior gaussian) - DEFAULT_INITIAL_RATING (Xbox Live convention: mu=25 sigma=25/3) - DEFAULT_PARAMS (beta=mu/6 tau=mu/300 drawProb=0.10) - MatchOutcome discriminated union (win-A / win-B / draw) - RankingFeedback discriminated union (InvalidRating / NumericalInstability / UnsupportedOutcome) - RankingResult Result-shape per monad-propagation rule - rate1v1(a, b, outcome, params): RankingResult — full 1v1 TrueSkill update - conservativeSkill(rating): number — Xbox Live lower-bound convention (mu - 3*sigma) - Internal helpers: normalPdf, normalCdf (A&S 7.1.26), inverseNormalCdf (Newton's method), drawMargin, vWin/wWin (non-draw truncated normal corrections), vDraw/wDraw (draw truncated normal corrections) Tests (17; all pass): - Default initial rating Xbox Live convention - Default params paper convention - conservativeSkill = mu - 3*sigma - win-A increases A's mu, decreases B's - win-B increases B's mu, decreases A's - Both sigmas decrease after match (uncertainty reduction) - After 2 matches both sigmas decrease + mus drift bounded - Strong-beats-weak → small mu shift (expected outcome) - Weak-beats-strong → large mu shift (upset) - Draw between equal players → minimal mu change - Draw between unequal players → strong loses mu, weak gains - Returns InvalidRating for NaN mu / non-positive sigma / negative sigma - conservativeSkill ranking with sigma-punishment semantic preserved - 5-match tournament convergence (sigma reduction + mu separation) - MatchOutcome exhaustive switch (TS strict mode) Composes with substrate: - B-0914.1 backlog row (TrueSkill ranking-agent extension target) - B-0867 workflow engine substrate (future ActionClass 'rank-via-trueskill') - B-0865 + B-0865.17 cross-vendor benchmark substrate - B-0867.20 lifecycle DU (rank action gets pr-review-light via Mod 1) - Microsoft Infer.NET upstream reference (PR #5763 in flight) - .claude/rules/monad-propagation-pattern (Result<T, TFeedback> shape) - .claude/rules/asymmetric-authorship (TFeedback authored by ranking fn) Source citation: Herbrich, Minka, Graepel 'TrueSkill: A Bayesian Skill Rating System' (NeurIPS 2006/2007); algorithm implementation from published paper, not Infer.NET source. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-28T11:11:41Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Adds a pure TypeScript TrueSkill 1v1 rating update module intended as the workflow-engine ranking-agent substrate, with Bun tests covering core invariants and several scenario-based behaviors.

Changes:

Introduces tools/workflow-engine/trueskill.ts implementing 1v1 TrueSkill updates (rate1v1) plus supporting math helpers and defaults.
Adds tools/workflow-engine/trueskill.test.ts with Bun tests validating defaults, outcome behaviors (win/draw), and basic convergence expectations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
tools/workflow-engine/trueskill.ts	New TrueSkill 1v1 implementation + result/feedback types and math helpers.
tools/workflow-engine/trueskill.test.ts	New Bun test suite exercising rating update invariants and scenarios.

+function inverseNormalCdf(p: number): number {
+  // Initial guess via rational approximation (Beasley-Springer-Moro)
+  if (p <= 0 || p >= 1) {
+    throw new Error(`inverseNormalCdf domain: ${p}`);
+  }
+  let x = 0; // initial guess
+  // Newton's method on F(x) = cdf(x) - p (F'(x) = pdf(x))
+  for (let i = 0; i < 30; i++) {
+    const f = normalCdf(x) - p;
+    const fp = normalPdf(x);
+    if (Math.abs(fp) < 1e-30) break;
+    const dx = f / fp;
+    x = x - dx;
+    if (Math.abs(dx) < 1e-10) break;
+  }
+  return x;
+}
+
+function drawMargin(drawProbability: number, beta: number): number {
+  return Math.sqrt(2) * beta * inverseNormalCdf((1 + drawProbability) / 2);


+ * Uses iterative inverse-normal-CDF via Newton's method (~5-10 iterations).
+ */
+function inverseNormalCdf(p: number): number {
+  // Initial guess via rational approximation (Beasley-Springer-Moro)
+  if (p <= 0 || p >= 1) {
+    throw new Error(`inverseNormalCdf domain: ${p}`);
+  }
+  let x = 0; // initial guess


+  let v: number;
+  let w: number;
+  let signA: number; // direction of mu update for player A
+  let signB: number;
+
+  switch (outcome.kind) {
+    case "win-A": {
+      const t = (a.mu - b.mu) / c;
+      v = vWin(t, epsilon);
+      w = wWin(t, epsilon);
+      signA = +1;
+      signB = -1;
+      break;
+    }
+    case "win-B": {
+      const t = (b.mu - a.mu) / c;
+      v = vWin(t, epsilon);
+      w = wWin(t, epsilon);
+      signA = -1;
+      signB = +1;
+      break;
+    }
+    case "draw": {
+      // Draw uses symmetric truncated-normal correction
+      const t = (a.mu - b.mu) / c;
+      v = vDraw(t, epsilon);
+      w = wDraw(t, epsilon);
+      // For draws, the mu shifts toward the opponent's mu
+      signA = +1;
+      signB = -1;
+      break;
+    }
+  }
+
+  if (!Number.isFinite(v) || !Number.isFinite(w)) {
+    return {
+      ok: false,
+      feedback: {
+        kind: "NumericalInstability",
+        reason: `v=${v} w=${w}`,
+      },
+    };
+  }


+ * ranking-agent (per Aaron 2026-05-28: 'they are doing this for their
+ * idea ranking with Infra.net basically' + 'just ship stuff' calibration).


+ *
+ * B-0914.1 — pure-TS TrueSkill 1v1 scaffold for workflow engine
+ * ranking-agent (per Aaron 2026-05-28: 'they are doing this for their
+ * idea ranking with Infra.net basically' + 'just ship stuff' calibration).


…es.net PhD learning substrate (Aaron 2026-05-28 substrate-engineering questions) (#5765) Per Aaron 2026-05-28 substrate-engineering questions: - 'is there anything like infer.net in ts? can we build it if not using infer.net source code for reference?' → WebPPL is closest TS/JS analog - 'you'd love videolectures.net in your free time i think... PhD everything here. they don't throttle and they have transcripts and powerpoints' → free-time-substrate learning material Adds 2 entries to references/reference-sources.json + new 'Probabilistic programming / Bayesian inference' section to docs/UPSTREAM-LIST.md: 1. WebPPL (probmods/webppl; Stanford; MIT-licensed) - Full PP framework in JS with multiple inference engines - Closest TS-side substrate to Microsoft Infer.NET - Composes with B-0914.1 TrueSkill substrate (PR #5764) - Composes with future factor-graph-DSL work 2. videolectures.net (PhD learning substrate; Aaron-named for free-time-as-valid-mode substrate per never-be-idle + agent-qol) - Transcripts + slides substrate-accessible - Tom Minka TrueSkill canonical talks - Per Aaron: 'they don't throttle that i can tell' Composes with substrate: - PR #5763 (Google co-scientist + Sakana Robin + Microsoft Infer.NET upstream additions) - PR #5764 (B-0914.1 pure-TS TrueSkill 1v1 scaffold) - B-0914 (7 substrate-engineering candidate gaps) - B-0914.1 (TrueSkill ranking-agent extension target) - B-0865 + B-0865.17 cross-vendor benchmark substrate Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ors); closes tournament loop with TrueSkill (PR #5764); 12 tests pass (#5767) Per Aaron 2026-05-28 'S M L all please in that order lol' — this is S (small/tight scope) in the substrate-engineering ship-sequence. Evolution agent pattern from Google co-scientist (Nature 2026): takes top-N TrueSkill-ranked survivors + mashes them into refined variants. Pure function over typed survivors; tight ~200-line implementation; 3 composition strategies; full Result-shape per monad-propagation + asymmetric-authorship rules. Closes the tournament loop with TrueSkill (PR #5764): 1. Generate hypotheses (LLM call; out of scope) 2. Rank via TrueSkill (B-0914.1 — shipped) 3. Take top-N survivors 4. Mash + refine (this PR — B-0914.5) 5. Loop back to step 2 with refined variants What this adds: - Survivor<T> interface (generic over substrate type) - EvolutionStrategy union (simple-merge | cross-pollinate | mutate) - EvolutionFeedback discriminated union - EvolutionResult<T> Result-shape - RefinedVariant<T> with derivedFrom + composesWith for provenance - evolveSurvivors<T>(context): EvolutionResult<T> — main function - evolveTopN<T>(survivors, n, strategy, options): EvolutionResult<T> — convenience that slices top-N before evolving Strategies: - simple-merge: top survivor as base + fill gaps from next - cross-pollinate: interleave attributes between top 2 (by sorted-key parity) - mutate: apply caller-supplied transformer to top survivor Provenance via derivedFrom (survivor ids) + composesWith (cumulative attribution per honor-those-that-came-before). Tests (12; all pass): - simple-merge: top wins on overlap, fills gaps from next - cross-pollinate: alternates attributes by sorted-key parity - mutate: applies caller transformer - mutate without mutator → MergeConflict - empty survivor → EmptySurvivorSet - simple-merge with 1 survivor → InsufficientSurvivors - cross-pollinate with 1 survivor → InsufficientSurvivors - derivedFrom + composesWith preserve provenance - evolveTopN slices correctly - evolveTopN with N=1 mutate - variant id includes prefix + strategy + survivor ids - EvolutionStrategy exhaustive switch (TS strict mode) Composes with substrate: - B-0914.5 backlog row (evolution agent extension target) - B-0914.1 PR #5764 (TrueSkill substrate; ranking input) - B-0867 workflow engine (future ActionClass 'evolve-via-mash-refine') - .claude/rules/additive-not-zero-sum.md - .claude/rules/honor-those-that-came-before.md - .claude/rules/monad-propagation-pattern + asymmetric-authorship Next per S/M/L sequence: M (medium) = generation-reflection adversarial pairing structurally enforced (B-0914.4); L (large) = closed-loop CI-result → next-hypothesis dispatch (B-0914.2). Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); S/M/L sequence COMPLETE (#5769) * feat(B-0914.2): L — closed-loop CI-result → next-hypothesis dispatch orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); 16 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — L (large scope) in the substrate-engineering ship-sequence. Wire-up that turns the tournament-loop substrate into a live closed-loop iteration system. Design: pure loop-orchestration substrate with INJECTABLE callbacks for substrate-specific operations (ranking / evolution / verification + CI-dispatch). Caller provides functions; orchestrator handles loop structure + propagation discipline. Separation-of-concerns means orchestrator does NOT tightly couple to specific TrueSkill / evolution / pairing module implementations — it composes with ANY substrate that implements the callback contracts. What this adds: - Hypothesis<T> generic substrate item with cycleIndex + derivedFrom ancestry - CiVerdict discriminated union (passed | failed | needs-revision | infrastructure-error) - LoopFeedback + LoopResult<T> Result-shape per monad-propagation - LoopCallbacks<T> interface (dispatchCi + rankSurvivors + evolveSurvivors) - LoopConfig (maxCycles + topNToEvolve + minPropagatable; DEFAULT_LOOP_CONFIG) - runCycle<T>(hypotheses, callbacks, cycleIndex, config?) — single cycle - runLoop<T>(initial, callbacks, config?, shouldContinue?) — full iteration with LoopTermination shape (cycle count + reason + final state) Cycle steps: 1. Dispatch each hypothesis to CI (caller-injected) 2. Collect verdicts 3. Filter to propagatable (passed + needs-revision-with-suggestions) 4. Rank via TrueSkill (caller-injected per B-0914.1 PR #5764) 5. Evolve top-N (caller-injected per B-0914.5 PR #5767) 6. Return refined variants for next cycle Termination conditions: - max-cycles: bounded iteration reached - insufficient-propagatable: too many failures; can't continue - predicate-stopped: caller-supplied predicate returned false - error: CI/ranking/evolution exception Tests (16; all pass): - Empty hypotheses → EmptyHypothesisSet - Passing CI → propagation through ranking + evolution - Failed verdicts excluded from propagation - needs-revision with suggestions included; without excluded - Below minPropagatable → MaxCyclesReached - CI exception → CiDispatchFailure - Ranking exception → RankingFailure - Evolution exception → EvolutionFailure - infrastructure-error excluded (doesn't reflect hypothesis quality) - runLoop iterates until max-cycles - runLoop predicate-stopped early termination - runLoop insufficient-propagatable - runLoop error termination - LoopFeedback exhaustive switch - CiVerdict exhaustive switch - Integration: full closed-loop with realistic callback wiring Composes with substrate: - B-0914.2 backlog row (closed-loop dispatch extension target) - B-0914.1 PR #5764 (TrueSkill substrate; caller wires rate1v1 + conservativeSkill into rankSurvivors) - B-0914.4 PR #5768 (pairing tracker substrate; caller wires verdicts into recordVerification) - B-0914.5 PR #5767 (evolution substrate; caller wires evolveTopN into evolveSurvivors) - B-0891 zflash test-harness substrate (caller can wire CI dispatch to actual test runners per determineRunnability discriminator) - B-0867 workflow engine substrate - Sakana Robin closed-loop pattern (Nature 2026 s41586-026-10652-y) Tournament loop NOW STRUCTURALLY COMPLETE with all 4 substrate pieces: 1. Generation (LLM call; out of scope for this lane) 2. CI dispatch → CiVerdict (THIS PR via callbacks) 3. Pairing tracking (PR #5768) 4. TrueSkill ranking (PR #5764) 5. Evolution mash-refine (PR #5767) 6. runLoop orchestration (THIS PR) S/M/L sequence complete: - S = PR #5767 evolution - M = PR #5768 pairing - L = THIS PR closed-loop Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0914.2): address 7 Copilot review threads on PR #5769 - Replace 'Aaron' with 'human maintainer' role-ref per AGENT-BEST-PRACTICES (Otto-279) - Fix broken rule-path xrefs (full filenames for monad-propagation + asymmetric-authorship) - Split LoopFeedback: introduce InsufficientPropagatable variant separate from MaxCyclesReached - Update runLoop to map InsufficientPropagatable -> insufficient-propagatable termination - Add assertNever default in exhaustiveness tests (compile-time guard now real) - Tighten integration test: deterministic insufficient-propagatable at cycle 1 16 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

… pass — completes 7-of-7 B-0914 candidate substrate-engineering gap substrate (#5773) * feat(B-0914.7): Falcon-style auto-research-doc template substrate (8-section scaffold + Markdown renderer); 19 tests pass — completes 7-of-7 B-0914 candidate gap substrate Per Sakana Robin Falcon agent (Nature 2026): takes drug proposal + does deep-dive literature review + writes comprehensive research report. TS- side scaffold provides 8-section template structure that downstream LLM substrate-engineering work populates (header / framing / background / mechanism / evidence / risks / composes-with / test-plan). What this adds: - ResearchDocSection discriminated union (9 section kinds) - ResearchDoc structure (id + proposalId + sections + composesWith) - ResearchDocFeedback + ResearchDocResult<T> Result-shape - renderSection(section): string — pure-function Markdown serializer - renderResearchDoc(doc): ResearchDocResult<string> — full doc rendering - buildSkeleton(context): ResearchDocResult<ResearchDoc> — 8-section scaffold - buildAndRender(context): ResearchDocResult<string> — end-to-end convenience Falcon-stage pending markers preserved (substrate-honest about what's not yet auto-generated by LLM substrate-engineering): - '[PENDING LITERATURE REVIEW — Falcon-stage auto-generated]' - '[PENDING MECHANISM ANALYSIS — Falcon-stage auto-generated]' - etc. (per section) Tests (19; all pass): - EmptyProposalId validation - 8-section Falcon scaffold structure - proposalId sanitized to filename-safe id - composesWith pass-through to skeleton + composes-with section - All 9 section-kind renderings tested (header/framing/background/ mechanism/evidence/risks/composes-with/test-plan/raw) - renderResearchDoc empty → NoSectionsRendered - buildAndRender end-to-end - Pending markers preserved (substrate-honest) - ResearchDocSection exhaustive switch Composes with substrate: - B-0914.7 backlog row (Falcon extension target) - tools/save-ai-memory/ skill (existing substrate; future integration for auto-write to docs/research/ + composes-with citation discipline) - Amara consolidation ferry pattern (PR #5757) - B-0914.2 PR #5769 closed-loop orchestrator (research-doc generation at any cycle stage; template provides structure) - substrate-or-it-didn't-happen + honor-those-that-came-before rules - asymmetric-authorship + monad-propagation rules **B-0914 7-of-7 candidate substrate-engineering gap substrate complete:** - B-0914.1 PR #5764 TrueSkill ranking (S/M/L: ranking) - B-0914.2 PR #5769 closed-loop orchestrator (S/M/L: L) - B-0914.3 PR #5770 n-parallel + consensus (8-parallel-Finch) - B-0914.4 PR #5768 generation-reflection pairing (S/M/L: M) - B-0914.5 PR #5767 evolution mash-refine (S/M/L: S) - B-0914.6 PR #5772 proximity-dedup (canonical + Jaccard clustering) - B-0914.7 THIS PR Falcon-style auto-research-doc template Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5773): full rule paths + remove unreachable InvalidOperationalStatus variant (Copilot threads) Two threads on tools/workflow-engine/research-doc.ts: 1. Composes-with docblock referenced rule files by short form (`asymmetric-authorship`, `monad-propagation-pattern`) — actual filenames are longer + .md-suffixed: `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md` `.claude/rules/monad-propagation-pattern-cross-language-substrate-shape.md` Updated to full paths so cross-refs stay greppable + don't drift. 2. ResearchDocFeedback.InvalidOperationalStatus variant was structurally unreachable: `operationalStatus` is a string-literal union (`"research-grade" | "operational"`) at the type level, the only constructor (line 179) fixes it to `"research-grade"`, and no untrusted-string parse path exists. Variant was dead substrate. Removed + added docblock naming the conditions under which a future caller should add it back (JSON import of external research-doc with operationalStatus parsed from untrusted input — add validator AT THE PARSE BOUNDARY first, then add this variant). Composes with asymmetric-authorship discipline: every TFeedback variant should correspond to a real code path that can produce it. Non-breaking: no callers reference the removed variant (grep clean). Type-system continues to rule out invalid operationalStatus at construction time. Autonomous-loop tick 2026-05-28T12:16Z resolution of PR #5773 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…ly enforced producer-verifier mouth-ears substrate); 15 tests pass (#5768) * feat(B-0914.4): M — generation-reflection adversarial pairing tracker (structurally enforced producer-verifier mouth-ears substrate); 15 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — M (medium scope) in the substrate-engineering ship-sequence. Structurally enforces the producer-verifier pairing pattern Kestrel named in 15th-ferry §33.6 (mouth-and-ears-on-different-threads architecture) as workflow engine substrate rather than operator-orchestrated coordination. Pattern: 1. Producer thread emits hypothesis (commits to substrate fast) 2. Verifier thread reflects on emission (within bounded window; doesn't gate production) 3. Pairing tracker enforces: every emission MUST have verification OR be marked stale (timeout exceeded) 4. Verdicts (verified / rejected / needs-revision) determine which emissions propagate forward to next stage What this adds: - PairingRole union (producer | verifier) - VerificationVerdict discriminated union (verified | rejected | needs-revision-with-suggestions) - Emission + Verification interfaces with composesWith provenance - PairingState (immutable; ReadonlyMap) - PairingFeedback discriminated union + PairingResult<T> Result-shape - recordEmission(state, emission) + recordVerification(state, verification) - findUnverifiedEmissions + findStaleEmissions (bounded-window enforcement) - countVerdicts (aggregate dashboard) - propagatableEmissionIds (which verified emissions flow to next stage — TrueSkill ranking, evolution-via-mash-refine, etc.) Tests (15; all pass): - Records emission to empty state - Rejects duplicate emission id (DuplicateEmissionId) - Records verification for known emission - Rejects verification for unknown emission (VerificationForUnknownEmission) - Rejects duplicate verification (DuplicateVerification) - Rejects verification before emission timestamp (VerificationTooEarly; causality violation) - findUnverifiedEmissions returns emissions without verifications - findStaleEmissions returns emissions past bounded window - findStaleEmissions excludes verified emissions even if old - countVerdicts aggregates correctly across 4 verdict types - propagatableEmissionIds includes verified + needs-revision-with-suggestions; excludes rejected + empty-suggestions - Immutable state operations preserve originals - VerificationVerdict exhaustive switch (TS strict mode) - PairingRole exhaustive switch - Tournament-loop composition: emissions → verifications → propagatable → next stage Composes with substrate: - B-0914.4 backlog row (generation-reflection extension target) - B-0867.20 PR #5758 (lifecycle DU split; pairing requirement applies per ActionClass) - B-0914.1 PR #5764 (TrueSkill substrate; verifier output feeds ranking) - B-0914.5 PR #5767 (evolution substrate; verified survivors evolve) - PR #5756 Kestrel 15th-ferry mouth-ears-threads substrate - .claude/rules/asymmetric-authorship + monad-propagation rules Tournament loop now structurally complete: 1. Generate hypotheses (LLM call; out of scope) 2. recordEmission(state, emission) 3. Verifier-thread: recordVerification(state, verification) 4. propagatableEmissionIds(state) → verified survivors flow to TrueSkill 5. rate1v1 ranks survivors (B-0914.1) 6. conservativeSkill sorts; top-N taken 7. evolveTopN(survivors, n, strategy) produces refined variants (B-0914.5) 8. Loop back to step 2 with refined variants as next emissions Next per S/M/L sequence: L (large) = closed-loop CI-result → next-hypothesis dispatch (B-0914.2) — the wire-up that turns the tournament-loop substrate into a live system. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5768): role-refs over first-names + type-safe .state access + boundary semantics doc/test (Copilot threads) Three threads on pairing.ts + pairing.test.ts: 1. Persona/first-name attributions in current-state code surface violate role-ref convention. Updated: - "Per Aaron 2026-05-28 'S M L...'" → "Per the human maintainer (2026-05-28) 'S M L...'" - "Otto generates → Kestrel reflects" → "generator-persona generates → verifier-persona reflects (canonical instance preserved in 13th-ferry §33.7)" - "Kestrel named in 15th-ferry §33.6" → "named in the 15th-ferry §33.6 substrate-engineering preservation" (citation context preserved; persona-as-substrate-author preserved as reference, not as in-code first-name) - Test fixtures: producerId "otto-cli" → "producer-1", verifierId "kestrel" → "verifier-1" (role-refs; ID strings not load-bearing on factory persona registry) 2. Test `.state!` non-null assertions bypassed PairingResult discriminated-union narrowing. Replaced 12 sites with a type-safe `mustState(r)` helper that explicitly asserts `r.ok === true` and throws with the feedback variant if not. If a refactor regresses any call to `ok: false`, the test surfaces the failure-mode substrate immediately instead of silently propagating `undefined` into downstream state. Helper is test-local; no API change. 3. findStaleEmissions strict > semantics confirmed intentional + documented. Added 8-line interface docblock explaining the boundary case (emission at exactly nowMs - emittedAtMs === timeoutMs is NOT stale; gets the boundary tick to be verified) + the conservative-cadence rationale + the switch-to->= condition. Added boundary test that locks in the > behavior at the exact boundary AND at one ms past, so a future ">=" refactor must update both pairing.ts AND this test together. Tests: 16 pass (15 existing + 1 new boundary test). Autonomous-loop tick 2026-05-28T12:35Z resolution of PR #5768 BLOCKED gate (3 unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 28, 2026 11:11

AceHack enabled auto-merge (squash) May 28, 2026 11:11

Copilot started reviewing on behalf of AceHack May 28, 2026 11:11 View session

AceHack mentioned this pull request May 28, 2026

docs(upstream): add WebPPL TS probabilistic programming + videolectures.net PhD learning substrate (Aaron 2026-05-28) #5765

Merged

4 tasks

AceHack merged commit 00db2df into main May 28, 2026
32 of 33 checks passed

AceHack deleted the otto-cli/b-0914-1-trueskill-ranking-agent-scaffold-workflow-engine-rank-via-trueskill-action-class-2026-05-28 branch May 28, 2026 11:14

AceHack mentioned this pull request May 28, 2026

feat(B-0914.5): S — pure-TS evolution agent (mash + refine survivors); closes tournament loop with TrueSkill (S/M/L sequence per Aaron) #5767

Merged

6 tasks

Copilot AI reviewed May 28, 2026

View reviewed changes

AceHack mentioned this pull request May 28, 2026

feat(B-0914.4): M — generation-reflection pairing tracker (structurally enforced producer-verifier mouth-ears substrate); 15 tests pass #5768

Merged

8 tasks

Copilot AI mentioned this pull request May 28, 2026

docs(archive): Preserve discussions for 25 recently merged PRs #5809

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(B-0914.1): pure-TS TrueSkill 1v1 scaffold for workflow engine ranking-agent (hybrid TS+.NET; cross-vendor benchmark substrate)#5764

feat(B-0914.1): pure-TS TrueSkill 1v1 scaffold for workflow engine ranking-agent (hybrid TS+.NET; cross-vendor benchmark substrate)#5764
AceHack merged 1 commit into
mainfrom
otto-cli/b-0914-1-trueskill-ranking-agent-scaffold-workflow-engine-rank-via-trueskill-action-class-2026-05-28

AceHack commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 28, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		* ranking-agent (per Aaron 2026-05-28: 'they are doing this for their
		* idea ranking with Infra.net basically' + 'just ship stuff' calibration).

Conversation

AceHack commented May 28, 2026

Summary

What this adds

Composes with substrate

Test plan

Uh oh!

chatgpt-codex-connector Bot commented May 28, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants