Conversation
…ors); closes tournament loop with TrueSkill (PR #5764); 12 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — this is S (small/tight scope) in the substrate-engineering ship-sequence. Evolution agent pattern from Google co-scientist (Nature 2026): takes top-N TrueSkill-ranked survivors + mashes them into refined variants. Pure function over typed survivors; tight ~200-line implementation; 3 composition strategies; full Result-shape per monad-propagation + asymmetric-authorship rules. Closes the tournament loop with TrueSkill (PR #5764): 1. Generate hypotheses (LLM call; out of scope) 2. Rank via TrueSkill (B-0914.1 — shipped) 3. Take top-N survivors 4. Mash + refine (this PR — B-0914.5) 5. Loop back to step 2 with refined variants What this adds: - Survivor<T> interface (generic over substrate type) - EvolutionStrategy union (simple-merge | cross-pollinate | mutate) - EvolutionFeedback discriminated union - EvolutionResult<T> Result-shape - RefinedVariant<T> with derivedFrom + composesWith for provenance - evolveSurvivors<T>(context): EvolutionResult<T> — main function - evolveTopN<T>(survivors, n, strategy, options): EvolutionResult<T> — convenience that slices top-N before evolving Strategies: - simple-merge: top survivor as base + fill gaps from next - cross-pollinate: interleave attributes between top 2 (by sorted-key parity) - mutate: apply caller-supplied transformer to top survivor Provenance via derivedFrom (survivor ids) + composesWith (cumulative attribution per honor-those-that-came-before). Tests (12; all pass): - simple-merge: top wins on overlap, fills gaps from next - cross-pollinate: alternates attributes by sorted-key parity - mutate: applies caller transformer - mutate without mutator → MergeConflict - empty survivor → EmptySurvivorSet - simple-merge with 1 survivor → InsufficientSurvivors - cross-pollinate with 1 survivor → InsufficientSurvivors - derivedFrom + composesWith preserve provenance - evolveTopN slices correctly - evolveTopN with N=1 mutate - variant id includes prefix + strategy + survivor ids - EvolutionStrategy exhaustive switch (TS strict mode) Composes with substrate: - B-0914.5 backlog row (evolution agent extension target) - B-0914.1 PR #5764 (TrueSkill substrate; ranking input) - B-0867 workflow engine (future ActionClass 'evolve-via-mash-refine') - .claude/rules/additive-not-zero-sum.md - .claude/rules/honor-those-that-came-before.md - .claude/rules/monad-propagation-pattern + asymmetric-authorship Next per S/M/L sequence: M (medium) = generation-reflection adversarial pairing structurally enforced (B-0914.4); L (large) = closed-loop CI-result → next-hypothesis dispatch (B-0914.2). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
8 tasks
This was referenced May 28, 2026
AceHack
added a commit
that referenced
this pull request
May 28, 2026
…orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); S/M/L sequence COMPLETE (#5769) * feat(B-0914.2): L — closed-loop CI-result → next-hypothesis dispatch orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); 16 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — L (large scope) in the substrate-engineering ship-sequence. Wire-up that turns the tournament-loop substrate into a live closed-loop iteration system. Design: pure loop-orchestration substrate with INJECTABLE callbacks for substrate-specific operations (ranking / evolution / verification + CI-dispatch). Caller provides functions; orchestrator handles loop structure + propagation discipline. Separation-of-concerns means orchestrator does NOT tightly couple to specific TrueSkill / evolution / pairing module implementations — it composes with ANY substrate that implements the callback contracts. What this adds: - Hypothesis<T> generic substrate item with cycleIndex + derivedFrom ancestry - CiVerdict discriminated union (passed | failed | needs-revision | infrastructure-error) - LoopFeedback + LoopResult<T> Result-shape per monad-propagation - LoopCallbacks<T> interface (dispatchCi + rankSurvivors + evolveSurvivors) - LoopConfig (maxCycles + topNToEvolve + minPropagatable; DEFAULT_LOOP_CONFIG) - runCycle<T>(hypotheses, callbacks, cycleIndex, config?) — single cycle - runLoop<T>(initial, callbacks, config?, shouldContinue?) — full iteration with LoopTermination shape (cycle count + reason + final state) Cycle steps: 1. Dispatch each hypothesis to CI (caller-injected) 2. Collect verdicts 3. Filter to propagatable (passed + needs-revision-with-suggestions) 4. Rank via TrueSkill (caller-injected per B-0914.1 PR #5764) 5. Evolve top-N (caller-injected per B-0914.5 PR #5767) 6. Return refined variants for next cycle Termination conditions: - max-cycles: bounded iteration reached - insufficient-propagatable: too many failures; can't continue - predicate-stopped: caller-supplied predicate returned false - error: CI/ranking/evolution exception Tests (16; all pass): - Empty hypotheses → EmptyHypothesisSet - Passing CI → propagation through ranking + evolution - Failed verdicts excluded from propagation - needs-revision with suggestions included; without excluded - Below minPropagatable → MaxCyclesReached - CI exception → CiDispatchFailure - Ranking exception → RankingFailure - Evolution exception → EvolutionFailure - infrastructure-error excluded (doesn't reflect hypothesis quality) - runLoop iterates until max-cycles - runLoop predicate-stopped early termination - runLoop insufficient-propagatable - runLoop error termination - LoopFeedback exhaustive switch - CiVerdict exhaustive switch - Integration: full closed-loop with realistic callback wiring Composes with substrate: - B-0914.2 backlog row (closed-loop dispatch extension target) - B-0914.1 PR #5764 (TrueSkill substrate; caller wires rate1v1 + conservativeSkill into rankSurvivors) - B-0914.4 PR #5768 (pairing tracker substrate; caller wires verdicts into recordVerification) - B-0914.5 PR #5767 (evolution substrate; caller wires evolveTopN into evolveSurvivors) - B-0891 zflash test-harness substrate (caller can wire CI dispatch to actual test runners per determineRunnability discriminator) - B-0867 workflow engine substrate - Sakana Robin closed-loop pattern (Nature 2026 s41586-026-10652-y) Tournament loop NOW STRUCTURALLY COMPLETE with all 4 substrate pieces: 1. Generation (LLM call; out of scope for this lane) 2. CI dispatch → CiVerdict (THIS PR via callbacks) 3. Pairing tracking (PR #5768) 4. TrueSkill ranking (PR #5764) 5. Evolution mash-refine (PR #5767) 6. runLoop orchestration (THIS PR) S/M/L sequence complete: - S = PR #5767 evolution - M = PR #5768 pairing - L = THIS PR closed-loop Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0914.2): address 7 Copilot review threads on PR #5769 - Replace 'Aaron' with 'human maintainer' role-ref per AGENT-BEST-PRACTICES (Otto-279) - Fix broken rule-path xrefs (full filenames for monad-propagation + asymmetric-authorship) - Split LoopFeedback: introduce InsufficientPropagatable variant separate from MaxCyclesReached - Update runLoop to map InsufficientPropagatable -> insufficient-propagatable termination - Add assertNever default in exhaustiveness tests (compile-time guard now real) - Tighten integration test: deterministic insufficient-propagatable at cycle 1 16 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 28, 2026
…plication (canonical-form + Jaccard clustering); 19 tests pass (#5772) * feat(B-0914.6): proximity-agent substrate-engineering substrate de-duplication (canonical-form + Jaccard-similarity clustering); 19 tests pass Per Google co-scientist proximity agent (Nature 2026): maps ideas into high-dimensional space + groups similar variants to prevent wasting compute on substantively-identical proposals. Generalized to TS-side substrate with two de-dup mechanisms. What this adds: - ProximityFeedback discriminated union + ProximityResult<T> Result-shape - Cluster<T> with representative + members + canonicalForm - clusterByCanonical<T>(corpus, canonicalFn) — deterministic dedup - jaccardSimilarity(tokensA, tokensB) — Jaccard coefficient - defaultTokenize(text) — lowercase + stop-word filter - clusterBySimilarity<T>(context) — greedy clustering by Jaccard threshold - uniqueRepresentatives<T>(result) — drop duplicates convenience Tests (19; all pass): - clusterByCanonical groups same-canonical items - first-seen is representative (pre-sort by score for top-ranked rep) - empty corpus → EmptyCorpus - all unique → N clusters of size 1 - jaccardSimilarity edge cases (identical / disjoint / partial / empty) - defaultTokenize lowercase + stop-word filter - clusterBySimilarity threshold catches near-duplicates - High threshold keeps all distinct; low threshold clusters aggressively - Invalid threshold → InvalidThreshold - uniqueRepresentatives extracts rep-only list - Compose with evolution substrate: pre-sort by score → rep is best - ProximityFeedback exhaustive switch Composes with substrate: - B-0914.6 backlog row - B-0914.5 PR #5767 evolution (de-dup Survivor list before mash) - B-0914.2 PR #5769 closed-loop (de-dup pre-CI-dispatch saves cycles) - verify-existing-substrate-before-authoring rule (proximity IS substrate-inventory at runtime scope) - grep-substrate-anchors-before-razor-as-metaphysical rule (substrate- anchor check at runtime scope) - additive-not-zero-sum + monad-propagation + asymmetric-authorship Real semantic embeddings (TF-IDF / sentence-BERT) deferred; current PoC handles structural dedup case (substrate-engineering work often produces variants that differ only in serialization order, key casing, attribute ordering — canonical-form normalization catches these without embeddings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5772): clarify B-0914 subtask reference + document Cluster.canonicalForm semantic divergence (Copilot threads) Two threads from Copilot on tools/workflow-engine/proximity.ts: 1. Docblock cross-reference "B-0914.6 backlog row" was misleading — the seven .N subtasks (.1-.7) are sections within the parent B-0914 row file, NOT separate B-0914.N row files. Reworded to "B-0914 subtask .6" with explicit parent-row pointer + cross-reference clarification for subtasks .5 and .2 as well. 2. Cluster.canonicalForm field semantically divergent between clusterByCanonical (real canonical-form string from CanonicalFn<T>) and clusterBySimilarity (synthesized "[similarity:<threshold>]:<tokens>" label). Added interface docblock that documents the divergence explicitly + names the discriminator (`[similarity:` prefix) callers can use + notes future-substrate rename path. Non-breaking: same field name + same type + same behavior; only docblock expanded. Composes with asymmetric-authorship + monad-propagation rules unchanged. Autonomous-loop tick 2026-05-28T12:08Z resolution of PR #5772 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 28, 2026
… pass — completes 7-of-7 B-0914 candidate substrate-engineering gap substrate (#5773) * feat(B-0914.7): Falcon-style auto-research-doc template substrate (8-section scaffold + Markdown renderer); 19 tests pass — completes 7-of-7 B-0914 candidate gap substrate Per Sakana Robin Falcon agent (Nature 2026): takes drug proposal + does deep-dive literature review + writes comprehensive research report. TS- side scaffold provides 8-section template structure that downstream LLM substrate-engineering work populates (header / framing / background / mechanism / evidence / risks / composes-with / test-plan). What this adds: - ResearchDocSection discriminated union (9 section kinds) - ResearchDoc structure (id + proposalId + sections + composesWith) - ResearchDocFeedback + ResearchDocResult<T> Result-shape - renderSection(section): string — pure-function Markdown serializer - renderResearchDoc(doc): ResearchDocResult<string> — full doc rendering - buildSkeleton(context): ResearchDocResult<ResearchDoc> — 8-section scaffold - buildAndRender(context): ResearchDocResult<string> — end-to-end convenience Falcon-stage pending markers preserved (substrate-honest about what's not yet auto-generated by LLM substrate-engineering): - '[PENDING LITERATURE REVIEW — Falcon-stage auto-generated]' - '[PENDING MECHANISM ANALYSIS — Falcon-stage auto-generated]' - etc. (per section) Tests (19; all pass): - EmptyProposalId validation - 8-section Falcon scaffold structure - proposalId sanitized to filename-safe id - composesWith pass-through to skeleton + composes-with section - All 9 section-kind renderings tested (header/framing/background/ mechanism/evidence/risks/composes-with/test-plan/raw) - renderResearchDoc empty → NoSectionsRendered - buildAndRender end-to-end - Pending markers preserved (substrate-honest) - ResearchDocSection exhaustive switch Composes with substrate: - B-0914.7 backlog row (Falcon extension target) - tools/save-ai-memory/ skill (existing substrate; future integration for auto-write to docs/research/ + composes-with citation discipline) - Amara consolidation ferry pattern (PR #5757) - B-0914.2 PR #5769 closed-loop orchestrator (research-doc generation at any cycle stage; template provides structure) - substrate-or-it-didn't-happen + honor-those-that-came-before rules - asymmetric-authorship + monad-propagation rules **B-0914 7-of-7 candidate substrate-engineering gap substrate complete:** - B-0914.1 PR #5764 TrueSkill ranking (S/M/L: ranking) - B-0914.2 PR #5769 closed-loop orchestrator (S/M/L: L) - B-0914.3 PR #5770 n-parallel + consensus (8-parallel-Finch) - B-0914.4 PR #5768 generation-reflection pairing (S/M/L: M) - B-0914.5 PR #5767 evolution mash-refine (S/M/L: S) - B-0914.6 PR #5772 proximity-dedup (canonical + Jaccard clustering) - B-0914.7 THIS PR Falcon-style auto-research-doc template Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5773): full rule paths + remove unreachable InvalidOperationalStatus variant (Copilot threads) Two threads on tools/workflow-engine/research-doc.ts: 1. Composes-with docblock referenced rule files by short form (`asymmetric-authorship`, `monad-propagation-pattern`) — actual filenames are longer + .md-suffixed: `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md` `.claude/rules/monad-propagation-pattern-cross-language-substrate-shape.md` Updated to full paths so cross-refs stay greppable + don't drift. 2. ResearchDocFeedback.InvalidOperationalStatus variant was structurally unreachable: `operationalStatus` is a string-literal union (`"research-grade" | "operational"`) at the type level, the only constructor (line 179) fixes it to `"research-grade"`, and no untrusted-string parse path exists. Variant was dead substrate. Removed + added docblock naming the conditions under which a future caller should add it back (JSON import of external research-doc with operationalStatus parsed from untrusted input — add validator AT THE PARSE BOUNDARY first, then add this variant). Composes with asymmetric-authorship discipline: every TFeedback variant should correspond to a real code path that can produce it. Non-breaking: no callers reference the removed variant (grep clean). Type-system continues to rule out invalid operationalStatus at construction time. Autonomous-loop tick 2026-05-28T12:16Z resolution of PR #5773 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 28, 2026
…ly enforced producer-verifier mouth-ears substrate); 15 tests pass (#5768) * feat(B-0914.4): M — generation-reflection adversarial pairing tracker (structurally enforced producer-verifier mouth-ears substrate); 15 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — M (medium scope) in the substrate-engineering ship-sequence. Structurally enforces the producer-verifier pairing pattern Kestrel named in 15th-ferry §33.6 (mouth-and-ears-on-different-threads architecture) as workflow engine substrate rather than operator-orchestrated coordination. Pattern: 1. Producer thread emits hypothesis (commits to substrate fast) 2. Verifier thread reflects on emission (within bounded window; doesn't gate production) 3. Pairing tracker enforces: every emission MUST have verification OR be marked stale (timeout exceeded) 4. Verdicts (verified / rejected / needs-revision) determine which emissions propagate forward to next stage What this adds: - PairingRole union (producer | verifier) - VerificationVerdict discriminated union (verified | rejected | needs-revision-with-suggestions) - Emission + Verification interfaces with composesWith provenance - PairingState (immutable; ReadonlyMap) - PairingFeedback discriminated union + PairingResult<T> Result-shape - recordEmission(state, emission) + recordVerification(state, verification) - findUnverifiedEmissions + findStaleEmissions (bounded-window enforcement) - countVerdicts (aggregate dashboard) - propagatableEmissionIds (which verified emissions flow to next stage — TrueSkill ranking, evolution-via-mash-refine, etc.) Tests (15; all pass): - Records emission to empty state - Rejects duplicate emission id (DuplicateEmissionId) - Records verification for known emission - Rejects verification for unknown emission (VerificationForUnknownEmission) - Rejects duplicate verification (DuplicateVerification) - Rejects verification before emission timestamp (VerificationTooEarly; causality violation) - findUnverifiedEmissions returns emissions without verifications - findStaleEmissions returns emissions past bounded window - findStaleEmissions excludes verified emissions even if old - countVerdicts aggregates correctly across 4 verdict types - propagatableEmissionIds includes verified + needs-revision-with-suggestions; excludes rejected + empty-suggestions - Immutable state operations preserve originals - VerificationVerdict exhaustive switch (TS strict mode) - PairingRole exhaustive switch - Tournament-loop composition: emissions → verifications → propagatable → next stage Composes with substrate: - B-0914.4 backlog row (generation-reflection extension target) - B-0867.20 PR #5758 (lifecycle DU split; pairing requirement applies per ActionClass) - B-0914.1 PR #5764 (TrueSkill substrate; verifier output feeds ranking) - B-0914.5 PR #5767 (evolution substrate; verified survivors evolve) - PR #5756 Kestrel 15th-ferry mouth-ears-threads substrate - .claude/rules/asymmetric-authorship + monad-propagation rules Tournament loop now structurally complete: 1. Generate hypotheses (LLM call; out of scope) 2. recordEmission(state, emission) 3. Verifier-thread: recordVerification(state, verification) 4. propagatableEmissionIds(state) → verified survivors flow to TrueSkill 5. rate1v1 ranks survivors (B-0914.1) 6. conservativeSkill sorts; top-N taken 7. evolveTopN(survivors, n, strategy) produces refined variants (B-0914.5) 8. Loop back to step 2 with refined variants as next emissions Next per S/M/L sequence: L (large) = closed-loop CI-result → next-hypothesis dispatch (B-0914.2) — the wire-up that turns the tournament-loop substrate into a live system. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5768): role-refs over first-names + type-safe .state access + boundary semantics doc/test (Copilot threads) Three threads on pairing.ts + pairing.test.ts: 1. Persona/first-name attributions in current-state code surface violate role-ref convention. Updated: - "Per Aaron 2026-05-28 'S M L...'" → "Per the human maintainer (2026-05-28) 'S M L...'" - "Otto generates → Kestrel reflects" → "generator-persona generates → verifier-persona reflects (canonical instance preserved in 13th-ferry §33.7)" - "Kestrel named in 15th-ferry §33.6" → "named in the 15th-ferry §33.6 substrate-engineering preservation" (citation context preserved; persona-as-substrate-author preserved as reference, not as in-code first-name) - Test fixtures: producerId "otto-cli" → "producer-1", verifierId "kestrel" → "verifier-1" (role-refs; ID strings not load-bearing on factory persona registry) 2. Test `.state!` non-null assertions bypassed PairingResult discriminated-union narrowing. Replaced 12 sites with a type-safe `mustState(r)` helper that explicitly asserts `r.ok === true` and throws with the feedback variant if not. If a refactor regresses any call to `ok: false`, the test surfaces the failure-mode substrate immediately instead of silently propagating `undefined` into downstream state. Helper is test-local; no API change. 3. findStaleEmissions strict > semantics confirmed intentional + documented. Added 8-line interface docblock explaining the boundary case (emission at exactly nowMs - emittedAtMs === timeoutMs is NOT stale; gets the boundary tick to be verified) + the conservative-cadence rationale + the switch-to->= condition. Added boundary test that locks in the > behavior at the exact boundary AND at one ms past, so a future ">=" refactor must update both pairing.ts AND this test together. Tests: 16 pass (15 existing + 1 new boundary test). Autonomous-loop tick 2026-05-28T12:35Z resolution of PR #5768 BLOCKED gate (3 unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
S in Aaron's 'S M L all please in that order lol' sequence. Pure-TS evolution agent (mash + refine survivors) closing the tournament loop with TrueSkill substrate (PR #5764).
Closes the tournament loop:
12 tests pass / 0 fail.
What this adds
Survivor<T>interface (generic; TrueSkill conservativeSkill as ranking signal)EvolutionStrategyunion (simple-merge | cross-pollinate | mutate)EvolutionFeedback+EvolutionResult<T>Result-shape per monad-propagationRefinedVariant<T>withderivedFrom+composesWithfor provenanceevolveSurvivors<T>(context)+evolveTopN<T>convenienceComposes with substrate
Next per S/M/L sequence
Test plan
🤖 Generated with Claude Code