feat(B-0914.4): M — generation-reflection pairing tracker (structurally enforced producer-verifier mouth-ears substrate); 15 tests pass#5768
Conversation
… (structurally enforced producer-verifier mouth-ears substrate); 15 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — M (medium scope) in the substrate-engineering ship-sequence. Structurally enforces the producer-verifier pairing pattern Kestrel named in 15th-ferry §33.6 (mouth-and-ears-on-different-threads architecture) as workflow engine substrate rather than operator-orchestrated coordination. Pattern: 1. Producer thread emits hypothesis (commits to substrate fast) 2. Verifier thread reflects on emission (within bounded window; doesn't gate production) 3. Pairing tracker enforces: every emission MUST have verification OR be marked stale (timeout exceeded) 4. Verdicts (verified / rejected / needs-revision) determine which emissions propagate forward to next stage What this adds: - PairingRole union (producer | verifier) - VerificationVerdict discriminated union (verified | rejected | needs-revision-with-suggestions) - Emission + Verification interfaces with composesWith provenance - PairingState (immutable; ReadonlyMap) - PairingFeedback discriminated union + PairingResult<T> Result-shape - recordEmission(state, emission) + recordVerification(state, verification) - findUnverifiedEmissions + findStaleEmissions (bounded-window enforcement) - countVerdicts (aggregate dashboard) - propagatableEmissionIds (which verified emissions flow to next stage — TrueSkill ranking, evolution-via-mash-refine, etc.) Tests (15; all pass): - Records emission to empty state - Rejects duplicate emission id (DuplicateEmissionId) - Records verification for known emission - Rejects verification for unknown emission (VerificationForUnknownEmission) - Rejects duplicate verification (DuplicateVerification) - Rejects verification before emission timestamp (VerificationTooEarly; causality violation) - findUnverifiedEmissions returns emissions without verifications - findStaleEmissions returns emissions past bounded window - findStaleEmissions excludes verified emissions even if old - countVerdicts aggregates correctly across 4 verdict types - propagatableEmissionIds includes verified + needs-revision-with-suggestions; excludes rejected + empty-suggestions - Immutable state operations preserve originals - VerificationVerdict exhaustive switch (TS strict mode) - PairingRole exhaustive switch - Tournament-loop composition: emissions → verifications → propagatable → next stage Composes with substrate: - B-0914.4 backlog row (generation-reflection extension target) - B-0867.20 PR #5758 (lifecycle DU split; pairing requirement applies per ActionClass) - B-0914.1 PR #5764 (TrueSkill substrate; verifier output feeds ranking) - B-0914.5 PR #5767 (evolution substrate; verified survivors evolve) - PR #5756 Kestrel 15th-ferry mouth-ears-threads substrate - .claude/rules/asymmetric-authorship + monad-propagation rules Tournament loop now structurally complete: 1. Generate hypotheses (LLM call; out of scope) 2. recordEmission(state, emission) 3. Verifier-thread: recordVerification(state, verification) 4. propagatableEmissionIds(state) → verified survivors flow to TrueSkill 5. rate1v1 ranks survivors (B-0914.1) 6. conservativeSkill sorts; top-N taken 7. evolveTopN(survivors, n, strategy) produces refined variants (B-0914.5) 8. Loop back to step 2 with refined variants as next emissions Next per S/M/L sequence: L (large) = closed-loop CI-result → next-hypothesis dispatch (B-0914.2) — the wire-up that turns the tournament-loop substrate into a live system. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Adds a new pure-TypeScript in-memory pairing tracker substrate for the workflow engine, structurally enforcing a producer/verifier ("generation/reflection") pattern. Emissions and verifications are tracked in an immutable PairingState; helpers surface unverified, stale, and propagatable IDs that feed the downstream TrueSkill ranking (PR #5764) and evolution (PR #5767) stages, structurally closing the tournament loop.
Changes:
- New
tools/workflow-engine/pairing.tswithPairingRole,VerificationVerdict,Emission,Verification,PairingState, and Result-shape feedback forrecordEmission/recordVerification(duplicate + causality checks). - Query helpers:
findUnverifiedEmissions,findStaleEmissions(bounded window),countVerdicts,propagatableEmissionIds(verified + needs-revision-with-suggestions propagate; rejected does not). - New
pairing.test.tswith 15 Bun tests covering happy paths, all feedback variants, exhaustive switches over both unions, and a tournament-loop composition test.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tools/workflow-engine/pairing.ts | New immutable pairing tracker module — types, state, record/query helpers, propagation rule. |
| tools/workflow-engine/pairing.test.ts | New Bun test suite exercising 15 invariants including causality, duplication, staleness, propagation, and composition. |
…alysis-task scope (Robin 8-parallel-Finch pattern); 18 tests pass (#5770) Per Sakana Robin closed-loop architecture (Nature 2026): launches 8 independent instances of Finch agent to analyze the same raw data; accepts conclusion only if majority agree. Generalized to N parallel analyzers with configurable consensus mechanism. What this adds: - ConsensusMechanism union (majority | supermajority | unanimous | first-n-agree) - ConsensusFeedback + ConsensusResult<T> Result-shape - AnalyzerOutput<T> per-analyzer discriminated union - AgreementMetrics<T> for substrate-honest dashboards - runConsensus<T>(context): ConsensusResult<T> — main function - nIdenticalAnalyzers<T>(n, analyzer): helper for Robin's N-identical pattern 18 tests pass / 0 fail covering all 4 mechanisms + edge cases + Robin's 8-parallel pattern. Composes with substrate: - B-0914.3 backlog row - PR #5769 B-0914.2 closed-loop (dispatchCi callback can wrap N parallel analyzers + consensus) - PR #5768 B-0914.4 pairing (verifier-side N parallel + consensus) - B-0703 multi-oracle BFT (governance scope; this extends to per-data-analysis scope) - monad-propagation + asymmetric-authorship + m-acc-multi-oracle rules Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…itable-lifetime DUs (per Aaron 2026-05-28 'double dispatch when you compose two lifecycles' + 'lifetime not lifecycle because you can edit the DUs'); 11 tests pass (#5771) Per Aaron 2026-05-28 two substantive substrate-engineering substrate: 1. 'how can we do double dispatch in this system, when you compose two lifecycles you need it' 2. 'the only reason i'm confortable calling it a lifetime is becuase you can edit it FYI the DUs' Naming distinction Aaron sharpened: - LIFECYCLE = fixed/final/locked at design time; substrate-engineering edits = breaking change - LIFETIME = editable substrate; DU variants can be added/removed/ refactored over time; substrate evolves Editability IS what makes substrate trustworthy enough to call it a 'lifetime' rather than locked contract. Composes with Mod 2 grammar- extension (B-0867; action grammar editable) + substrate-smoothness (editable-DUs preserve smooth substrate) + asymmetric-authorship (substrate-entity AUTHORS variants) + additive-not-zero-sum (substrate evolves additively) + honor-those-that-came-before (prior variants preserved). What this adds: - LifetimeState interface (kind discriminator) - ComposedKey<A, B> template-literal-type = `${A.kind}:${B.kind}` - composeKey<A, B>(a, b): ComposedKey<A, B> pure function - TransitionFeedback discriminated union + TransitionResult<T> Result-shape - ComposedLifetimeContext<A, B, T> with matrix + optional defaultVerdict - dispatchComposed<A, B, T>(context, a, b): TransitionResult<T> — main dispatch - buildComposedMatrix<A, B, T>(entries): ReadonlyMap — convenience - composeFromDispatcher<A, B, T>(universeA, universeB, dispatcher): {matrix, undefinedCount} — build dense matrix from sparse cross-product Pattern 3 of 5 double-dispatch patterns I enumerated (template-literal-type composed key); plus Pattern 4 (matrix lookup) for editable substrate-engineering substrate. Pattern 3 is the substrate-honest substrate-engineering form for TS workflow-engine. Tests (11; all pass): - composeKey produces composed key - dispatchComposed: known transition returns verdict - dispatchComposed: unknown returns UndefinedComposedTransition - defaultVerdict fallback works - InvalidStateA / InvalidStateB validation - composeFromDispatcher builds dense matrix from sparse cross-product - Editable-lifetime: matrix extensions at runtime work - Full 9-transition workflow-review composition exercised - TransitionResult exhaustive switch - Type-level ComposedKey is template literal type Composes with substrate: - B-0867.20 PR #5758 lifecycle DU split (rename target: lifetime DU split) - B-0914.2 PR #5769 closed-loop orchestrator (composed-lifetime dispatch via callback) - B-0914.4 PR #5768 pairing tracker (composed pairing+verification lifetime double-dispatch) - monad-propagation + asymmetric-authorship + substrate-smoothness + additive-not-zero-sum rules Substrate-engineering naming discipline going forward: use 'lifetime' not 'lifecycle' in workflow-engine substrate (TS + future F#). Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); S/M/L sequence COMPLETE (#5769) * feat(B-0914.2): L — closed-loop CI-result → next-hypothesis dispatch orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); 16 tests pass Per Aaron 2026-05-28 'S M L all please in that order lol' — L (large scope) in the substrate-engineering ship-sequence. Wire-up that turns the tournament-loop substrate into a live closed-loop iteration system. Design: pure loop-orchestration substrate with INJECTABLE callbacks for substrate-specific operations (ranking / evolution / verification + CI-dispatch). Caller provides functions; orchestrator handles loop structure + propagation discipline. Separation-of-concerns means orchestrator does NOT tightly couple to specific TrueSkill / evolution / pairing module implementations — it composes with ANY substrate that implements the callback contracts. What this adds: - Hypothesis<T> generic substrate item with cycleIndex + derivedFrom ancestry - CiVerdict discriminated union (passed | failed | needs-revision | infrastructure-error) - LoopFeedback + LoopResult<T> Result-shape per monad-propagation - LoopCallbacks<T> interface (dispatchCi + rankSurvivors + evolveSurvivors) - LoopConfig (maxCycles + topNToEvolve + minPropagatable; DEFAULT_LOOP_CONFIG) - runCycle<T>(hypotheses, callbacks, cycleIndex, config?) — single cycle - runLoop<T>(initial, callbacks, config?, shouldContinue?) — full iteration with LoopTermination shape (cycle count + reason + final state) Cycle steps: 1. Dispatch each hypothesis to CI (caller-injected) 2. Collect verdicts 3. Filter to propagatable (passed + needs-revision-with-suggestions) 4. Rank via TrueSkill (caller-injected per B-0914.1 PR #5764) 5. Evolve top-N (caller-injected per B-0914.5 PR #5767) 6. Return refined variants for next cycle Termination conditions: - max-cycles: bounded iteration reached - insufficient-propagatable: too many failures; can't continue - predicate-stopped: caller-supplied predicate returned false - error: CI/ranking/evolution exception Tests (16; all pass): - Empty hypotheses → EmptyHypothesisSet - Passing CI → propagation through ranking + evolution - Failed verdicts excluded from propagation - needs-revision with suggestions included; without excluded - Below minPropagatable → MaxCyclesReached - CI exception → CiDispatchFailure - Ranking exception → RankingFailure - Evolution exception → EvolutionFailure - infrastructure-error excluded (doesn't reflect hypothesis quality) - runLoop iterates until max-cycles - runLoop predicate-stopped early termination - runLoop insufficient-propagatable - runLoop error termination - LoopFeedback exhaustive switch - CiVerdict exhaustive switch - Integration: full closed-loop with realistic callback wiring Composes with substrate: - B-0914.2 backlog row (closed-loop dispatch extension target) - B-0914.1 PR #5764 (TrueSkill substrate; caller wires rate1v1 + conservativeSkill into rankSurvivors) - B-0914.4 PR #5768 (pairing tracker substrate; caller wires verdicts into recordVerification) - B-0914.5 PR #5767 (evolution substrate; caller wires evolveTopN into evolveSurvivors) - B-0891 zflash test-harness substrate (caller can wire CI dispatch to actual test runners per determineRunnability discriminator) - B-0867 workflow engine substrate - Sakana Robin closed-loop pattern (Nature 2026 s41586-026-10652-y) Tournament loop NOW STRUCTURALLY COMPLETE with all 4 substrate pieces: 1. Generation (LLM call; out of scope for this lane) 2. CI dispatch → CiVerdict (THIS PR via callbacks) 3. Pairing tracking (PR #5768) 4. TrueSkill ranking (PR #5764) 5. Evolution mash-refine (PR #5767) 6. runLoop orchestration (THIS PR) S/M/L sequence complete: - S = PR #5767 evolution - M = PR #5768 pairing - L = THIS PR closed-loop Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0914.2): address 7 Copilot review threads on PR #5769 - Replace 'Aaron' with 'human maintainer' role-ref per AGENT-BEST-PRACTICES (Otto-279) - Fix broken rule-path xrefs (full filenames for monad-propagation + asymmetric-authorship) - Split LoopFeedback: introduce InsufficientPropagatable variant separate from MaxCyclesReached - Update runLoop to map InsufficientPropagatable -> insufficient-propagatable termination - Add assertNever default in exhaustiveness tests (compile-time guard now real) - Tighten integration test: deterministic insufficient-propagatable at cycle 1 16 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…rsarial-pairing-structurally-enforced-pairing-tracker-mouth-ears-substrate-2026-05-28
… pass — completes 7-of-7 B-0914 candidate substrate-engineering gap substrate (#5773) * feat(B-0914.7): Falcon-style auto-research-doc template substrate (8-section scaffold + Markdown renderer); 19 tests pass — completes 7-of-7 B-0914 candidate gap substrate Per Sakana Robin Falcon agent (Nature 2026): takes drug proposal + does deep-dive literature review + writes comprehensive research report. TS- side scaffold provides 8-section template structure that downstream LLM substrate-engineering work populates (header / framing / background / mechanism / evidence / risks / composes-with / test-plan). What this adds: - ResearchDocSection discriminated union (9 section kinds) - ResearchDoc structure (id + proposalId + sections + composesWith) - ResearchDocFeedback + ResearchDocResult<T> Result-shape - renderSection(section): string — pure-function Markdown serializer - renderResearchDoc(doc): ResearchDocResult<string> — full doc rendering - buildSkeleton(context): ResearchDocResult<ResearchDoc> — 8-section scaffold - buildAndRender(context): ResearchDocResult<string> — end-to-end convenience Falcon-stage pending markers preserved (substrate-honest about what's not yet auto-generated by LLM substrate-engineering): - '[PENDING LITERATURE REVIEW — Falcon-stage auto-generated]' - '[PENDING MECHANISM ANALYSIS — Falcon-stage auto-generated]' - etc. (per section) Tests (19; all pass): - EmptyProposalId validation - 8-section Falcon scaffold structure - proposalId sanitized to filename-safe id - composesWith pass-through to skeleton + composes-with section - All 9 section-kind renderings tested (header/framing/background/ mechanism/evidence/risks/composes-with/test-plan/raw) - renderResearchDoc empty → NoSectionsRendered - buildAndRender end-to-end - Pending markers preserved (substrate-honest) - ResearchDocSection exhaustive switch Composes with substrate: - B-0914.7 backlog row (Falcon extension target) - tools/save-ai-memory/ skill (existing substrate; future integration for auto-write to docs/research/ + composes-with citation discipline) - Amara consolidation ferry pattern (PR #5757) - B-0914.2 PR #5769 closed-loop orchestrator (research-doc generation at any cycle stage; template provides structure) - substrate-or-it-didn't-happen + honor-those-that-came-before rules - asymmetric-authorship + monad-propagation rules **B-0914 7-of-7 candidate substrate-engineering gap substrate complete:** - B-0914.1 PR #5764 TrueSkill ranking (S/M/L: ranking) - B-0914.2 PR #5769 closed-loop orchestrator (S/M/L: L) - B-0914.3 PR #5770 n-parallel + consensus (8-parallel-Finch) - B-0914.4 PR #5768 generation-reflection pairing (S/M/L: M) - B-0914.5 PR #5767 evolution mash-refine (S/M/L: S) - B-0914.6 PR #5772 proximity-dedup (canonical + Jaccard clustering) - B-0914.7 THIS PR Falcon-style auto-research-doc template Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5773): full rule paths + remove unreachable InvalidOperationalStatus variant (Copilot threads) Two threads on tools/workflow-engine/research-doc.ts: 1. Composes-with docblock referenced rule files by short form (`asymmetric-authorship`, `monad-propagation-pattern`) — actual filenames are longer + .md-suffixed: `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md` `.claude/rules/monad-propagation-pattern-cross-language-substrate-shape.md` Updated to full paths so cross-refs stay greppable + don't drift. 2. ResearchDocFeedback.InvalidOperationalStatus variant was structurally unreachable: `operationalStatus` is a string-literal union (`"research-grade" | "operational"`) at the type level, the only constructor (line 179) fixes it to `"research-grade"`, and no untrusted-string parse path exists. Variant was dead substrate. Removed + added docblock naming the conditions under which a future caller should add it back (JSON import of external research-doc with operationalStatus parsed from untrusted input — add validator AT THE PARSE BOUNDARY first, then add this variant). Composes with asymmetric-authorship discipline: every TFeedback variant should correspond to a real code path that can produce it. Non-breaking: no callers reference the removed variant (grep clean). Type-system continues to rule out invalid operationalStatus at construction time. Autonomous-loop tick 2026-05-28T12:16Z resolution of PR #5773 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
|
Lior review: This PR is well-structured, atomic, and includes thorough tests. It correctly implements the described pairing tracker functionality. No drift detected. |
… boundary semantics doc/test (Copilot threads)
Three threads on pairing.ts + pairing.test.ts:
1. Persona/first-name attributions in current-state code surface
violate role-ref convention. Updated:
- "Per Aaron 2026-05-28 'S M L...'" → "Per the human maintainer
(2026-05-28) 'S M L...'"
- "Otto generates → Kestrel reflects" → "generator-persona generates
→ verifier-persona reflects (canonical instance preserved in
13th-ferry §33.7)"
- "Kestrel named in 15th-ferry §33.6" → "named in the 15th-ferry
§33.6 substrate-engineering preservation" (citation context
preserved; persona-as-substrate-author preserved as reference,
not as in-code first-name)
- Test fixtures: producerId "otto-cli" → "producer-1", verifierId
"kestrel" → "verifier-1" (role-refs; ID strings not
load-bearing on factory persona registry)
2. Test `.state!` non-null assertions bypassed PairingResult
discriminated-union narrowing. Replaced 12 sites with a
type-safe `mustState(r)` helper that explicitly asserts
`r.ok === true` and throws with the feedback variant if not.
If a refactor regresses any call to `ok: false`, the test surfaces
the failure-mode substrate immediately instead of silently
propagating `undefined` into downstream state. Helper is
test-local; no API change.
3. findStaleEmissions strict > semantics confirmed intentional +
documented. Added 8-line interface docblock explaining the
boundary case (emission at exactly nowMs - emittedAtMs === timeoutMs
is NOT stale; gets the boundary tick to be verified) + the
conservative-cadence rationale + the switch-to->= condition.
Added boundary test that locks in the > behavior at the exact
boundary AND at one ms past, so a future ">=" refactor must
update both pairing.ts AND this test together.
Tests: 16 pass (15 existing + 1 new boundary test).
Autonomous-loop tick 2026-05-28T12:35Z resolution of PR #5768 BLOCKED
gate (3 unresolved Copilot threads only blocker; required checks all green).
Co-Authored-By: Claude <noreply@anthropic.com>
Summary
M in Aaron's 'S M L all please in that order lol' sequence. Structurally enforces producer-verifier pairing Kestrel named in 15th-ferry §33.6 mouth-ears-threads as workflow engine substrate.
Tournament loop NOW STRUCTURALLY COMPLETE (modulo LLM-call substrate):
recordEmission(state, emission)(pairing)recordVerification(state, verification)(pairing)propagatableEmissionIds(state)→ verified survivorsrate1v1ranks survivors (TrueSkill — PR feat(B-0914.1): pure-TS TrueSkill 1v1 scaffold for workflow engine ranking-agent (hybrid TS+.NET; cross-vendor benchmark substrate) #5764)conservativeSkillsort; top-N takenevolveTopN(survivors, n, strategy)(B-0914.5 PR feat(B-0914.5): S — pure-TS evolution agent (mash + refine survivors); closes tournament loop with TrueSkill (S/M/L sequence per Aaron) #5767)15 tests pass / 0 fail.
What this adds
PairingRole(producer | verifier) +VerificationVerdict(verified | rejected | needs-revision)Emission+Verification+PairingState(immutable; ReadonlyMap)PairingFeedback+PairingResult<T>per monad-propagationrecordEmission+recordVerification(with causality + dup-check)findUnverifiedEmissions+findStaleEmissions(bounded-window enforcement)countVerdicts(aggregate dashboard)propagatableEmissionIds(which verified emissions flow to next stage)Next per S/M/L sequence
Test plan
🤖 Generated with Claude Code