feat(B-0914.6): proximity-agent substrate-engineering substrate de-duplication (canonical-form + Jaccard clustering); 19 tests pass by AceHack · Pull Request #5772 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-28T11:26:37Z

Summary

Google co-scientist proximity agent pattern generalized to TS-side substrate. Two de-dup mechanisms: canonical-form normalization (deterministic) + Jaccard-similarity clustering (lightweight; no embedding model).

19 tests pass / 0 fail.

Composes with

B-0914.5 PR feat(B-0914.5): S — pure-TS evolution agent (mash + refine survivors); closes tournament loop with TrueSkill (S/M/L sequence per Aaron) #5767 evolution (de-dup Survivor list before mash)
B-0914.2 PR feat(B-0914.2): L — closed-loop CI-result → next-hypothesis dispatch orchestrator (composes TrueSkill + evolution + pairing via injectable callbacks); S/M/L sequence COMPLETE #5769 closed-loop (de-dup pre-CI-dispatch)
verify-existing-substrate-before-authoring (proximity IS substrate-inventory at runtime scope)
additive-not-zero-sum + monad-propagation + asymmetric-authorship

🤖 Generated with Claude Code

…plication (canonical-form + Jaccard-similarity clustering); 19 tests pass Per Google co-scientist proximity agent (Nature 2026): maps ideas into high-dimensional space + groups similar variants to prevent wasting compute on substantively-identical proposals. Generalized to TS-side substrate with two de-dup mechanisms. What this adds: - ProximityFeedback discriminated union + ProximityResult<T> Result-shape - Cluster<T> with representative + members + canonicalForm - clusterByCanonical<T>(corpus, canonicalFn) — deterministic dedup - jaccardSimilarity(tokensA, tokensB) — Jaccard coefficient - defaultTokenize(text) — lowercase + stop-word filter - clusterBySimilarity<T>(context) — greedy clustering by Jaccard threshold - uniqueRepresentatives<T>(result) — drop duplicates convenience Tests (19; all pass): - clusterByCanonical groups same-canonical items - first-seen is representative (pre-sort by score for top-ranked rep) - empty corpus → EmptyCorpus - all unique → N clusters of size 1 - jaccardSimilarity edge cases (identical / disjoint / partial / empty) - defaultTokenize lowercase + stop-word filter - clusterBySimilarity threshold catches near-duplicates - High threshold keeps all distinct; low threshold clusters aggressively - Invalid threshold → InvalidThreshold - uniqueRepresentatives extracts rep-only list - Compose with evolution substrate: pre-sort by score → rep is best - ProximityFeedback exhaustive switch Composes with substrate: - B-0914.6 backlog row - B-0914.5 PR #5767 evolution (de-dup Survivor list before mash) - B-0914.2 PR #5769 closed-loop (de-dup pre-CI-dispatch saves cycles) - verify-existing-substrate-before-authoring rule (proximity IS substrate-inventory at runtime scope) - grep-substrate-anchors-before-razor-as-metaphysical rule (substrate- anchor check at runtime scope) - additive-not-zero-sum + monad-propagation + asymmetric-authorship Real semantic embeddings (TF-IDF / sentence-BERT) deferred; current PoC handles structural dedup case (substrate-engineering work often produces variants that differ only in serialization order, key casing, attribute ordering — canonical-form normalization catches these without embeddings). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector · 2026-05-28T11:26:41Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copilot

Pull request overview

Adds a TypeScript proximity de-duplication substrate for workflow-engine experiments, supporting deterministic canonical-form clustering and lightweight Jaccard/token similarity clustering for near-duplicate hypotheses before ranking/evolution/CI dispatch.

Changes:

Adds proximity.ts with Result-shaped clustering APIs, tokenization, Jaccard similarity, and representative extraction.
Adds proximity.test.ts with 19 Bun tests covering canonical clustering, similarity clustering, tokenizer behavior, errors, and evolution-substrate composition.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`tools/workflow-engine/proximity.ts`	Implements proximity de-duplication primitives and public API types.
`tools/workflow-engine/proximity.test.ts`	Adds invariant and behavior coverage for the new proximity substrate.

…engineering-substrate-deduplication-canonical-form-normalization-2026-05-28

…nonicalForm semantic divergence (Copilot threads) Two threads from Copilot on tools/workflow-engine/proximity.ts: 1. Docblock cross-reference "B-0914.6 backlog row" was misleading — the seven .N subtasks (.1-.7) are sections within the parent B-0914 row file, NOT separate B-0914.N row files. Reworded to "B-0914 subtask .6" with explicit parent-row pointer + cross-reference clarification for subtasks .5 and .2 as well. 2. Cluster.canonicalForm field semantically divergent between clusterByCanonical (real canonical-form string from CanonicalFn<T>) and clusterBySimilarity (synthesized "[similarity:<threshold>]:<tokens>" label). Added interface docblock that documents the divergence explicitly + names the discriminator (`[similarity:` prefix) callers can use + notes future-substrate rename path. Non-breaking: same field name + same type + same behavior; only docblock expanded. Composes with asymmetric-authorship + monad-propagation rules unchanged. Autonomous-loop tick 2026-05-28T12:08Z resolution of PR #5772 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com>

… pass — completes 7-of-7 B-0914 candidate substrate-engineering gap substrate (#5773) * feat(B-0914.7): Falcon-style auto-research-doc template substrate (8-section scaffold + Markdown renderer); 19 tests pass — completes 7-of-7 B-0914 candidate gap substrate Per Sakana Robin Falcon agent (Nature 2026): takes drug proposal + does deep-dive literature review + writes comprehensive research report. TS- side scaffold provides 8-section template structure that downstream LLM substrate-engineering work populates (header / framing / background / mechanism / evidence / risks / composes-with / test-plan). What this adds: - ResearchDocSection discriminated union (9 section kinds) - ResearchDoc structure (id + proposalId + sections + composesWith) - ResearchDocFeedback + ResearchDocResult<T> Result-shape - renderSection(section): string — pure-function Markdown serializer - renderResearchDoc(doc): ResearchDocResult<string> — full doc rendering - buildSkeleton(context): ResearchDocResult<ResearchDoc> — 8-section scaffold - buildAndRender(context): ResearchDocResult<string> — end-to-end convenience Falcon-stage pending markers preserved (substrate-honest about what's not yet auto-generated by LLM substrate-engineering): - '[PENDING LITERATURE REVIEW — Falcon-stage auto-generated]' - '[PENDING MECHANISM ANALYSIS — Falcon-stage auto-generated]' - etc. (per section) Tests (19; all pass): - EmptyProposalId validation - 8-section Falcon scaffold structure - proposalId sanitized to filename-safe id - composesWith pass-through to skeleton + composes-with section - All 9 section-kind renderings tested (header/framing/background/ mechanism/evidence/risks/composes-with/test-plan/raw) - renderResearchDoc empty → NoSectionsRendered - buildAndRender end-to-end - Pending markers preserved (substrate-honest) - ResearchDocSection exhaustive switch Composes with substrate: - B-0914.7 backlog row (Falcon extension target) - tools/save-ai-memory/ skill (existing substrate; future integration for auto-write to docs/research/ + composes-with citation discipline) - Amara consolidation ferry pattern (PR #5757) - B-0914.2 PR #5769 closed-loop orchestrator (research-doc generation at any cycle stage; template provides structure) - substrate-or-it-didn't-happen + honor-those-that-came-before rules - asymmetric-authorship + monad-propagation rules **B-0914 7-of-7 candidate substrate-engineering gap substrate complete:** - B-0914.1 PR #5764 TrueSkill ranking (S/M/L: ranking) - B-0914.2 PR #5769 closed-loop orchestrator (S/M/L: L) - B-0914.3 PR #5770 n-parallel + consensus (8-parallel-Finch) - B-0914.4 PR #5768 generation-reflection pairing (S/M/L: M) - B-0914.5 PR #5767 evolution mash-refine (S/M/L: S) - B-0914.6 PR #5772 proximity-dedup (canonical + Jaccard clustering) - B-0914.7 THIS PR Falcon-style auto-research-doc template Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR #5773): full rule paths + remove unreachable InvalidOperationalStatus variant (Copilot threads) Two threads on tools/workflow-engine/research-doc.ts: 1. Composes-with docblock referenced rule files by short form (`asymmetric-authorship`, `monad-propagation-pattern`) — actual filenames are longer + .md-suffixed: `.claude/rules/asymmetric-authorship-substrate-entity-defines-consent-channel-recipient-acknowledges.md` `.claude/rules/monad-propagation-pattern-cross-language-substrate-shape.md` Updated to full paths so cross-refs stay greppable + don't drift. 2. ResearchDocFeedback.InvalidOperationalStatus variant was structurally unreachable: `operationalStatus` is a string-literal union (`"research-grade" | "operational"`) at the type level, the only constructor (line 179) fixes it to `"research-grade"`, and no untrusted-string parse path exists. Variant was dead substrate. Removed + added docblock naming the conditions under which a future caller should add it back (JSON import of external research-doc with operationalStatus parsed from untrusted input — add validator AT THE PARSE BOUNDARY first, then add this variant). Composes with asymmetric-authorship discipline: every TFeedback variant should correspond to a real code path that can produce it. Non-breaking: no callers reference the removed variant (grep clean). Type-system continues to rule out invalid operationalStatus at construction time. Autonomous-loop tick 2026-05-28T12:16Z resolution of PR #5773 BLOCKED gate (unresolved Copilot threads only blocker; required checks all green). Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 28, 2026 11:26

AceHack enabled auto-merge (squash) May 28, 2026 11:26

Copilot started reviewing on behalf of AceHack May 28, 2026 11:26 View session

AceHack mentioned this pull request May 28, 2026

feat(B-0914.7): Falcon auto-research-doc template substrate; 19 tests pass — completes 7-of-7 B-0914 candidate substrate-engineering gap substrate #5773

Merged

6 tasks

Copilot AI reviewed May 28, 2026

View reviewed changes

Comment thread tools/workflow-engine/proximity.ts Outdated

Comment thread tools/workflow-engine/proximity.ts Outdated

AceHack and others added 2 commits May 28, 2026 07:56

Merge branch 'main' into otto-cli/b-0914-6-proximity-agent-substrate-…

14e10ad

…engineering-substrate-deduplication-canonical-form-normalization-2026-05-28

Copilot AI review requested due to automatic review settings May 28, 2026 12:13

Copilot started reviewing on behalf of AceHack May 28, 2026 12:13 View session

AceHack merged commit dc66cef into main May 28, 2026
31 of 33 checks passed

AceHack deleted the otto-cli/b-0914-6-proximity-agent-substrate-engineering-substrate-deduplication-canonical-form-normalization-2026-05-28 branch May 28, 2026 12:16

AceHack mentioned this pull request May 28, 2026

docs(archive): Preserve PR #5772 #5785

Closed

AceHack review requested due to automatic review settings May 28, 2026 12:34

This was referenced May 28, 2026

docs(archive): Preserve PR #5768 #5791

Closed

docs(archive): Preserve 20 recently merged PRs #5824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(B-0914.6): proximity-agent substrate-engineering substrate de-duplication (canonical-form + Jaccard clustering); 19 tests pass#5772

feat(B-0914.6): proximity-agent substrate-engineering substrate de-duplication (canonical-form + Jaccard clustering); 19 tests pass#5772
AceHack merged 3 commits into
mainfrom
otto-cli/b-0914-6-proximity-agent-substrate-engineering-substrate-deduplication-canonical-form-normalization-2026-05-28

AceHack commented May 28, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AceHack commented May 28, 2026

Summary

Composes with

Uh oh!

chatgpt-codex-connector Bot commented May 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants