Skip to content

feat(memory-v2): cross-encoder rerank as additive boost#29555

Merged
siddseethepalli merged 1 commit into
mainfrom
do/memory-v2-rerank-boost
May 5, 2026
Merged

feat(memory-v2): cross-encoder rerank as additive boost#29555
siddseethepalli merged 1 commit into
mainfrom
do/memory-v2-rerank-boost

Conversation

@siddseethepalli
Copy link
Copy Markdown
Contributor

@siddseethepalli siddseethepalli commented May 5, 2026

Summary

  • Adds an opt-in (`memory.v2.rerank.enabled: false` by default) cross-encoder rerank that runs locally via a new `rerank-worker.mjs` generated alongside the existing embed worker. Reuses `EmbeddingRuntimeManager` for the runtime download.
  • `simBatch(text, candidates, config, { useRerank: true })` sorts by fused desc, takes the top-K (default 50), reranks via `LocalRerankBackend`, per-batch min-max normalises, and applies `boosted = clamp01(fused + alpha · r_norm)` (default `alpha = 0.3`). Tail untouched. `computeOwnActivation` opts in for user + assistant channels; NOW stays on pure fused.
  • Activation formula unchanged (3 terms, no new coefficient). All paths fail-open: any worker error returns an empty rerank Map and `simBatch` falls back to pure fused.

Original prompt

implement /Users/sidd/.claude/plans/memory-v2-cross-encoder-rerank.md


Open in Devin Review

Adds an opt-in (`memory.v2.rerank.enabled: false` by default) cross-encoder
rerank step that runs locally via the existing embedding-runtime worker
infrastructure. When enabled, simBatch wraps the dense+sparse fused score
with `boosted = clamp01(fused + alpha · normalized_rerank)` for the top-K
candidates of the user and assistant similarity channels — NOW keeps pure
fused since structured context is outside the cross-encoder's training
distribution.

Default model `Xenova/bge-reranker-base` (278M, MIT, ONNX); long-term
target is `BAAI/bge-reranker-v2-m3` once a public ONNX export ships.
@siddseethepalli siddseethepalli merged commit b7a1ba4 into main May 5, 2026
@siddseethepalli siddseethepalli deleted the do/memory-v2-rerank-boost branch May 5, 2026 01:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a619c20b5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread assistant/src/memory/embedding-runtime-manager.ts
Comment thread assistant/src/memory/v2/reranker.ts
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread assistant/src/memory/embedding-runtime-manager.ts
Comment thread assistant/src/memory/embedding-runtime-manager.ts
siddseethepalli added a commit that referenced this pull request May 9, 2026
…p tokenizer flag (#30091)

Address review feedback on #29555:

- Reranker cache key now hashes model + dtype alongside query and
  candidate slugs. Previously, switching memory.v2.rerank.model or
  rerank.dtype could return stale scores from the prior model for up
  to the 2m TTL window, since cache hits bypass getOrCreateRerankBackend.
- Drop return_tensors: 'pt' from the generated rerank-worker.mjs
  tokenizer call. 'pt' is the Python transformers PyTorch flag; the JS
  port (@huggingface/transformers) returns its own Tensor type and
  ignores this option today, but a future strict validation could turn
  it into a silent fail-open via the reranker's catch-and-return-empty
  path.
- Bump RUNTIME_VERSION worker suffix to v3 so existing installs
  regenerate the rerank worker on next daemon start.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
@siddseethepalli
Copy link
Copy Markdown
Contributor Author

Addressed in follow-up PR #30091:

  • Reranker cache key now includes model + dtype (Codex P2)
  • Dropped no-op return_tensors: 'pt' from generated rerank-worker.mjs (Devin analysis)

The P1 RUNTIME_VERSION concern (rerank worker required by isReady() without version bump) was already addressed in #29631 via the _workers-v2 suffix; #30091 bumps to _workers-v3 to regenerate the worker again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant