feat(memory-v2): cross-encoder rerank as additive boost by siddseethepalli · Pull Request #29555 · vellum-ai/vellum-assistant

siddseethepalli · 2026-05-05T01:07:10Z

Summary

Adds an opt-in (`memory.v2.rerank.enabled: false` by default) cross-encoder rerank that runs locally via a new `rerank-worker.mjs` generated alongside the existing embed worker. Reuses `EmbeddingRuntimeManager` for the runtime download.
`simBatch(text, candidates, config, { useRerank: true })` sorts by fused desc, takes the top-K (default 50), reranks via `LocalRerankBackend`, per-batch min-max normalises, and applies `boosted = clamp01(fused + alpha · r_norm)` (default `alpha = 0.3`). Tail untouched. `computeOwnActivation` opts in for user + assistant channels; NOW stays on pure fused.
Activation formula unchanged (3 terms, no new coefficient). All paths fail-open: any worker error returns an empty rerank Map and `simBatch` falls back to pure fused.

Original prompt

implement /Users/sidd/.claude/plans/memory-v2-cross-encoder-rerank.md

Adds an opt-in (`memory.v2.rerank.enabled: false` by default) cross-encoder rerank step that runs locally via the existing embedding-runtime worker infrastructure. When enabled, simBatch wraps the dense+sparse fused score with `boosted = clamp01(fused + alpha · normalized_rerank)` for the top-K candidates of the user and assistant similarity channels — NOW keeps pure fused since structured context is outside the cross-encoder's training distribution. Default model `Xenova/bge-reranker-base` (278M, MIT, ONNX); long-term target is `BAAI/bge-reranker-v2-m3` once a public ONNX export ships.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a619c20b5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

devin-ai-integration

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

…p tokenizer flag (#30091) Address review feedback on #29555: - Reranker cache key now hashes model + dtype alongside query and candidate slugs. Previously, switching memory.v2.rerank.model or rerank.dtype could return stale scores from the prior model for up to the 2m TTL window, since cache hits bypass getOrCreateRerankBackend. - Drop return_tensors: 'pt' from the generated rerank-worker.mjs tokenizer call. 'pt' is the Python transformers PyTorch flag; the JS port (@huggingface/transformers) returns its own Tensor type and ignores this option today, but a future strict validation could turn it into a silent fail-open via the reranker's catch-and-return-empty path. - Bump RUNTIME_VERSION worker suffix to v3 so existing installs regenerate the rerank worker on next daemon start. Co-authored-by: Vellum Assistant <assistant@vellum.ai>

siddseethepalli · 2026-05-09T04:31:36Z

Addressed in follow-up PR #30091:

Reranker cache key now includes model + dtype (Codex P2)
Dropped no-op return_tensors: 'pt' from generated rerank-worker.mjs (Devin analysis)

The P1 RUNTIME_VERSION concern (rerank worker required by isReady() without version bump) was already addressed in #29631 via the _workers-v2 suffix; #30091 bumps to _workers-v3 to regenerate the worker again.

siddseethepalli merged commit b7a1ba4 into main May 5, 2026

siddseethepalli deleted the do/memory-v2-rerank-boost branch May 5, 2026 01:07

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread assistant/src/memory/embedding-runtime-manager.ts

Comment thread assistant/src/memory/v2/reranker.ts

devin-ai-integration Bot reviewed May 5, 2026

View reviewed changes

Comment thread assistant/src/memory/embedding-runtime-manager.ts

Comment thread assistant/src/memory/embedding-runtime-manager.ts

siddseethepalli mentioned this pull request May 9, 2026

fix(memory-v2): include rerank model/dtype in cache key #30091

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory-v2): cross-encoder rerank as additive boost#29555

feat(memory-v2): cross-encoder rerank as additive boost#29555
siddseethepalli merged 1 commit into
mainfrom
do/memory-v2-rerank-boost

siddseethepalli commented May 5, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Uh oh!

siddseethepalli commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siddseethepalli commented May 5, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Original prompt

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

siddseethepalli commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

siddseethepalli commented May 5, 2026 •

edited by devin-ai-integration Bot

Loading