Skip to content

feat: add search encoder backend#3492

Merged
Samoed merged 18 commits intomainfrom
search_barckend
Nov 28, 2025
Merged

feat: add search encoder backend#3492
Samoed merged 18 commits intomainfrom
search_barckend

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Oct 25, 2025

Close #3406

I've implemented SearchEncoder protocol and now can be selected faiss or direct search (as previous).

class IndexEncoderSearchProtocol(Protocol):
"""Protocol for search backends used in encoder-based retrieval."""
def add_document(
self,
embeddings: Array,
idxs: list[str],
) -> None:
"""Add documents to the search backend.
Args:
embeddings: Embeddings of the documents to add.
idxs: IDs of the documents to add.
"""
def search(
self,
embeddings: Array,
top_k: int,
similarity_fn: Callable[[Array, Array], Array],
top_ranked: TopRankedDocumentsType | None = None,
query_idx_to_id: dict[int, str] | None = None,
) -> tuple[list[list[float]], list[list[int]]]:

I've saved "batched" approach for retrieval to store less memory during evaluation. Backend can be changed by

import mteb
from mteb.models.search_encoder_index import (
    DefaultEncoderSearchBackend,
    FaissEncoderSearchBackend,
)
from mteb.models import SearchEncoderWrapper


model = mteb.get_model("baseline/random-encoder-baseline")

python_backend = SearchEncoderWrapper(
    model, index_backend=DefaultEncoderSearchBackend()
)
faiss_backend = SearchEncoderWrapper(
    model, index_backend=FaissEncoderSearchBackend(model)
)

I've tested on Scifact using potion-2M and got 2s evaluation for default search and 3s for FAISS.

Script to test
import mteb
from mteb.cache import ResultCache
from mteb.models import SearchEncoderWrapper
from mteb.models.search_encoder_index import (
    DefaultEncoderSearchBackend,
    FaissEncoderSearchBackend,
)

model = mteb.get_model("minishlab/potion-base-2M")

python_backend = SearchEncoderWrapper(
    model, index_backend=DefaultEncoderSearchBackend()
)
faiss_backend = SearchEncoderWrapper(
    model, index_backend=FaissEncoderSearchBackend(model)
)

task = mteb.get_task("SciFact")

python_cache = ResultCache("python_backend_cache")
faiss_cache = ResultCache("faiss_backend_cache")

# warmup
mteb.evaluate(
    model,
    task,
    cache=None,
)

mteb.evaluate(
    python_backend,
    task,
    cache=python_cache,
)

mteb.evaluate(
    faiss_backend,
    task,
    cache=faiss_cache,
)

@Samoed Samoed changed the title add search backend feat: add search encoder backend Oct 25, 2025
Copy link
Contributor

@orionw orionw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall!

I think it doesn't take advantage of faiss's built in scoring functionality but by not doing that we can have more control (so that can be ignored). If we wanted to use faiss we could do something like

# Batch reconstruct candidate embeddings
candidate_embs = np.vstack([
    self.index.reconstruct(idx) for idx in candidate_indices
])

# Create temporary index to let FAISS handle scoring
temp_index = self.index_type(d)
temp_index.add(candidate_embs)

# Search returns scores and indices in one call
scores, local_indices = temp_index.search(
    query_emb.reshape(1, -1).astype(np.float32),
    min(top_k, len(candidate_indices))
)

But I think it just does dot product. So it looks great as is, but just mentioning this in case that's helpful.

@Samoed
Copy link
Member Author

Samoed commented Oct 27, 2025

Yes, I think that's better. I've added support of cosine and dot product similarity support and scores are nearly the same (same for 1e-6).

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I would probably restructure it a bit.

I would probably seperate out the implementations from the protocol.

We also need to add documentation on these backends as well as some discussion on the trade-offs between them.

@Samoed
Copy link
Member Author

Samoed commented Oct 27, 2025

We also need to add documentation

Yes, wanted to add after your check on pr

@Samoed
Copy link
Member Author

Samoed commented Oct 28, 2025

I've run this script and both evaluation method took same time, so I'm unsure a bit what to add in advantages of FAISS, except of dumping index, but we're clearing it after evaluation.

task Stream FAISS
SWEbenchVerifiedRR 536 541
ClimateFEVERHardNegatives 9 12
import logging

import mteb
from mteb.cache import ResultCache
from mteb.models import SearchEncoderWrapper
from mteb.models.search_encoder_index import StreamingSearchIndex, FaissSearchIndex

logging.basicConfig(level=logging.INFO)

model = SearchEncoderWrapper(mteb.get_model("minishlab/potion-base-2M"))
tasks = mteb.get_tasks(
    tasks=[
        "ClimateFEVERHardNegatives",
        "SWEbenchVerifiedRR",
    ],
)

cache = ResultCache("stream")

mteb.evaluate(
    model,
    tasks,
    cache=cache,
)

### FAISS
index_backend = FaissSearchIndex(model)
model = SearchEncoderWrapper(
    mteb.get_model("minishlab/potion-base-2M"),
    index_backend=index_backend
)
cache = ResultCache("FAISS")

mteb.evaluate(
    model,
    tasks,
    cache=cache,
)

@orionw
Copy link
Contributor

orionw commented Oct 28, 2025

I think faiss is not ideal for smaller reranking cases (~100-1000 docs to search for). We should see dramatic gains for retrieval though, with a large enough corpus. For ClimateFEVERHardNegatives it could just be initialization differences. Maybe if you try MS MARCO for retrieval?

I asked Claude what it thinks we should do for reranking and it suggested we retrieve the vectors from faiss for reranking but just use standard numpy afterwards. We could do this, but if it's roughly the same to use faiss then we might as well keep what we have for that.

If large scale retrieval is much faster I think that's the main benefit

@Samoed
Copy link
Member Author

Samoed commented Oct 29, 2025

I tried running it on MSMARCO, and both backends showed similar times on sub-batches. If we remove the search over each sub-corpus batch, FAISS would probably show a speedup, but I’m not sure how to do that while still supporting the "streaming" backend.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can improve the docs a bit, but codewise I think we are there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also add documentation on the search backend?

Maybe we should also add something here so people can discover what has been added:

Screenshot 2025-11-25 at 16 23 28

A kind of user-friendly changelog

Copy link
Member Author

@Samoed Samoed Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be shown in advanced usage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but people will not know what has happened since 2.0.0

I would probably change New in v2.0 to

- What is new
  - v2.3
  - v2.2
  - v2.1
  - v2.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more about changelog #3401

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair we still need the API docs though

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still missing the API docs

make test

build-docs:
build-docs: build-docs-overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oO does this work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything after : will be triggered before running a function

targets: prerequisites
	command

@KennethEnevoldsen
Copy link
Contributor

I think this is good to merge

@Samoed Samoed merged commit 4ed7ef4 into main Nov 28, 2025
10 checks passed
@Samoed Samoed deleted the search_barckend branch November 28, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add similarity search backend to Retrieval tasks

3 participants