Skip to content

PoC - Vector rescoring in kNN#116350

Closed
carlosdelest wants to merge 2 commits intoelastic:mainfrom
carlosdelest:feature/knn-vector-rescore
Closed

PoC - Vector rescoring in kNN#116350
carlosdelest wants to merge 2 commits intoelastic:mainfrom
carlosdelest:feature/knn-vector-rescore

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Nov 6, 2024

PoC that adds a new parameter to kNN query (rescore_vector_oversample) that is used to:

  • Multiply the number of candidates in the kNN query
  • Perform an approximate search in the extended candidate set
  • Perform an exact search over the returned results, and get the top k

This is done by overriding the approximateSearch() method in ESKnnFloatVectorQuery. It could be pushed down to the Lucene query if needed.

Usage:

GET msmarco-v2-bbq/_search
{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector_oversample": 10.0
        }
    }
}

@carlosdelest carlosdelest changed the title PoC - Reranking using kNN PoC - Vector rescoring in kNN Nov 6, 2024
if (knnCollectorManager instanceof TimeLimitingKnnCollectorManager timeLimitingKnnCollectorManager) {
queryTimeout = timeLimitingKnnCollectorManager.getQueryTimeout();
}
return exactSearch(context, bitSetIterator, queryTimeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work as exactSearch would use the quantized scorers, also this would be done per segment. Which would be bad.

I think what we need to do is override rewrite to return another query that can be further rewritten but scores the previously scored documents given the raw floating point vectors.

We only want to rescore for the entire shard. Doing each segment would be very expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining Ben. I'll give it another shot 🔫

public static final ParseField NUM_CANDS_FIELD = new ParseField("num_candidates");
public static final ParseField QUERY_VECTOR_FIELD = new ParseField("query_vector");
public static final ParseField VECTOR_SIMILARITY_FIELD = new ParseField("similarity");
public static final ParseField RESCORE_VECTOR_OVERSAMPLE = new ParseField("rescore_vector_oversample");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think an object with a new parameter is best. We will likely have separate options to provide (rescore field, rescore kind, etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this was just me not wanting to deal with the parser yet 😁

@carlosdelest
Copy link
Member Author

Closing this in favour of #116663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments