Add recall and NDCG operations in msmarco-v2-vector#610
Merged
jimczi merged 27 commits intoelastic:masterfrom Jun 10, 2024
Merged
Add recall and NDCG operations in msmarco-v2-vector#610jimczi merged 27 commits intoelastic:masterfrom
jimczi merged 27 commits intoelastic:masterfrom
Conversation
This change adds an operation called knn-recall that computes the following metrics:
* Recall
* NDCG
* Avg number of nodes visited during search
Given the size of the corpus, the true top N values used for recall operations have been approximated offline for each query as follows:
```
{
"knn": {
"field": "emb",
"query_vector": query['emb'],
"k": 10000,
"num_candidates": 10000
},
"rescore": {
"window_size": 10000,
"query": {
"query_weight": 0,
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "double value = dotProduct(params.query_vector, 'emb'); return sigmoid(1, Math.E, -value);",
"params": {
"query_vector": vec
}
}
}
}
}
}
}
```
This means that the computed recall is measured against the system's best possible approximate neighbor run rather than the actual top N.
For the relevance metrics, the `qrels.tsv` file contains annotations for all the queries listed in `queries.json`. This file is generated from the original training data available at [ir_datasets/msmarco_passage_v2](https://ir-datasets.com/msmarco-passage-v2.html#msmarco-passage-v2/train).
1stvamp
reviewed
May 17, 2024
1stvamp
reviewed
May 17, 2024
1stvamp
reviewed
May 17, 2024
afoucret
reviewed
May 23, 2024
| "dynamic": false, | ||
| "_source": { | ||
| "enabled": false | ||
| "mode": "synthetic" |
Co-authored-by: Wes Mason <wes@1stvamp.org>
Co-authored-by: Wes Mason <wes@1stvamp.org>
Co-authored-by: Wes Mason <wes@1stvamp.org>
…s into jim/msmarco-v2-vector
afoucret
approved these changes
Jun 6, 2024
Contributor
afoucret
left a comment
There was a problem hiding this comment.
Few comment but nothing that would prevent you to merge the PR
| for query in dataset.queries_iter(): | ||
| emb = await retrieve_embed_for_query(co, query[1]) | ||
| resp = await es.search( | ||
| index="msmarco-v2", query=get_brute_force_query(emb), size=1000, _source=["_none_"], fields=["docid"] |
Member
|
@jimczi should this be backported to 8.15? I tried to backport #708 due to it adding about 50 mins to IT tests, but it seems that this PR was never backported to 8.15, so the changes are only in master (By default rally will choose the 8.15 branch when benchmarking against 8.X, where the version being tested is 8.15 or later - serverless always runs from master) |
Contributor
Author
|
Sorry for the delay here @gareth-ellis . |
gareth-ellis
pushed a commit
to gareth-ellis/rally-tracks
that referenced
this pull request
Dec 6, 2024
This change adds an operation called knn-recall that computes the following metrics:
* Recall
* NDCG
* Avg number of nodes visited during search
Given the size of the corpus, the true top N values used for recall operations have been approximated offline for each query as follows:
```
{
"knn": {
"field": "emb",
"query_vector": query['emb'],
"k": 10000,
"num_candidates": 10000
},
"rescore": {
"window_size": 10000,
"query": {
"query_weight": 0,
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "double value = dotProduct(params.query_vector, 'emb'); return sigmoid(1, Math.E, -value);",
"params": {
"query_vector": vec
}
}
}
}
}
}
}
```
This means that the computed recall is measured against the system's best possible approximate neighbor run rather than the actual top N.
For the relevance metrics, the `qrels.tsv` file contains annotations for all the queries listed in `queries.json`. This file is generated from the original training data available at [ir_datasets/msmarco_passage_v2](https://ir-datasets.com/msmarco-passage-v2.html#msmarco-passage-v2/train).
(cherry picked from commit b6f3535)
gareth-ellis
added a commit
that referenced
this pull request
Dec 16, 2024
* Add recall and NDCG operations in msmarco-v2-vector (#610) This change adds an operation called knn-recall that computes the following metrics: * Recall * NDCG * Avg number of nodes visited during search Given the size of the corpus, the true top N values used for recall operations have been approximated offline for each query as follows: ``` { "knn": { "field": "emb", "query_vector": query['emb'], "k": 10000, "num_candidates": 10000 }, "rescore": { "window_size": 10000, "query": { "query_weight": 0, "rescore_query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "double value = dotProduct(params.query_vector, 'emb'); return sigmoid(1, Math.E, -value);", "params": { "query_vector": vec } } } } } } } ``` This means that the computed recall is measured against the system's best possible approximate neighbor run rather than the actual top N. For the relevance metrics, the `qrels.tsv` file contains annotations for all the queries listed in `queries.json`. This file is generated from the original training data available at [ir_datasets/msmarco_passage_v2](https://ir-datasets.com/msmarco-passage-v2.html#msmarco-passage-v2/train). (cherry picked from commit b6f3535) * Exclude msmarco from IT tests (#708) --------- Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change adds an operation called knn-recall that computes the following metrics:
The new
queries-recall.jsonfile contains all the queries (76 in total) from the testing set along with their embeddings and the top 1000 ids computed with brute force over the entire corpus.For the relevance metrics, the
qrels.tsvfile contains annotations for all the queries listed inqueries-recall.json. This file is generated from the original training data available at ir_datasets/msmarco_passage_v2.