Conversation
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
|
@martin-gaievski Could you please review this PR for technical accuracy? |
|
Please add following example of scoring script query so out doc will have sample for all three ways of filtering: |
| - [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [Approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search, and then applies the filter to the results. Because it uses post-filtering, a query with a Boolean filter may return significantly less than `k` results for a restrictive filter. | ||
|
|
||
| The OpenSearch k-NN plugin version 2.2 introduced support for the Lucene engine in order to process k-NN searches. The Lucene engine provides a search that is based on the HNSW algorithm in order to represent a multi-layered graph. The OpenSearch k-NN plugin version 2.4 can incorporate filters for searches based on Lucene 9.4. | ||
| - [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach ensures that `k` results are returned because filtering is applied during the k-NN search as opposed to after the k-NN search, like in post-filtering. The drawback is that you can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by a Lucene search engine. |
There was a problem hiding this comment.
Please mention that lucene filtering is available from 2.4.
| ## Filtered search optimization | ||
|
|
||
| Lucene also provides the capability to operate its `KnnVectorQuery` across a subset of documents. To learn more about this capability, see the [Apache Lucene Documentation](https://issues.apache.org/jira/browse/LUCENE-10382). | ||
| Overall, Lucene k-NN filters are more efficient compared to other filtering methods, both in terms of performance and relevancy of search results. |
There was a problem hiding this comment.
This statement isn't correct for all use cases, and later we provide table where each of 3 filtering types paired with the dataset and filter combination.
| Overall, Lucene k-NN filters are more efficient compared to other filtering methods, both in terms of performance and relevancy of search results. | ||
|
|
||
| To learn more about all available k-NN search approaches, including approximate k-NN, exact k-NN with script score, and pre-filtering with painless extensions, see [k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/). | ||
| Depending on your dataset and use case, you might be more interested in maximizing recall or minimizing latency. The following table provides guidance on various k-NN search configurations and the filtering methods to use in order to optimize for better recall or lower latency. The first three columns of the table provide several example k-NN search configurations. A search configuration consists of: |
There was a problem hiding this comment.
I think we need to state "higher recall" if it follows by "lower latency".
|
|
||
| Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency | ||
| -- | -- | -- | -- | -- | ||
| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for best recall | Filtering method to use for lowest latency |
There was a problem hiding this comment.
same as above, not sure we need to say "best recall", probably "higher recall" is more accurate
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
|
@martin-gaievski I have added the scoring script example (changed the query_value to use a double array because integers raised an exception) and addressed the comments. Please re-review when you get a chance and approve the PR when everything looks good. Thanks! |
martin-gaievski
left a comment
There was a problem hiding this comment.
Looks good to me, thank you
|
|
||
| Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency | ||
| -- | -- | -- | -- | -- | ||
| Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency |
There was a problem hiding this comment.
Jeff has got everyone going on this program: add pipes before and after all table elements (in case we ever switch to a host that can only interpret a markdown table with this format. Not sure if you think that's worthwhile.
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
natebower
left a comment
There was a problem hiding this comment.
@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!
| #### Example request | ||
|
|
||
| The following request returns hotels that provide parking. This request illustrates multiple alternative mechanisms to obtain the parking filter criteria. It uses a regular expression for the value `true`, a term query for the key-value pair `"parking":"true"`, a wildcard for the characters that spell "true", and the `must_not` clause to eliminate hotels with "parking" set to `false`: | ||
| The following request illustrates alternative mechanisms to search for hotels with parking using a `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`: |
There was a problem hiding this comment.
| The following request illustrates alternative mechanisms to search for hotels with parking using a `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`: | |
| The following request illustrates how to search for hotels with parking using `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`: |
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com> (cherry picked from commit ade705e) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com> (cherry picked from commit ade705e) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com> (cherry picked from commit ade705e) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Refactor k-NN filter search * Implemented tech review comments * Implemented doc review feedback * Apply suggestions from code review * One more editorial review comment --------- (cherry picked from commit ade705e) Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
* Refactor k-NN filter search * Implemented tech review comments * Implemented doc review feedback * Apply suggestions from code review * One more editorial review comment --------- (cherry picked from commit ade705e) Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
* Refactor k-NN filter search * Implemented tech review comments * Implemented doc review feedback * Apply suggestions from code review * One more editorial review comment --------- (cherry picked from commit ade705e) Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
* Refactor k-NN filter search Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review feedback Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> * One more editorial review comment Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
Fixes #3221
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.