Skip to content

Refactor k-NN filter search#3613

Merged
kolchfa-aws merged 5 commits intomainfrom
knn-fix
Apr 5, 2023
Merged

Refactor k-NN filter search#3613
kolchfa-aws merged 5 commits intomainfrom
knn-fix

Conversation

@kolchfa-aws
Copy link
Collaborator

@kolchfa-aws kolchfa-aws commented Mar 29, 2023

Fixes #3221

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws requested a review from a team as a code owner March 29, 2023 17:27
@kolchfa-aws kolchfa-aws self-assigned this Mar 29, 2023
@kolchfa-aws
Copy link
Collaborator Author

@martin-gaievski Could you please review this PR for technical accuracy?

@martin-gaievski
Copy link
Member

Please add following example of scoring script query so out doc will have sample for all three ways of filtering:

{
    "size": 3,
    "query": {
        "script_score": {
            "query": {
                "bool": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "range": {
                                        "rating": {
                                            "gte": 8,
                                            "lte": 10
                                        }
                                    }
                                },
                                {
                                    "term": {
                                        "parking": "true"
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            "script": {
                "source": "knn_score",
                "lang": "knn",
                "params": {
                    "field": "location",
                    "query_value": [
                        5.0,
                        4.0
                    ],
                    "space_type": "l2"
                }
            }
        }
    }
}

- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [Approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search, and then applies the filter to the results. Because it uses post-filtering, a query with a Boolean filter may return significantly less than `k` results for a restrictive filter.

The OpenSearch k-NN plugin version 2.2 introduced support for the Lucene engine in order to process k-NN searches. The Lucene engine provides a search that is based on the HNSW algorithm in order to represent a multi-layered graph. The OpenSearch k-NN plugin version 2.4 can incorporate filters for searches based on Lucene 9.4.
- [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach ensures that `k` results are returned because filtering is applied during the k-NN search as opposed to after the k-NN search, like in post-filtering. The drawback is that you can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by a Lucene search engine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention that lucene filtering is available from 2.4.

## Filtered search optimization

Lucene also provides the capability to operate its `KnnVectorQuery` across a subset of documents. To learn more about this capability, see the [Apache Lucene Documentation](https://issues.apache.org/jira/browse/LUCENE-10382).
Overall, Lucene k-NN filters are more efficient compared to other filtering methods, both in terms of performance and relevancy of search results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement isn't correct for all use cases, and later we provide table where each of 3 filtering types paired with the dataset and filter combination.

Overall, Lucene k-NN filters are more efficient compared to other filtering methods, both in terms of performance and relevancy of search results.

To learn more about all available k-NN search approaches, including approximate k-NN, exact k-NN with script score, and pre-filtering with painless extensions, see [k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/index/).
Depending on your dataset and use case, you might be more interested in maximizing recall or minimizing latency. The following table provides guidance on various k-NN search configurations and the filtering methods to use in order to optimize for better recall or lower latency. The first three columns of the table provide several example k-NN search configurations. A search configuration consists of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to state "higher recall" if it follows by "lower latency".


Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency
-- | -- | -- | -- | --
Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for best recall | Filtering method to use for lowest latency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, not sure we need to say "best recall", probably "higher recall" is more accurate

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws
Copy link
Collaborator Author

@martin-gaievski I have added the scoring script example (changed the query_value to use a double array because integers raised an exception) and addressed the comments. Please re-review when you get a chance and approve the PR when everything looks good. Thanks!

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.


Number of Vectors | Filter Restrictive Percentage | k | Recall | Latency
-- | -- | -- | -- | --
Number of documents in an index | Percentage of documents the filter returns | k | Filtering method to use for higher recall | Filtering method to use for lower latency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff has got everyone going on this program: add pipes before and after all table elements (in case we ever switch to a host that can only interpret a markdown table with this format. Not sure if you think that's worthwhile.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Contributor

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

#### Example request

The following request returns hotels that provide parking. This request illustrates multiple alternative mechanisms to obtain the parking filter criteria. It uses a regular expression for the value `true`, a term query for the key-value pair `"parking":"true"`, a wildcard for the characters that spell "true", and the `must_not` clause to eliminate hotels with "parking" set to `false`:
The following request illustrates alternative mechanisms to search for hotels with parking using a `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following request illustrates alternative mechanisms to search for hotels with parking using a `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`:
The following request illustrates how to search for hotels with parking using `term`, `wildcard`, and `regexp` query clauses in the `should` clause. Additionally, it uses the `must_not` clause to eliminate hotels with `parking` set to `false`:

kolchfa-aws and others added 2 commits April 5, 2023 10:16
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws merged commit ade705e into main Apr 5, 2023
@kolchfa-aws kolchfa-aws added backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6 labels Apr 5, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit ade705e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit ade705e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit ade705e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
kolchfa-aws pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search



* Implemented tech review comments



* Implemented doc review feedback



* Apply suggestions from code review



* One more editorial review comment



---------



(cherry picked from commit ade705e)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
kolchfa-aws pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search



* Implemented tech review comments



* Implemented doc review feedback



* Apply suggestions from code review



* One more editorial review comment



---------



(cherry picked from commit ade705e)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
kolchfa-aws pushed a commit that referenced this pull request Apr 5, 2023
* Refactor k-NN filter search



* Implemented tech review comments



* Implemented doc review feedback



* Apply suggestions from code review



* One more editorial review comment



---------



(cherry picked from commit ade705e)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
vagimeli pushed a commit that referenced this pull request Apr 25, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
vagimeli added a commit that referenced this pull request Apr 25, 2023
vagimeli pushed a commit that referenced this pull request May 4, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Refactor k-NN filter search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented doc review feedback

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>

* One more editorial review comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
@Naarcha-AWS Naarcha-AWS deleted the knn-fix branch March 28, 2024 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5 backport 2.6 PR: Backport label for 2.6

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] Feedback on Seach with k-NN filters page

5 participants