-
Notifications
You must be signed in to change notification settings - Fork 15.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community[minor]: Opensearch hybridsearch implementation #25375
community[minor]: Opensearch hybridsearch implementation #25375
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
libs/community/langchain_community/vectorstores/opensearch_vector_search.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/vectorstores/opensearch_vector_search.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/vectorstores/opensearch_vector_search.py
Outdated
Show resolved
Hide resolved
Hi @karthikbharadhwajKB, thank you for the contribution! We need the following changes to get this code merged:
|
…ub.com/karthikbharadhwajKB/langchain into opensearch_hybridsearch_implementation
Hey @eyurtsev,
Here you can see working Hybrid Search feature and some experiments (where I tried to reproduce the exact keyword search results and approximate_search with tweaking keyword_weight & vector_weight). |
…ub.com/karthikbharadhwajKB/langchain into opensearch_hybridsearch_implementation
Hey buddy @eyurtsev
Please have a look. if everything seems fine then merge it! 🚀 |
…ctions: Updated docstring
… performing Hybrid Search) and raising helpful msg to user
…ub.com/karthikbharadhwajKB/langchain into opensearch_hybridsearch_implementation
Hey @eyurtsev I have made the changes that you have requested. please check and let me know. |
libs/community/langchain_community/vectorstores/opensearch_vector_search.py
Show resolved
Hide resolved
libs/community/langchain_community/vectorstores/opensearch_vector_search.py
Show resolved
Hide resolved
Hey @eyurtsev I have made the changes that you requested to do. Please merge if everything looks fine 🚀 |
community: add hybrid search in opensearch
Langchain OpenSearch Hybrid Search Implementation
Implementation of Hybrid Search:
I have taken LangChain's OpenSearch integration to the next level by adding hybrid search capabilities. Building on the existing OpenSearchVectorSearch class, I have implemented Hybrid Search functionality (which combines the best of both keyword and semantic search). This new functionality allows users to harness the power of OpenSearch's advanced hybrid search features without leaving the familiar LangChain ecosystem. By blending traditional text matching with vector-based similarity, the enhanced class delivers more accurate and contextually relevant results. It's designed to seamlessly fit into existing LangChain workflows, making it easy for developers to upgrade their search capabilities.
In implementing the hybrid search for OpenSearch within the LangChain framework, I also incorporated filtering capabilities. It's important to note that according to the OpenSearch hybrid search documentation, only post-filtering is supported for hybrid queries. This means that the filtering is applied after the hybrid search results are obtained, rather than during the initial search process.
Note: For the implementation of hybrid search, I strictly followed the official OpenSearch Hybrid search documentation and I took inspiration from https://github.com/AndreasThinks/langchain/tree/feature/opensearch_hybrid_search
Thanks Mate!
Experiments
I conducted few experiments to verify that the hybrid search implementation is accurate and capable of reproducing the results of both plain keyword search and vector search.
Experiment - 1
Hybrid Search
Keyword_weight: 1, vector_weight: 0
I conducted an experiment to verify the accuracy of my hybrid search implementation by comparing it to a plain keyword search. For this test, I set the keyword_weight to 1 and the vector_weight to 0 in the hybrid search, effectively giving full weightage to the keyword component. The results from this hybrid search configuration matched those of a plain keyword search, confirming that my implementation can accurately reproduce keyword-only search results when needed. It's important to note that while the results were the same, the scores differed between the two methods. This difference is expected because the plain keyword search in OpenSearch uses the BM25 algorithm for scoring, whereas the hybrid search still performs both keyword and vector searches before normalizing the scores, even when the vector component is given zero weight. This experiment validates that my hybrid search solution correctly handles the keyword search component and properly applies the weighting system, demonstrating its accuracy and flexibility in emulating different search scenarios.
Experiment - 2
Hybrid Search
keyword_weight = 0.0, vector_weight = 1.0
For experiment-2, I took the inverse approach to further validate my hybrid search implementation. I set the keyword_weight to 0 and the vector_weight to 1, effectively giving full weightage to the vector search component (KNN search). I then compared these results with a pure vector search. The outcome was consistent with my expectations: the results from the hybrid search with these settings exactly matched those from a standalone vector search. This confirms that my implementation accurately reproduces vector search results when configured to do so. As with the first experiment, I observed that while the results were identical, the scores differed between the two methods. This difference in scoring is expected and can be attributed to the normalization process in hybrid search, which still considers both components even when one is given zero weight. This experiment further validates the accuracy and flexibility of my hybrid search solution, demonstrating its ability to effectively emulate pure vector search when needed while maintaining the underlying hybrid search structure.
Experiment - 3
Hybrid Search - balanced
keyword_weight = 0.5, vector_weight = 0.5
For experiment-3, I adopted a balanced approach to further evaluate the effectiveness of my hybrid search implementation. In this test, I set both the keyword_weight and vector_weight to 0.5, giving equal importance to keyword-based and vector-based search components. This configuration aims to leverage the strengths of both search methods simultaneously. By setting both weights to 0.5, I intended to create a scenario where the hybrid search would consider lexical matches and semantic similarity equally. This balanced approach is often ideal for many real-world applications, as it can capture both exact keyword matches and contextually relevant results that might not contain the exact search terms.
Kindly verify the notebook for the experiments conducted!
Notebook: https://github.com/karthikbharadhwajKB/Langchain_OpenSearch_Hybrid_search/blob/main/Opensearch_Hybridsearch.ipynb
Instructions to follow for Performing Hybrid Search:
Step-1: Instantiating OpenSearchVectorSearch Class:
Parameters:
Step-2: Configure Search Pipeline:
To initiate hybrid search functionality, you need to configures a search pipeline first.
Implementation Details:
This method configures a search pipeline in OpenSearch that:
Parameters:
Step-3: Performing Hybrid Search:
After creating the search pipeline, you can perform a hybrid search using the
similarity_search()
method (or) any methods that are supported bylangchain
. This method combines bothkeyword-based and semantic similarity
searches on your OpenSearch index, leveraging the strengths of both traditional information retrieval and vector embedding techniques.parameters:
hybrid_search
to use both keyword and vector search capabilities.twitter handle: @iamkarthik98