Skip to content

Commit

Permalink
community[minor]: Added filter search for LanceDB (#22461)
Browse files Browse the repository at this point in the history
- [ ] **community**: "vectorstore: added filtering support for LanceDB
vector store"

- [ ] **This PR adds filtering capabilities to LanceDB**:
- **Description:** In LanceDB filtering can be applied when searching
for data into the vectorstore. It is using the SQL language as mentioned
in the LanceDB documentation.
    - **Issue:** #18235 
    - **Dependencies:** No

- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
  • Loading branch information
brnaba-aws authored and hinthornw committed Jun 20, 2024
1 parent a3ba5c0 commit dd8fdfa
Showing 1 changed file with 29 additions and 1 deletion.
30 changes: 29 additions & 1 deletion libs/community/langchain_community/vectorstores/lancedb.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def add_texts(
Args:
texts: Iterable of strings to add to the vectorstore.
metadatas: Optional list of metadatas associated with the texts.
ids: Optional list of ids to associate w ith the texts.
ids: Optional list of ids to associate with the texts.
Returns:
List of ids of the added texts.
Expand Down Expand Up @@ -218,14 +218,42 @@ def similarity_search(
Args:
query: String to query the vectorstore with.
k: Number of documents to return.
filter (Optional[Dict]): Optional filter arguments
sql_filter(Optional[string]): SQL filter to apply to the query.
prefilter(Optional[bool]): Whether to apply the filter prior
to the vector search.
Raises:
ValueError: If the specified table is not found in the database.
Returns:
List of documents most similar to the query.
Examples:
.. code-block:: python
# Retrieve documents with filtering based on a metadata file_type
vector_store.as_retriever(search_kwargs={"k": 4, "filter":{
'sql_filter':"file_type='notice'",
'prefilter': True
}
})
# Retrieve documents with filtering on a specific file name
vector_store.as_retriever(search_kwargs={"k": 4, "filter":{
'sql_filter':"source='my-file.txt'",
'prefilter': True
}
})
"""
embedding = self._embedding.embed_query(query) # type: ignore
tbl = self.get_table(name)
filters = kwargs.pop("filter", {})
sql_filter = filters.pop("sql_filter", None)
prefilter = filters.pop("prefilter", False)
docs = (
tbl.search(embedding, vector_column_name=self._vector_key)
.where(sql_filter, prefilter=prefilter)
.limit(k)
.to_arrow()
)
Expand Down

0 comments on commit dd8fdfa

Please sign in to comment.