Community: Adding bulk_size as a setable param for OpenSearchVectorSearch #28325

manukychen · 2024-11-24T07:18:37Z

Description:
When using langchain.retrievers.parent_document_retriever.py with vectorstore is OpenSearchVectorSearch, I found that the bulk_size param I passed into OpenSearchVectorSearch class did not work on my ParentDocumentRetriever.add_documents() function correctly, it will be overwrite with int 500 the function which OpenSearchVectorSearch class had (e.g., add_texts(), add_embeddings()...).

So I made this PR requset to fix this, thanks!

…correctly

vercel · 2024-11-24T07:18:41Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Dec 12, 2024 1:43am

efriis · 2024-12-09T21:16:41Z

libs/community/langchain_community/vectorstores/opensearch_vector_search.py

@@ -618,7 +615,6 @@ def add_embeddings(
        text_embeddings: Iterable[Tuple[str, List[float]]],
        metadatas: Optional[List[dict]] = None,
        ids: Optional[List[str]] = None,
-        bulk_size: int = 500,


breaking change - can we keep this, and use the passed-in as an override? can still default to the self.bulk_size behavior if it's None

efriis

more breaking changes to the interface

could you revert these to something like bulk_size: Optional[int] = None, and replace usage of bulk size with bulk_size if bulk_size is not None else self.bulk_size?

efriis · 2024-12-09T21:16:57Z

libs/community/langchain_community/vectorstores/opensearch_vector_search.py

@@ -596,7 +594,6 @@ async def aadd_texts(
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        ids: Optional[List[str]] = None,
-        bulk_size: int = 500,


efriis · 2024-12-09T21:17:04Z

libs/community/langchain_community/vectorstores/opensearch_vector_search.py

@@ -1085,7 +1081,6 @@ def from_texts(
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
-        bulk_size: int = 500,


efriis · 2024-12-09T21:17:09Z

libs/community/langchain_community/vectorstores/opensearch_vector_search.py

@@ -1150,7 +1145,6 @@ async def afrom_texts(
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
-        bulk_size: int = 500,


manukychen · 2024-12-10T15:07:43Z

Hi @efriis , thanks for reviewing my request, really appreciate it. :)
I've followed your suggestions and modified the code, unless I misinterpreted anything.
If you have any new questions or suggestions, please let me know. Thanks again!

let OpenSearchVectorSearch class function can accept bulk_size param …

51843d4

…correctly

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. community Related to langchain-community Ɑ: vector store Related to vector store module labels Nov 24, 2024

fix classmethod self.bulk_size bug

9fbccec

efriis reviewed Dec 9, 2024

View reviewed changes

efriis self-assigned this Dec 9, 2024

let usage of bulk size can be passed-in as an override

11383f4

efriis added 2 commits December 11, 2024 17:39

Merge branch 'master' into opensearch_bulk_size

6c3bd4a

x

0ee5af5

efriis approved these changes Dec 12, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 12, 2024

efriis enabled auto-merge (squash) December 12, 2024 01:43

efriis merged commit ba9b95c into langchain-ai:master Dec 12, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Community: Adding bulk_size as a setable param for OpenSearchVectorSearch #28325

Community: Adding bulk_size as a setable param for OpenSearchVectorSearch #28325

manukychen commented Nov 24, 2024

vercel bot commented Nov 24, 2024 •

edited

Loading

efriis Dec 9, 2024

efriis left a comment

efriis Dec 9, 2024

efriis Dec 9, 2024

efriis Dec 9, 2024

manukychen commented Dec 10, 2024

Community: Adding bulk_size as a setable param for OpenSearchVectorSearch #28325

Community: Adding bulk_size as a setable param for OpenSearchVectorSearch #28325

Conversation

manukychen commented Nov 24, 2024

vercel bot commented Nov 24, 2024 • edited Loading

efriis Dec 9, 2024

Choose a reason for hiding this comment

efriis left a comment

Choose a reason for hiding this comment

efriis Dec 9, 2024

Choose a reason for hiding this comment

efriis Dec 9, 2024

Choose a reason for hiding this comment

efriis Dec 9, 2024

Choose a reason for hiding this comment

manukychen commented Dec 10, 2024

vercel bot commented Nov 24, 2024 •

edited

Loading