Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

Open
pashva opened this issue Feb 10, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@pashva
Copy link

pashva commented Feb 10, 2024

I was using langchain weaviate modules as my library to manage my weaviate storage. But the main problem was that I wanted to use weaviate's local text2vec transformers but in langchain there was no way to pass this argument to make sure that particular documents are embedded with particular vectorizers.

Weaviate allows users to mention a key value pair of vectorizer while creating a class so that users can leverage local vectorization or basically vectorization of their choice for each class.

Currently this is not implemented in langchain and only a default type schema gets created with a singular data property when using the from_documents or from_texts function calls.

Solution:
Allow an optional user defined vectorizer field

I have implemented this, should I create a PR?
langchain-ai/langchain#16795
It was closed here and was asked to check this repository out

@pashva pashva added the enhancement New feature or request label Feb 10, 2024
@hsm207
Copy link
Collaborator

hsm207 commented Feb 10, 2024

hey @pashva,

Thanks again for your interest to contribute! I would like to learn more about your use case but first, to answer your question about whether we should port over your PR:

Looks like we already have an example of what happens when we want to allow customisation to the default schema that langchain creates, as discussed in #94. I think a solution where users create their desired schema themselves, and then tell langchain the schema name, is much cleaner than extending the init method with more params.

What do you think?

As for your use case, I understand that you want to have a local embeddings model so weaviate's text2vec transformers module is a great choice. However, since you're using langchain, why not use their HuggingFaceEmbeddings class with sentence_transformers?

@StreetLamb
Copy link

Hi @hsm207, I agree that defining the schema with the Weaviate client and integrating it with Langchain is a better approach. For my use case, I plan to use Weaviate as a retrieval tool for my agents, which is why I prefer the langchain-weaviate over Weaviate stand-alone. Additionally, I want to offer my users the flexibility to choose their vectoriser, such as using Langchain's OpenAIEmbeddings() or a local embedding model. Currently, this level of customisation is not supported in langchain-weaviate.

@hsm207
Copy link
Collaborator

hsm207 commented May 8, 2024

@StreetLamb thanks for clarifying your use case.

I plan to use Weaviate as a retrieval tool for my agents

I'm not very familiar with other parts of langchain. Do you mean you're going to create a custom tool so that a langchain agent can use weaviate to do retrieval?

@StreetLamb
Copy link

StreetLamb commented May 9, 2024

@hsm207 Yes but it should already be possible to create a Weaviate retriever and use langchain API to create the retriever tool instead of creating a custom tool. Just to share, I tried langchain-chroma in the past, and the ability to customise the collection using their client and use langchain-chroma to reference the collection via name was helpful.

@hsm207
Copy link
Collaborator

hsm207 commented May 9, 2024

@StreetLamb
I'm looping in @efriis for input on what changes are needed in langchain-weaviate in order to have the langchain agent + chroma feature you described.

@efriis
Copy link
Member

efriis commented May 10, 2024

Howdy! You should be able to define an embedding model (which I think is what you're calling a vectoriser), and make a weaviate retriever tool with

weaviate_vectorstore = WeaviateVectorStore(embedding=OpenAIEmbeddings())
create_retriever_tool(weaviate_vectorstore.as_retriever(), ...)

If that's not the case, feel free to reopen, as that would probably be a bug.

@efriis efriis closed this as completed May 10, 2024
@StreetLamb
Copy link

Hi @efriis, sorry there might have been some confusion. The challenge I am facing is that I cannot specify the use of Weaviate modules to do the vectorisation:

weaviate_vectorstore = WeaviateVectorStore(embedding=OpenAIEmbeddings())

Using OpenAI's embedding model when DEFAULT_VECTORIZER_MODULE: 'multi2vec-clip' is set in my docker-compose.yml will cause a conflict since the default schema created by langchain assumes no weaviate modules is being used for vectorisation. See #177.

@efriis
Copy link
Member

efriis commented May 11, 2024

Got it. @hsm207 I tend to agree that langchain support even with that setting is relevant in order to make it usable with other components (e.g. as a retrieval tool for an agent), and I'll defer to you to determine what's best for the weaviate integration package!

@efriis efriis reopened this May 11, 2024
@pashva
Copy link
Author

pashva commented May 11, 2024

I have the implementation ready that I use for myself, if needed can contribute to this repository @hsm207

@hsm207
Copy link
Collaborator

hsm207 commented May 11, 2024

@pashva sure, that contribution would be great.

@pashva
Copy link
Author

pashva commented May 12, 2024

@hsm207 I have created a PR for the same, hopefully it solves our purpose @StreetLamb

PR: #179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants