-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95
Comments
hey @pashva, Thanks again for your interest to contribute! I would like to learn more about your use case but first, to answer your question about whether we should port over your PR: Looks like we already have an example of what happens when we want to allow customisation to the default schema that langchain creates, as discussed in #94. I think a solution where users create their desired schema themselves, and then tell langchain the schema name, is much cleaner than extending the init method with more params. What do you think? As for your use case, I understand that you want to have a local embeddings model so weaviate's text2vec transformers module is a great choice. However, since you're using langchain, why not use their HuggingFaceEmbeddings class with sentence_transformers? |
Hi @hsm207, I agree that defining the schema with the Weaviate client and integrating it with Langchain is a better approach. For my use case, I plan to use Weaviate as a retrieval tool for my agents, which is why I prefer the langchain-weaviate over Weaviate stand-alone. Additionally, I want to offer my users the flexibility to choose their vectoriser, such as using Langchain's |
@StreetLamb thanks for clarifying your use case.
I'm not very familiar with other parts of langchain. Do you mean you're going to create a custom tool so that a langchain agent can use weaviate to do retrieval? |
@hsm207 Yes but it should already be possible to create a Weaviate retriever and use langchain API to create the retriever tool instead of creating a custom tool. Just to share, I tried langchain-chroma in the past, and the ability to customise the collection using their client and use langchain-chroma to reference the collection via name was helpful. |
@StreetLamb |
Howdy! You should be able to define an embedding model (which I think is what you're calling a vectoriser), and make a weaviate retriever tool with
If that's not the case, feel free to reopen, as that would probably be a bug. |
Hi @efriis, sorry there might have been some confusion. The challenge I am facing is that I cannot specify the use of Weaviate modules to do the vectorisation:
Using OpenAI's embedding model when |
Got it. @hsm207 I tend to agree that langchain support even with that setting is relevant in order to make it usable with other components (e.g. as a retrieval tool for an agent), and I'll defer to you to determine what's best for the weaviate integration package! |
I have the implementation ready that I use for myself, if needed can contribute to this repository @hsm207 |
@pashva sure, that contribution would be great. |
@hsm207 I have created a PR for the same, hopefully it solves our purpose @StreetLamb PR: #179 |
I was using langchain weaviate modules as my library to manage my weaviate storage. But the main problem was that I wanted to use weaviate's local text2vec transformers but in langchain there was no way to pass this argument to make sure that particular documents are embedded with particular vectorizers.
Weaviate allows users to mention a key value pair of vectorizer while creating a class so that users can leverage local vectorization or basically vectorization of their choice for each class.
Currently this is not implemented in langchain and only a default type schema gets created with a singular data property when using the from_documents or from_texts function calls.
Solution:
Allow an optional user defined vectorizer field
I have implemented this, should I create a PR?
langchain-ai/langchain#16795
It was closed here and was asked to check this repository out
The text was updated successfully, but these errors were encountered: