Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

pashva · 2024-02-10T07:34:17Z

I was using langchain weaviate modules as my library to manage my weaviate storage. But the main problem was that I wanted to use weaviate's local text2vec transformers but in langchain there was no way to pass this argument to make sure that particular documents are embedded with particular vectorizers.

Weaviate allows users to mention a key value pair of vectorizer while creating a class so that users can leverage local vectorization or basically vectorization of their choice for each class.

Currently this is not implemented in langchain and only a default type schema gets created with a singular data property when using the from_documents or from_texts function calls.

Solution:
Allow an optional user defined vectorizer field

I have implemented this, should I create a PR?
langchain-ai/langchain#16795
It was closed here and was asked to check this repository out

hsm207 · 2024-02-10T13:18:27Z

hey @pashva,

Thanks again for your interest to contribute! I would like to learn more about your use case but first, to answer your question about whether we should port over your PR:

Looks like we already have an example of what happens when we want to allow customisation to the default schema that langchain creates, as discussed in #94. I think a solution where users create their desired schema themselves, and then tell langchain the schema name, is much cleaner than extending the init method with more params.

What do you think?

As for your use case, I understand that you want to have a local embeddings model so weaviate's text2vec transformers module is a great choice. However, since you're using langchain, why not use their HuggingFaceEmbeddings class with sentence_transformers?

StreetLamb · 2024-05-08T13:35:20Z

Hi @hsm207, I agree that defining the schema with the Weaviate client and integrating it with Langchain is a better approach. For my use case, I plan to use Weaviate as a retrieval tool for my agents, which is why I prefer the langchain-weaviate over Weaviate stand-alone. Additionally, I want to offer my users the flexibility to choose their vectoriser, such as using Langchain's OpenAIEmbeddings() or a local embedding model. Currently, this level of customisation is not supported in langchain-weaviate.

hsm207 · 2024-05-08T17:25:35Z

@StreetLamb thanks for clarifying your use case.

I plan to use Weaviate as a retrieval tool for my agents

I'm not very familiar with other parts of langchain. Do you mean you're going to create a custom tool so that a langchain agent can use weaviate to do retrieval?

StreetLamb · 2024-05-09T00:46:08Z

@hsm207 Yes but it should already be possible to create a Weaviate retriever and use langchain API to create the retriever tool instead of creating a custom tool. Just to share, I tried langchain-chroma in the past, and the ability to customise the collection using their client and use langchain-chroma to reference the collection via name was helpful.

hsm207 · 2024-05-09T17:47:40Z

@StreetLamb
I'm looping in @efriis for input on what changes are needed in langchain-weaviate in order to have the langchain agent + chroma feature you described.

efriis · 2024-05-10T23:58:14Z

Howdy! You should be able to define an embedding model (which I think is what you're calling a vectoriser), and make a weaviate retriever tool with

weaviate_vectorstore = WeaviateVectorStore(embedding=OpenAIEmbeddings())
create_retriever_tool(weaviate_vectorstore.as_retriever(), ...)

If that's not the case, feel free to reopen, as that would probably be a bug.

StreetLamb · 2024-05-11T04:13:32Z

Hi @efriis, sorry there might have been some confusion. The challenge I am facing is that I cannot specify the use of Weaviate modules to do the vectorisation:

weaviate_vectorstore = WeaviateVectorStore(embedding=OpenAIEmbeddings())

Using OpenAI's embedding model when DEFAULT_VECTORIZER_MODULE: 'multi2vec-clip' is set in my docker-compose.yml will cause a conflict since the default schema created by langchain assumes no weaviate modules is being used for vectorisation. See #177.

efriis · 2024-05-11T05:41:02Z

Got it. @hsm207 I tend to agree that langchain support even with that setting is relevant in order to make it usable with other components (e.g. as a retrieval tool for an agent), and I'll defer to you to determine what's best for the weaviate integration package!

pashva · 2024-05-11T05:46:34Z

I have the implementation ready that I use for myself, if needed can contribute to this repository @hsm207

hsm207 · 2024-05-11T07:19:45Z

@pashva sure, that contribution would be great.

pashva · 2024-05-12T08:59:46Z

@hsm207 I have created a PR for the same, hopefully it solves our purpose @StreetLamb

PR: #179

pashva added the enhancement New feature or request label Feb 10, 2024

hsm207 mentioned this issue May 7, 2024

Error when using WeaviateVectorStore.from_documents() #177

Closed

efriis closed this as completed May 10, 2024

efriis reopened this May 11, 2024

pashva mentioned this issue May 12, 2024

[Enhancement] Support for specifying vectorizer parameter #179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

pashva commented Feb 10, 2024

hsm207 commented Feb 10, 2024

StreetLamb commented May 8, 2024

hsm207 commented May 8, 2024

StreetLamb commented May 9, 2024 •

edited

Loading

hsm207 commented May 9, 2024

efriis commented May 10, 2024

StreetLamb commented May 11, 2024

efriis commented May 11, 2024

pashva commented May 11, 2024

hsm207 commented May 11, 2024

pashva commented May 12, 2024

Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

Weaviate should allow the flexibility for the user to mention what vectorizer module that they want to use #95

Comments

pashva commented Feb 10, 2024

hsm207 commented Feb 10, 2024

StreetLamb commented May 8, 2024

hsm207 commented May 8, 2024

StreetLamb commented May 9, 2024 • edited Loading

hsm207 commented May 9, 2024

efriis commented May 10, 2024

StreetLamb commented May 11, 2024

efriis commented May 11, 2024

pashva commented May 11, 2024

hsm207 commented May 11, 2024

pashva commented May 12, 2024

StreetLamb commented May 9, 2024 •

edited

Loading