Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ChromaDB OpenAI embedding function is not working #2979

Closed
StamKavid opened this issue Jun 20, 2024 · 4 comments
Closed

[Bug]: ChromaDB OpenAI embedding function is not working #2979

StamKavid opened this issue Jun 20, 2024 · 4 comments

Comments

@StamKavid
Copy link

Describe the bug

I have the below code:

openai_ef = OpenAIEmbeddingFunction(
api_key= os.getenv("AZURE_OPENAI_API_KEY"),
api_base= os.getenv("AZURE_OPENAI_ENDPOINT"),
model_name= "text-embedding-ada-002"
)

While I am passing it to RetrieveUserProxyAgent as "embedding_function" : openai_ef, i am still getting the below error:

autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding.

Also in the logs, the link: https://docs.trychroma.com/embeddings is not existing anymore (404)

Steps to reproduce

No response

Model Used

text-embeddings-ada-002

Expected Behavior

TO pass the embedding function into ChromaDB and overpass the default state

Screenshots and logs

image
image

Additional Information

No response

@StamKavid StamKavid added the bug label Jun 20, 2024
@nithinkodithala
Copy link

Verify Compatibility: Ensure that the RetrieveUserProxyAgent accepts the embedding function in the manner you're providing it. There might be specific requirements or ways to pass the embedding function.
also try this method
{chromadb_client = ChromaDB(embedding_function=openai_ef)}

@StamKavid
Copy link
Author

@nithinkodithyala Thank you very much for the prompt response.

I tried "client": Chroma(client = chromadb.PersistentClient(path="/tmp/chromadb/test"), collection_name="groupchat-collection", embedding_function=embeddings),

but still doesn't seem to solve this.

@StamKavid
Copy link
Author

I can confirm that it is working with the below settings:

"client": chromadb.PersistentClient().get_or_create_collection(name = 'autogen_agent', embedding_function=openai_ef)

@Tanv-1
Copy link

Tanv-1 commented Jan 4, 2025

I tried this:
import chromadb.utils.embedding_functions as embedding_functions
import chromadb

import autogen
from autogen import AssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="XXXXXXXXXXXXXXXXXXX",
api_base="XXXXXXXXXXXXXXX",
api_type="azure",
api_version="2023-12-01-preview",
model_name="embeddings-large"
)

assistant = AssistantAgent(
name="assistant",
system_message="You are a helpful assistant.",
llm_config={
"timeout": 600,
"cache_seed": 42,
"config_list": config_list,
},
)
ragproxyagent = RetrieveUserProxyAgent(
name="ragproxyagent",
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
retrieve_config={
"task": "qa",
"docs_path": [
<directory_path containing 1000 md files>
],
"model": config_list[0]["model"],
"client": chromadb.PersistentClient(path="C:/Users/Datasemantics/Documents/DS/RAG1/venvPython12/chroma_db_autogen").get_or_create_collection(name = 'danube_phase3', embedding_function=openai_ef),
"collection_name": "danube_phase3",
"embedding_model": "text-embedding-3-large",
"embedding_function": openai_ef,
"overwrite": True, # set to True if you want to overwrite an existing collection
"get_or_create": False, # set to False if don't want to reuse an existing collection
},
code_execution_config=False, # set to False if you don't want to execute the code
)

qa_problem = "How many agents are there?"
chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=qa_problem)

ERROR:
Trying to create collection.
2025-01-04 12:43:18,147 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 4424 chunks.
2025-01-04 12:43:18,202 - autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding

RateLimitError Traceback (most recent call last)
File c:\Users\Datasemantics\Documents\DS\RAG1.autogen\lib\site-packages\chromadb\api\models\CollectionCommon.py:90, in validation_context..decorator..wrapper(self, *args, **kwargs)
89 try:
---> 90 return func(self, *args, **kwargs)
91 except Exception as e:

File c:\Users\Datasemantics\Documents\DS\RAG1.autogen\lib\site-packages\chromadb\api\models\CollectionCommon.py:406, in CollectionCommon._validate_and_prepare_upsert_request(self, ids, embeddings, metadatas, documents, images, uris)
403 validate_record_set_for_embedding(
404 record_set=upsert_records, embeddable_fields={"documents", "images"}
405 )
--> 406 upsert_embeddings = self._embed_record_set(record_set=upsert_records)
407 else:

File c:\Users\Datasemantics\Documents\DS\RAG1.autogen\lib\site-packages\chromadb\api\models\CollectionCommon.py:526, in CollectionCommon._embed_record_set(self, record_set, embeddable_fields)
525 else:
--> 526 return self._embed(input=record_set[field]) # type: ignore[literal-required]
527 raise ValueError(
528 "Record does not contain any non-None fields that can be embedded."
529 f"Embeddable Fields: {embeddable_fields}"
530 f"Record Fields: {record_set}"
531 )

File c:\Users\Datasemantics\Documents\DS\RAG1.autogen\lib\site-packages\chromadb\api\models\CollectionCommon.py:539, in CollectionCommon._embed(self, input)
535 raise ValueError(
...
91 except Exception as e:
92 msg = f"{str(e)} in {name}."
---> 93 raise type(e)(msg).with_traceback(e.traceback)

TypeError: init() missing 2 required keyword-only arguments: 'response' and 'body'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants