Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

Closed
deerleo opened this issue Dec 28, 2023 · 7 comments
Closed
Assignees

Comments

@deerleo
Copy link

deerleo commented Dec 28, 2023

Describe the bug

I tried to integrate the dalle-3 example code to my groupchat bot, (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_dalle_and_gpt4v.ipynb),
but after the dalle agent replied with img data in base64 format, the next request encounter an error sad request data too large.
I think it's because the autogen push the reply data of base64 img to message list, in the next request, there will be a large request body.

How to exclude some message or truncate it in the group chat?

autogen-dalle-error

Steps to reproduce

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

@deerleo deerleo added the bug label Dec 28, 2023
@rickyloynd-microsoft
Copy link
Contributor

@kevin666aa

@yiranwu0
Copy link
Collaborator

Hello @deerleo, when you are creating dalle agent, are you using your own customized agents? A quick solution is to process the image in the customized registered reply function. You will take out the image and return a message like: "Placeholder for Image".

@sonichi
Copy link
Contributor

sonichi commented Dec 31, 2023

@BeibinLi

@BeibinLi
Copy link
Collaborator

BeibinLi commented Jan 1, 2024

@deerleo Thanks for the feedback. I tried the notebook again and it works ok.

Can you provide a simple example to reproduce the error? My guess is the Chat Manager is not a LMM agent (e.g., it uses a LLM with no vision features). So, it can not understand the b64 image format, and read the image as a super long string.

I will try to redesign how LMM agents handle images in the future. In the meanwhile, if you could provide some failing examples, that would be great!

@BeibinLi
Copy link
Collaborator

BeibinLi commented Jan 3, 2024

@deerleo Can you check: #1124

Also, can you make the chat_manager a MultimodalAgent instead of a conversable agent?

@afourney
Copy link
Member

afourney commented Jan 3, 2024

@BeibinLi I don't think there is a MultimodalGroupChat?

You can add groupchat manager capabilities to any agent via registration, but I suspect the problem lies with the GroupChat class that handles the orchestration.

@BeibinLi
Copy link
Collaborator

BeibinLi commented Jan 4, 2024

@afourney Got it. You are correct. I just added an issue (#1142) regarding MultimodalGroupChat. I will create a MultimodalGroupChat under the multimodal features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants