[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

deerleo · 2023-12-28T10:27:14Z

Describe the bug

I tried to integrate the dalle-3 example code to my groupchat bot, (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_dalle_and_gpt4v.ipynb),
but after the dalle agent replied with img data in base64 format, the next request encounter an error sad request data too large.
I think it's because the autogen push the reply data of base64 img to message list, in the next request, there will be a large request body.

How to exclude some message or truncate it in the group chat?

Steps to reproduce

No response

Expected Behavior

No response

Screenshots and logs

No response

Additional Information

No response

rickyloynd-microsoft · 2023-12-28T14:13:03Z

@kevin666aa

yiranwu0 · 2023-12-31T03:42:39Z

Hello @deerleo, when you are creating dalle agent, are you using your own customized agents? A quick solution is to process the image in the customized registered reply function. You will take out the image and return a message like: "Placeholder for Image".

sonichi · 2023-12-31T18:26:16Z

@BeibinLi

BeibinLi · 2024-01-01T19:14:36Z

@deerleo Thanks for the feedback. I tried the notebook again and it works ok.

Can you provide a simple example to reproduce the error? My guess is the Chat Manager is not a LMM agent (e.g., it uses a LLM with no vision features). So, it can not understand the b64 image format, and read the image as a super long string.

I will try to redesign how LMM agents handle images in the future. In the meanwhile, if you could provide some failing examples, that would be great!

BeibinLi · 2024-01-03T01:07:24Z

@deerleo Can you check: #1124

Also, can you make the chat_manager a MultimodalAgent instead of a conversable agent?

afourney · 2024-01-03T23:20:12Z

@BeibinLi I don't think there is a MultimodalGroupChat?

You can add groupchat manager capabilities to any agent via registration, but I suspect the problem lies with the GroupChat class that handles the orchestration.

BeibinLi · 2024-01-04T17:59:29Z

@afourney Got it. You are correct. I just added an issue (#1142) regarding MultimodalGroupChat. I will create a MultimodalGroupChat under the multimodal features.

deerleo added the bug label Dec 28, 2023

BeibinLi self-assigned this Jan 1, 2024

BeibinLi mentioned this issue Jan 3, 2024

Use PIL Image internally for the Multimodal Agent #1124

Merged

3 tasks

BeibinLi mentioned this issue Jan 4, 2024

MultimodalGroupChat #1142

Closed

gagb closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

deerleo commented Dec 28, 2023

rickyloynd-microsoft commented Dec 28, 2023

yiranwu0 commented Dec 31, 2023

sonichi commented Dec 31, 2023

BeibinLi commented Jan 1, 2024 •

edited

Loading

BeibinLi commented Jan 3, 2024

afourney commented Jan 3, 2024

BeibinLi commented Jan 4, 2024

[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

[Bug]: Using the dall-e 3 notebook example code, the request data is too large. #1087

Comments

deerleo commented Dec 28, 2023

Describe the bug

Steps to reproduce

Expected Behavior

Screenshots and logs

Additional Information

rickyloynd-microsoft commented Dec 28, 2023

yiranwu0 commented Dec 31, 2023

sonichi commented Dec 31, 2023

BeibinLi commented Jan 1, 2024 • edited Loading

BeibinLi commented Jan 3, 2024

afourney commented Jan 3, 2024

BeibinLi commented Jan 4, 2024

BeibinLi commented Jan 1, 2024 •

edited

Loading