Agentchat with Multimodal Model not working #1059

ViperVille007 · 2023-12-26T08:08:04Z

The multimodal agent doesn't seem to be working.
I replicated the notebook given in the example: Agent Chat with Multimodal Models

This is what I am getting as a response:

image-explainer (to user_proxy): Sorry, I can't help with identifying or making assumptions about images.

user_proxy (to image-explainer): I'm sorry for the confusion, but as a text-based AI, I'm unable to view or interpret images directly. If you need assistance with identifying a dog breed from an image, you would typically use image recognition software or a service that utilizes artificial intelligence to analyze the picture.

How to solve this?

The text was updated successfully, but these errors were encountered:

rickyloynd-microsoft · 2023-12-26T15:11:06Z

@BeibinLi

BeibinLi · 2023-12-30T00:39:31Z

@ViperVille007 Thanks for raising this issue!

Here are some of my findings:

I tried again but could not reproduce your error.
a. Can you run the standard version of GPT-4v? (as the dog example in the first few sections).
b. Here is another issue related to your error, which might be from the oai_config_list: [Issue]: Unable to run notebook: Agent Chat with Multimodal Models - GPT-4V #965
I noticed another issue from the GroupChat manager in this example, caused by the recent updates for the group chat.
a. Now, the group chat manager would not ask both image-explainers at the same time.
b. Need to update the example.

BTW, here is my current result:

User_proxy (to chat_manager):

Describe the image below:
                        <img [https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0>.](https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0%3E.)

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
image-explainer-2 (to chat_manager):

The image shows a close-up of a cute, curly-haired apricot-colored puppy. The puppy is wearing a blue collar with a colorful bow tie and a heart-shaped tag. In the background, there are two black objects that appear to be shoes and a white door or wall. The focus is on the puppy, giving the photo a warm and friendly feel.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):



--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
image-explainer-2 (to chat_manager):

It appears that you haven't entered any text. If you have any questions or need assistance with something, feel free to ask!

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):



--------------------------------------------------------------------------------

thinkall · 2024-06-18T09:04:07Z

Closing this issue due to inactivity. If you have further questions, please open a new issue or join the discussion in AutoGen Discord server: https://discord.com/invite/Yb5gwGVkE5

BeibinLi self-assigned this Dec 28, 2023

whiskyboy pushed a commit to whiskyboy/autogen that referenced this issue Apr 17, 2024

Support more azure openai api_type (microsoft#1059)

a13539c

thinkall closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentchat with Multimodal Model not working #1059

Agentchat with Multimodal Model not working #1059

ViperVille007 commented Dec 26, 2023

rickyloynd-microsoft commented Dec 26, 2023

BeibinLi commented Dec 30, 2023 •

edited

Loading

thinkall commented Jun 18, 2024

Agentchat with Multimodal Model not working #1059

Agentchat with Multimodal Model not working #1059

Comments

ViperVille007 commented Dec 26, 2023

rickyloynd-microsoft commented Dec 26, 2023

BeibinLi commented Dec 30, 2023 • edited Loading

thinkall commented Jun 18, 2024

BeibinLi commented Dec 30, 2023 •

edited

Loading