Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agentchat with Multimodal Model not working #1059

Closed
ViperVille007 opened this issue Dec 26, 2023 · 3 comments
Closed

Agentchat with Multimodal Model not working #1059

ViperVille007 opened this issue Dec 26, 2023 · 3 comments
Assignees

Comments

@ViperVille007
Copy link

The multimodal agent doesn't seem to be working.
I replicated the notebook given in the example: Agent Chat with Multimodal Models

This is what I am getting as a response:

image-explainer (to user_proxy): Sorry, I can't help with identifying or making assumptions about images.

user_proxy (to image-explainer): I'm sorry for the confusion, but as a text-based AI, I'm unable to view or interpret images directly. If you need assistance with identifying a dog breed from an image, you would typically use image recognition software or a service that utilizes artificial intelligence to analyze the picture.

How to solve this?

@rickyloynd-microsoft
Copy link
Contributor

@BeibinLi

@BeibinLi BeibinLi self-assigned this Dec 28, 2023
@BeibinLi
Copy link
Collaborator

BeibinLi commented Dec 30, 2023

@ViperVille007 Thanks for raising this issue!

Here are some of my findings:

  1. I tried again but could not reproduce your error.
    a. Can you run the standard version of GPT-4v? (as the dog example in the first few sections).
    b. Here is another issue related to your error, which might be from the oai_config_list: [Issue]: Unable to run notebook: Agent Chat with Multimodal Models - GPT-4V  #965
  2. I noticed another issue from the GroupChat manager in this example, caused by the recent updates for the group chat.
    a. Now, the group chat manager would not ask both image-explainers at the same time.
    b. Need to update the example.

BTW, here is my current result:

User_proxy (to chat_manager):

Describe the image below:
                        <img [https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0>.](https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0%3E.)

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
image-explainer-2 (to chat_manager):

The image shows a close-up of a cute, curly-haired apricot-colored puppy. The puppy is wearing a blue collar with a colorful bow tie and a heart-shaped tag. In the background, there are two black objects that appear to be shoes and a white door or wall. The focus is on the puppy, giving the photo a warm and friendly feel.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):



--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
image-explainer-2 (to chat_manager):

It appears that you haven't entered any text. If you have any questions or need assistance with something, feel free to ask!

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
User_proxy (to chat_manager):



--------------------------------------------------------------------------------

whiskyboy pushed a commit to whiskyboy/autogen that referenced this issue Apr 17, 2024
@thinkall
Copy link
Collaborator

Closing this issue due to inactivity. If you have further questions, please open a new issue or join the discussion in AutoGen Discord server: https://discord.com/invite/Yb5gwGVkE5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants