Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain Image functionality #148

Open
leventmolla opened this issue Jan 24, 2024 Discussed in #147 · 2 comments
Open

Explain Image functionality #148

leventmolla opened this issue Jan 24, 2024 Discussed in #147 · 2 comments

Comments

@leventmolla
Copy link

Discussed in #147

Originally posted by leventmolla January 25, 2024
OpenAI has a new model gp4-1106-vision-preview which can explain a collection of images. I think it could be done with the general chat completions endpoint, but the documentation is not very clear about the message structure. There should be a text prompt initially describing the task and follow-up messages that contain the images. I tried this and passed base64-encoded images, got errors (for some reason the number of tokens requested is a very large number and the query fails). I then tried to pass the URLs of the files for the images, which failed as well. So I am at a loss about how to use this functionality.

@kalafus
Copy link
Contributor

kalafus commented Feb 7, 2024

Correct, this is done with the ChatCompletions endpoint. The documentation indicates that the User role can send 3 types of Message Content:

  • single value encoded strings that we're used to
  • ChatCompletionContentPartTextParam (not clear to me if this functionally differs from single value encoded strings, haven't experimented)
{
    "type": "text",
    "text": text
}
  • ChatCompletionContentPartImageParam (haven't experimented)
{
    "type": "image_url",
    "image_url": {
        "url": url or base64 encoded data,
        "detail": "auto", "low" or, "high"
    }
}

#169 adds the ChatCompletionContentPartImageParam type (as it's called in the Python API). I hadn't included base64 support, but can do so now.

@kalafus
Copy link
Contributor

kalafus commented Feb 7, 2024

looking at the Python code, OpenAI counts on you to encode image data to base64 string and feed that to the image_url.url parameter; looks like i had coded it that way because I was following the Python API precisely.

class ImageURL(TypedDict, total=False): 
    url: Required[str]
    """Either a URL of the image or the base64 encoded image data.""" 

    detail: Literal["auto", "low", "high"]
    """Specifies the detail level of the image. 

    Learn more in the
    [Vision guide](https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding).
    """


class ChatCompletionContentPartImageParam(TypedDict, total=False):
    image_url: Required[ImageURL]

    type: Required[Literal["image_url"]]
    """The type of the content part."""

see src/openai/types/chat/chat_completion_content_part_image_param.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants