Explain Image functionality #148

leventmolla · 2024-01-24T22:46:19Z

Discussed in #147

^{Originally posted by leventmolla January 25, 2024}
OpenAI has a new model gp4-1106-vision-preview which can explain a collection of images. I think it could be done with the general chat completions endpoint, but the documentation is not very clear about the message structure. There should be a text prompt initially describing the task and follow-up messages that contain the images. I tried this and passed base64-encoded images, got errors (for some reason the number of tokens requested is a very large number and the query fails). I then tried to pass the URLs of the files for the images, which failed as well. So I am at a loss about how to use this functionality.

kalafus · 2024-02-07T21:08:47Z

Correct, this is done with the ChatCompletions endpoint. The documentation indicates that the User role can send 3 types of Message Content:

single value encoded strings that we're used to
ChatCompletionContentPartTextParam (not clear to me if this functionally differs from single value encoded strings, haven't experimented)

{
    "type": "text",
    "text": text
}

ChatCompletionContentPartImageParam (haven't experimented)

{
    "type": "image_url",
    "image_url": {
        "url": url or base64 encoded data,
        "detail": "auto", "low" or, "high"
    }
}

#169 adds the ChatCompletionContentPartImageParam type (as it's called in the Python API). I hadn't included base64 support, but can do so now.

kalafus · 2024-02-07T21:18:26Z

looking at the Python code, OpenAI counts on you to encode image data to base64 string and feed that to the image_url.url parameter; looks like i had coded it that way because I was following the Python API precisely.

class ImageURL(TypedDict, total=False): 
    url: Required[str]
    """Either a URL of the image or the base64 encoded image data.""" 

    detail: Literal["auto", "low", "high"]
    """Specifies the detail level of the image. 

    Learn more in the
    [Vision guide](https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding).
    """


class ChatCompletionContentPartImageParam(TypedDict, total=False):
    image_url: Required[ImageURL]

    type: Required[Literal["image_url"]]
    """The type of the content part."""

see src/openai/types/chat/chat_completion_content_part_image_param.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain Image functionality #148

Explain Image functionality #148

leventmolla commented Jan 24, 2024

kalafus commented Feb 7, 2024

kalafus commented Feb 7, 2024 •

edited

Loading

Explain Image functionality #148

Explain Image functionality #148

Comments

leventmolla commented Jan 24, 2024

Discussed in #147

kalafus commented Feb 7, 2024

kalafus commented Feb 7, 2024 • edited Loading

kalafus commented Feb 7, 2024 •

edited

Loading