Skip to content

OpenAI incompatible image handling in server multimodal #4771

@gelim

Description

@gelim

Hello while testing Llava-13B with server implementation I got a 500 error related to the content being a list of dicts and not a simple string.

  • what works (but unrelated to Llava):
$ curl -H "Content-Type: application/json" -X POST -s $SERVER/v1/chat/completions -d '{"messages": [{"role": "user", "content": "hello"}]}'

{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Hi there! How can I help you today?","role":"assistant"}}],[...]

  • what yields a 500 with [json.exception.type_error.302] type must be string, but is array:
$ curl -H "Content-Type: application/json" -X POST -s $SERVER/v1/chat/completions -d '{"messages": [{"role": "user", "content": [{"type":"text","text":"hello"}]}]}'

this is to demonstrate the issue when using an OpenAI REST aware frontend that is pushing text with pic inside the content key like this:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "describe the picture"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/webp;base64,AAAAAA==",
            "detail": "auto"
          }
        }
      ]
    }
  ],
  "model": "llava-13b",
  "frequency_penalty": 0,
  "max_tokens": 4000,
  "presence_penalty": 0,
  "temperature": 0.1,
  "top_p": 1,
  "user": "foobar"
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions