Skip to content

[Usage]: Phi-4-multimodal-instruct: Model Does Not Support Transcriptions API Error #24570

@BugsBuggy

Description

@BugsBuggy

Your current environment:

System Info
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version : version 4.1.0
Libc version : glibc-2.35

PyTorch Info
PyTorch version : 2.7.1+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8

Python Environment
Python version : 3.12.11 [GCC 11.4.0] (64-bit runtime)

When attempting to use the OpenAI transcription client with a deployed vLLM instance (version v0.10.1.1) serving the microsoft/Phi-4-multimodal-instruct model, I encounter an error indicating that the model does not support the Transcriptions API.

Steps Taken:

  • Deployed vLLM with the specified model (microsoft/Phi-4-multimodal-instruct) using Docker image v0.10.1.1.
  • Started the vLLM server with the necessary arguments including enabling LORA and setting appropriate parameters for model length, data types, etc.
  • Utilized the OpenAI transcription client example from the documentation but modified the openai_api_base URL to point to our locally hosted vLLM service.
  • Attempted both synchronous and asynchronous transcription methods through the client.
{
  "error": {
    "message": "The model does not support Transcriptions API",
    "type": "BadRequestError",
    "param": null,
    "code": 400
  }
}

Deployment Method: Kubernetes Deployment YAML specifying the vLLM container and its configurations.
Starting Command: The vLLM server was started within a Kubernetes pod with the command (see: https://huggingface.co/microsoft/Phi-4-multimodal-instruct#vllm-inference):

python3 -m vllm.entrypoints.openai.api_server \
  --model microsoft/Phi-4-multimodal-instruct \
  --dtype auto \
  --port 8080 \
  --trust-remote-code \
  --max-model-len 131072 \
  --enable-lora \
  --max-lora-rank 320 \
  --lora-extra-vocab-size 256 \
  --max-loras 1 \
  --lora-modules speech=/data/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/<snapshot-id>/speech-lora/

Can someone please help me understand why my model doesn't support the Transcriptions API and provide guidance on how to enable or configure it properly?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions