[Usage]: Phi-4-multimodal-instruct: Model Does Not Support Transcriptions API Error

Your current environment:

System Info
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version                : version 4.1.0
Libc version                 : glibc-2.35

PyTorch Info
PyTorch version              : 2.7.1+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8


Python Environment
Python version               : 3.12.11 [GCC 11.4.0] (64-bit runtime)


When attempting to use the OpenAI transcription client with a deployed vLLM instance (version v0.10.1.1) serving the microsoft/Phi-4-multimodal-instruct model, I encounter an error indicating that the model does not support the Transcriptions API.

**Steps Taken:**
- Deployed vLLM with the specified model (microsoft/Phi-4-multimodal-instruct) using Docker image v0.10.1.1.
- Started the vLLM server with the necessary arguments including enabling LORA and setting appropriate parameters for model length, data types, etc.
- Utilized the OpenAI transcription client example from [the documentation](https://docs.vllm.ai/en/stable/examples/online_serving/openai_transcription_client.html) but modified the openai_api_base URL to point to our locally hosted vLLM service.
- Attempted both synchronous and asynchronous transcription methods through the client.

```
{
  "error": {
    "message": "The model does not support Transcriptions API",
    "type": "BadRequestError",
    "param": null,
    "code": 400
  }
}
```

Deployment Method: Kubernetes Deployment YAML specifying the vLLM container and its configurations.
Starting Command: The vLLM server was started within a Kubernetes pod with the command (see: https://huggingface.co/microsoft/Phi-4-multimodal-instruct#vllm-inference):
```
python3 -m vllm.entrypoints.openai.api_server \
  --model microsoft/Phi-4-multimodal-instruct \
  --dtype auto \
  --port 8080 \
  --trust-remote-code \
  --max-model-len 131072 \
  --enable-lora \
  --max-lora-rank 320 \
  --lora-extra-vocab-size 256 \
  --max-loras 1 \
  --lora-modules speech=/data/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/<snapshot-id>/speech-lora/
```

Can someone please help me understand why my model doesn't support the Transcriptions API and provide guidance on how to enable or configure it properly?


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Phi-4-multimodal-instruct: Model Does Not Support Transcriptions API Error #24570

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Phi-4-multimodal-instruct: Model Does Not Support Transcriptions API Error #24570

Description

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions