-
-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Description
Your current environment:
System Info
OS : Ubuntu 22.04.5 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CMake version : version 4.1.0
Libc version : glibc-2.35
PyTorch Info
PyTorch version : 2.7.1+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
Python Environment
Python version : 3.12.11 [GCC 11.4.0] (64-bit runtime)
When attempting to use the OpenAI transcription client with a deployed vLLM instance (version v0.10.1.1) serving the microsoft/Phi-4-multimodal-instruct model, I encounter an error indicating that the model does not support the Transcriptions API.
Steps Taken:
- Deployed vLLM with the specified model (microsoft/Phi-4-multimodal-instruct) using Docker image v0.10.1.1.
- Started the vLLM server with the necessary arguments including enabling LORA and setting appropriate parameters for model length, data types, etc.
- Utilized the OpenAI transcription client example from the documentation but modified the openai_api_base URL to point to our locally hosted vLLM service.
- Attempted both synchronous and asynchronous transcription methods through the client.
{
"error": {
"message": "The model does not support Transcriptions API",
"type": "BadRequestError",
"param": null,
"code": 400
}
}
Deployment Method: Kubernetes Deployment YAML specifying the vLLM container and its configurations.
Starting Command: The vLLM server was started within a Kubernetes pod with the command (see: https://huggingface.co/microsoft/Phi-4-multimodal-instruct#vllm-inference):
python3 -m vllm.entrypoints.openai.api_server \
--model microsoft/Phi-4-multimodal-instruct \
--dtype auto \
--port 8080 \
--trust-remote-code \
--max-model-len 131072 \
--enable-lora \
--max-lora-rank 320 \
--lora-extra-vocab-size 256 \
--max-loras 1 \
--lora-modules speech=/data/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/<snapshot-id>/speech-lora/
Can someone please help me understand why my model doesn't support the Transcriptions API and provide guidance on how to enable or configure it properly?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.