Skip to content

Commit 0854248

Browse files
Remove audio optional dependency for mistral-common (vllm-project#28722)
Signed-off-by: Julien Denize <[email protected]> Signed-off-by: Julien Denize <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
1 parent a17e36f commit 0854248

File tree

4 files changed

+6
-2
lines changed

4 files changed

+6
-2
lines changed

docs/contributing/model/transcription.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ No extra registration is required beyond having your model class available via t
249249
## Examples in-tree
250250

251251
- Whisper encoder–decoder (audio-only): [vllm/model_executor/models/whisper.py](../../../vllm/model_executor/models/whisper.py)
252-
- Voxtral decoder-only (audio embeddings + LLM): [vllm/model_executor/models/voxtral.py](../../../vllm/model_executor/models/voxtral.py)
252+
- Voxtral decoder-only (audio embeddings + LLM): [vllm/model_executor/models/voxtral.py](../../../vllm/model_executor/models/voxtral.py). Make sure to have installed `mistral-common[audio]`.
253253
- Gemma3n decoder-only with fixed instruction prompt: [vllm/model_executor/models/gemma3n_mm.py](../../../vllm/model_executor/models/gemma3n_mm.py)
254254

255255
## Test with the API

docs/models/supported_models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -785,6 +785,9 @@ Speech2Text models trained specifically for Automatic Speech Recognition.
785785
| `Gemma3nForConditionalGeneration` | Gemma3n | `google/gemma-3n-E2B-it`, `google/gemma-3n-E4B-it`, etc. | | |
786786
| `GraniteSpeechForConditionalGeneration` | Granite Speech | `ibm-granite/granite-speech-3.3-2b`, `ibm-granite/granite-speech-3.3-8b`, etc. | ✅︎ | ✅︎ |
787787

788+
!!! note
789+
`VoxtralForConditionalGeneration` requires `mistral-common[audio]` to be installed.
790+
788791
### Pooling Models
789792

790793
See [this page](./pooling_models.md) for more information on how to use pooling models.

examples/offline_inference/audio_language.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ class ModelRequestData(NamedTuple):
4343

4444

4545
# Voxtral
46+
# Make sure to install mistral-common[audio].
4647
def run_voxtral(question: str, audio_count: int) -> ModelRequestData:
4748
from mistral_common.audio import Audio
4849
from mistral_common.protocol.instruct.chunk import (

requirements/common.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ partial-json-parser # used for parsing partial JSON outputs
3131
pyzmq >= 25.0.0
3232
msgspec
3333
gguf >= 0.13.0
34-
mistral_common[image,audio] >= 1.8.5
34+
mistral_common[image] >= 1.8.5
3535
opencv-python-headless >= 4.11.0 # required for video IO
3636
pyyaml
3737
six>=1.16.0; python_version > '3.11' # transitive dependency of pandas that needs to be the latest version for python 3.12

0 commit comments

Comments
 (0)