Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/contributing/model/transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ No extra registration is required beyond having your model class available via t
## Examples in-tree

- Whisper encoder–decoder (audio-only): [vllm/model_executor/models/whisper.py](../../../vllm/model_executor/models/whisper.py)
- Voxtral decoder-only (audio embeddings + LLM): [vllm/model_executor/models/voxtral.py](../../../vllm/model_executor/models/voxtral.py)
- Voxtral decoder-only (audio embeddings + LLM): [vllm/model_executor/models/voxtral.py](../../../vllm/model_executor/models/voxtral.py). Make sure to have installed `mistral-common[audio]`.
- Gemma3n decoder-only with fixed instruction prompt: [vllm/model_executor/models/gemma3n_mm.py](../../../vllm/model_executor/models/gemma3n_mm.py)

## Test with the API
Expand Down
3 changes: 3 additions & 0 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,9 @@ Speech2Text models trained specifically for Automatic Speech Recognition.
| `Gemma3nForConditionalGeneration` | Gemma3n | `google/gemma-3n-E2B-it`, `google/gemma-3n-E4B-it`, etc. | | |
| `GraniteSpeechForConditionalGeneration` | Granite Speech | `ibm-granite/granite-speech-3.3-2b`, `ibm-granite/granite-speech-3.3-8b`, etc. | ✅︎ | ✅︎ |

!!! note
`VoxtralForConditionalGeneration` requires `mistral-common[audio]` to be installed.

### Pooling Models

See [this page](./pooling_models.md) for more information on how to use pooling models.
Expand Down
1 change: 1 addition & 0 deletions examples/offline_inference/audio_language.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class ModelRequestData(NamedTuple):


# Voxtral
# Make sure to install mistral-common[audio].
def run_voxtral(question: str, audio_count: int) -> ModelRequestData:
from mistral_common.audio import Audio
from mistral_common.protocol.instruct.chunk import (
Expand Down
2 changes: 1 addition & 1 deletion requirements/common.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ partial-json-parser # used for parsing partial JSON outputs
pyzmq >= 25.0.0
msgspec
gguf >= 0.13.0
mistral_common[image,audio] >= 1.8.5
mistral_common[image] >= 1.8.5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Dropping audio extra removes mandatory test dependency

Removing mistral_common[audio] from the default requirements means packages like librosa and soundfile are no longer installed when setting up the repo via requirements/common.txt. Several audio tests import librosa unconditionally (e.g. tests/multimodal/test_audio.py, tests/models/multimodal/generation/test_phi4_multimodal.py, tests/entrypoints/openai/test_transcription_validation.py), so the test suite now raises ModuleNotFoundError before those modules can mark themselves skipped. Unless the audio tests are explicitly split into an optional suite, the dependency should stay in the base requirements or the tests must guard their imports, otherwise CI and developers running the default setup will be unable to execute the audio-related tests or demos.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test dependencies kept audio optional dependency.

opencv-python-headless >= 4.11.0 # required for video IO
pyyaml
six>=1.16.0; python_version > '3.11' # transitive dependency of pandas that needs to be the latest version for python 3.12
Expand Down