Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -767,9 +767,6 @@ Some models are supported only via the [Transformers modeling backend](#transfor
The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`HwwwH/MiniCPM-V-2`) for now.
For more details, please see: <https://github.com/vllm-project/vllm/pull/4087#issuecomment-2250397630>

!!! note
For Qwen2.5-Omni and Qwen3-Omni, reading audio from video pre-processing (`--mm-processor-kwargs '{"use_audio_in_video": true}'`) is currently work in progress and not yet supported.

#### Transcription

Speech2Text models trained specifically for Automatic Speech Recognition.
Expand Down
1 change: 0 additions & 1 deletion examples/offline_inference/qwen2_5_omni/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ python examples/offline_inference/qwen2_5_omni/only_thinker.py \
-q mixed_modalities

# Read vision and audio inputs from a single video file
# NOTE: V1 engine does not support interleaved modalities yet.
python examples/offline_inference/qwen2_5_omni/only_thinker.py \
-q use_audio_in_video

Expand Down
2 changes: 0 additions & 2 deletions vllm/model_executor/models/qwen2_5_omni_thinker.py
Original file line number Diff line number Diff line change
Expand Up @@ -1128,8 +1128,6 @@ def embed_multimodal(self, **kwargs: object) -> MultiModalEmbeddings:
multimodal_embeddings += tuple(audio_embeddings)
return multimodal_embeddings

# TODO (ywang96): support overlapping modality embeddings so that
# `use_audio_in_video` will work on V1.
def embed_input_ids(
self,
input_ids: torch.Tensor,
Expand Down
2 changes: 0 additions & 2 deletions vllm/model_executor/models/qwen3_omni_moe_thinker.py
Original file line number Diff line number Diff line change
Expand Up @@ -1371,8 +1371,6 @@ def embed_input_ids(
return inputs_embeds

deepstack_input_embeds = None
# TODO (ywang96): support overlapping modalitiy embeddings so that
# `use_audio_in_video` will work on V1.
# split the feat dim to obtain multi-scale visual feature
has_vision_embeddings = [
embeddings.shape[-1] != self.config.text_config.hidden_size
Expand Down