Skip to content

[Voxtral TTS] Fix Voxtral TTS input with text and ref_audio#2750

Merged
ywang96 merged 3 commits into
vllm-project:mainfrom
y123456y78:fix-voxtral-tts-mm-input
Apr 13, 2026
Merged

[Voxtral TTS] Fix Voxtral TTS input with text and ref_audio#2750
ywang96 merged 3 commits into
vllm-project:mainfrom
y123456y78:fix-voxtral-tts-mm-input

Conversation

@y123456y78
Copy link
Copy Markdown
Contributor

@y123456y78 y123456y78 commented Apr 13, 2026

Purpose

  • ref_audio (mm_data) + text input fails bc VoxtralTTSMultiModalProcessor doesn't work with HF _apply_hf_processor_mm_only directly (the class didn't inherit from Transformers ProcessorMixin since it use mistral tokenizer to handle preprocess)
  • prefix voice clone still work bc it send text + voice id (no mm_input)

Test Plan

pytest -s -v   tests/model_executor/stage_input_processors/test_voxtral_tts_async_chunk.py   \
tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py   \
tests/model_executor/models/voxtral_tts/test_audio_tokenizer_parsing.py   \
tests/e2e/online_serving/test_voxtral_tts.py \
tests/model_executor/models/voxtral_tts/test_text_preprocess.py  \
tests/e2e/offline_inference/test_voxtral_tts.py

Test Result

image

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
@y123456y78 y123456y78 changed the title [Voxtral TTS] Fix Voxtral TTS mm + text input [Voxtral TTS] Fix Voxtral TTS input with text and mm data Apr 13, 2026
@y123456y78 y123456y78 changed the title [Voxtral TTS] Fix Voxtral TTS input with text and mm data [Voxtral TTS] Fix Voxtral TTS input with text and ref_audio Apr 13, 2026
@y123456y78 y123456y78 marked this pull request as ready for review April 13, 2026 20:40
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

PR #2750 - [Voxtral TTS] Fix input with text and ref_audio

OVERALL: NO BLOCKERS
VERDICT: COMMENT

Correctness: PASS, Reliability: PASS, Breaking: PASS, Tests: PASS, Docs: PASS, Security: PASS

Summary: Bugfix override _apply_hf_processor_mm_only with dummy text for mm_input. 23 lines. Gates pass, tests pass. No blockers.

@ywang96 ywang96 enabled auto-merge (squash) April 13, 2026 22:04
@ywang96 ywang96 disabled auto-merge April 13, 2026 22:04
@ywang96 ywang96 enabled auto-merge (squash) April 13, 2026 22:04
@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 13, 2026
@ywang96 ywang96 merged commit dd13891 into vllm-project:main Apr 13, 2026
7 of 8 checks passed
Celeste-jq pushed a commit to IsleOfDawnlight/vllm-omni-voxcpm that referenced this pull request Apr 14, 2026
alex-jw-brooks pushed a commit to alex-jw-brooks/vllm-omni that referenced this pull request Apr 14, 2026
@codeHackeR321
Copy link
Copy Markdown

Hi @y123456y78, Can you please tell me which voxtral tts opensource model are you using ? I could not find voice cloning support in official hf model page. Commuity discussions say they have not released voice cloning weights. https://huggingface.co/mistralai/Voxtral-4B-TTS-2603/discussions/17.

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants