Skip to content

[Voxtral TTS] Support voxtral tts cfg_alpha sampling params via temperature#2243

Closed
y123456y78 wants to merge 20 commits into
vllm-project:mainfrom
y123456y78:chenyo/voxtral-tts-sampling-params
Closed

[Voxtral TTS] Support voxtral tts cfg_alpha sampling params via temperature#2243
y123456y78 wants to merge 20 commits into
vllm-project:mainfrom
y123456y78:chenyo/voxtral-tts-sampling-params

Conversation

@y123456y78
Copy link
Copy Markdown
Contributor

@y123456y78 y123456y78 commented Mar 26, 2026

Purpose

  • Support voxtral tts cfg_alpha sampling params via temperature
  • Add corresponding unit test
  • Clean up cuda graph & cuda graph test code a bit

Testing Plan

pytest -s -v \
  tests/model_executor/stage_input_processors/test_voxtral_tts_async_chunk.py \
  tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py \
  tests/model_executor/models/voxtral_tts/test_audio_tokenizer_parsing.py \
  tests/e2e/online_serving/test_voxtral_tts.py \
  tests/model_executor/models/voxtral_tts/test_text_preprocess.py \
  tests/e2e/offline_inference/test_voxtral_tts.py

Result

Uploading image.png…

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
@y123456y78 y123456y78 force-pushed the chenyo/voxtral-tts-sampling-params branch from ea989bd to 03f3d9b Compare March 26, 2026 22:57
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
@y123456y78 y123456y78 changed the title [Voxtral TTS] Support voxtral tts cfg_alpha sampling params [Voxtral TTS] Support voxtral tts cfg_alpha sampling params via temperature Mar 27, 2026
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple comments

sampling_metadata = kwargs.get("sampling_metadata")
if sampling_metadata is None or sampling_metadata.temperature is None:
raise ValueError(
"VoxtralTTS requires a non-zero 'temperature' sampling parameter (used as cfg_alpha for flow-matching)."
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will silently accept temperature=0, which would zero out the conditional velocity entirely (pure unconditional generation). Should probably validate temperature > 0 here, or at least != 0.

final_output_type: text
default_sampling_params:
temperature: 0.0
# NOTE: VoxtralTTS repurposes 'temperature' as the CFG alpha
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might be worth adding a user-facing note somewhere (CLI help, docs) that temperature controls CFG strength for voxtral-tts — otherwise people will set temperature=0.7 expecting normal sampling behavior and get confused.

padded_size = self._get_padded_size(actual_size)
if padded_size is None or padded_size not in self.graphs:
return self.model.compute_mm_logits(hidden_states)
return self.model.compute_mm_logits(hidden_states, cfg_alpha=cfg_alpha)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1D -> 2D reshape (unsqueeze(1)) happens inside decode_one_frame, but in the graph path static_cfg_alpha is already (size, 1). This means the eager fallback via compute_mm_logits will unsqueeze, but the graph path skips it. Works today but the shape contract is fragile — a comment on the expected shape at this interface would help.

@y123456y78 y123456y78 marked this pull request as ready for review April 20, 2026 19:50
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@y123456y78 y123456y78 marked this pull request as draft April 20, 2026 19:50
@linyueqian linyueqian added this to the v0.20.0 milestone Apr 22, 2026
@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 22, 2026
@lishunyang12
Copy link
Copy Markdown
Collaborator

Resolve conflicts.

@y123456y78 y123456y78 closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants