Skip to content

[Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS#2338

Merged
linyueqian merged 20 commits into
vllm-project:mainfrom
y123456y78:chenyo/voxtral-tts-extra-sampling-params
Apr 22, 2026
Merged

[Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS#2338
linyueqian merged 20 commits into
vllm-project:mainfrom
y123456y78:chenyo/voxtral-tts-extra-sampling-params

Conversation

@y123456y78
Copy link
Copy Markdown
Contributor

@y123456y78 y123456y78 commented Mar 30, 2026

Purpose

  • Add new arg has_sampling_extra_args in OmniModelConfig and OmniEngineArgs, set True when stage config yaml's default_sampling_params contains extra_arg
  • Add new logic in GpuModelRunner._build_model_kwargs_extra to collect extra sampling param when self.model_config.has_sampling_extra_arg is True
  • Update serving_speech.py to handle extra params in request
  • Add cfg_alpha in voxtral tts model and example

Test Plan

pytest -s -v   tests/model_executor/stage_input_processors/test_voxtral_tts_async_chunk.py   \
tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py   \
tests/model_executor/models/voxtral_tts/test_audio_tokenizer_parsing.py   \
tests/e2e/online_serving/test_voxtral_tts.py \
tests/model_executor/models/voxtral_tts/test_text_preprocess.py  \
tests/e2e/offline_inference/test_voxtral_tts.py

Test Result

image

@linyueqian
Copy link
Copy Markdown
Collaborator

@Sy0307 @JuanPZuluaga @hsliuustc0106 does this makes sense to you compared to #2243? I think this it the right approach as future model can use this as well. Also cc @lishunyang12

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple comments


def make_omni_output(
self, model_outputs: torch.Tensor | OmniOutput | tuple, logits_index: int | None = None, **kwargs
) -> OmniOutput:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_extract_cfg_alpha is defined but never called anywhere in this PR. Is this meant to be used inside forward() or make_omni_output()? Either wire it up or leave it out until the follow-up that actually needs it — dead code just rots.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix. Thank you!

Comment thread vllm_omni/worker/gpu_model_runner.py Outdated
req = self.requests.get(req_id)
sp = req.sampling_params if req else None
extra_args_list.append(
sp.extra_args if sp and sp.extra_args else {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs for every model on every step, not just voxtral. Iterating over all request IDs and doing dict lookups to collect extra_args adds overhead that most models will never use. Could you guard this behind a check (e.g. a model capability flag or at least if any request has extra_args) so we're not paying the cost unconditionally?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Add a new args has_sampling_extra_args. Thank you.

Comment thread vllm_omni/worker/gpu_model_runner.py Outdated
# Gather extra_args from per-request SamplingParams so models can
# access custom parameters (e.g. cfg_alpha for VoxtralTTS).
extra_args_list: list[dict] = []
for req_id in self.input_batch.req_ids:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: existing code at L215/L1027 uses self.requests[req_id] directly since req_id comes from input_batch.req_ids and should always be present. The defensive .get() is fine but inconsistent with the rest of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix. Thank you!

@linyueqian
Copy link
Copy Markdown
Collaborator

lmk if it is ready for review. thanks!

@y123456y78 y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from b754f44 to 3a4ce03 Compare April 13, 2026 15:14
@y123456y78 y123456y78 changed the title Add extra sampling params [Feature] Support per-model extra sampling params Apr 15, 2026
@y123456y78 y123456y78 changed the title [Feature] Support per-model extra sampling params [Feature] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS Apr 15, 2026
@y123456y78 y123456y78 changed the title [Feature] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS [Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS Apr 15, 2026
@y123456y78 y123456y78 marked this pull request as ready for review April 15, 2026 23:24
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@y123456y78 y123456y78 requested a review from lishunyang12 April 15, 2026 23:25
@linyueqian
Copy link
Copy Markdown
Collaborator

fix dco and pre-commit pls

@y123456y78 y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from 7989d0a to 791bf7a Compare April 16, 2026 00:11
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
@y123456y78 y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from 791bf7a to a134106 Compare April 16, 2026 00:12
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
@y123456y78
Copy link
Copy Markdown
Contributor Author

commit

Fixed. Thank you!

@y123456y78
Copy link
Copy Markdown
Contributor Author

y123456y78 commented Apr 16, 2026

Hi @linyueqian @lishunyang12, I finished the end-to-end changes and the PR is ready for review. I tried to follow the diffusion model that model code can access request.sampling_params.extra_args directly. But since AR model doesn't access request, we need to unpack the extra_args to extra_kwargs dict to pass to the model code.

If you think the design need a large refactor to be done properly. We can switch back to use #2243 to limit the change in the scope of voxtral tts for now.

@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 21, 2026
@linyueqian linyueqian added this to the v0.20.0 milestone Apr 22, 2026
Resolves conflicts from upstream 3f504b4 (Pipeline + Deploy schema migration)
and upstream e076378 (sleep mode):

- vllm_omni/model_executor/stage_configs/voxtral_tts.yaml: deleted upstream
  as part of migration. Ported PR vllm-project#2338's cfg_alpha/extra_args addition to
  the new location vllm_omni/deploy/voxtral_tts.yaml (stage 0
  default_sampling_params).
- vllm_omni/config/model.py: kept both new OmniModelConfig fields,
  enable_sleep_mode (from upstream) and has_sampling_extra_args (from PR).

No functional change beyond conflict resolution.

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@linyueqian linyueqian enabled auto-merge (squash) April 22, 2026 03:17
@linyueqian linyueqian merged commit 8e1bbc9 into vllm-project:main Apr 22, 2026
7 of 8 checks passed
qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026
…cfg_alpha for Voxtral TTS (vllm-project#2338)

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
gnomefin added a commit to gnomefin/vllm-omni that referenced this pull request Apr 25, 2026
Per vllm-project#3118 review feedback, follow the convention established by
vllm-project#2338 (Voxtral TTS / cfg_alpha): model-specific knobs live inside the
shared `extra_params: dict[str, Any]` rather than as top-level fields
on `OpenAICreateSpeechRequest`. Keeps the protocol surface clean as
more TTS backends add per-model parameters.

Wire shape change:

* Drop `cfg_value: float | None` field from `protocol/audio.py`. Users
  now send `{"extra_params": {"cfg_value": 2.5}}` instead of
  `{"cfg_value": 2.5}`.
* `_build_voxcpm2_prompt` reads `request.extra_params["cfg_value"]`
  and validates the 0.1-10.0 range manually (pydantic auto-validation
  is gone with the field). Non-numeric values raise ValueError with a
  clear message.
* Update tests: 7 cfg-related cases rewritten to use `extra_params`,
  plus 4 new parametrized non-numeric rejection cases. 29/29 pass.
* Update `docs/serving/speech_api.md`: VoxCPM2-specific Parameters
  subsection now describes `cfg_value` as a key inside `extra_params`,
  curl examples updated, mode table fields show `extra_params.cfg_value`.

No talker-side changes; `_RequestState.cfg_value` and `_run_cfm` still
operate on a float, only the protocol surface moved.

Signed-off-by: gnomefin <alfian@uselevers.com>
gnomefin added a commit to gnomefin/vllm-omni that referenced this pull request Apr 25, 2026
Per vllm-project#3118 review feedback, follow the convention established by
vllm-project#2338 (Voxtral TTS / cfg_alpha): model-specific knobs live inside the
shared `extra_params: dict[str, Any]` rather than as top-level fields
on `OpenAICreateSpeechRequest`. Keeps the protocol surface clean as
more TTS backends add per-model parameters.

Wire shape change:

* Drop `cfg_value: float | None` field from `protocol/audio.py`. Users
  now send `{"extra_params": {"cfg_value": 2.5}}` instead of
  `{"cfg_value": 2.5}`.
* `_build_voxcpm2_prompt` reads `request.extra_params["cfg_value"]`
  and validates the 0.1-10.0 range manually (pydantic auto-validation
  is gone with the field). Non-numeric values raise ValueError with a
  clear message.
* Update tests: 7 cfg-related cases rewritten to use `extra_params`,
  plus 4 new parametrized non-numeric rejection cases. 29/29 pass.
* Update `docs/serving/speech_api.md`: VoxCPM2-specific Parameters
  subsection now describes `cfg_value` as a key inside `extra_params`,
  curl examples updated, mode table fields show `extra_params.cfg_value`.

No talker-side changes; `_RequestState.cfg_value` and `_run_cfm` still
operate on a float, only the protocol surface moved.

Signed-off-by: gnomefin <alfian@uselevers.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
…cfg_alpha for Voxtral TTS (vllm-project#2338)

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
…cfg_alpha for Voxtral TTS (vllm-project#2338)

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants