[Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS by y123456y78 · Pull Request #2338 · vllm-project/vllm-omni

y123456y78 · 2026-03-30T14:10:40Z

Purpose

Add new arg has_sampling_extra_args in OmniModelConfig and OmniEngineArgs, set True when stage config yaml's default_sampling_params contains extra_arg
Add new logic in GpuModelRunner._build_model_kwargs_extra to collect extra sampling param when self.model_config.has_sampling_extra_arg is True
Update serving_speech.py to handle extra params in request
Add cfg_alpha in voxtral tts model and example

Test Plan

pytest -s -v   tests/model_executor/stage_input_processors/test_voxtral_tts_async_chunk.py   \
tests/model_executor/models/voxtral_tts/test_cuda_graph_acoustic_transformer.py   \
tests/model_executor/models/voxtral_tts/test_audio_tokenizer_parsing.py   \
tests/e2e/online_serving/test_voxtral_tts.py \
tests/model_executor/models/voxtral_tts/test_text_preprocess.py  \
tests/e2e/offline_inference/test_voxtral_tts.py

Test Result

linyueqian · 2026-03-30T15:44:11Z

@Sy0307 @JuanPZuluaga @hsliuustc0106 does this makes sense to you compared to #2243? I think this it the right approach as future model can use this as well. Also cc @lishunyang12

lishunyang12

left a couple comments

lishunyang12 · 2026-04-02T15:40:23Z

+
    def make_omni_output(
        self, model_outputs: torch.Tensor | OmniOutput | tuple, logits_index: int | None = None, **kwargs
    ) -> OmniOutput:


_extract_cfg_alpha is defined but never called anywhere in this PR. Is this meant to be used inside forward() or make_omni_output()? Either wire it up or leave it out until the follow-up that actually needs it — dead code just rots.

Fix. Thank you!

lishunyang12 · 2026-04-02T15:40:23Z

+            req = self.requests.get(req_id)
+            sp = req.sampling_params if req else None
+            extra_args_list.append(
+                sp.extra_args if sp and sp.extra_args else {}


This runs for every model on every step, not just voxtral. Iterating over all request IDs and doing dict lookups to collect extra_args adds overhead that most models will never use. Could you guard this behind a check (e.g. a model capability flag or at least if any request has extra_args) so we're not paying the cost unconditionally?

Nice catch! Add a new args has_sampling_extra_args. Thank you.

lishunyang12 · 2026-04-02T15:40:23Z

+        # Gather extra_args from per-request SamplingParams so models can
+        # access custom parameters (e.g. cfg_alpha for VoxtralTTS).
+        extra_args_list: list[dict] = []
+        for req_id in self.input_batch.req_ids:


Nit: existing code at L215/L1027 uses self.requests[req_id] directly since req_id comes from input_batch.req_ids and should always be present. The defensive .get() is fine but inconsistent with the rest of the file.

Fix. Thank you!

linyueqian · 2026-04-07T22:15:12Z

lmk if it is ready for review. thanks!

chatgpt-codex-connector · 2026-04-15T23:24:39Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

linyueqian · 2026-04-15T23:26:07Z

fix dco and pre-commit pls

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

y123456y78 · 2026-04-16T00:42:07Z

commit

Fixed. Thank you!

y123456y78 · 2026-04-16T16:00:49Z

Hi @linyueqian @lishunyang12, I finished the end-to-end changes and the PR is ready for review. I tried to follow the diffusion model that model code can access request.sampling_params.extra_args directly. But since AR model doesn't access request, we need to unpack the extra_args to extra_kwargs dict to pass to the model code.

If you think the design need a large refactor to be done properly. We can switch back to use #2243 to limit the change in the scope of voxtral tts for now.

Resolves conflicts from upstream 3f504b4 (Pipeline + Deploy schema migration) and upstream e076378 (sleep mode): - vllm_omni/model_executor/stage_configs/voxtral_tts.yaml: deleted upstream as part of migration. Ported PR vllm-project#2338's cfg_alpha/extra_args addition to the new location vllm_omni/deploy/voxtral_tts.yaml (stage 0 default_sampling_params). - vllm_omni/config/model.py: kept both new OmniModelConfig fields, enable_sleep_mode (from upstream) and has_sampling_extra_args (from PR). No functional change beyond conflict resolution. Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian

lgtm

…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>

Per vllm-project#3118 review feedback, follow the convention established by vllm-project#2338 (Voxtral TTS / cfg_alpha): model-specific knobs live inside the shared `extra_params: dict[str, Any]` rather than as top-level fields on `OpenAICreateSpeechRequest`. Keeps the protocol surface clean as more TTS backends add per-model parameters. Wire shape change: * Drop `cfg_value: float | None` field from `protocol/audio.py`. Users now send `{"extra_params": {"cfg_value": 2.5}}` instead of `{"cfg_value": 2.5}`. * `_build_voxcpm2_prompt` reads `request.extra_params["cfg_value"]` and validates the 0.1-10.0 range manually (pydantic auto-validation is gone with the field). Non-numeric values raise ValueError with a clear message. * Update tests: 7 cfg-related cases rewritten to use `extra_params`, plus 4 new parametrized non-numeric rejection cases. 29/29 pass. * Update `docs/serving/speech_api.md`: VoxCPM2-specific Parameters subsection now describes `cfg_value` as a key inside `extra_params`, curl examples updated, mode table fields show `extra_params.cfg_value`. No talker-side changes; `_RequestState.cfg_value` and `_run_cfm` still operate on a float, only the protocol surface moved. Signed-off-by: gnomefin <alfian@uselevers.com>

…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from b754f44 to 3a4ce03 Compare April 13, 2026 15:14

y123456y78 changed the title ~~Add extra sampling params~~ [Feature] Support per-model extra sampling params Apr 15, 2026

y123456y78 changed the title ~~[Feature] Support per-model extra sampling params~~ [Feature] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS Apr 15, 2026

y123456y78 changed the title ~~[Feature] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS~~ [Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS Apr 15, 2026

y123456y78 marked this pull request as ready for review April 15, 2026 23:24

y123456y78 requested a review from hsliuustc0106 as a code owner April 15, 2026 23:24

y123456y78 requested a review from lishunyang12 April 15, 2026 23:25

y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from 7989d0a to 791bf7a Compare April 16, 2026 00:11

y123456y78 added 17 commits April 16, 2026 00:12

Add extra sampling params

85dc796

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Remove redundant code

29a52e1

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Fix

a64b28b

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Clean up comment

b8265ed

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Revise access request by id

dfedfd7

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Use cfg_alpha in Voxtral TTS

78f0601

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Add cfg_alpha in test

0df9bff

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Make cfg_alpha required at test

e695ebd

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Add cfg_alpha in end2end.py

12a9a1c

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Add cfg_alpha in voxtral tts gradio demo

8e54a56

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Update voxtral tts gradio demo cfg_alpha range

2575f3c

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Remove default logic

b11c3c7

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Add back default

9de7cf4

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Use general extra args

1f2c700

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Update name

39eb95e

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Fix typo

2c33983

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Align extra args in speech endpoint with video

a134106

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

y123456y78 force-pushed the chenyo/voxtral-tts-extra-sampling-params branch from 791bf7a to a134106 Compare April 16, 2026 00:12

y123456y78 added 2 commits April 16, 2026 00:18

Fix format

a715486

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

Remove debug log

d879ec5

Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>

linyueqian added the ready label to trigger buildkite CI label Apr 21, 2026

linyueqian added this to the v0.20.0 milestone Apr 22, 2026

linyueqian approved these changes Apr 22, 2026

View reviewed changes

linyueqian enabled auto-merge (squash) April 22, 2026 03:17

linyueqian merged commit 8e1bbc9 into vllm-project:main Apr 22, 2026
7 of 8 checks passed

gnomefin mentioned this pull request Apr 25, 2026

[Doc][Frontend][Model][VoxCPM2] Support instructions and per-request cfg_value #3118

Merged

Conversation

y123456y78 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

linyueqian commented Mar 30, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

y123456y78 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

y123456y78 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

y123456y78 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 7, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 15, 2026

Uh oh!

linyueqian commented Apr 15, 2026

Uh oh!

y123456y78 commented Apr 16, 2026

Uh oh!

y123456y78 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

y123456y78 commented Mar 30, 2026 •

edited

Loading

y123456y78 commented Apr 16, 2026 •

edited

Loading