[Feature][Voxtral TTS] Support per-model extra sampling params & add cfg_alpha for Voxtral TTS#2338
Conversation
|
@Sy0307 @JuanPZuluaga @hsliuustc0106 does this makes sense to you compared to #2243? I think this it the right approach as future model can use this as well. Also cc @lishunyang12 |
lishunyang12
left a comment
There was a problem hiding this comment.
left a couple comments
|
|
||
| def make_omni_output( | ||
| self, model_outputs: torch.Tensor | OmniOutput | tuple, logits_index: int | None = None, **kwargs | ||
| ) -> OmniOutput: |
There was a problem hiding this comment.
_extract_cfg_alpha is defined but never called anywhere in this PR. Is this meant to be used inside forward() or make_omni_output()? Either wire it up or leave it out until the follow-up that actually needs it — dead code just rots.
| req = self.requests.get(req_id) | ||
| sp = req.sampling_params if req else None | ||
| extra_args_list.append( | ||
| sp.extra_args if sp and sp.extra_args else {} |
There was a problem hiding this comment.
This runs for every model on every step, not just voxtral. Iterating over all request IDs and doing dict lookups to collect extra_args adds overhead that most models will never use. Could you guard this behind a check (e.g. a model capability flag or at least if any request has extra_args) so we're not paying the cost unconditionally?
There was a problem hiding this comment.
Nice catch! Add a new args has_sampling_extra_args. Thank you.
| # Gather extra_args from per-request SamplingParams so models can | ||
| # access custom parameters (e.g. cfg_alpha for VoxtralTTS). | ||
| extra_args_list: list[dict] = [] | ||
| for req_id in self.input_batch.req_ids: |
There was a problem hiding this comment.
Nit: existing code at L215/L1027 uses self.requests[req_id] directly since req_id comes from input_batch.req_ids and should always be present. The defensive .get() is fine but inconsistent with the rest of the file.
|
lmk if it is ready for review. thanks! |
b754f44 to
3a4ce03
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
fix dco and pre-commit pls |
7989d0a to
791bf7a
Compare
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
791bf7a to
a134106
Compare
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai>
Fixed. Thank you! |
|
Hi @linyueqian @lishunyang12, I finished the end-to-end changes and the PR is ready for review. I tried to follow the diffusion model that model code can access If you think the design need a large refactor to be done properly. We can switch back to use #2243 to limit the change in the scope of voxtral tts for now. |
Resolves conflicts from upstream 3f504b4 (Pipeline + Deploy schema migration) and upstream e076378 (sleep mode): - vllm_omni/model_executor/stage_configs/voxtral_tts.yaml: deleted upstream as part of migration. Ported PR vllm-project#2338's cfg_alpha/extra_args addition to the new location vllm_omni/deploy/voxtral_tts.yaml (stage 0 default_sampling_params). - vllm_omni/config/model.py: kept both new OmniModelConfig fields, enable_sleep_mode (from upstream) and has_sampling_extra_args (from PR). No functional change beyond conflict resolution. Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Per vllm-project#3118 review feedback, follow the convention established by vllm-project#2338 (Voxtral TTS / cfg_alpha): model-specific knobs live inside the shared `extra_params: dict[str, Any]` rather than as top-level fields on `OpenAICreateSpeechRequest`. Keeps the protocol surface clean as more TTS backends add per-model parameters. Wire shape change: * Drop `cfg_value: float | None` field from `protocol/audio.py`. Users now send `{"extra_params": {"cfg_value": 2.5}}` instead of `{"cfg_value": 2.5}`. * `_build_voxcpm2_prompt` reads `request.extra_params["cfg_value"]` and validates the 0.1-10.0 range manually (pydantic auto-validation is gone with the field). Non-numeric values raise ValueError with a clear message. * Update tests: 7 cfg-related cases rewritten to use `extra_params`, plus 4 new parametrized non-numeric rejection cases. 29/29 pass. * Update `docs/serving/speech_api.md`: VoxCPM2-specific Parameters subsection now describes `cfg_value` as a key inside `extra_params`, curl examples updated, mode table fields show `extra_params.cfg_value`. No talker-side changes; `_RequestState.cfg_value` and `_run_cfm` still operate on a float, only the protocol surface moved. Signed-off-by: gnomefin <alfian@uselevers.com>
Per vllm-project#3118 review feedback, follow the convention established by vllm-project#2338 (Voxtral TTS / cfg_alpha): model-specific knobs live inside the shared `extra_params: dict[str, Any]` rather than as top-level fields on `OpenAICreateSpeechRequest`. Keeps the protocol surface clean as more TTS backends add per-model parameters. Wire shape change: * Drop `cfg_value: float | None` field from `protocol/audio.py`. Users now send `{"extra_params": {"cfg_value": 2.5}}` instead of `{"cfg_value": 2.5}`. * `_build_voxcpm2_prompt` reads `request.extra_params["cfg_value"]` and validates the 0.1-10.0 range manually (pydantic auto-validation is gone with the field). Non-numeric values raise ValueError with a clear message. * Update tests: 7 cfg-related cases rewritten to use `extra_params`, plus 4 new parametrized non-numeric rejection cases. 29/29 pass. * Update `docs/serving/speech_api.md`: VoxCPM2-specific Parameters subsection now describes `cfg_value` as a key inside `extra_params`, curl examples updated, mode table fields show `extra_params.cfg_value`. No talker-side changes; `_RequestState.cfg_value` and `_run_cfm` still operate on a float, only the protocol surface moved. Signed-off-by: gnomefin <alfian@uselevers.com>
…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
…cfg_alpha for Voxtral TTS (vllm-project#2338) Signed-off-by: Chen-Yo Sun <chenyo.sun@mistral.ai> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Purpose
has_sampling_extra_argsinOmniModelConfigandOmniEngineArgs, set True when stage config yaml'sdefault_sampling_paramscontainsextra_argGpuModelRunner._build_model_kwargs_extrato collect extra sampling param whenself.model_config.has_sampling_extra_argisTrueserving_speech.pyto handle extra params in requestTest Plan
Test Result