[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech by divyanshsinghvi · Pull Request #2121 · vllm-project/vllm-omni

divyanshsinghvi · 2026-03-24T07:24:56Z

CosyVoice3 model_stage values (talker/code2wav) were not recognized by the online speech serving path, causing requests to fall through to the generic text-only prompt builder and crash with CUDA index out-of-bounds.

Namespace CosyVoice3 stage types to cosyvoice3_talker/cosyvoice3_code2wav to avoid collision with other models using the same generic names
Register cosyvoice3_talker in _TTS_MODEL_STAGES so the model is recognized as TTS in the serving layer
Add cosyvoice3 branch in _prepare_speech_generation to build the correct multimodal prompt (audio data + prompt_text) matching the offline inference format

Closes #2043

Purpose

Fix CosyVoice3 online serving via /v1/audio/speech endpoint which was crashing with CUDA index out-of-bounds because the model was not recognized as TTS.

Test Plan

Verify offline inference still works with verify_e2e_cosyvoice.py
Verify online serving via /v1/audio/speech with CosyVoice3 model (requires ref_audio and ref_text)

Test Result

Pending

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

…ject#2043) CosyVoice3 model_stage values (talker/code2wav) were not recognized by the online speech serving path, causing requests to fall through to the generic text-only prompt builder and crash with CUDA index out-of-bounds. - Namespace CosyVoice3 stage types to cosyvoice3_talker/cosyvoice3_code2wav to avoid collision with other models using the same generic names - Register cosyvoice3_talker in _TTS_MODEL_STAGES so the model is recognized as TTS in the serving layer - Add cosyvoice3 branch in _prepare_speech_generation to build the correct multimodal prompt (audio data + prompt_text) matching the offline inference format Closes vllm-project#2043 Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Gaohan123

Thanks for your contribution! Please post test results and supplement UT for protecting functions

linyueqian · 2026-03-31T03:53:09Z

@divyanshsinghvi is there any updates? thanks!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 056ac58036

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T04:06:59Z

+                        "sample_rate": sr,
+                    },
+                }
+                tts_params = {}


Respect max_new_tokens for CosyVoice3 requests

The new CosyVoice3 path drops all per-request generation controls by setting tts_params = {} and never mapping request.max_new_tokens into sampling params, so /v1/audio/speech callers who set max_new_tokens for latency/cost control will have that limit silently ignored. This is observable whenever max_new_tokens is provided with a CosyVoice3 model and can lead to much longer-than-requested decoding runs.

Useful? React with 👍 / 👎.

Cover the changes in this PR: - model_stage rename (cosyvoice3_talker/cosyvoice3_code2wav) - TTS model type detection for cosyvoice3 - Validation: ref_audio, ref_text, and input text required - Prompt building with audio data and processor kwargs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian · 2026-03-31T04:21:33Z

Added unit tests (8 tests, all passing) covering:

model_stage rename consistency (cosyvoice3_talker/cosyvoice3_code2wav in _TTS_MODEL_STAGES)
TTS model type detection for cosyvoice3
Input validation (ref_audio, ref_text, empty input)
Prompt building structure

Could you also add e2e benchmark results (TTFP, RTF, latency) from a working run? That would help get this merged.

divyanshsinghvi · 2026-03-31T05:53:29Z

Added unit tests (8 tests, all passing) covering:

model_stage rename consistency (cosyvoice3_talker/cosyvoice3_code2wav in _TTS_MODEL_STAGES)

TTS model type detection for cosyvoice3

Input validation (ref_audio, ref_text, empty input)

Prompt building structure

Could you also add e2e benchmark results (TTFP, RTF, latency) from a working run? That would help get this merged.

Will do over next few days.

linyueqian · 2026-04-04T03:07:45Z

Hi @divyanshsinghvi since #2431 is merged, I will close this pr for now.

Gaohan123 reviewed Mar 25, 2026

View reviewed changes

linyueqian marked this pull request as ready for review March 31, 2026 03:59

linyueqian requested a review from hsliuustc0106 as a code owner March 31, 2026 03:59

chatgpt-codex-connector Bot reviewed Mar 31, 2026

View reviewed changes

linyueqian mentioned this pull request Apr 1, 2026

[CosyVoice3] Add online serving support, fix stage config, and add CI tests #2431

Merged

7 tasks

linyueqian closed this Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech#2121

[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech#2121
divyanshsinghvi wants to merge 2 commits into
vllm-project:mainfrom
divyanshsinghvi:fix/cosyvoice3-online-serving

divyanshsinghvi commented Mar 24, 2026 •

edited

Loading

Uh oh!

Gaohan123 left a comment

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

divyanshsinghvi commented Mar 31, 2026

Uh oh!

linyueqian commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

divyanshsinghvi commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Mar 31, 2026

Uh oh!

divyanshsinghvi commented Mar 31, 2026

Uh oh!

linyueqian commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

divyanshsinghvi commented Mar 24, 2026 •

edited

Loading