Skip to content

[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech#2121

Closed
divyanshsinghvi wants to merge 2 commits into
vllm-project:mainfrom
divyanshsinghvi:fix/cosyvoice3-online-serving
Closed

[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech#2121
divyanshsinghvi wants to merge 2 commits into
vllm-project:mainfrom
divyanshsinghvi:fix/cosyvoice3-online-serving

Conversation

@divyanshsinghvi
Copy link
Copy Markdown
Contributor

@divyanshsinghvi divyanshsinghvi commented Mar 24, 2026

CosyVoice3 model_stage values (talker/code2wav) were not recognized by the online speech serving path, causing requests to fall through to the generic text-only prompt builder and crash with CUDA index out-of-bounds.

  • Namespace CosyVoice3 stage types to cosyvoice3_talker/cosyvoice3_code2wav to avoid collision with other models using the same generic names
  • Register cosyvoice3_talker in _TTS_MODEL_STAGES so the model is recognized as TTS in the serving layer
  • Add cosyvoice3 branch in _prepare_speech_generation to build the correct multimodal prompt (audio data + prompt_text) matching the offline inference format

Closes #2043

Purpose

Fix CosyVoice3 online serving via /v1/audio/speech endpoint which was crashing with CUDA index out-of-bounds because the model was not recognized as TTS.

Test Plan

  • Verify offline inference still works with verify_e2e_cosyvoice.py
  • Verify online serving via /v1/audio/speech with CosyVoice3 model (requires ref_audio and ref_text)

Test Result

Pending


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

…ject#2043)

CosyVoice3 model_stage values (talker/code2wav) were not recognized by
the online speech serving path, causing requests to fall through to the
generic text-only prompt builder and crash with CUDA index out-of-bounds.

- Namespace CosyVoice3 stage types to cosyvoice3_talker/cosyvoice3_code2wav
  to avoid collision with other models using the same generic names
- Register cosyvoice3_talker in _TTS_MODEL_STAGES so the model is
  recognized as TTS in the serving layer
- Add cosyvoice3 branch in _prepare_speech_generation to build the
  correct multimodal prompt (audio data + prompt_text) matching the
  offline inference format

Closes vllm-project#2043

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! Please post test results and supplement UT for protecting functions

@linyueqian
Copy link
Copy Markdown
Collaborator

@divyanshsinghvi is there any updates? thanks!

@linyueqian linyueqian marked this pull request as ready for review March 31, 2026 03:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 056ac58036

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"sample_rate": sr,
},
}
tts_params = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect max_new_tokens for CosyVoice3 requests

The new CosyVoice3 path drops all per-request generation controls by setting tts_params = {} and never mapping request.max_new_tokens into sampling params, so /v1/audio/speech callers who set max_new_tokens for latency/cost control will have that limit silently ignored. This is observable whenever max_new_tokens is provided with a CosyVoice3 model and can lead to much longer-than-requested decoding runs.

Useful? React with 👍 / 👎.

Cover the changes in this PR:
- model_stage rename (cosyvoice3_talker/cosyvoice3_code2wav)
- TTS model type detection for cosyvoice3
- Validation: ref_audio, ref_text, and input text required
- Prompt building with audio data and processor kwargs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian
Copy link
Copy Markdown
Collaborator

Added unit tests (8 tests, all passing) covering:

  • model_stage rename consistency (cosyvoice3_talker/cosyvoice3_code2wav in _TTS_MODEL_STAGES)
  • TTS model type detection for cosyvoice3
  • Input validation (ref_audio, ref_text, empty input)
  • Prompt building structure

Could you also add e2e benchmark results (TTFP, RTF, latency) from a working run? That would help get this merged.

@divyanshsinghvi
Copy link
Copy Markdown
Contributor Author

Added unit tests (8 tests, all passing) covering:

  • model_stage rename consistency (cosyvoice3_talker/cosyvoice3_code2wav in _TTS_MODEL_STAGES)
  • TTS model type detection for cosyvoice3
  • Input validation (ref_audio, ref_text, empty input)
  • Prompt building structure

Could you also add e2e benchmark results (TTFP, RTF, latency) from a working run? That would help get this merged.

Will do over next few days.

@linyueqian
Copy link
Copy Markdown
Collaborator

Hi @divyanshsinghvi since #2431 is merged, I will close this pr for now.

@linyueqian linyueqian closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: cannot run Cosyvoice3 offline with ValueError: This model does not support generation

3 participants