[CosyVoice3] Fix vLLM 0.19.0 compatibility issues by linyueqian · Pull Request #2486 · vllm-project/vllm-omni

linyueqian · 2026-04-04T18:59:52Z

Summary

vLLM 0.19.0 rebase (#2475) broke CosyVoice3 in several ways. This PR fixes all identified issues:

Stage config resolution: resolve_model_config_path fails for models with empty config.json (no model_type). Added fallback that matches model name against registered stage config filenames.
EOS token not set: CosyVoice3Config stores eos_token_id=6562 in a nested llm dict but the top-level PretrainedConfig field was None. vLLM reads the top-level field, so generation never stopped (~76s of audio for a short sentence). Fixed by setting it in __init__.
SamplingParams.max_tokens default changed to 16: vLLM 0.19.0 defaults max_tokens=16 (was higher before), causing truncated 0.6s audio. Added default_sampling_params with max_tokens=2048 and stop_token_ids=[6562] to both stages.
Embedding OOB crash in code2wav: Speech EOS token (6562) exceeds the flow model's embedding table size (6561), causing CUDA vectorized_gather_kernel assertion failure. Clamped token IDs to valid range.

Test plan

Verified non-streaming ZH and EN speech generation produces correct-length audio (~6s)
Tested with vLLM 0.19.0 + CosyVoice3 locally on H100
CI pre-merge tests (now running advanced_model level per [CI] Fix missing queue for Voxtral-TTS E2E test step #2484)

vLLM 0.19.0 changed several defaults that broke CosyVoice3: 1. **Stage config resolution**: `resolve_model_config_path` fails for models with empty config.json (no model_type). Add fallback that matches model name against registered stage config filenames. 2. **EOS token not set**: CosyVoice3Config stores eos_token_id=6562 in a nested `llm` dict but the top-level PretrainedConfig field was None. vLLM reads the top-level field, so generation never stopped. Set it via kwargs.setdefault in __init__. 3. **SamplingParams.max_tokens default changed to 16**: vLLM 0.19.0 defaults max_tokens=16 (was higher before). Add default_sampling_params with max_tokens=2048 and stop_token_ids=[6562] to both stages. 4. **Embedding OOB crash in code2wav**: Speech EOS token (6562) exceeds the flow model's embedding table size (6561). Clamp token IDs to valid range before embedding lookup. Signed-off-by: linyueqian <linyueqian@outlook.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64cbbcc281

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

- Wrap sr=22050 as torch.tensor in code2wav output so the generation model runner doesn't silently drop it (only tensor outputs accepted). Fixes "First audio chunk must include sample rate metadata" assertion. - Skip test_voice_clone_zh_002 (stream=True) because CosyVoice3 does not have async_chunk streaming support yet. Signed-off-by: linyueqian <linyueqian@outlook.com>

Gaohan123

LGTM. Thanks,

- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Unskip streaming e2e test now that async_chunk is supported Signed-off-by: linyueqian <linyueqian@outlook.com>

- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Enable streaming e2e test with async_chunk config Signed-off-by: linyueqian <linyueqian@outlook.com>

- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids, repetition_penalty) to both stages; repetition_penalty=2.0 is required to enable output_token_ids tracking in vLLM, which feeds the RAS sampler's windowed token history - Fix unit test model_stage references (code2wav -> cosyvoice3_code2wav) - Enable streaming e2e test with async_chunk config - Restore advanced_model markers for transcription quality checks Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner April 4, 2026 18:59

linyueqian added the ready label to trigger buildkite CI label Apr 4, 2026

chatgpt-codex-connector Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread vllm_omni/entrypoints/utils.py Outdated

linyueqian force-pushed the fix/cosyvoice3-vllm019-compat branch from 7c3cba3 to aee3884 Compare April 4, 2026 19:28

linyueqian force-pushed the fix/cosyvoice3-vllm019-compat branch from aee3884 to 3d59077 Compare April 4, 2026 19:34

linyueqian requested a review from Gaohan123 April 4, 2026 19:54

linyueqian mentioned this pull request Apr 5, 2026

[Model][Core] Enable async_chunk streaming pipeline for CosyVoice3 #1703

Merged

Gaohan123 approved these changes Apr 5, 2026

View reviewed changes

Gaohan123 merged commit d92439c into vllm-project:main Apr 5, 2026
8 checks passed

skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues (vllm-project#2486)

502f589

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues (vllm-project#2486)

da515d4

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues (vllm-project#2486)

273afd3

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues (vllm-project#2486)

c7a36bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues#2486

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues#2486
Gaohan123 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/cosyvoice3-vllm019-compat

linyueqian commented Apr 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Gaohan123 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linyueqian commented Apr 4, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants