[CosyVoice3] Fix vLLM 0.19.0 compatibility issues#2486
Merged
Gaohan123 merged 2 commits intoApr 5, 2026
Conversation
vLLM 0.19.0 changed several defaults that broke CosyVoice3: 1. **Stage config resolution**: `resolve_model_config_path` fails for models with empty config.json (no model_type). Add fallback that matches model name against registered stage config filenames. 2. **EOS token not set**: CosyVoice3Config stores eos_token_id=6562 in a nested `llm` dict but the top-level PretrainedConfig field was None. vLLM reads the top-level field, so generation never stopped. Set it via kwargs.setdefault in __init__. 3. **SamplingParams.max_tokens default changed to 16**: vLLM 0.19.0 defaults max_tokens=16 (was higher before). Add default_sampling_params with max_tokens=2048 and stop_token_ids=[6562] to both stages. 4. **Embedding OOB crash in code2wav**: Speech EOS token (6562) exceeds the flow model's embedding table size (6561). Clamp token IDs to valid range before embedding lookup. Signed-off-by: linyueqian <linyueqian@outlook.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 64cbbcc281
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
7c3cba3 to
aee3884
Compare
- Wrap sr=22050 as torch.tensor in code2wav output so the generation model runner doesn't silently drop it (only tensor outputs accepted). Fixes "First audio chunk must include sample rate metadata" assertion. - Skip test_voice_clone_zh_002 (stream=True) because CosyVoice3 does not have async_chunk streaming support yet. Signed-off-by: linyueqian <linyueqian@outlook.com>
aee3884 to
3d59077
Compare
linyueqian
added a commit
to indevn/vllm-omni
that referenced
this pull request
Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Unskip streaming e2e test now that async_chunk is supported Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian
added a commit
to indevn/vllm-omni
that referenced
this pull request
Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Enable streaming e2e test with async_chunk config Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian
added a commit
to indevn/vllm-omni
that referenced
this pull request
Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Enable streaming e2e test with async_chunk config Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian
added a commit
to indevn/vllm-omni
that referenced
this pull request
Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids) to both stages - Enable streaming e2e test with async_chunk config Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian
added a commit
to indevn/vllm-omni
that referenced
this pull request
Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486) - Rename model_stage: code2wav -> cosyvoice3_code2wav - Add default_sampling_params (max_tokens, stop_token_ids, repetition_penalty) to both stages; repetition_penalty=2.0 is required to enable output_token_ids tracking in vLLM, which feeds the RAS sampler's windowed token history - Fix unit test model_stage references (code2wav -> cosyvoice3_code2wav) - Enable streaming e2e test with async_chunk config - Restore advanced_model markers for transcription quality checks Signed-off-by: linyueqian <linyueqian@outlook.com>
skf-1999
pushed a commit
to Semmer2/vllm-omni
that referenced
this pull request
Apr 7, 2026
vraiti
pushed a commit
to vraiti/vllm-omni
that referenced
this pull request
Apr 9, 2026
lengrongfu
pushed a commit
to lengrongfu/vllm-omni
that referenced
this pull request
May 1, 2026
clodaghwalsh17
pushed a commit
to clodaghwalsh17/nm-vllm-omni-ent
that referenced
this pull request
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vLLM 0.19.0 rebase (#2475) broke CosyVoice3 in several ways. This PR fixes all identified issues:
resolve_model_config_pathfails for models with emptyconfig.json(nomodel_type). Added fallback that matches model name against registered stage config filenames.CosyVoice3Configstoreseos_token_id=6562in a nestedllmdict but the top-levelPretrainedConfigfield wasNone. vLLM reads the top-level field, so generation never stopped (~76s of audio for a short sentence). Fixed by setting it in__init__.SamplingParams.max_tokensdefault changed to 16: vLLM 0.19.0 defaultsmax_tokens=16(was higher before), causing truncated 0.6s audio. Addeddefault_sampling_paramswithmax_tokens=2048andstop_token_ids=[6562]to both stages.vectorized_gather_kernelassertion failure. Clamped token IDs to valid range.Test plan
advanced_modellevel per [CI] Fix missing queue for Voxtral-TTS E2E test step #2484)