Skip to content

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues#2486

Merged
Gaohan123 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/cosyvoice3-vllm019-compat
Apr 5, 2026
Merged

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues#2486
Gaohan123 merged 2 commits into
vllm-project:mainfrom
linyueqian:fix/cosyvoice3-vllm019-compat

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

Summary

vLLM 0.19.0 rebase (#2475) broke CosyVoice3 in several ways. This PR fixes all identified issues:

  • Stage config resolution: resolve_model_config_path fails for models with empty config.json (no model_type). Added fallback that matches model name against registered stage config filenames.
  • EOS token not set: CosyVoice3Config stores eos_token_id=6562 in a nested llm dict but the top-level PretrainedConfig field was None. vLLM reads the top-level field, so generation never stopped (~76s of audio for a short sentence). Fixed by setting it in __init__.
  • SamplingParams.max_tokens default changed to 16: vLLM 0.19.0 defaults max_tokens=16 (was higher before), causing truncated 0.6s audio. Added default_sampling_params with max_tokens=2048 and stop_token_ids=[6562] to both stages.
  • Embedding OOB crash in code2wav: Speech EOS token (6562) exceeds the flow model's embedding table size (6561), causing CUDA vectorized_gather_kernel assertion failure. Clamped token IDs to valid range.

Test plan

vLLM 0.19.0 changed several defaults that broke CosyVoice3:

1. **Stage config resolution**: `resolve_model_config_path` fails for
   models with empty config.json (no model_type). Add fallback that
   matches model name against registered stage config filenames.

2. **EOS token not set**: CosyVoice3Config stores eos_token_id=6562 in
   a nested `llm` dict but the top-level PretrainedConfig field was None.
   vLLM reads the top-level field, so generation never stopped. Set it
   via kwargs.setdefault in __init__.

3. **SamplingParams.max_tokens default changed to 16**: vLLM 0.19.0
   defaults max_tokens=16 (was higher before). Add default_sampling_params
   with max_tokens=2048 and stop_token_ids=[6562] to both stages.

4. **Embedding OOB crash in code2wav**: Speech EOS token (6562) exceeds
   the flow model's embedding table size (6561). Clamp token IDs to valid
   range before embedding lookup.

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64cbbcc281

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/entrypoints/utils.py Outdated
@linyueqian linyueqian force-pushed the fix/cosyvoice3-vllm019-compat branch from 7c3cba3 to aee3884 Compare April 4, 2026 19:28
- Wrap sr=22050 as torch.tensor in code2wav output so the generation
  model runner doesn't silently drop it (only tensor outputs accepted).
  Fixes "First audio chunk must include sample rate metadata" assertion.

- Skip test_voice_clone_zh_002 (stream=True) because CosyVoice3 does
  not have async_chunk streaming support yet.

Signed-off-by: linyueqian <linyueqian@outlook.com>
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks,

@Gaohan123 Gaohan123 merged commit d92439c into vllm-project:main Apr 5, 2026
8 checks passed
linyueqian added a commit to indevn/vllm-omni that referenced this pull request Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486)
- Rename model_stage: code2wav -> cosyvoice3_code2wav
- Add default_sampling_params (max_tokens, stop_token_ids) to both stages
- Unskip streaming e2e test now that async_chunk is supported

Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian added a commit to indevn/vllm-omni that referenced this pull request Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486)
- Rename model_stage: code2wav -> cosyvoice3_code2wav
- Add default_sampling_params (max_tokens, stop_token_ids) to both stages
- Enable streaming e2e test with async_chunk config

Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian added a commit to indevn/vllm-omni that referenced this pull request Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486)
- Rename model_stage: code2wav -> cosyvoice3_code2wav
- Add default_sampling_params (max_tokens, stop_token_ids) to both stages
- Enable streaming e2e test with async_chunk config

Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian added a commit to indevn/vllm-omni that referenced this pull request Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486)
- Rename model_stage: code2wav -> cosyvoice3_code2wav
- Add default_sampling_params (max_tokens, stop_token_ids) to both stages
- Enable streaming e2e test with async_chunk config

Signed-off-by: linyueqian <linyueqian@outlook.com>
linyueqian added a commit to indevn/vllm-omni that referenced this pull request Apr 5, 2026
- Rename model_stage: talker -> cosyvoice3_talker (renamed in vllm-project#2486)
- Rename model_stage: code2wav -> cosyvoice3_code2wav
- Add default_sampling_params (max_tokens, stop_token_ids, repetition_penalty)
  to both stages; repetition_penalty=2.0 is required to enable output_token_ids
  tracking in vLLM, which feeds the RAS sampler's windowed token history
- Fix unit test model_stage references (code2wav -> cosyvoice3_code2wav)
- Enable streaming e2e test with async_chunk config
- Restore advanced_model markers for transcription quality checks

Signed-off-by: linyueqian <linyueqian@outlook.com>
skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants