Skip to content

[CosyVoice3] Add online serving support, fix stage config, and add CI tests#2431

Merged
hsliuustc0106 merged 11 commits into
vllm-project:mainfrom
linyueqian:feat/cosyvoice3-online-serving-ci
Apr 4, 2026
Merged

[CosyVoice3] Add online serving support, fix stage config, and add CI tests#2431
hsliuustc0106 merged 11 commits into
vllm-project:mainfrom
linyueqian:feat/cosyvoice3-online-serving-ci

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

Summary

CosyVoice3 online serving via /v1/audio/speech was broken due to generic stage names colliding with other models and missing model type registration. This PR fixes the serving path end-to-end and adds CI coverage.

Bug Fixes

  • Stage name collision: Namespace talker/code2wavcosyvoice3_talker/cosyvoice3_code2wav to avoid conflicts with other models
  • cuDNN crash: Set enforce_eager=true for code2wav stage — Conv1d in HiFiGAN has dynamic shapes incompatible with CUDA graphs
  • Wrong sample rate: Add sr=22050 to code2wav multimodal output (was defaulting to 24000)
  • HF config resolution: Auto-inject model_type via hf_overrides for models with empty config.json (like CosyVoice3)
  • Tokenizer resolution: Auto-detect tokenizer in subdirectories (CosyVoice-BlankEN/) for both local and HF-hosted models
  • Mel filters: Auto-download mel_filters.npz from Whisper repo when missing instead of raising error

Online Serving

  • Register cosyvoice3_talker in TTS model stages
  • Add model type detection returning "cosyvoice3"
  • Add _validate_cosyvoice3_request() — requires input, ref_audio, and ref_text
  • Add _build_cosyvoice3_prompt() — multimodal prompt with reference audio for voice cloning

Config Tuning

  • gpu_memory_utilization: 0.4→0.2 (stage 0), 0.2→0.1 (stage 1) — 0.5B model doesn't need high utilization
  • enforce_eager: true on both stages — avoids 20+ min CUDA graph capture on startup

Tests

  • 9 unit tests for validation, detection, and prompt building (runs in pre-merge core_model and cpu)
  • E2E test with official CosyVoice zero-shot reference audio from GitHub
  • CI steps in merge pipeline (core_model, H100) and nightly pipeline (advanced_model)

Test plan

  • Unit tests pass (132/132, excluding pre-existing pytest-asyncio failures)
  • CosyVoice3 component tests pass (29/29)
  • Offline e2e inference verified — 8.71s audio output at 22050 Hz with official reference
  • Config resolution verified for both local paths and HF model IDs
  • Tokenizer auto-detection verified for local and HF paths
  • CI nightly run on H100
  • CI merge run on H100

Related: Supersedes #2121

@linyueqian linyueqian force-pushed the feat/cosyvoice3-online-serving-ci branch from 557f2a1 to 6d2a5e0 Compare April 1, 2026 20:15
@linyueqian
Copy link
Copy Markdown
Collaborator Author

@yenuo26 PTAL

@linyueqian linyueqian added ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI labels Apr 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 557f2a1910

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/engine/arg_utils.py Outdated
… tests

- Namespace stage names to cosyvoice3_talker/cosyvoice3_code2wav to avoid
  collision with other models using generic talker/code2wav names
- Register CosyVoice3 in the TTS serving layer with validation and prompt
  building for /v1/audio/speech endpoint (voice cloning with ref_audio)
- Fix cuDNN crash in code2wav by setting enforce_eager=true (Conv1d dynamic
  shapes are incompatible with CUDA graphs)
- Add sr=22050 to code2wav multimodal output for correct audio playback
- Tune gpu_memory_utilization (0.2/0.1) for the 0.5B model
- Auto-inject model_type into hf_overrides so models with empty config.json
  (like CosyVoice3) can be loaded directly from HuggingFace
- Register omni model configs in vLLM _CONFIG_REGISTRY for config resolution
- Auto-detect tokenizer in subdirectories for models that don't store it
  at the root (CosyVoice-BlankEN/)
- Auto-download mel_filters.npz asset from Whisper repo when missing
- Add unit tests for CosyVoice3 serving (validation, detection, prompt)
- Add e2e test with official CosyVoice zero-shot reference audio
- Add CI steps in merge (core_model) and nightly (advanced_model) pipelines

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian linyueqian force-pushed the feat/cosyvoice3-online-serving-ci branch from 6d2a5e0 to 78f2d65 Compare April 1, 2026 20:24
linyueqian and others added 2 commits April 1, 2026 16:29
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
Comment thread .buildkite/test-merge.yml Outdated
- |
timeout 20m bash -c '
export VLLM_WORKER_MULTIPROC_METHOD=spawn
pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "core_model" --run-level "core_model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to put the same test cases in ready, merge, and nightly simultaneously. I think you may need to modify it here to pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "advanced_model" --run-level "advanced_model"

Comment thread .buildkite/test-nightly.yml Outdated
pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
EXIT4=$$?
exit $$((EXIT1 | EXIT2 | EXIT3 | EXIT4))
pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "advanced_model" --run-level "advanced_model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think maybe this can be deleted

@@ -0,0 +1,172 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To unify the code style, maybe we can modify this test case according to the tests/e2e/online_serving/test_qwen3_tts_base.py? If there are validation points that cannot be covered, we can add them in the assert_audio_speech_response of tests/conftest.py.

- Use advanced_model marker in merge pipeline
- Remove duplicate CosyVoice3 step from nightly pipeline
- Rewrite test to follow test_qwen3_tts_base.py style with
  function-based tests and openai_client.send_audio_speech_request;
  basic zh test marked core_model+advanced_model, others advanced_model only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian
Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback @yenuo26! I've made the following changes:

  1. test-merge.yml: Changed the marker to advanced_model as suggested.
  2. test-nightly.yml: Removed the CosyVoice3 step since it's already covered by the merge pipeline.
  3. test_cosyvoice3_tts.py: Rewrote to follow the test_qwen3_tts_base.py style, using function-based tests with openai_client.send_audio_speech_request and assert_audio_speech_response via conftest. I kept test_voice_clone_zh_001 marked as both core_model and advanced_model so a basic smoke test still runs in the ready pipeline, while the streaming and English tests are advanced_model only.

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments on the arg_utils changes

_TOKENIZER_SUBFOLDER_MAP = {
"CosyVoice3Model": "CosyVoice-BlankEN",
}
subfolder = _TOKENIZER_SUBFOLDER_MAP.get(self.model_arch)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This downloads the full snapshot (all files matching the subfolder prefix), not just tokenizer files. For a model repo with large checkpoint files in nested dirs this could be slow or wasteful.

Suggested change
subfolder = _TOKENIZER_SUBFOLDER_MAP.get(self.model_arch)
local_dir = snapshot_download(
model_path,
allow_patterns=[f"{subfolder}/tokenizer*", f"{subfolder}/special_tokens*", f"{subfolder}/vocab*", f"{subfolder}/merges*", f"{subfolder}/added_tokens*"],
)

Comment thread vllm_omni/engine/arg_utils.py Outdated
self.hf_overrides = {}
if isinstance(self.hf_overrides, dict):
self.hf_overrides.setdefault("architectures", [self.model_arch])
# Derive model_type from known arch→model_type mappings.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: defining this dict inside create_model_config means it gets rebuilt every call. Move it to module level.

"Missing CosyVoice3 mel filter asset:\n"
f" {filters_path}\n"
"Auto-download failed. Download it manually from:\n"
f" {source_url}\n"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urlretrieve has no timeout — this can hang indefinitely in CI. Consider urllib.request.urlopen with a timeout + manual write, or just requests.get(timeout=30).

linyueqian and others added 7 commits April 2, 2026 11:25
…row download, add timeout

- Move _ARCH_TO_MODEL_TYPE and _TOKENIZER_SUBFOLDER_MAP to module level
- Narrow snapshot_download allow_patterns to tokenizer files only
- Replace urlretrieve with urlopen(timeout=30) to prevent CI hangs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
When model_dir is an HF repo ID (e.g. FunAudioLLM/Fun-CosyVoice3-0.5B-2512),
os.path.join with qwen_pretrain_path produces an invalid 3-part repo ID that
AutoTokenizer.from_pretrained rejects. Use snapshot_download to resolve to the
local cache directory first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
The previous fix only covered the processor path. CosyVoice3Model.load_weights
also uses self.model_dir with os.path.join for flow.pt, llm.pt, hift.pt etc.
Resolve the HF repo ID to local cache in __init__ so all downstream code works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
… generation_config

Models like CosyVoice3 have an empty config.json ({}) without model_type,
which causes AutoConfig.from_pretrained to fail. This commit:

1. Registers omni config classes with vLLM's internal _CONFIG_REGISTRY
   (not just transformers AutoConfig) so HFConfigParser can resolve them
2. Injects model_type into hf_overrides when model_arch is specified
3. Patches try_get_generation_config in _attach_llm_stage to avoid
   crashes for models without generation_config.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
…ine-serving-ci

# Conflicts:
#	vllm_omni/entrypoints/openai/serving_speech.py

Signed-off-by: linyueqian <linyueqian@outlook.com>
The previous patch returned None, but get_diff_sampling_param() calls
.update() on the result, causing AttributeError on NoneType.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
@linyueqian linyueqian enabled auto-merge (squash) April 3, 2026 22:12
@hsliuustc0106 hsliuustc0106 disabled auto-merge April 4, 2026 01:55
@hsliuustc0106 hsliuustc0106 merged commit 4c03158 into vllm-project:main Apr 4, 2026
8 checks passed
linyueqian added a commit to linyueqian/vllm-omni that referenced this pull request Apr 4, 2026
vllm-project#2431 inserted the CosyVoice3 step between Voxtral's commands block
and its agents/plugins block, causing the agents/plugins to attach to
CosyVoice3 instead of Voxtral. This left the Voxtral step without a
queue, causing `buildkite-agent pipeline upload` to reject the entire
pipeline with "No queue specified".

Add back the kubernetes agents/plugins block for the Voxtral step.

Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
linyueqian added a commit to linyueqian/vllm-omni that referenced this pull request Apr 4, 2026
vllm-project#2431 inserted the CosyVoice3 step between Voxtral's commands block
and its agents/plugins block, causing the agents/plugins to attach to
CosyVoice3 instead of Voxtral. This left the Voxtral step without a
queue, causing `buildkite-agent pipeline upload` to reject the entire
pipeline with "No queue specified".

Add back the kubernetes agents/plugins block for the Voxtral step.

Signed-off-by: linyueqian <linyueqian@outlook.com>
skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026
… tests (vllm-project#2431)

Signed-off-by: linyueqian <linyueqian@outlook.com>
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
… tests (vllm-project#2431)

Signed-off-by: linyueqian <linyueqian@outlook.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
… tests (vllm-project#2431)

Signed-off-by: linyueqian <linyueqian@outlook.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
… tests (vllm-project#2431)

Signed-off-by: linyueqian <linyueqian@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants