[CosyVoice3] Add online serving support, fix stage config, and add CI tests by linyueqian · Pull Request #2431 · vllm-project/vllm-omni

linyueqian · 2026-04-01T20:13:19Z

Summary

CosyVoice3 online serving via /v1/audio/speech was broken due to generic stage names colliding with other models and missing model type registration. This PR fixes the serving path end-to-end and adds CI coverage.

Bug Fixes

Stage name collision: Namespace talker/code2wav → cosyvoice3_talker/cosyvoice3_code2wav to avoid conflicts with other models
cuDNN crash: Set enforce_eager=true for code2wav stage — Conv1d in HiFiGAN has dynamic shapes incompatible with CUDA graphs
Wrong sample rate: Add sr=22050 to code2wav multimodal output (was defaulting to 24000)
HF config resolution: Auto-inject model_type via hf_overrides for models with empty config.json (like CosyVoice3)
Tokenizer resolution: Auto-detect tokenizer in subdirectories (CosyVoice-BlankEN/) for both local and HF-hosted models
Mel filters: Auto-download mel_filters.npz from Whisper repo when missing instead of raising error

Online Serving

Register cosyvoice3_talker in TTS model stages
Add model type detection returning "cosyvoice3"
Add _validate_cosyvoice3_request() — requires input, ref_audio, and ref_text
Add _build_cosyvoice3_prompt() — multimodal prompt with reference audio for voice cloning

Config Tuning

gpu_memory_utilization: 0.4→0.2 (stage 0), 0.2→0.1 (stage 1) — 0.5B model doesn't need high utilization
enforce_eager: true on both stages — avoids 20+ min CUDA graph capture on startup

Tests

9 unit tests for validation, detection, and prompt building (runs in pre-merge core_model and cpu)
E2E test with official CosyVoice zero-shot reference audio from GitHub
CI steps in merge pipeline (core_model, H100) and nightly pipeline (advanced_model)

Test plan

Unit tests pass (132/132, excluding pre-existing pytest-asyncio failures)
CosyVoice3 component tests pass (29/29)
Offline e2e inference verified — 8.71s audio output at 22050 Hz with official reference
Config resolution verified for both local paths and HF model IDs
Tokenizer auto-detection verified for local and HF paths
CI nightly run on H100
CI merge run on H100

Related: Supersedes #2121

linyueqian · 2026-04-01T20:16:28Z

@yenuo26 PTAL

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 557f2a1910

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

… tests - Namespace stage names to cosyvoice3_talker/cosyvoice3_code2wav to avoid collision with other models using generic talker/code2wav names - Register CosyVoice3 in the TTS serving layer with validation and prompt building for /v1/audio/speech endpoint (voice cloning with ref_audio) - Fix cuDNN crash in code2wav by setting enforce_eager=true (Conv1d dynamic shapes are incompatible with CUDA graphs) - Add sr=22050 to code2wav multimodal output for correct audio playback - Tune gpu_memory_utilization (0.2/0.1) for the 0.5B model - Auto-inject model_type into hf_overrides so models with empty config.json (like CosyVoice3) can be loaded directly from HuggingFace - Register omni model configs in vLLM _CONFIG_REGISTRY for config resolution - Auto-detect tokenizer in subdirectories for models that don't store it at the root (CosyVoice-BlankEN/) - Auto-download mel_filters.npz asset from Whisper repo when missing - Add unit tests for CosyVoice3 serving (validation, detection, prompt) - Add e2e test with official CosyVoice zero-shot reference audio - Add CI steps in merge (core_model) and nightly (advanced_model) pipelines Signed-off-by: linyueqian <linyueqian@outlook.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

yenuo26 · 2026-04-02T01:42:43Z

+      - |
+        timeout 20m bash -c '
+          export VLLM_WORKER_MULTIPROC_METHOD=spawn
+          pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "core_model" --run-level "core_model"


We don't need to put the same test cases in ready, merge, and nightly simultaneously. I think you may need to modify it here to pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "advanced_model" --run-level "advanced_model"

yenuo26 · 2026-04-02T01:43:07Z

        pytest -s -v tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
        EXIT4=$$?
-        exit $$((EXIT1 | EXIT2 | EXIT3 | EXIT4))
+        pytest -s -v tests/e2e/online_serving/test_cosyvoice3_tts.py -m "advanced_model" --run-level "advanced_model"


i think maybe this can be deleted

yenuo26 · 2026-04-02T01:47:45Z

@@ -0,0 +1,172 @@
+# SPDX-License-Identifier: Apache-2.0


To unify the code style, maybe we can modify this test case according to the tests/e2e/online_serving/test_qwen3_tts_base.py? If there are validation points that cannot be covered, we can add them in the assert_audio_speech_response of tests/conftest.py.

- Use advanced_model marker in merge pipeline - Remove duplicate CosyVoice3 step from nightly pipeline - Rewrite test to follow test_qwen3_tts_base.py style with function-based tests and openai_client.send_audio_speech_request; basic zh test marked core_model+advanced_model, others advanced_model only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian · 2026-04-02T01:55:12Z

Thanks for the feedback @yenuo26! I've made the following changes:

test-merge.yml: Changed the marker to advanced_model as suggested.
test-nightly.yml: Removed the CosyVoice3 step since it's already covered by the merge pipeline.
test_cosyvoice3_tts.py: Rewrote to follow the test_qwen3_tts_base.py style, using function-based tests with openai_client.send_audio_speech_request and assert_audio_speech_response via conftest. I kept test_voice_clone_zh_001 marked as both core_model and advanced_model so a basic smoke test still runs in the ready pipeline, while the streaming and English tests are advanced_model only.

lishunyang12

left a few comments on the arg_utils changes

lishunyang12 · 2026-04-02T15:19:26Z

+                _TOKENIZER_SUBFOLDER_MAP = {
+                    "CosyVoice3Model": "CosyVoice-BlankEN",
+                }
+                subfolder = _TOKENIZER_SUBFOLDER_MAP.get(self.model_arch)


This downloads the full snapshot (all files matching the subfolder prefix), not just tokenizer files. For a model repo with large checkpoint files in nested dirs this could be slow or wasteful.

Suggested change

subfolder = _TOKENIZER_SUBFOLDER_MAP.get(self.model_arch)

local_dir = snapshot_download(

model_path,

allow_patterns=[f"{subfolder}/tokenizer*", f"{subfolder}/special_tokens*", f"{subfolder}/vocab*", f"{subfolder}/merges*", f"{subfolder}/added_tokens*"],

)

lishunyang12 · 2026-04-02T15:19:26Z

                self.hf_overrides = {}
            if isinstance(self.hf_overrides, dict):
                self.hf_overrides.setdefault("architectures", [self.model_arch])
+                # Derive model_type from known arch→model_type mappings.


Nit: defining this dict inside create_model_config means it gets rebuilt every call. Move it to module level.

lishunyang12 · 2026-04-02T15:19:26Z

+                "Missing CosyVoice3 mel filter asset:\n"
+                f"  {filters_path}\n"
+                "Auto-download failed. Download it manually from:\n"
+                f"  {source_url}\n"


urlretrieve has no timeout — this can hang indefinitely in CI. Consider urllib.request.urlopen with a timeout + manual write, or just requests.get(timeout=30).

…row download, add timeout - Move _ARCH_TO_MODEL_TYPE and _TOKENIZER_SUBFOLDER_MAP to module level - Narrow snapshot_download allow_patterns to tokenizer files only - Replace urlretrieve with urlopen(timeout=30) to prevent CI hangs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

When model_dir is an HF repo ID (e.g. FunAudioLLM/Fun-CosyVoice3-0.5B-2512), os.path.join with qwen_pretrain_path produces an invalid 3-part repo ID that AutoTokenizer.from_pretrained rejects. Use snapshot_download to resolve to the local cache directory first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

The previous fix only covered the processor path. CosyVoice3Model.load_weights also uses self.model_dir with os.path.join for flow.pt, llm.pt, hift.pt etc. Resolve the HF repo ID to local cache in __init__ so all downstream code works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

… generation_config Models like CosyVoice3 have an empty config.json ({}) without model_type, which causes AutoConfig.from_pretrained to fail. This commit: 1. Registers omni config classes with vLLM's internal _CONFIG_REGISTRY (not just transformers AutoConfig) so HFConfigParser can resolve them 2. Injects model_type into hf_overrides when model_arch is specified 3. Patches try_get_generation_config in _attach_llm_stage to avoid crashes for models without generation_config.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

…ine-serving-ci # Conflicts: # vllm_omni/entrypoints/openai/serving_speech.py Signed-off-by: linyueqian <linyueqian@outlook.com>

The previous patch returned None, but get_diff_sampling_param() calls .update() on the result, causing AttributeError on NoneType. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

vllm-project#2431 inserted the CosyVoice3 step between Voxtral's commands block and its agents/plugins block, causing the agents/plugins to attach to CosyVoice3 instead of Voxtral. This left the Voxtral step without a queue, causing `buildkite-agent pipeline upload` to reject the entire pipeline with "No queue specified". Add back the kubernetes agents/plugins block for the Voxtral step. Signed-off-by: linyueqian <linyueqian@outlook.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vllm-project#2431 inserted the CosyVoice3 step between Voxtral's commands block and its agents/plugins block, causing the agents/plugins to attach to CosyVoice3 instead of Voxtral. This left the Voxtral step without a queue, causing `buildkite-agent pipeline upload` to reject the entire pipeline with "No queue specified". Add back the kubernetes agents/plugins block for the Voxtral step. Signed-off-by: linyueqian <linyueqian@outlook.com>

… tests (vllm-project#2431) Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner April 1, 2026 20:13

linyueqian force-pushed the feat/cosyvoice3-online-serving-ci branch from 557f2a1 to 6d2a5e0 Compare April 1, 2026 20:15

linyueqian added ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI labels Apr 1, 2026

chatgpt-codex-connector Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread vllm_omni/engine/arg_utils.py Outdated

linyueqian force-pushed the feat/cosyvoice3-online-serving-ci branch from 6d2a5e0 to 78f2d65 Compare April 1, 2026 20:24

linyueqian and others added 2 commits April 1, 2026 16:29

Merge branch 'main' into feat/cosyvoice3-online-serving-ci

c26a373

[CI] Add CosyVoice3-TTS E2E test to ready pipeline for PR testing

6435b24

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

yenuo26 reviewed Apr 2, 2026

View reviewed changes

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

linyueqian and others added 7 commits April 2, 2026 11:25

Merge branch 'main' into feat/cosyvoice3-online-serving-ci

71a0b05

Merge remote-tracking branch 'upstream/main' into feat/cosyvoice3-onl…

e98e4cd

…ine-serving-ci # Conflicts: # vllm_omni/entrypoints/openai/serving_speech.py Signed-off-by: linyueqian <linyueqian@outlook.com>

linyueqian enabled auto-merge (squash) April 3, 2026 22:12

hsliuustc0106 disabled auto-merge April 4, 2026 01:55

hsliuustc0106 merged commit 4c03158 into vllm-project:main Apr 4, 2026
8 checks passed

linyueqian mentioned this pull request Apr 4, 2026

[Bugfix] Fix CosyVoice3 online serving via /v1/audio/speech #2121

Closed

7 tasks

linyueqian mentioned this pull request Apr 4, 2026

[CI] Fix missing queue for Voxtral-TTS E2E test step #2484

Merged

1 task

skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026

[CosyVoice3] Add online serving support, fix stage config, and add CI…

cf35a66

… tests (vllm-project#2431) Signed-off-by: linyueqian <linyueqian@outlook.com>

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[CosyVoice3] Add online serving support, fix stage config, and add CI…

a353e02

… tests (vllm-project#2431) Signed-off-by: linyueqian <linyueqian@outlook.com>

Cccei000 mentioned this pull request Apr 16, 2026

[Feature]: Need online serving stream example for cosyvoice3 #2841

Open

1 task

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[CosyVoice3] Add online serving support, fix stage config, and add CI…

55e566d

… tests (vllm-project#2431) Signed-off-by: linyueqian <linyueqian@outlook.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[CosyVoice3] Add online serving support, fix stage config, and add CI…

0670a31

… tests (vllm-project#2431) Signed-off-by: linyueqian <linyueqian@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CosyVoice3] Add online serving support, fix stage config, and add CI tests#2431

[CosyVoice3] Add online serving support, fix stage config, and add CI tests#2431
hsliuustc0106 merged 11 commits into
vllm-project:mainfrom
linyueqian:feat/cosyvoice3-online-serving-ci

linyueqian commented Apr 1, 2026

Uh oh!

linyueqian commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

yenuo26 Apr 2, 2026

Uh oh!

yenuo26 Apr 2, 2026

Uh oh!

yenuo26 Apr 2, 2026

Uh oh!

linyueqian commented Apr 2, 2026

Uh oh!

lishunyang12 left a comment

Uh oh!

lishunyang12 Apr 2, 2026

Uh oh!

lishunyang12 Apr 2, 2026

Uh oh!

lishunyang12 Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-                subfolder = _TOKENIZER_SUBFOLDER_MAP.get(self.model_arch)
+                        local_dir = snapshot_download(
+                            model_path,
+                            allow_patterns=[f"{subfolder}/tokenizer*", f"{subfolder}/special_tokens*", f"{subfolder}/vocab*", f"{subfolder}/merges*", f"{subfolder}/added_tokens*"],
+                        )

Conversation

linyueqian commented Apr 1, 2026

Summary

Bug Fixes

Online Serving

Config Tuning

Tests

Test plan

Uh oh!

linyueqian commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

yenuo26 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

yenuo26 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 2, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

lishunyang12 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants