-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[CosyVoice3] Add online serving support, fix stage config, and add CI tests #2431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 11 commits into
vllm-project:main
from
linyueqian:feat/cosyvoice3-online-serving-ci
Apr 4, 2026
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
78f2d65
[CosyVoice3] Add online serving support, fix stage config, and add CI…
linyueqian c26a373
Merge branch 'main' into feat/cosyvoice3-online-serving-ci
linyueqian 6435b24
[CI] Add CosyVoice3-TTS E2E test to ready pipeline for PR testing
linyueqian da56ff4
[CI] Address review feedback for CosyVoice3 E2E test
linyueqian 96a2cb9
[CosyVoice3] Address review feedback: move dicts to module level, nar…
linyueqian 71a0b05
Merge branch 'main' into feat/cosyvoice3-online-serving-ci
linyueqian 40476a4
fix: resolve HF repo ID to local cache path in CosyVoice3 processor
linyueqian 64fb8b5
fix: resolve HF repo ID to local cache path in CosyVoice3Model.__init__
linyueqian 9ccc054
fix: register omni model configs with vLLM _CONFIG_REGISTRY and patch…
linyueqian e98e4cd
Merge remote-tracking branch 'upstream/main' into feat/cosyvoice3-onl…
linyueqian 307b351
fix: return empty dict from patched try_get_generation_config
linyueqian File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # SPDX-FileCopyrightText: Copyright contributors to the vLLM project | ||
| """ | ||
| E2E Online tests for CosyVoice3 TTS model with voice cloning. | ||
|
|
||
| These tests verify the /v1/audio/speech endpoint works correctly with | ||
| the CosyVoice3 model, which requires reference audio for voice cloning. | ||
| """ | ||
|
|
||
| import os | ||
|
|
||
| os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" | ||
| os.environ["VLLM_TEST_CLEAN_GPU_MEMORY"] = "0" | ||
|
|
||
| from pathlib import Path | ||
|
|
||
| import pytest | ||
|
|
||
| from tests.conftest import OmniServerParams | ||
| from tests.utils import hardware_test | ||
|
|
||
| MODEL = "FunAudioLLM/Fun-CosyVoice3-0.5B-2512" | ||
|
|
||
| # Official CosyVoice zero-shot prompt audio and its transcript | ||
| REF_AUDIO_URL = "https://raw.githubusercontent.com/FunAudioLLM/CosyVoice/main/asset/zero_shot_prompt.wav" | ||
| REF_TEXT = "希望你以后能够做的比我还好呦。" | ||
|
|
||
|
|
||
| def get_stage_config(name: str = "cosyvoice3.yaml"): | ||
| """Get the stage config path from vllm_omni model_executor stage_configs.""" | ||
| return str(Path(__file__).parent.parent.parent.parent / "vllm_omni" / "model_executor" / "stage_configs" / name) | ||
|
|
||
|
|
||
| def get_prompt(prompt_type="zh"): | ||
| prompts = { | ||
| "zh": "收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的感动让我热泪盈眶。", | ||
| "en": "Hello, this is a voice cloning test with English text.", | ||
| } | ||
| return prompts.get(prompt_type, prompts["zh"]) | ||
|
|
||
|
|
||
| tts_server_params = [ | ||
| pytest.param( | ||
| OmniServerParams( | ||
| model=MODEL, | ||
| stage_config_path=get_stage_config(), | ||
| server_args=["--trust-remote-code", "--disable-log-stats"], | ||
| ), | ||
| id="cosyvoice3", | ||
| ) | ||
| ] | ||
|
|
||
|
|
||
| @pytest.mark.advanced_model | ||
| @pytest.mark.core_model | ||
| @pytest.mark.omni | ||
| @hardware_test(res={"cuda": "H100"}, num_cards=1) | ||
| @pytest.mark.parametrize("omni_server", tts_server_params, indirect=True) | ||
| def test_voice_clone_zh_001(omni_server, openai_client) -> None: | ||
| """ | ||
| Test voice cloning TTS with Chinese text via OpenAI API. | ||
| Deploy Setting: default yaml | ||
| Input Modal: text + ref_audio + ref_text | ||
| Output Modal: audio | ||
| Input Setting: stream=False | ||
| Datasets: single request | ||
| """ | ||
| request_config = { | ||
| "model": omni_server.model, | ||
| "input": get_prompt("zh"), | ||
| "stream": False, | ||
| "response_format": "wav", | ||
| "ref_audio": REF_AUDIO_URL, | ||
| "ref_text": REF_TEXT, | ||
| } | ||
| openai_client.send_audio_speech_request(request_config) | ||
|
|
||
|
|
||
| @pytest.mark.advanced_model | ||
| @pytest.mark.omni | ||
| @hardware_test(res={"cuda": "H100"}, num_cards=1) | ||
| @pytest.mark.parametrize("omni_server", tts_server_params, indirect=True) | ||
| def test_voice_clone_zh_002(omni_server, openai_client) -> None: | ||
| """ | ||
| Test voice cloning TTS with Chinese text via OpenAI API. | ||
| Deploy Setting: default yaml | ||
| Input Modal: text + ref_audio + ref_text | ||
| Output Modal: audio | ||
| Input Setting: stream=True | ||
| Datasets: single request | ||
| """ | ||
| request_config = { | ||
| "model": omni_server.model, | ||
| "input": get_prompt("zh"), | ||
| "stream": True, | ||
| "response_format": "wav", | ||
| "ref_audio": REF_AUDIO_URL, | ||
| "ref_text": REF_TEXT, | ||
| } | ||
| openai_client.send_audio_speech_request(request_config) | ||
|
|
||
|
|
||
| @pytest.mark.advanced_model | ||
| @pytest.mark.omni | ||
| @hardware_test(res={"cuda": "H100"}, num_cards=1) | ||
| @pytest.mark.parametrize("omni_server", tts_server_params, indirect=True) | ||
| def test_voice_clone_en_001(omni_server, openai_client) -> None: | ||
| """ | ||
| Test voice cloning TTS with English text via OpenAI API. | ||
| Deploy Setting: default yaml | ||
| Input Modal: text + ref_audio + ref_text | ||
| Output Modal: audio | ||
| Input Setting: stream=False | ||
| Datasets: single request | ||
| """ | ||
| request_config = { | ||
| "model": omni_server.model, | ||
| "input": get_prompt("en"), | ||
| "stream": False, | ||
| "response_format": "wav", | ||
| "ref_audio": REF_AUDIO_URL, | ||
| "ref_text": REF_TEXT, | ||
| } | ||
| openai_client.send_audio_speech_request(request_config) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To unify the code style, maybe we can modify this test case according to the tests/e2e/online_serving/test_qwen3_tts_base.py? If there are validation points that cannot be covered, we can add them in the assert_audio_speech_response of tests/conftest.py.