Add voxcpm model support. by IsleOfDawnlight · Pull Request #2467 · vllm-project/vllm-omni

IsleOfDawnlight · 2026-04-03T02:50:16Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for the voxcpm model, with capabilities for streaming inference and embedding input/output.

Test Plan

Verifying voxcpm voice cloning,high-efficiency synthesis and batch processing functionality, covering both streaming and non-streaming inference modes.

Test Case	Input
Single text synthesis	TXT："Meeting you was the most beautiful surprise."
Voice cloning with single reference	Audio：link
Batch processing from text file	TXT Path：examples\offline_inference\voxcpm\example_texts.txt.

Test Result

Case	Device	config	generate time	time per sample	batch size	是否warm-up	stage0	stage1	ttfp	rtf
Single txt	910B	Non-streaming	7.58s	7.58s	/	是	7.24s	0.33s	7.57s	1.25
Single Clone	910B	Non-streaming	8.57s	8.57s	/	是	8.18s	0.38s	8.56s	1.22
Batch txt	910B	Non-streaming	22.81s	7.60s	3	是	7.27s	0.33s	7.60s	1.25
Batch clone	910B	Non-streaming	25.55s	8.52s	3	是	8.12s	0.38s	8.50s	1.21
Single txt	910B	Streaming	7.12s	7.12s	/	是	7.06s	7.12s	0.40s	1.17
Single clone	910B	Streaming	8.55s	8.55s	/	是	8.49s	8.54s	0.52s	1.27
Batch txt	910B	Streaming	23.72s	7.91s	3	是	7.84s	7.89s	0.44s	1.3
Batch clone	910B	Streaming	24.70s	8.23s	3	是	8.14s	8.20s	0.51s	1.22
Single txt	H20	Non-streaming	0.77s	0.77s	/	是	0.40s	0.36s	0.76s	0.16
Single Clone	H20	Non-streaming	1.09s	1.09s	/	是	1.04s	0.03s	1.08s	0.17
Batch txt	H20	Non-streaming	1.35s	0.45s	3	是	0.37s	0.02s	0.39s	0.08
Batch clone	H20	Non-streaming	2.19s	0.73s	3	是	0.48s	0.02s	0.50s	0.08
Single txt	H20	Non-streaming	9.54s	9.54s	/	否	7.92s	1.60s	9.53s	1.92
Single clone	H20	Non-streaming	10.85s	10.85s	/	否	8.54s	2.28s	10.85s	1.65
Single txt	H20	Streaming	0.58s	0.58s	/	是	0.57s	0.58s	0.08s	0.12
Single clone	H20	Streaming	1.43s	1.43s	/	是	1.42s	1.43s	0.81s	0.24
Batch txt	H20	Streaming	1.73s	0.58s	3	是	0.54s	0.55s	0.05s	0.11
Batch clone	H20	Streaming	2.44s	0.81s	3	是	0.69s	0.70s	0.08s	0.12
Single txt	H20	Streaming	10.39s	10.39s	/	否	9.18s	10.39s	10.08s	2.1
Single clone	H20	Streaming	10.51s	10.51s	/	否	8.93s	10.51s	10.15s	1.69

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>

Signed-off-by: Celeste-jq <591998922@qq.com>

… voxcpm_streaming_0180 Signed-off-by: Celeste-jq <591998922@qq.com>

Switch VoxCPM stage0 to the AR scheduler path, align the async-chunk flow with the common framework pattern, and restore scheduler/test_utils changes to match upstream where needed. Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: IsleOfDawnlight <stellamou@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f8b8524b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T02:56:01Z

+    try:
+        AutoConfig.register("qwen3_tts", Qwen3TTSConfig)
+        AutoConfig.register("cosyvoice3", CosyVoice3Config)


Register each HF config independently

These two registrations are wrapped in a single try, so if qwen3_tts is already registered and raises ValueError, cosyvoice3 is never attempted. In environments where one config is pre-registered by another plugin/import path, this leaves the other config missing and later model/config resolution fails unexpectedly. Register each config in its own guarded block (as already done for voxtral_tts/voxcpm) to avoid this partial-registration regression.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-03T02:56:01Z

        except (asyncio.CancelledError, GeneratorExit):
-            if input_stream_task is not None and not input_stream_task.done():
-                input_stream_task.cancel()
            await self.abort(request_id)
            logger.info(f"[AsyncOmni] Request {request_id} aborted.")
            raise


Abort request on non-cancellation generate errors

generate() now only aborts on cancellation, but _process_orchestrator_results() can raise regular exceptions (for example when it receives an error message). In that case this method exits without calling abort() or cleanup, so the request can remain active in engine/orchestrator state and self.request_states, causing leaked state and stuck/follow-on request behavior. A generic exception path should still abort and clean up the request before re-raising.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-03T02:56:01Z

+                latent_audio_feat = self._extract_val(info, "latent_audio_feat", None)
+                print(f"---latent_audio_feat---:{latent_audio_feat.shape}")
+                audio_tensor = self._pipeline.decode(


Guard VAE path when latent chunk is missing

This path unconditionally accesses latent_audio_feat.shape, but async-chunk terminal payloads may intentionally omit latent_audio_feat (finish-only metadata). In a batched VAE decode step where one request has latent data and another is finish-only, this raises AttributeError and fails the whole batch instead of cleanly skipping/finishing that item.

Useful? React with 👍 / 👎.

…e_voxcpm_isle_first Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: IsleOfDawnlight <stellamou@qq.com>

Signed-off-by: Celeste-jq <591998922@qq.com>

(cherry picked from commit cff0398) Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: Celeste-jq <591998922@qq.com>

linyueqian · 2026-04-04T03:15:18Z

fix dco pre-commit and resolv conflicts please.

hsliuustc0106 · 2026-04-05T02:07:41Z

resolve conflicts @IsleOfDawnlight

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/distributed/omni_connectors/transfer_adapter/chunk_transfer_adapter.py # vllm_omni/engine/arg_utils.py # vllm_omni/entrypoints/utils.py

Signed-off-by: Celeste-jq <591998922@qq.com>

linyueqian · 2026-04-07T14:37:46Z

fix pre-commit pls.

linyueqian

Thanks for adding VoxCPM support! The two-stage latent+VAE architecture and async_chunk integration look solid. Left a few comments, mostly around import hygiene and file size.

linyueqian · 2026-04-07T14:48:19Z

@@ -0,0 +1,24 @@
+# Point Python at VoxCPM's ``src`` (parent of ``voxcpm/model`` and ``voxcpm/modules``) if not next to this repo.
+export VLLM_OMNI_VOXCPM_CODE_PATH=/home/l00613087/voxcpm/VoxCPM/src
+export ASCEND_RT_VISIBLE_DEVICES=1


[blocker] This file has hardcoded user paths (/home/l00613087/...) and a device-specific env var. Should be gitignored or removed from the PR.

I've updated the question. Thank you for your suggestions.

linyueqian · 2026-04-07T14:48:19Z

 from vllm_omni.engine.output_modality import OutputModality
+from vllm_omni.model_executor.models.voxcpm.configuration_voxcpm import VoxCPMConfig
+from vllm_omni.model_executor.models.voxcpm.native_config import (
+    detect_native_voxcpm_model_type,


[high] These top-level imports mean every vllm-omni startup pays for VoxCPM even when it's not used. Other models register lazily. Can you move these inside _maybe_prepare_model_hf_config_path() and _register_omni_hf_configs()?

I have removed the improper import statements.

linyueqian · 2026-04-07T14:48:19Z


 from vllm_omni.config.yaml_util import create_config, load_yaml_config, merge_configs
 from vllm_omni.entrypoints.stage_utils import _to_dict
+from vllm_omni.model_executor.models.voxcpm.native_config import detect_native_voxcpm_model_type


[high] Same as arg_utils. This import should be lazy, inside resolve_model_config_path where it's actually used.

I have removed the improper import statements, thanks!

linyueqian · 2026-04-07T14:48:20Z

+    repo_root = Path(__file__).resolve().parents[4]
+    candidates.append(repo_root.parent / "VoxCPM" / "src")
+
+    for candidate in candidates:


[high] 1116 lines is quite large. Could you split the native model loading helpers, the stage wrappers (_DirectVoxCPMLatentGenerator / _DirectVoxCPMAudioVAE), and the main class into separate files?

Good idea, i have split into separate files。

linyueqian · 2026-04-07T14:48:20Z

+
+
+def _import_voxcpm_audio_vae_classes():
+    env_path = os.environ.get("VLLM_OMNI_VOXCPM_CODE_PATH")


[medium] _import_voxcpm_audio_vae_classes below is nearly identical to this function. Worth extracting the shared sys.path discovery into one helper.

OK，I have extracted it.

linyueqian · 2026-04-07T14:48:20Z

+        pass
+    if isinstance(val, (list, tuple)) and len(val) == 1:
+        return _connector_finished_truthy(val[0])
+    return bool(val)


[medium] The recursive unwrap for single-element lists could loop on pathological input. Maybe just do an iterative unwrap with a small depth cap?

linyueqian · 2026-04-07T14:48:20Z

+
+    try:
+        config_dict = json.loads(config_path.read_text())
+    except Exception:


[medium] Bare except Exception here swallows permission errors, disk errors, etc. Could narrow to (json.JSONDecodeError, OSError).

linyueqian · 2026-04-07T14:48:20Z

+            min_len: int = 2,
+            max_len: int = 2000,
+            inference_timesteps: int = 10,
+            cfg_value: float = 2.0,


[medium] If symlink fails this falls back to shutil.copytree on potentially multi-GB model dirs without any logging. A warning would help users understand why /tmp is filling up.

linyueqian · 2026-04-07T14:48:20Z

+    if not request_summaries:
+        print("未解析到 stage 耗时摘要。")
+        return
+    print("每个 request 的 stage 耗时:")


[nit] A few Chinese strings in the test output (未解析到, 汇总:, 失败用例:). Should be English for consistency with the rest of the repo.

linyueqian · 2026-04-07T14:48:20Z

@@ -0,0 +1,768 @@
+"""Offline VoxCPM inference example for vLLM Omni.
+


[nit] os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" at line 27 runs on import. Move it inside the if __name__ == "__main__" block?

hsliuustc0106 · 2026-04-14T01:36:38Z

Thanks for your contribution. Please add UT for protecting key APIs, and e2e test for the model referring to the guidance: https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/CI_5levels/

I will add ready label after ut added

hsliuustc0106 · 2026-04-14T01:37:20Z

@@ -0,0 +1,68 @@
+# VoxCPM two-stage (latent → VAE) without async_chunk: one-shot latent then decode.
+stage_args:


@linyueqian maybe this model works better with one single stage

the current implementation of voxcpm 2 is one stage. it is worthwhile to try so in follow up pr.

linyueqian · 2026-04-14T04:14:07Z

fix pre commit and dco please

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/entrypoints/openai/serving_speech.py

Signed-off-by: Celeste-jq <591998922@qq.com>

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # tests/engine/test_arg_utils.py # vllm_omni/entrypoints/openai/serving_speech.py

Celeste-jq · 2026-04-14T09:45:05Z

PTAL @IsleOfDawnlight @linyueqian @hsliuustc0106

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian · 2026-04-14T16:12:11Z

fix ci

Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq · 2026-04-15T02:57:24Z

@linyueqian @hsliuustc0106 CI passed, ptal, thank you.

Gaohan123

LGTM. Thanks!

Signed-off-by: Celeste-jq <591998922@qq.com> Signed-off-by: lyj-jjj <liuyingjun5@huawei.com> Signed-off-by: IsleOfDawnlight <stellamou@qq.com> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Celeste-jq <591998922@qq.com> Co-authored-by: lyj-jjj <liuyingjun5@huawei.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>

Celeste-jq and others added 12 commits March 18, 2026 17:38

feat: add minimal split-stage VoxCPM integration

1d11533

Signed-off-by: Celeste-jq <591998922@qq.com>

docs: add minimal offline VoxCPM example

013a220

Signed-off-by: Celeste-jq <591998922@qq.com>

support stream_generate

a70f4ee

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>

merge endtoend.py&endtoend_streaming.py

8b4a638

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>

abstract common methods

03f4c5d

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>

support stream

779afac

Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>

Merge upstream main into lyj pure_voxcpm baseline

5c10527

Signed-off-by: Celeste-jq <591998922@qq.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

8ab0c3a

… voxcpm_streaming_0180 Signed-off-by: Celeste-jq <591998922@qq.com>

feat: add voxcpm batch tts and clone examples

61f2841

Signed-off-by: Celeste-jq <591998922@qq.com>

add voxcpm example helper files

c71733b

Signed-off-by: IsleOfDawnlight <stellamou@qq.com>

add voxcpm model

ca3834a

Signed-off-by: IsleOfDawnlight <stellamou@qq.com>

IsleOfDawnlight requested a review from hsliuustc0106 as a code owner April 3, 2026 02:50

chatgpt-codex-connector Bot reviewed Apr 3, 2026

View reviewed changes

Merge remote-tracking branch 'celeste/voxcpm_streaming_0180' into pur…

6adba2e

…e_voxcpm_isle_first Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq force-pushed the pure_voxcpm branch from bf50ba5 to b916551 Compare April 3, 2026 06:34

IsleOfDawnlight and others added 5 commits April 3, 2026 14:47

add voxcpm model

de5644c

Signed-off-by: IsleOfDawnlight <stellamou@qq.com>

align voxcpm content with voxcpm_streaming_0180

0e96d52

Signed-off-by: Celeste-jq <591998922@qq.com>

fix: avoid torchcodec dependency for voxcpm clone

164ccbd

(cherry picked from commit cff0398) Signed-off-by: Celeste-jq <591998922@qq.com>

sync voxcpm files with latest voxcpm_streaming_0180

94ebf07

Signed-off-by: Celeste-jq <591998922@qq.com>

sync voxcpm files with latest voxcpm_streaming_0180

a80bd7a

Signed-off-by: Celeste-jq <591998922@qq.com>

linyueqian self-requested a review April 4, 2026 03:15

Merge remote-tracking branch 'upstream/main' into pure_voxcpm

3e4e2b2

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/distributed/omni_connectors/transfer_adapter/chunk_transfer_adapter.py # vllm_omni/engine/arg_utils.py # vllm_omni/entrypoints/utils.py

Celeste-jq force-pushed the pure_voxcpm branch from 45c93cc to 3e4e2b2 Compare April 7, 2026 08:00

Celeste-jq added 2 commits April 7, 2026 16:08

style: apply ruff formatting for voxcpm files

a735478

Signed-off-by: Celeste-jq <591998922@qq.com>

align voxcpm and bridge files with Celeste upstream branches

3fd7bb9

Signed-off-by: Celeste-jq <591998922@qq.com>

linyueqian reviewed Apr 7, 2026

View reviewed changes

hsliuustc0106 reviewed Apr 14, 2026

View reviewed changes

Celeste-jq force-pushed the pure_voxcpm branch 2 times, most recently from a9aaaca to ba96e46 Compare April 14, 2026 08:23

Celeste-jq added 5 commits April 14, 2026 16:39

Merge remote-tracking branch 'upstream/main' into pure_voxcpm

5868f29

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/entrypoints/openai/serving_speech.py

fix: avoid restarting voxcpm async stream

1f2dc5d

Signed-off-by: Celeste-jq <591998922@qq.com>

fix: consolidate voxcpm async stream updates

3ae9cb1

Signed-off-by: Celeste-jq <591998922@qq.com>

test: align voxcpm UTs with current interfaces

4dcf000

Signed-off-by: Celeste-jq <591998922@qq.com>

style: fix voxcpm pre-commit issues

2a1e779

Signed-off-by: Celeste-jq <591998922@qq.com>

Celeste-jq force-pushed the pure_voxcpm branch from ba96e46 to 2a1e779 Compare April 14, 2026 08:44

Celeste-jq added 2 commits April 14, 2026 17:04

chore: remove voxcpm debug logging

5bcd68d

Signed-off-by: Celeste-jq <591998922@qq.com>

Merge remote-tracking branch 'upstream/main' into pure_voxcpm

98a45fd

Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # tests/engine/test_arg_utils.py # vllm_omni/entrypoints/openai/serving_speech.py

Celeste-jq force-pushed the pure_voxcpm branch 2 times, most recently from e50165f to 98a45fd Compare April 14, 2026 09:35

ci: add VoxCPM E2E pre-merge test to test-ready.yml

a2bc6f4

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian added the ready label to trigger buildkite CI label Apr 14, 2026

Celeste-jq added 5 commits April 15, 2026 09:29

test: guard cleanup_dist_env_and_memory on NPU

f6a27cf

Signed-off-by: Celeste-jq <591998922@qq.com>

test: scope NPU cleanup guard to VoxCPM e2e

c4bf6ae

Signed-off-by: Celeste-jq <591998922@qq.com>

style: fix voxcpm e2e import ordering

1f69d9e

Signed-off-by: Celeste-jq <591998922@qq.com>

test: prepare VoxCPM e2e model dir and hf config

68db41e

Signed-off-by: Celeste-jq <591998922@qq.com>

refactor: move VoxCPM model prep helpers out of tests

25258b2

Signed-off-by: Celeste-jq <591998922@qq.com>

Gaohan123 approved these changes Apr 15, 2026

View reviewed changes

Gaohan123 merged commit 4bf4c63 into vllm-project:main Apr 15, 2026
8 checks passed

zhangj1an mentioned this pull request Apr 17, 2026

[RFC]: TTS Development Roadmap - March 2026 #1795

Open



		def _import_voxcpm_audio_vae_classes():
		env_path = os.environ.get("VLLM_OMNI_VOXCPM_CODE_PATH")

		@@ -0,0 +1,768 @@
		"""Offline VoxCPM inference example for vLLM Omni.

		@@ -0,0 +1,68 @@
		# VoxCPM two-stage (latent → VAE) without async_chunk: one-shot latent then decode.
		stage_args:

Conversation

IsleOfDawnlight commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 4, 2026

Uh oh!

hsliuustc0106 commented Apr 5, 2026

Uh oh!

linyueqian commented Apr 7, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linyueqian commented Apr 14, 2026

Uh oh!

Celeste-jq commented Apr 14, 2026

Uh oh!

linyueqian commented Apr 14, 2026

Uh oh!

Celeste-jq commented Apr 15, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

IsleOfDawnlight commented Apr 3, 2026 •

edited

Loading