Skip to content

Add voxcpm model support.#2467

Merged
Gaohan123 merged 51 commits intovllm-project:mainfrom
IsleOfDawnlight:pure_voxcpm
Apr 15, 2026
Merged

Add voxcpm model support.#2467
Gaohan123 merged 51 commits intovllm-project:mainfrom
IsleOfDawnlight:pure_voxcpm

Conversation

@IsleOfDawnlight
Copy link
Copy Markdown
Contributor

@IsleOfDawnlight IsleOfDawnlight commented Apr 3, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add support for the voxcpm model, with capabilities for streaming inference and embedding input/output.

Test Plan

Verifying voxcpm voice cloning,high-efficiency synthesis and batch processing functionality, covering both streaming and non-streaming inference modes.

Test Case Input
Single text synthesis TXT:"Meeting you was the most beautiful surprise."
Voice cloning with single reference Audio:link
Batch processing from text file TXT Path:examples\offline_inference\voxcpm\example_texts.txt.

Test Result

Case Device config generate time time per sample batch size 是否warm-up stage0 stage1 ttfp rtf
Single txt 910B Non-streaming 7.58s 7.58s / 7.24s 0.33s 7.57s 1.25
Single Clone 910B Non-streaming 8.57s 8.57s / 8.18s 0.38s 8.56s 1.22
Batch txt 910B Non-streaming 22.81s 7.60s 3 7.27s 0.33s 7.60s 1.25
Batch clone 910B Non-streaming 25.55s 8.52s 3 8.12s 0.38s 8.50s 1.21
Single txt 910B Streaming 7.12s 7.12s / 7.06s 7.12s 0.40s 1.17
Single clone 910B Streaming 8.55s 8.55s / 8.49s 8.54s 0.52s 1.27
Batch txt 910B Streaming 23.72s 7.91s 3 7.84s 7.89s 0.44s 1.3
Batch clone 910B Streaming 24.70s 8.23s 3 8.14s 8.20s 0.51s 1.22
Single txt H20 Non-streaming 0.77s 0.77s / 0.40s 0.36s 0.76s 0.16
Single Clone H20 Non-streaming 1.09s 1.09s / 1.04s 0.03s 1.08s 0.17
Batch txt H20 Non-streaming 1.35s 0.45s 3 0.37s 0.02s 0.39s 0.08
Batch clone H20 Non-streaming 2.19s 0.73s 3 0.48s 0.02s 0.50s 0.08
Single txt H20 Non-streaming 9.54s 9.54s / 7.92s 1.60s 9.53s 1.92
Single clone H20 Non-streaming 10.85s 10.85s / 8.54s 2.28s 10.85s 1.65
Single txt H20 Streaming 0.58s 0.58s / 0.57s 0.58s 0.08s 0.12
Single clone H20 Streaming 1.43s 1.43s / 1.42s 1.43s 0.81s 0.24
Batch txt H20 Streaming 1.73s 0.58s 3 0.54s 0.55s 0.05s 0.11
Batch clone H20 Streaming 2.44s 0.81s 3 0.69s 0.70s 0.08s 0.12
Single txt H20 Streaming 10.39s 10.39s / 9.18s 10.39s 10.08s 2.1
Single clone H20 Streaming 10.51s 10.51s / 8.93s 10.51s 10.15s 1.69

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Celeste-jq and others added 12 commits March 18, 2026 17:38
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
… voxcpm_streaming_0180

Signed-off-by: Celeste-jq <591998922@qq.com>
Switch VoxCPM stage0 to the AR scheduler path, align the async-chunk flow with the common framework pattern, and restore scheduler/test_utils changes to match upstream where needed.

Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f8b8524b5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/engine/arg_utils.py Outdated
Comment on lines +35 to +37
try:
AutoConfig.register("qwen3_tts", Qwen3TTSConfig)
AutoConfig.register("cosyvoice3", CosyVoice3Config)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Register each HF config independently

These two registrations are wrapped in a single try, so if qwen3_tts is already registered and raises ValueError, cosyvoice3 is never attempted. In environments where one config is pre-registered by another plugin/import path, this leaves the other config missing and later model/config resolution fails unexpectedly. Register each config in its own guarded block (as already done for voxtral_tts/voxcpm) to avoid this partial-registration regression.

Useful? React with 👍 / 👎.

Comment on lines 245 to 248
except (asyncio.CancelledError, GeneratorExit):
if input_stream_task is not None and not input_stream_task.done():
input_stream_task.cancel()
await self.abort(request_id)
logger.info(f"[AsyncOmni] Request {request_id} aborted.")
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Abort request on non-cancellation generate errors

generate() now only aborts on cancellation, but _process_orchestrator_results() can raise regular exceptions (for example when it receives an error message). In that case this method exits without calling abort() or cleanup, so the request can remain active in engine/orchestrator state and self.request_states, causing leaked state and stuck/follow-on request behavior. A generic exception path should still abort and clean up the request before re-raising.

Useful? React with 👍 / 👎.

Comment on lines +929 to +931
latent_audio_feat = self._extract_val(info, "latent_audio_feat", None)
print(f"---latent_audio_feat---:{latent_audio_feat.shape}")
audio_tensor = self._pipeline.decode(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard VAE path when latent chunk is missing

This path unconditionally accesses latent_audio_feat.shape, but async-chunk terminal payloads may intentionally omit latent_audio_feat (finish-only metadata). In a batched VAE decode step where one request has latent data and another is finish-only, this raises AttributeError and fails the whole batch instead of cleanly skipping/finishing that item.

Useful? React with 👍 / 👎.

…e_voxcpm_isle_first

Signed-off-by: Celeste-jq <591998922@qq.com>
IsleOfDawnlight and others added 5 commits April 3, 2026 14:47
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
(cherry picked from commit cff0398)

Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
@linyueqian
Copy link
Copy Markdown
Collaborator

fix dco pre-commit and resolv conflicts please.

@linyueqian linyueqian self-requested a review April 4, 2026 03:15
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

resolve conflicts @IsleOfDawnlight

Signed-off-by: Celeste-jq <591998922@qq.com>

# Conflicts:
#	vllm_omni/distributed/omni_connectors/transfer_adapter/chunk_transfer_adapter.py
#	vllm_omni/engine/arg_utils.py
#	vllm_omni/entrypoints/utils.py
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
@linyueqian
Copy link
Copy Markdown
Collaborator

fix pre-commit pls.

Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding VoxCPM support! The two-stage latent+VAE architecture and async_chunk integration look solid. Left a few comments, mostly around import hygiene and file size.

Comment thread run.sh Outdated
@@ -0,0 +1,24 @@
# Point Python at VoxCPM's ``src`` (parent of ``voxcpm/model`` and ``voxcpm/modules``) if not next to this repo.
export VLLM_OMNI_VOXCPM_CODE_PATH=/home/l00613087/voxcpm/VoxCPM/src
export ASCEND_RT_VISIBLE_DEVICES=1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocker] This file has hardcoded user paths (/home/l00613087/...) and a device-specific env var. Should be gitignored or removed from the PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the question. Thank you for your suggestions.

Comment thread vllm_omni/engine/arg_utils.py Outdated
from vllm_omni.engine.output_modality import OutputModality
from vllm_omni.model_executor.models.voxcpm.configuration_voxcpm import VoxCPMConfig
from vllm_omni.model_executor.models.voxcpm.native_config import (
detect_native_voxcpm_model_type,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] These top-level imports mean every vllm-omni startup pays for VoxCPM even when it's not used. Other models register lazily. Can you move these inside _maybe_prepare_model_hf_config_path() and _register_omni_hf_configs()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the improper import statements.

Comment thread vllm_omni/entrypoints/utils.py Outdated

from vllm_omni.config.yaml_util import create_config, load_yaml_config, merge_configs
from vllm_omni.entrypoints.stage_utils import _to_dict
from vllm_omni.model_executor.models.voxcpm.native_config import detect_native_voxcpm_model_type
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] Same as arg_utils. This import should be lazy, inside resolve_model_config_path where it's actually used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the improper import statements, thanks!

repo_root = Path(__file__).resolve().parents[4]
candidates.append(repo_root.parent / "VoxCPM" / "src")

for candidate in candidates:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] 1116 lines is quite large. Could you split the native model loading helpers, the stage wrappers (_DirectVoxCPMLatentGenerator / _DirectVoxCPMAudioVAE), and the main class into separate files?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, i have split into separate files。



def _import_voxcpm_audio_vae_classes():
env_path = os.environ.get("VLLM_OMNI_VOXCPM_CODE_PATH")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] _import_voxcpm_audio_vae_classes below is nearly identical to this function. Worth extracting the shared sys.path discovery into one helper.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I have extracted it.

pass
if isinstance(val, (list, tuple)) and len(val) == 1:
return _connector_finished_truthy(val[0])
return bool(val)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] The recursive unwrap for single-element lists could loop on pathological input. Maybe just do an iterative unwrap with a small depth cap?


try:
config_dict = json.loads(config_path.read_text())
except Exception:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] Bare except Exception here swallows permission errors, disk errors, etc. Could narrow to (json.JSONDecodeError, OSError).

min_len: int = 2,
max_len: int = 2000,
inference_timesteps: int = 10,
cfg_value: float = 2.0,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[medium] If symlink fails this falls back to shutil.copytree on potentially multi-GB model dirs without any logging. A warning would help users understand why /tmp is filling up.

if not request_summaries:
print("未解析到 stage 耗时摘要。")
return
print("每个 request 的 stage 耗时:")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] A few Chinese strings in the test output (未解析到, 汇总:, 失败用例:). Should be English for consistency with the rest of the repo.

@@ -0,0 +1,768 @@
"""Offline VoxCPM inference example for vLLM Omni.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" at line 27 runs on import. Move it inside the if __name__ == "__main__" block?

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Thanks for your contribution. Please add UT for protecting key APIs, and e2e test for the model referring to the guidance: https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/CI_5levels/

I will add ready label after ut added

@@ -0,0 +1,68 @@
# VoxCPM two-stage (latent → VAE) without async_chunk: one-shot latent then decode.
stage_args:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linyueqian maybe this model works better with one single stage

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current implementation of voxcpm 2 is one stage. it is worthwhile to try so in follow up pr.

@linyueqian
Copy link
Copy Markdown
Collaborator

fix pre commit and dco please

@Celeste-jq Celeste-jq force-pushed the pure_voxcpm branch 2 times, most recently from a9aaaca to ba96e46 Compare April 14, 2026 08:23
Signed-off-by: Celeste-jq <591998922@qq.com>

# Conflicts:
#	vllm_omni/entrypoints/openai/serving_speech.py
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>

# Conflicts:
#	tests/engine/test_arg_utils.py
#	vllm_omni/entrypoints/openai/serving_speech.py
@Celeste-jq Celeste-jq force-pushed the pure_voxcpm branch 2 times, most recently from e50165f to 98a45fd Compare April 14, 2026 09:35
@Celeste-jq
Copy link
Copy Markdown
Contributor

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 14, 2026
@linyueqian
Copy link
Copy Markdown
Collaborator

fix ci

Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
@Celeste-jq
Copy link
Copy Markdown
Contributor

@linyueqian @hsliuustc0106 CI passed, ptal, thank you.

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@Gaohan123 Gaohan123 merged commit 4bf4c63 into vllm-project:main Apr 15, 2026
8 checks passed
y123456y78 pushed a commit to y123456y78/vllm-omni that referenced this pull request Apr 15, 2026
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Celeste-jq <591998922@qq.com>
Co-authored-by: lyj-jjj <liuyingjun5@huawei.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Co-authored-by: Celeste-jq <591998922@qq.com>
Co-authored-by: lyj-jjj <liuyingjun5@huawei.com>
Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants