Add voxcpm model support.#2467
Conversation
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: lyj-jjj <liuyingjun5@huawei.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
… voxcpm_streaming_0180 Signed-off-by: Celeste-jq <591998922@qq.com>
Switch VoxCPM stage0 to the AR scheduler path, align the async-chunk flow with the common framework pattern, and restore scheduler/test_utils changes to match upstream where needed. Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f8b8524b5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| try: | ||
| AutoConfig.register("qwen3_tts", Qwen3TTSConfig) | ||
| AutoConfig.register("cosyvoice3", CosyVoice3Config) |
There was a problem hiding this comment.
Register each HF config independently
These two registrations are wrapped in a single try, so if qwen3_tts is already registered and raises ValueError, cosyvoice3 is never attempted. In environments where one config is pre-registered by another plugin/import path, this leaves the other config missing and later model/config resolution fails unexpectedly. Register each config in its own guarded block (as already done for voxtral_tts/voxcpm) to avoid this partial-registration regression.
Useful? React with 👍 / 👎.
| except (asyncio.CancelledError, GeneratorExit): | ||
| if input_stream_task is not None and not input_stream_task.done(): | ||
| input_stream_task.cancel() | ||
| await self.abort(request_id) | ||
| logger.info(f"[AsyncOmni] Request {request_id} aborted.") | ||
| raise |
There was a problem hiding this comment.
Abort request on non-cancellation generate errors
generate() now only aborts on cancellation, but _process_orchestrator_results() can raise regular exceptions (for example when it receives an error message). In that case this method exits without calling abort() or cleanup, so the request can remain active in engine/orchestrator state and self.request_states, causing leaked state and stuck/follow-on request behavior. A generic exception path should still abort and clean up the request before re-raising.
Useful? React with 👍 / 👎.
| latent_audio_feat = self._extract_val(info, "latent_audio_feat", None) | ||
| print(f"---latent_audio_feat---:{latent_audio_feat.shape}") | ||
| audio_tensor = self._pipeline.decode( |
There was a problem hiding this comment.
Guard VAE path when latent chunk is missing
This path unconditionally accesses latent_audio_feat.shape, but async-chunk terminal payloads may intentionally omit latent_audio_feat (finish-only metadata). In a batched VAE decode step where one request has latent data and another is finish-only, this raises AttributeError and fails the whole batch instead of cleanly skipping/finishing that item.
Useful? React with 👍 / 👎.
…e_voxcpm_isle_first Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: IsleOfDawnlight <stellamou@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
(cherry picked from commit cff0398) Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
|
fix dco pre-commit and resolv conflicts please. |
|
resolve conflicts @IsleOfDawnlight |
Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/distributed/omni_connectors/transfer_adapter/chunk_transfer_adapter.py # vllm_omni/engine/arg_utils.py # vllm_omni/entrypoints/utils.py
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
|
fix pre-commit pls. |
linyueqian
left a comment
There was a problem hiding this comment.
Thanks for adding VoxCPM support! The two-stage latent+VAE architecture and async_chunk integration look solid. Left a few comments, mostly around import hygiene and file size.
| @@ -0,0 +1,24 @@ | |||
| # Point Python at VoxCPM's ``src`` (parent of ``voxcpm/model`` and ``voxcpm/modules``) if not next to this repo. | |||
| export VLLM_OMNI_VOXCPM_CODE_PATH=/home/l00613087/voxcpm/VoxCPM/src | |||
| export ASCEND_RT_VISIBLE_DEVICES=1 | |||
There was a problem hiding this comment.
[blocker] This file has hardcoded user paths (/home/l00613087/...) and a device-specific env var. Should be gitignored or removed from the PR.
There was a problem hiding this comment.
I've updated the question. Thank you for your suggestions.
| from vllm_omni.engine.output_modality import OutputModality | ||
| from vllm_omni.model_executor.models.voxcpm.configuration_voxcpm import VoxCPMConfig | ||
| from vllm_omni.model_executor.models.voxcpm.native_config import ( | ||
| detect_native_voxcpm_model_type, |
There was a problem hiding this comment.
[high] These top-level imports mean every vllm-omni startup pays for VoxCPM even when it's not used. Other models register lazily. Can you move these inside _maybe_prepare_model_hf_config_path() and _register_omni_hf_configs()?
There was a problem hiding this comment.
I have removed the improper import statements.
|
|
||
| from vllm_omni.config.yaml_util import create_config, load_yaml_config, merge_configs | ||
| from vllm_omni.entrypoints.stage_utils import _to_dict | ||
| from vllm_omni.model_executor.models.voxcpm.native_config import detect_native_voxcpm_model_type |
There was a problem hiding this comment.
[high] Same as arg_utils. This import should be lazy, inside resolve_model_config_path where it's actually used.
There was a problem hiding this comment.
I have removed the improper import statements, thanks!
| repo_root = Path(__file__).resolve().parents[4] | ||
| candidates.append(repo_root.parent / "VoxCPM" / "src") | ||
|
|
||
| for candidate in candidates: |
There was a problem hiding this comment.
[high] 1116 lines is quite large. Could you split the native model loading helpers, the stage wrappers (_DirectVoxCPMLatentGenerator / _DirectVoxCPMAudioVAE), and the main class into separate files?
There was a problem hiding this comment.
Good idea, i have split into separate files。
|
|
||
|
|
||
| def _import_voxcpm_audio_vae_classes(): | ||
| env_path = os.environ.get("VLLM_OMNI_VOXCPM_CODE_PATH") |
There was a problem hiding this comment.
[medium] _import_voxcpm_audio_vae_classes below is nearly identical to this function. Worth extracting the shared sys.path discovery into one helper.
There was a problem hiding this comment.
OK,I have extracted it.
| pass | ||
| if isinstance(val, (list, tuple)) and len(val) == 1: | ||
| return _connector_finished_truthy(val[0]) | ||
| return bool(val) |
There was a problem hiding this comment.
[medium] The recursive unwrap for single-element lists could loop on pathological input. Maybe just do an iterative unwrap with a small depth cap?
|
|
||
| try: | ||
| config_dict = json.loads(config_path.read_text()) | ||
| except Exception: |
There was a problem hiding this comment.
[medium] Bare except Exception here swallows permission errors, disk errors, etc. Could narrow to (json.JSONDecodeError, OSError).
| min_len: int = 2, | ||
| max_len: int = 2000, | ||
| inference_timesteps: int = 10, | ||
| cfg_value: float = 2.0, |
There was a problem hiding this comment.
[medium] If symlink fails this falls back to shutil.copytree on potentially multi-GB model dirs without any logging. A warning would help users understand why /tmp is filling up.
| if not request_summaries: | ||
| print("未解析到 stage 耗时摘要。") | ||
| return | ||
| print("每个 request 的 stage 耗时:") |
There was a problem hiding this comment.
[nit] A few Chinese strings in the test output (未解析到, 汇总:, 失败用例:). Should be English for consistency with the rest of the repo.
| @@ -0,0 +1,768 @@ | |||
| """Offline VoxCPM inference example for vLLM Omni. | |||
|
|
|||
There was a problem hiding this comment.
[nit] os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" at line 27 runs on import. Move it inside the if __name__ == "__main__" block?
I will add ready label after ut added |
| @@ -0,0 +1,68 @@ | |||
| # VoxCPM two-stage (latent → VAE) without async_chunk: one-shot latent then decode. | |||
| stage_args: | |||
There was a problem hiding this comment.
@linyueqian maybe this model works better with one single stage
There was a problem hiding this comment.
the current implementation of voxcpm 2 is one stage. it is worthwhile to try so in follow up pr.
|
fix pre commit and dco please |
a9aaaca to
ba96e46
Compare
Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # vllm_omni/entrypoints/openai/serving_speech.py
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com> # Conflicts: # tests/engine/test_arg_utils.py # vllm_omni/entrypoints/openai/serving_speech.py
e50165f to
98a45fd
Compare
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
|
fix ci |
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
Signed-off-by: Celeste-jq <591998922@qq.com>
|
@linyueqian @hsliuustc0106 CI passed, ptal, thank you. |
Signed-off-by: Celeste-jq <591998922@qq.com> Signed-off-by: lyj-jjj <liuyingjun5@huawei.com> Signed-off-by: IsleOfDawnlight <stellamou@qq.com> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Celeste-jq <591998922@qq.com> Co-authored-by: lyj-jjj <liuyingjun5@huawei.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: Celeste-jq <591998922@qq.com> Signed-off-by: lyj-jjj <liuyingjun5@huawei.com> Signed-off-by: IsleOfDawnlight <stellamou@qq.com> Signed-off-by: Yueqian Lin <linyueqian@outlook.com> Co-authored-by: Celeste-jq <591998922@qq.com> Co-authored-by: lyj-jjj <liuyingjun5@huawei.com> Co-authored-by: Yueqian Lin <linyueqian@outlook.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add support for the voxcpm model, with capabilities for streaming inference and embedding input/output.
Test Plan
Verifying voxcpm voice cloning,high-efficiency synthesis and batch processing functionality, covering both streaming and non-streaming inference modes.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)