Skip to content

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API#2720

Merged
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
linyueqian:fix/voxcpm2-voice-clone-api
Apr 13, 2026
Merged

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API#2720
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
linyueqian:fix/voxcpm2-voice-clone-api

Conversation

@linyueqian
Copy link
Copy Markdown
Collaborator

Summary

  • Fix voice cloning via the OpenAI /v1/audio/speech endpoint with ref_audio parameter
  • The serving layer's _resolve_ref_audio returns decoded audio as [samples_list, sr], but build_prompt_cache expects a file path string for librosa.load() -- this causes voice cloning to silently fail
  • Add _encode_raw_audio() that mirrors native _encode_wav but accepts in-memory samples
  • Add _build_prompt_cache() that auto-detects the input format (file path vs raw [samples, sr]) and routes accordingly

Test plan

  • Offline voice cloning with file path still works (backward compat)
  • Voice cloning via OpenAI API with base64 ref_audio produces cloned voice
  • Zero-shot (no ref_audio) still works unchanged

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@linyueqian linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch 4 times, most recently from f253551 to 4cb3d9c Compare April 13, 2026 05:04
The OpenAI speech API's _resolve_ref_audio returns decoded audio
samples [list[float], int] but build_prompt_cache expects a file
path string for librosa.load(). This causes voice cloning to fail
silently when using the /v1/audio/speech endpoint with ref_audio.

Add _encode_raw_audio() that mirrors native _encode_wav but accepts
in-memory samples, and _build_prompt_cache() that auto-detects the
input format (file path vs raw [samples, sr]) and routes accordingly.

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
@linyueqian linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch from 4cb3d9c to 59aa005 Compare April 13, 2026 05:04
@linyueqian
Copy link
Copy Markdown
Collaborator Author

Tested on H20 (single GPU, enforce_eager=true, vllm 0.19.0)

All three voice cloning modes verified:

Test Status RTF Audio
Zero-shot (no ref audio) Pass 0.591 3.36s
Voice clone via file path (existing offline flow) Pass 0.826 3.84s
Voice clone via raw audio data (OpenAI API format) Pass 0.598 4.00s

The raw audio path simulates what serving_speech._resolve_ref_audio returns: [[samples_list, sr]]. Before this fix, passing that format to build_prompt_cache(reference_wav_path=...) would crash on librosa.load() since it expects a file path string.

Add OpenAI-compatible speech client and README for VoxCPM2 online
serving, demonstrating zero-shot synthesis and voice cloning via
ref_audio (local file, URL, or base64 data URI).

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
@linyueqian linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch from aa86a07 to 6c2dae7 Compare April 13, 2026 05:23
Comment thread vllm_omni/model_executor/models/voxcpm2/voxcpm2_talker.py Outdated
Comment thread vllm_omni/model_executor/models/voxcpm2/voxcpm2_talker.py
Comment thread examples/online_serving/voxcpm2/openai_speech_client.py
- Move `import librosa` to top-level imports
- Use `tts._encode_sample_rate` directly instead of getattr fallback
- Use "sk-empty" as default API key in example client

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
@linyueqian
Copy link
Copy Markdown
Collaborator Author

@hsliuustc0106 should fix, check again?

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 13, 2026
@hsliuustc0106 hsliuustc0106 enabled auto-merge (squash) April 13, 2026 06:05
@hsliuustc0106 hsliuustc0106 merged commit d9e745c into vllm-project:main Apr 13, 2026
7 of 8 checks passed
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants