[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API by linyueqian · Pull Request #2720 · vllm-project/vllm-omni

linyueqian · 2026-04-13T04:57:38Z

Summary

Fix voice cloning via the OpenAI /v1/audio/speech endpoint with ref_audio parameter
The serving layer's _resolve_ref_audio returns decoded audio as [samples_list, sr], but build_prompt_cache expects a file path string for librosa.load() -- this causes voice cloning to silently fail
Add _encode_raw_audio() that mirrors native _encode_wav but accepts in-memory samples
Add _build_prompt_cache() that auto-detects the input format (file path vs raw [samples, sr]) and routes accordingly

Test plan

Offline voice cloning with file path still works (backward compat)
Voice cloning via OpenAI API with base64 ref_audio produces cloned voice
Zero-shot (no ref_audio) still works unchanged

chatgpt-codex-connector · 2026-04-13T04:57:44Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

The OpenAI speech API's _resolve_ref_audio returns decoded audio samples [list[float], int] but build_prompt_cache expects a file path string for librosa.load(). This causes voice cloning to fail silently when using the /v1/audio/speech endpoint with ref_audio. Add _encode_raw_audio() that mirrors native _encode_wav but accepts in-memory samples, and _build_prompt_cache() that auto-detects the input format (file path vs raw [samples, sr]) and routes accordingly. Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian · 2026-04-13T05:12:07Z

Tested on H20 (single GPU, enforce_eager=true, vllm 0.19.0)

All three voice cloning modes verified:

Test	Status	RTF	Audio
Zero-shot (no ref audio)	Pass	0.591	3.36s
Voice clone via file path (existing offline flow)	Pass	0.826	3.84s
Voice clone via raw audio data (OpenAI API format)	Pass	0.598	4.00s

The raw audio path simulates what serving_speech._resolve_ref_audio returns: [[samples_list, sr]]. Before this fix, passing that format to build_prompt_cache(reference_wav_path=...) would crash on librosa.load() since it expects a file path string.

Add OpenAI-compatible speech client and README for VoxCPM2 online serving, demonstrating zero-shot synthesis and voice cloning via ref_audio (local file, URL, or base64 data URI). Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

- Move `import librosa` to top-level imports - Use `tts._encode_sample_rate` directly instead of getattr fallback - Use "sk-empty" as default API key in example client Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian · 2026-04-13T06:03:46Z

@hsliuustc0106 should fix, check again?

…lm-project#2720) Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner April 13, 2026 04:57

linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch 4 times, most recently from f253551 to 4cb3d9c Compare April 13, 2026 05:04

linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch from 4cb3d9c to 59aa005 Compare April 13, 2026 05:04

linyueqian force-pushed the fix/voxcpm2-voice-clone-api branch from aa86a07 to 6c2dae7 Compare April 13, 2026 05:23

hsliuustc0106 reviewed Apr 13, 2026

View reviewed changes

Comment thread vllm_omni/model_executor/models/voxcpm2/voxcpm2_talker.py Outdated

hsliuustc0106 reviewed Apr 13, 2026

View reviewed changes

Comment thread vllm_omni/model_executor/models/voxcpm2/voxcpm2_talker.py

hsliuustc0106 reviewed Apr 13, 2026

View reviewed changes

Comment thread examples/online_serving/voxcpm2/openai_speech_client.py

fix(voxcpm2): address review comments

6b50d9b

- Move `import librosa` to top-level imports - Use `tts._encode_sample_rate` directly instead of getattr fallback - Use "sk-empty" as default API key in example client Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

hsliuustc0106 approved these changes Apr 13, 2026

View reviewed changes

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 13, 2026

hsliuustc0106 enabled auto-merge (squash) April 13, 2026 06:05

hsliuustc0106 merged commit d9e745c into vllm-project:main Apr 13, 2026
7 of 8 checks passed

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API (vl…

90bbe80

…lm-project#2720) Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

NickCao mentioned this pull request Apr 21, 2026

[Bugfix] treewide: drop references to librosa #2996

Merged

5 tasks

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API (vl…

b7d966c

…lm-project#2720) Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API (vl…

4489a3e

…lm-project#2720) Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API#2720

[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API#2720
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
linyueqian:fix/voxcpm2-voice-clone-api

linyueqian commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

linyueqian commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linyueqian commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linyueqian commented Apr 13, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

linyueqian commented Apr 13, 2026

Tested on H20 (single GPU, enforce_eager=true, vllm 0.19.0)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linyueqian commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants