[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS by JuanPZuluaga · Pull Request #2676 · vllm-project/vllm-omni

JuanPZuluaga · 2026-04-10T07:23:07Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

here we enable voice cloning via ref_audio + ref_text API parameters using HiggsAudioV2TokenizerModel (requires transformers >= 5.3). It supports language and instructions params.

Also, handles data: URI, http(s)://, and file:// ref_audio formats. Wire up the diffusion speech path (_create_diffusion_speech) to pass ref_audio/ref_text/lang/instruct to the OmniVoice pipeline.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

JuanPZuluaga · 2026-04-10T07:38:48Z

@linyueqian I extended the voice cloning to OmniVoice. After this is merged, I can add it in: #2630

hsliuustc0106 · 2026-04-10T08:15:17Z

Missing tests and PR body evidence.

No test coverage for the voice cloning path. Please add at minimum a unit test for _encode_ref_audio and an integration test for the ref_audio/ref_text prompt flow.
PR body test plan and test result sections are empty.
torchaudio is imported inside _encode_ref_audio (hot path). Consider moving to top-level with a conditional import to match the HiggsAudioV2TokenizerModel pattern.
Silent fallback: audio_tokenizer is set to None on import failure with only a warning. If a user explicitly passes ref_audio, the error will surface deep in the pipeline rather than at startup. Consider failing fast when ref_audio is provided but tokenizer is unavailable.

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

… feat/omnivoive-clone-support

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

… feat/omnivoive-clone-support

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

… feat/omnivoive-clone-support

linyueqian · 2026-04-11T05:37:41Z

fix ci pls

JuanPZuluaga · 2026-04-11T06:15:46Z

@linyueqian fixing it now. One question:

for running voice clonning we need HiggsAudioV2TokenizerModel (https://huggingface.co/bosonai/higgs-audio-v2-tokenizer) which requires "transformers>=5.3.0"

what should we do here? should we add a specific message that one needs to run uv pip install "transformers==5.3.0" after installing vllm-omni in order to use voice cloning properly?

// EDIT

I just saw the message in: examples/offline_inference/omnivoice/README.md

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

… feat/omnivoive-clone-support

JuanPZuluaga · 2026-04-11T06:26:36Z

Add a skip mark to tests as HF is not installed in the CI (@linyueqian right?), it could be reenabled once we have HF installed

linyueqian · 2026-04-11T06:43:03Z

Looked into reimplementing HiggsAudioV2TokenizerModel directly, not practical. It's a full audio codec (HuBERT + DAC + RVQ, 11.5GB) with heavy dependencies on transformers internals. Too much to vendor.

Current approach (lazy import + skip tests) makes sense. @tzhouam any idea when upstream vllm will support transformers 5.x?

linyueqian

LGTM

…oject#2676) Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Sweet-john · 2026-04-16T06:57:37Z

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

linyueqian · 2026-04-16T12:19:14Z

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

thanks for spotting this! are you interested in submit a pr for that? definitely helpful for the speedup

Sweet-john · 2026-04-20T08:27:02Z

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

thanks for spotting this! are you interested in submit a pr for that? definitely helpful for the speedup

Don’t have that skill at the moment, but I’m willing to learn

JuanPZuluaga added 2 commits April 10, 2026 07:18

feat, add voice cloning support to omnivoice

9d652d6

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

remove boilerplate and use resolve_audio

0291960

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

JuanPZuluaga marked this pull request as ready for review April 10, 2026 07:38

JuanPZuluaga requested a review from hsliuustc0106 as a code owner April 10, 2026 07:38

JuanPZuluaga added 7 commits April 10, 2026 08:34

fix server crash when not calling properly init

490db1a

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

1c5b83e

… feat/omnivoive-clone-support

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

c20bbaa

… feat/omnivoive-clone-support

test for voice cloning

8b0c611

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

update omnivoice pipeline

cc412d6

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

6f3f04e

… feat/omnivoive-clone-support

use generate_synthetic_audio instead for testing

65aa00b

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

linyueqian added the ready label to trigger buildkite CI label Apr 10, 2026

linyueqian self-requested a review April 10, 2026 16:32

JuanPZuluaga added 2 commits April 10, 2026 18:20

revert

ba497ef

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

a0cfa5d

… feat/omnivoive-clone-support

JuanPZuluaga added 3 commits April 11, 2026 06:24

add check if HF version 5.3.0 not available

9c4e8d5

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

skip tests for now

6bcfeb4

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-omni into…

df5b718

… feat/omnivoive-clone-support

linyueqian enabled auto-merge (squash) April 11, 2026 16:13

linyueqian approved these changes Apr 11, 2026

View reviewed changes

linyueqian merged commit f7e8df9 into vllm-project:main Apr 11, 2026
8 checks passed

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026

[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS (vllm-pr…

fb1bc53

…oject#2676) Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676

[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676
linyueqian merged 14 commits intovllm-project:mainfrom
JuanPZuluaga:feat/omnivoive-clone-support

JuanPZuluaga commented Apr 10, 2026

Uh oh!

JuanPZuluaga commented Apr 10, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 commented Apr 10, 2026

Uh oh!

linyueqian commented Apr 11, 2026

Uh oh!

JuanPZuluaga commented Apr 11, 2026 •

edited

Loading

Uh oh!

JuanPZuluaga commented Apr 11, 2026

Uh oh!

linyueqian commented Apr 11, 2026

Uh oh!

linyueqian left a comment

Uh oh!

Uh oh!

Sweet-john commented Apr 16, 2026

Uh oh!

linyueqian commented Apr 16, 2026

Uh oh!

Sweet-john commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JuanPZuluaga commented Apr 10, 2026

Purpose

Test Plan

Test Result

Uh oh!

JuanPZuluaga commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented Apr 10, 2026

Uh oh!

linyueqian commented Apr 11, 2026

Uh oh!

JuanPZuluaga commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JuanPZuluaga commented Apr 11, 2026

Uh oh!

linyueqian commented Apr 11, 2026

Uh oh!

linyueqian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Sweet-john commented Apr 16, 2026

Uh oh!

linyueqian commented Apr 16, 2026

Uh oh!

Sweet-john commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JuanPZuluaga commented Apr 10, 2026 •

edited

Loading

JuanPZuluaga commented Apr 11, 2026 •

edited

Loading