Skip to content

[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676

Merged
linyueqian merged 14 commits intovllm-project:mainfrom
JuanPZuluaga:feat/omnivoive-clone-support
Apr 11, 2026
Merged

[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676
linyueqian merged 14 commits intovllm-project:mainfrom
JuanPZuluaga:feat/omnivoive-clone-support

Conversation

@JuanPZuluaga
Copy link
Copy Markdown
Contributor

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

here we enable voice cloning via ref_audio + ref_text API parameters using HiggsAudioV2TokenizerModel (requires transformers >= 5.3). It supports language and instructions params.

Also, handles data: URI, http(s)://, and file:// ref_audio formats. Wire up the diffusion speech path (_create_diffusion_speech) to pass ref_audio/ref_text/lang/instruct to the OmniVoice pipeline.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
@JuanPZuluaga JuanPZuluaga marked this pull request as ready for review April 10, 2026 07:38
@JuanPZuluaga
Copy link
Copy Markdown
Contributor Author

JuanPZuluaga commented Apr 10, 2026

@linyueqian I extended the voice cloning to OmniVoice. After this is merged, I can add it in: #2630

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Missing tests and PR body evidence.

  • No test coverage for the voice cloning path. Please add at minimum a unit test for _encode_ref_audio and an integration test for the ref_audio/ref_text prompt flow.
  • PR body test plan and test result sections are empty.
  • torchaudio is imported inside _encode_ref_audio (hot path). Consider moving to top-level with a conditional import to match the HiggsAudioV2TokenizerModel pattern.
  • Silent fallback: audio_tokenizer is set to None on import failure with only a warning. If a user explicitly passes ref_audio, the error will surface deep in the pipeline rather than at startup. Consider failing fast when ref_audio is provided but tokenizer is unavailable.

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
@linyueqian linyueqian added the ready label to trigger buildkite CI label Apr 10, 2026
@linyueqian linyueqian self-requested a review April 10, 2026 16:32
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
@linyueqian
Copy link
Copy Markdown
Collaborator

fix ci pls

@JuanPZuluaga
Copy link
Copy Markdown
Contributor Author

JuanPZuluaga commented Apr 11, 2026

@linyueqian fixing it now. One question:

what should we do here? should we add a specific message that one needs to run uv pip install "transformers==5.3.0" after installing vllm-omni in order to use voice cloning properly?

// EDIT

I just saw the message in: examples/offline_inference/omnivoice/README.md

Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
@JuanPZuluaga
Copy link
Copy Markdown
Contributor Author

  • Add a skip mark to tests as HF is not installed in the CI (@linyueqian right?), it could be reenabled once we have HF installed

@linyueqian
Copy link
Copy Markdown
Collaborator

Looked into reimplementing HiggsAudioV2TokenizerModel directly, not practical. It's a full audio codec (HuBERT + DAC + RVQ, 11.5GB) with heavy dependencies on transformers internals. Too much to vendor.

Current approach (lazy import + skip tests) makes sense. @tzhouam any idea when upstream vllm will support transformers 5.x?

@linyueqian linyueqian enabled auto-merge (squash) April 11, 2026 16:13
Copy link
Copy Markdown
Collaborator

@linyueqian linyueqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linyueqian linyueqian merged commit f7e8df9 into vllm-project:main Apr 11, 2026
8 checks passed
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
@Sweet-john
Copy link
Copy Markdown

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

@linyueqian
Copy link
Copy Markdown
Collaborator

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

thanks for spotting this! are you interested in submit a pr for that? definitely helpful for the speedup

@Sweet-john
Copy link
Copy Markdown

Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁

thanks for spotting this! are you interested in submit a pr for that? definitely helpful for the speedup

Don’t have that skill at the moment, but I’m willing to learn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants