[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676
[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS#2676linyueqian merged 14 commits intovllm-project:mainfrom
Conversation
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
|
@linyueqian I extended the voice cloning to OmniVoice. After this is merged, I can add it in: #2630 |
|
Missing tests and PR body evidence.
|
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
… feat/omnivoive-clone-support
… feat/omnivoive-clone-support
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
… feat/omnivoive-clone-support
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
… feat/omnivoive-clone-support
|
fix ci pls |
|
@linyueqian fixing it now. One question:
what should we do here? should we add a specific message that one needs to run // EDIT I just saw the message in: |
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
… feat/omnivoive-clone-support
|
|
Looked into reimplementing Current approach (lazy import + skip tests) makes sense. @tzhouam any idea when upstream vllm will support transformers 5.x? |
…oject#2676) Signed-off-by: JuanPZuluaga <juanz9312@gmail.com>
|
Does the OmniVoice model currently support CUDA Graphs? Based on the test results from https://github.com/newgrit1004/omnivoice-triton, it seems that CUDA Graphs significantly improve RTF performance😁 |
thanks for spotting this! are you interested in submit a pr for that? definitely helpful for the speedup |
Don’t have that skill at the moment, but I’m willing to learn |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
here we enable voice cloning via
ref_audio+ref_textAPI parameters usingHiggsAudioV2TokenizerModel(requires transformers >= 5.3). It supports language and instructions params.Also, handles data: URI, http(s)://, and file:// ref_audio formats. Wire up the diffusion speech path (
_create_diffusion_speech) to passref_audio/ref_text/lang/instructto the OmniVoice pipeline.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)