Feat/hyperclovax omni ad by with1015 · Pull Request #2 · with1015/vllm-omni

with1015 · 2026-04-06T05:59:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

vllm-project#797) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

Signed-off-by: dengyunyang <584797741@qq.com>

…ect#1023)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com>

Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…ct#927) Signed-off-by: wangyu31577 <wangyu31577@hundsun.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: wangyu31577 <wangyu31577@hundsun.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

…oject#1036) Signed-off-by: Kyle Huang <yellowsea@gmail.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…project#1043) Signed-off-by: natureofnature <wzliu@connect.hku.hk>

Signed-off-by: linyueqian <linyueqian@outlook.com>

Co-authored-by: root <root@hk01dgx028.cm.cluster>

…-project#983) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com

Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

…llm-project#980) Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com>

Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ct#1075) Signed-off-by: dongbo910220 <1275604947@qq.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

…omni (vllm-project#1025) Signed-off-by: dengyunyang <584797741@qq.com>

…e configuration (vllm-project#987) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

…llm-project#1554) Signed-off-by: linyueqian <linyueqian@outlook.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

) Signed-off-by: Semmer2 <semmer@live.cn>

…0 resource usage. (vllm-project#1543) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Model files: - vllm_omni/diffusion/models/hyperclovax_vision/: vision decoder pipeline (HyperCLOVAXVisionPipeline) using flow matching diffusion + VisionTransformer - vllm_omni/diffusion/models/hyperclovax_audio/: audio decoder pipeline (HyperCLOVAXAudioPipeline) using Unit-BigVGAN codec - vllm_omni/model_executor/stage_input_processors/hyperclovax_seed_omni.py: thinker2vision_decoder and thinker2audio_decoder — extract discrete tokens from LLM output; truncate/pad vision codes to 729 (27x27) for decoder Registry: - vllm_omni/diffusion/registry.py: register HyperCLOVAXVisionPipeline and HyperCLOVAXAudioPipeline with post-process functions Stage config: - vllm_omni/model_executor/stage_configs/hcx_omni.yaml: 3-stage config Stage 0: LLM thinker (TP=4, GPUs 0-3), Stage 1: vision decoder (GPU 4), Stage 2: audio decoder (GPU 5) Bug fixes for HyperCLOVAX compatibility: - diffusion/request.py: add extra dict field to OmniDiffusionRequest so vision_tokens/audio_tokens from stage input processors reach the pipeline - entrypoints/async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra before creating request - entrypoints/omni_stage.py: skip empty engine inputs (text-only requests where thinker2vision_decoder/thinker2audio_decoder return []) - entrypoints/async_omni.py: handle skipped sentinel in _process_single_result so text-only requests complete without crashing on Stage 1/2

- hcx_omni.yaml: guidance_scale 3.5→0.75, num_inference_steps 30→50 (matches OmniServe production defaults; 3.5 caused over-amplified autoguidance → shrunken/degraded output images) - omni_stage.py: skip empty engine inputs for text-only requests - async_omni_diffusion.py: extract OmniTokensPrompt.additional_information into OmniDiffusionRequest.extra (audio_tokens/vision_tokens) - registry.py: HCX Omni diffusion model registration fix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Serve default)

- Wire HyperCLOVAXAudioPipeline as Stage 2 in hcx_omni.yaml - GPU 5 assigned for audio decoder (Unit-BigVGAN / NCCosybigvganDecoder) - Add runtime edge 0->2 (thinker -> audio decoder) - Implement post-generation PCM chunk streaming for audio output (4800 samples / 200ms per SSE event @ 24kHz, int16 base64-encoded) Refs: github.com/vllm-project/pull/869 (already incorporated) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- config/model.py: try/except fallback for AttentionBackendEnum import (vllm.v1.attention.backends.registry absent in older vllm builds) - pipeline_hyperclovax_audio.py: return actual named_parameters() from load_weights() when using MAR checkpoint so diffusers_loader strict check passes (weights loaded eagerly in __init__ via MAR extraction) - qwen3_omni_moe_thinker.py, qwen2_5_omni_thinker.py: try/except stubs for check_interleaved_audio_video and merge_interleaved_embeddings which are absent in older vllm qwen2_5_omni_thinker; these symbols are only exercised by Qwen models, not HyperCLOVAX Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add runtime edge from:1 to:2 (required for Stage-2 connector init; without it AsyncOrchestrator cannot route to audio decoder at runtime) - Change model_subdir to model for Stage-2 engine_args to match total-poc working reference config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HyperCLOVAXAudioPipeline (diffusion) stores audio in multimodal_output directly (OmniRequestOutput.from_diffusion), not in outputs[0].multimodal_output like LLM pipelines. Fix three locations: 1. _create_audio_choice (non-streaming): use omni_outputs.multimodal_output when final_res.outputs is empty (diffusion path). 2. Streaming audio path: same fix for _final_res.outputs[0]. 3. Both loops (for output in final_res.outputs): fall back to single synthetic choice at index 0 when outputs list is empty. 4. Handle bytes audio output from HyperCLOVAXAudioPipeline post-process (returns WAV bytes, not tensors like Qwen3-Omni). Also fixes audio input (A2T) regression: skip diffusion prompt extraction when mm_data has audio content (added in previous session). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HyperCLOVAXAudioPipeline returns WAV bytes including 44-byte header. The previous byte-offset splitting included the header in the first chunk, corrupting it. Fix: parse with soundfile to get float32 PCM, then convert to int16 chunks uniformly regardless of source type (bytes or tensor). Verified: 136 audio chunks x 200ms = 27.04s audio streamed correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- serving_chat.py: extract last input_audio base64 from request messages and inject as ref_audio_b64 into engine_prompt dict - thinker2audio_decoder: read ref_audio_b64 from prompt and pass as ref_audio_tokens to Stage 2 (HyperCLOVAXAudioPipeline) - hcx_omni.yaml: switch Stage 2 to NCZSCosybigvganDecoder.mar (zero-shot) which uses ECAPA-TDNN speaker encoder instead of finetuned ID lookup Pipeline: input audio -> ECAPA-TDNN -> speaker embedding -> BigVGAN synthesis matching the voice characteristics of the original speaker. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add Stage 2 (HyperCLOVAXAudioPipeline / NCZSCosybigvganDecoder) to hcx_omni.yaml with GPU 5, gpu_memory_utilization 0.4, edge 0->2 from thinker - Fix thinker2audio_decoder: correct audio token range (128606-135167), remap to [0, 6561) for BigVGAN input, handle empty token case gracefully - Fix pipeline_hyperclovax_audio.py post_process_func signature and incorporate PR#869 BUG FIX patches for stable audio generation

…lization - hcx_omni.yaml: switch Stage 2 from NCZSCosybigvganDecoder (zero-shot, ECAPA-TDNN) to NCCosybigvganDecoder (finetuned, nn.Embedding speaker id). Zero-shot decoder required ref_audio (mel spectrogram) which is unavailable for text-only requests and incompatible with finetuned decoder path. - pipeline_hyperclovax_audio.py: guard ref_audio processing with 'not self.bigvgan.finetune' — finetuned decoder has no ECAPA-TDNN encoder, so passing ref_audio bytes would crash with 'expected 100 channels'. - omni_stage.py: add HuggingFace modules cache (~/.cache/huggingface/modules) to sys.path before queue.get_nowait() in try_collect(). Stage-0 pickles outputs containing custom classes from transformers_modules (trust_remote_code), but the API server process doesn't have this path, causing deserialization failures that silently drop Stage-0 outputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…quests - hcx_omni.yaml: revert to NCZSCosybigvganDecoder.mar (zero-shot ECAPA-TDNN) for voice-preserving S2S synthesis. NCCosybigvganDecoder used a fixed integer speaker_id and lost the input speaker's voice. - pipeline_hyperclovax_audio.py: add zero-mel fallback branch for finetune=False + ref_audio=None case. When a text-only request arrives (no input audio → no ref_audio), ECAPA-TDNN receives a zero mel tensor [1, num_mels, 64] instead of crashing with 'expected 100 channels'. S2S requests always have ref_audio so the zero-shot cloning path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

Signed-off-by: Hyunjoon Jeong <with1015@unist.ac.kr>

This reverts commit 8c3446a.

fhfuih and others added 30 commits January 28, 2026 16:07

[Frontend][Model] Support batch request with refined OmniDiffusionReq… (

d64bbde

vllm-project#797) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

[Model]: add FLUX.1-dev model (vllm-project#853)

ae96aa8

[BugFix] ignore mm data from stages to async omni (vllm-project#954)

6fb4058

Signed-off-by: dengyunyang <584797741@qq.com>

Revert "[BugFix] ignore mm data from stages to async omni" (vllm-proj…

91a3d46

…ect#1023)

[Bugfix] Modify output to model_runner_output (vllm-project#1026)

cb48968

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[Feature] Support cache-dit for Wan 2.2 inference (vllm-project#1021)

c0beebb

Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com>

[Hardware] Support platforms and plugin system (vllm-project#774)

c9392df

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[Core]: KV Cache Transfer Encapsulation (vllm-project#979)

741f7e2

Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Bugfix][Doc]Specify Qwen3-TTS model name for each task type (vllm-pr…

144fd6e

…oject#1036) Signed-off-by: Kyle Huang <yellowsea@gmail.com>

[Misc] pin version of fa3-fwd (vllm-project#1051)

d419011

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

[CI] [ROCm] Add more AMD CI tests (vllm-project#1039)

2ac1410

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

[Bugfix] fix qwen image layerd in dummy run (vllm-project#1027)

43c6f52

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

[BugFix] Fix noisy output without setting a seed in Qwen Image (vllm-…

8306c91

…project#1043) Signed-off-by: natureofnature <wzliu@connect.hku.hk>

[bugfix] remove vllm speech route (vllm-project#1060)

cac6504

Signed-off-by: linyueqian <linyueqian@outlook.com>

[Debug] Update GLM-Image Pipeline (vllm-project#1049)

def3956

Co-authored-by: root <root@hk01dgx028.cm.cluster>

[Diffusion][Bugfix] Fix the flash_attn backends selection logic (vllm…

5b8ade8

…-project#983) Signed-off-by: mxuax <mxuax@connect.ust.hk> Signed-off-by: XU Mingshi <91017482+mxuax@users.noreply.github.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

[BugFix] Fix the accuracy issue of multimodal input. (vllm-project#1020)

dd51897

Signed-off-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: Rein Yang <ruiruyang2@gmail.com>

[Bugfix] Set VaeImageProcessor do_convert_rgb True (vllm-project#1032)

0c8b5ac

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[feat]: adapt batch request for flux (vllm-project#1028)

ee0f7a7

Signed-off-by: wuzhongjian wuzhongjian_yewu@cmss.chinamobile.com

[CI] Change Qwen3 Omni stage placement strategy (vllm-project#1072)

c48a9f3

Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com>

[BugFix] Fix to use correct attn backend (vllm-project#1038)

f8dd566

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

[Perf] Qwen3 Omni talker mtp optimization (vllm-project#1005)

a909d36

Signed-off-by: ZeldaHuang <hzm414167@alibaba-inc.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>

[Wan2.2] Optimize memory usage with conditional transformer loading (v…

c7f89ef

…llm-project#980) Signed-off-by: Lin, Fanli <fanli.lin@intel.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com>

[Feat] Support XPU Backend in vLLM-Omni (vllm-project#191)

bfbf3e5

Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[Fix] stabilize diffusion images LoRA E2E across CI drift (vllm-proje…

3eecd20

…ct#1075) Signed-off-by: dongbo910220 <1275604947@qq.com>

[Bugfix][Test] Re-enable the log simple tests (vllm-project#1065)

2c4974c

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[Bugfix] pr conflict fix, bugfix ignore mm data from stages to async …

47a0969

…omni (vllm-project#1025) Signed-off-by: dengyunyang <584797741@qq.com>

[Doc][Bagel] Add BAGEL-7B-MoT documentation and edit the default stag…

0e07eb6

…e configuration (vllm-project#987) Signed-off-by: Ding Zuhao <e1583181@u.nus.edu> Signed-off-by: jzz <e1583181@u.nus.edu>

tzhouam and others added 26 commits February 28, 2026 10:39

[Debug] Merge vllm pull 35368 (vllm-project#1534)

fe0c02e

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

[Docs] update async chunk docs diagram [skip ci] (vllm-project#1530)

bfcfee7

Signed-off-by: Rein Yang <ruiruyang2@gmail.com>

fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio (v…

37775ff

…llm-project#1554) Signed-off-by: linyueqian <linyueqian@outlook.com>

[NPU][Bugfix] Align GPU side and recover qwen3-tts (vllm-project#1564)

c812667

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

[BugFix] Fix unexpected crash when init OmniDiffusion (vllm-project#1562

b0156d8

) Signed-off-by: Semmer2 <semmer@live.cn>

[CI] Modify some CI test cases to run on L4 environment to reduce H10…

cd2234a

…0 resource usage. (vllm-project#1543) Signed-off-by: yenuo26 <410167048@qq.com> Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>

[BugFix]: fix a lot of bug (vllm-project#1565)

3d9fa8d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

feat: HyperCLOVAX-SEED-Omni-8B stage pipeline and entrypoint fixes

c2584de

fix: change guidance_scale from 9.0 to 0.75 (autoguidance scale, Omni…

85d291c

…Serve default)

feat: add stage config yaml for HCX audio decoder

b73ae20

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

feat: add HyperCLOVAX-SEED-Omni 8B model as vllm-omni executor

44041c6

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

feat: add HCX audio decoder pipeline

d87a0af

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

fix: modify exception for HCX audio decoder (GAN)

8e45030

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

fix: default temperature set to 0, and pipeline model evaluation mode

e6285ce

Signed-off-by: Hyunjoon Jeong <hyunjoon.jeong@navercorp.com>

Merge branch 'model/hyperclovax-audio' into feat/hyperclovax-omni-AD

11b8ff3

Signed-off-by: Hyunjoon Jeong <with1015@unist.ac.kr>

with1015 marked this pull request as ready for review April 6, 2026 06:33

with1015 merged commit 8c3446a into model/hyperclovax-audio Apr 6, 2026

with1015 added a commit that referenced this pull request Apr 6, 2026

Revert "Feat/hyperclovax omni ad (#2)"

6caaabd

This reverts commit 8c3446a.

with1015 mentioned this pull request Apr 6, 2026

Revert "Feat/hyperclovax omni ad" #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/hyperclovax omni ad#2

Feat/hyperclovax omni ad#2
with1015 merged 305 commits intomodel/hyperclovax-audiofrom
feat/hyperclovax-omni-AD

with1015 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

with1015 commented Apr 6, 2026

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants