[Model]: Add HyperCLOVAX Audio Decoder (BigVGAN) support to vllm-omni by KilJaeeun · Pull Request #2512 · vllm-project/vllm-omni

KilJaeeun · 2026-04-06T04:17:51Z

Summary

Add HyperCLOVAX Audio Decoder (unit-BigVGAN vocoder) diffusion pipeline to vllm-omni.

This is a clean rebase of #869 onto current upstream/main.
#613 (vision decoder + full pipeline) stacks on top of this PR.

Changes

New Files

vllm_omni/diffusion/models/hyperclovax_audio/

pipeline_hyperclovax_audio.py — CosyVoice2 FSQ discrete unit tokens → BigVGAN waveform
hyperclovax_audio_decoder.py — BigVGAN decoder + EcapaTDNN speaker conditioning
ecapa_tdnn.py — ECAPA-TDNN speaker encoder
activations.py — Snake activation
constants.py — Mel filterbank constants

Modified Files

vllm_omni/diffusion/registry.py — Register HyperCLOVAXAudioPipeline
pyproject.toml — Add pydub>=0.25.1

PR dependency

Co-Authored-By: Hyunjoon Cho with1015@github.com

Test Plan

HyperCLOVAXAudioPipeline loads from HF hub
S2S pipeline generates audio output (see tests/e2e/online_serving/test_hcx_omni.py)

… vllm-omni - Add HyperCLOVAXAudioPipeline: CosyVoice2 FSQ discrete unit → BigVGAN vocoder - Add supporting layers: EcapaTDNN speaker encoder, activations, constants - Register HyperCLOVAXAudioPipeline in diffusion model registry - Add pydub>=0.25.1 dependency for audio I/O Co-Authored-By: Hyunjoon Cho <with1015@github.com> Signed-off-by: jaeeun.kil <jaeeun.kil@navercorp.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61deed5946

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T04:23:19Z

+    "HyperCLOVAXVisionPipeline": (
+        "hyperclovax_vision",
+        "pipeline_hyperclovax_vision",
+        "HyperCLOVAXVisionPipeline",
    ),


Restore removed pipeline registrations in model registry

This registry rewrite drops multiple existing model-class mappings (e.g., WanVACE, LTX2 variants, FluxKontext, Helios, Flux2, HunyuanVideo15, MagiHuman, OmniVoice, DreamIDOmni) while adding HyperCLOVAX entries; those pipeline classes still exist under vllm_omni/diffusion/models/*, so configs that previously worked will now fail in initialize_model with Model class ... not found and lose their pre/post-process hooks. Unless this is an explicit deprecation pass, this is a backward-incompatible regression introduced by this commit.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T04:23:19Z

+            if len(units.size()) == 2 and units.size(0) == 1:
+                return DiffusionOutput(output=None, error="the underlying decoder does not support batch inference yet")


Reject all batched token tensors before unsqueeze

The batch guard is inverted: it returns an error only when units is 2-D with size(0) == 1, but true batched inputs ([B, T] with B > 1) pass through and are then unsqueezed to 3-D, which later embedding/conv code does not support. This turns an intended user-facing validation error into a downstream shape/runtime failure for pre-batched token inputs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T04:23:19Z

+                for l_i in layer:
+                    remove_weight_norm(l_i)


Remove weight norm from inner convs, not wrapper modules

weight_norm is applied to inner conv/deconv layers, but this code calls remove_weight_norm on wrapper modules (l_i) instead. That raises ValueError, gets swallowed by remove_weight_norm()'s broad handler, and leaves parametrizations intact; the fallback path in from_pretrained for checkpoints without weight norm can then still fail to load.

Useful? React with 👍 / 👎.

KilJaeeun requested a review from hsliuustc0106 as a code owner April 6, 2026 04:17

KilJaeeun mentioned this pull request Apr 6, 2026

[Model]: Add HyperCLOVAX Audio Decoder support to vllm-omni #869

Open

5 tasks

KilJaeeun force-pushed the feat/hyperclovax-audio branch from 61deed5 to a8f5551 Compare April 6, 2026 04:18

chatgpt-codex-connector Bot reviewed Apr 6, 2026

View reviewed changes

KilJaeeun closed this Apr 6, 2026

KilJaeeun deleted the feat/hyperclovax-audio branch April 6, 2026 04:30

KilJaeeun mentioned this pull request Apr 6, 2026

feat: Add HyperCLOVAX Vision Decoder diffusion model support to vllm-omni #613

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model]: Add HyperCLOVAX Audio Decoder (BigVGAN) support to vllm-omni#2512

[Model]: Add HyperCLOVAX Audio Decoder (BigVGAN) support to vllm-omni#2512
KilJaeeun wants to merge 1 commit intovllm-project:mainfrom
KilJaeeun:feat/hyperclovax-audio

KilJaeeun commented Apr 6, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if len(units.size()) == 2 and units.size(0) == 1:
		return DiffusionOutput(output=None, error="the underlying decoder does not support batch inference yet")

Conversation

KilJaeeun commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Files

Modified Files

PR dependency

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KilJaeeun commented Apr 6, 2026 •

edited

Loading