[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs by linyueqian · Pull Request #3406 · vllm-project/vllm-omni

linyueqian · 2026-05-07T05:58:37Z

Purpose

The L4 nightly run for test_stable_audio_quantization_and_teacache (build 9093) fails with:

AssertionError: assert 'image' == 'audio'
- audio
+ image
tests/e2e/offline_inference/test_stable_audio_expansion.py:61

#2077 added a branch in async_omni_engine._create_default_diffusion_stage_cfg that sets the default stage's final_output_type="audio" when kwargs["model_class_name"] resolves to a pipeline whose support_audio_output flag is True, and tightened the stable-audio assertion to == "audio". The catch: OmniDiffusionConfig.enrich_config() is what auto-resolves model_class_name from model_index.json, and it runs after the default stage cfg is built. So at the time the engine branches on kwargs.get("model_class_name", None) it's still None, the else arm fires, and the outer stage carries final_output_type="image".

The companion tests/e2e/offline_inference/test_audiox_model.py already side-steps this by passing model_class_name="AudioXPipeline" explicitly into Omni(). Mirror the same pattern in the stable-audio test.

While I was there, also align StableAudioPipeline's class header with AudioXPipeline's by declaring the audio-output contract explicitly:

support_audio_output: ClassVar[bool] = True — currently inherited from the SupportAudioOutput Protocol, which works because Protocol class attributes carry through subclasses, but making it explicit matches the AudioX/OmniVoice pattern and removes the dependency on Protocol-default-attribute semantics.
audio_sample_rate: ClassVar[int] = 44100 — picked up by diffusion_engine._audio_mm so multimodal_output[\"audio_sample_rate\"] is populated; downstream consumers no longer need to hardcode 44.1 kHz for Stable Audio Open.

Verification

On h20-server-0 against vllm-project/vllm-omni:main (3c85ca55):

step	result
upstream main: `supports_audio_output(\"StableAudioPipeline\")`	`True` (Protocol inheritance already provided the flag, so the class-attr addition is defensive, not load-bearing for the pass/fail)
upstream main: `_create_default_diffusion_stage_cfg(kwargs)` with `kwargs={\"model\": \"...\"}`	`final_output_type=\"image\"` because `kwargs[\"model_class_name\"]` is `None` (auto-resolution hasn't run) → matches the failing assertion
this PR: same call with `kwargs={\"model\": \"...\", \"model_class_name\": \"StableAudioPipeline\"}`	`final_output_type=\"audio\"` ✓

Test Plan

tests/e2e/offline_inference/test_stable_audio_expansion.py::test_stable_audio_quantization_and_teacache should now go green on the next L4 nightly. The companion test_audiox_model is unchanged and should still pass.

The full L4 stable-audio inference run was not exercised on h20 (the 16 GB FP8 weights + tea_cache combination is L4-shaped), but the prompt-shape mismatch that produced the assertion is fully reproducible with a Python-only check of _create_default_diffusion_stage_cfg's output.

Essential Elements of an Effective PR Description Checklist

…io class attrs The L4 nightly test_stable_audio_quantization_and_teacache fails with 'image' != 'audio'. PR vllm-project#2077 added an engine branch in async_omni_engine._create_default_diffusion_stage_cfg that sets final_output_type='audio' when kwargs has model_class_name pointing at a pipeline whose support_audio_output is True, and tightened the test assertion. The model_class_name auto-resolution from model_index.json runs later (in OmniDiffusionConfig.enrich_config); by the time it runs, the default stage cfg's final_output_type is already locked to 'image'. Mirror the AudioX offline test, which already passes model_class_name='AudioXPipeline' explicitly. Also align StableAudioPipeline with AudioXPipeline by declaring support_audio_output and audio_sample_rate as class attributes (the latter is read by diffusion_engine._audio_mm to populate multimodal_output['audio_sample_rate']). Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian · 2026-05-07T06:07:52Z

Closing+reopening to retrigger RTD with the now-exposed pull/3406/head ref (RTD's earlier attempts raced GitHub's async ref propagation).

hsliuustc0106

LGTM. The root cause analysis is clear (auto-resolution of model_class_name happens after default stage cfg is built), and the fix is minimal and targeted. Adding explicit class attributes for support_audio_output and audio_sample_rate is a nice cleanup that aligns with AudioXPipeline pattern.

hsliuustc0106 · 2026-05-16T13:07:34Z

Hi @linyueqian, friendly reminder — this PR hasn't had any activity (commits or reviews) in the past 9 days. 🕐

Could you please provide an update?

If you're still working on it, that's great — just let us know.
If you're blocked on something, feel free to ask for help.
If this PR is no longer being pursued, please consider closing it so we can keep the review queue manageable.

Thanks for your contribution! 🙏

linyueqian added 4 commits May 7, 2026 01:11

ci: nudge PR ref to retrigger readthedocs

2683940

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

ci: second nudge for pull/3404/head ref

827d336

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

ci: third nudge (force-resync pull/3404/head)

1b3e3e2

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian requested a review from hsliuustc0106 as a code owner May 7, 2026 05:58

ci: nudge for pull/3406/head exposure

47354af

Signed-off-by: Yueqian Lin <linyueqian@outlook.com>

linyueqian closed this May 7, 2026

linyueqian reopened this May 7, 2026

hsliuustc0106 reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs#3406

[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs#3406
linyueqian wants to merge 5 commits into
vllm-project:mainfrom
linyueqian:fix/stable_audio_audio_meta

linyueqian commented May 7, 2026

Uh oh!

linyueqian commented May 7, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linyueqian commented May 7, 2026

Purpose

Verification

Test Plan

Uh oh!

linyueqian commented May 7, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants